Skip to content
Advertisement

Update & Select a MySQL table in FIFO queue with non-unique identifiers

I have this table documents based on client’s set of data. Only provided key is documentId, but it’s non-unique (and rowId is mine, not client’s).

This table also shows which documents have been processed and which have not, plus some additional data. isProcessed is a generated column (isProcessed bool as (processed is not null)) for indexing purposes.

SELECT * FROM documents;

| rowId | documentId | received            | processed           | user  | isProcessed |
| ----- | ---------- | ------------------- | ------------------- | ----- | ----------- |
| 1     | 10020      | 2020-11-27 15:01:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 2     | 10020      | 2020-11-27 15:02:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 3     | 10021      | 2020-11-27 15:03:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 4     | 10022      | 2020-11-27 15:04:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 5     | 10022      | 2020-11-27 15:05:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 6     | 10022      | 2020-11-27 15:06:00 |                     |       | 0           |
| 7     | 10022      | 2020-11-27 15:07:00 |                     |       | 0           |
| 8     | 10022      | 2020-11-27 15:08:00 |                     |       | 0           |
| 9     | 10023      | 2020-11-27 15:09:00 |                     |       | 0           |
| 10    | 10024      | 2020-11-27 15:10:00 |                     |       | 0           |
| 11    | 10024      | 2020-11-27 15:11:00 |                     |       | 0           |
| 12    | 10025      | 2020-11-27 15:12:00 |                     |       | 0           |
| 13    | 10025      | 2020-11-27 15:13:00 |                     |       | 0           |
| 14    | 10025      | 2020-11-27 15:14:00 |                     |       | 0           |
| 15    | 10025      | 2020-11-27 15:15:00 |                     |       | 0           |

Then, when employees process some documents, they pass their documentId's into an app, generating a JSON like this:

{
  "user": "user2",
  "documents": [
      10022,
      10022,
      10023,
      10024,
      10025,
      10025,
      10025
  ]
}

Now I need to figure out how to:

a) update documents to set processed and user for first matching unprocessed documents, but no more than that!

That’s 2 x 10022 (rows 6, 7), 1 x 10023 (row 9), 1 x 10024 (row 10), 3 x 10025 (rows 12, 13, 14).

What I’ve got so far is:

SET @processedTS = NOW();

UPDATE documents SET user = 'user2', processed = @processedTS WHERE documentId = 10022 AND !isProcessed ORDER BY rowId LIMIT 2;
UPDATE documents SET user = 'user2', processed = @processedTS WHERE documentId = 10023 AND !isProcessed ORDER BY rowId LIMIT 1;
UPDATE documents SET user = 'user2', processed = @processedTS WHERE documentId = 10024 AND !isProcessed ORDER BY rowId LIMIT 1;
UPDATE documents SET user = 'user2', processed = @processedTS WHERE documentId = 10025 AND !isProcessed ORDER BY rowId LIMIT 3;

Is there a smarter way to approach this than generate multiple update statements?

b) select the updated documents to get their additional information and make a results file to send to the client

SELECT * FROM documents WHERE processed = @processedTS;

Is there a smarter way to select the updated data?

Expected results

a) table documents after updates

| rowId | documentId | received            | processed           | user  | isProcessed |
| ----- | ---------- | ------------------- | ------------------- | ----- | ----------- |
| 1     | 10020      | 2020-11-27 15:01:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 2     | 10020      | 2020-11-27 15:02:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 3     | 10021      | 2020-11-27 15:03:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 4     | 10022      | 2020-11-27 15:04:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 5     | 10022      | 2020-11-27 15:05:00 | 2020-11-27 17:17:17 | user1 | 1           |
| 6     | 10022      | 2020-11-27 15:06:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 7     | 10022      | 2020-11-27 15:07:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 8     | 10022      | 2020-11-27 15:08:00 |                     |       | 0           |
| 9     | 10023      | 2020-11-27 15:09:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 10    | 10024      | 2020-11-27 15:10:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 11    | 10024      | 2020-11-27 15:11:00 |                     |       | 0           |
| 12    | 10025      | 2020-11-27 15:12:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 13    | 10025      | 2020-11-27 15:13:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 14    | 10025      | 2020-11-27 15:14:00 | 2020-11-27 19:34:58 | user2 | 1           |
| 15    | 10025      | 2020-11-27 15:15:00 |                     |       | 0           |

b) results file

[
    { "documentId": 10022, "received": "2020-11-27 15:06:00", "processed": "2020-11-27 19:34:58", "user": "user2"},
    { "documentId": 10022, "received": "2020-11-27 15:07:00", "processed": "2020-11-27 19:34:58", "user": "user2"},
    { "documentId": 10023, "received": "2020-11-27 15:09:00", "processed": "2020-11-27 19:34:58", "user": "user2"},
    { "documentId": 10024, "received": "2020-11-27 15:10:00", "processed": "2020-11-27 19:34:58", "user": "user2"},
    { "documentId": 10025, "received": "2020-11-27 15:12:00", "processed": "2020-11-27 19:34:58", "user": "user2"},
    { "documentId": 10025, "received": "2020-11-27 15:13:00", "processed": "2020-11-27 19:34:58", "user": "user2"},
    { "documentId": 10025, "received": "2020-11-27 15:14:00", "processed": "2020-11-27 19:34:58", "user": "user2"}
]

(I know how to make that JSON, I’m looking for a smart way to select the updated data)

My question is…

In production, documents will have some 300k + rows, and each save run will affect hundreds of rows.

Is there a smarter way to update in bulk & select updated data?

Ideally, I’m looking for a way to pass [10022, 10022, 10023, 10024, 10025, 10025, 10025] onto the update in a way that will respect FIFO order and match only first listed documents, not more. Running hundreds of update statements does not seem ideal.

I’m using MySQL 8.0. The app is written in javascript (typescript) run on Node.js v12. I’m only looking for smarter MySQL solution. For suggestions with some logic in app layer, the answer doesn’t have to be in JS, any pseudocode is appreciated too.

Fiddle w/ table schema here: https://www.db-fiddle.com/f/ecYLwEGpwZpPtec6VwzMBF/1

Thank you very much.

Advertisement

Answer

A way to find the records in table documents which need an update:

set @json='{ "user": "user2", "documents": [ 10022, 10022, 10023, 10024, 10025, 10025, 10025 ]}';

WITH RECURSIVE json as (
   SELECT 0 as i
   UNION ALL
   SELECT i+1 FROM json WHERE i<10
),
to_update as (
   SELECT json_extract(@json,CONCAT("$.documents[",i,"]")) as document, count(*) as count
   FROM json
   WHERE not json_extract(@json,CONCAT("$.documents[",i,"]")) is null
   group by document
)
SELECT * FROM (
   SELECT documents.*, row_number() over (partition by documents.documentId order by documents.documentId) nr, to_update.count count
   FROM documents
   INNER JOIN to_update ON to_update.document = documents.documentId
   WHERE NOT documents.isProcessed  
   ) x WHERE nr<=count
;

This will output:

+-------+------------+---------------------+-----------+------+-------------+----+-------+
| rowId | documentId | received            | processed | user | isProcessed | nr | count |
+-------+------------+---------------------+-----------+------+-------------+----+-------+
|     6 |      10022 | 2020-11-27 15:06:00 | NULL      | NULL |           0 |  1 |     2 |
|     7 |      10022 | 2020-11-27 15:07:00 | NULL      | NULL |           0 |  2 |     2 |
|     9 |      10023 | 2020-11-27 15:09:00 | NULL      | NULL |           0 |  1 |     1 |
|    10 |      10024 | 2020-11-27 15:10:00 | NULL      | NULL |           0 |  1 |     1 |
|    12 |      10025 | 2020-11-27 15:12:00 | NULL      | NULL |           0 |  1 |     3 |
|    13 |      10025 | 2020-11-27 15:13:00 | NULL      | NULL |           0 |  2 |     3 |
|    14 |      10025 | 2020-11-27 15:14:00 | NULL      | NULL |           0 |  3 |     3 |
+-------+------------+---------------------+-----------+------+-------------+----+-------+
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement