Skip to content
Advertisement

CosmosDB count using OrderBy multiple conditions

I am using Azure CosmosDB, the model contains Id, and creationTime properties. What I am trying to do is count entries within time interval. I looked how to achieve it using LINQ but could not find a solution. What I used than is SQL. This is what I tried:

SELECT VALUE COUNT(1) 
FROM root WHERE (root["DeviceId"] = "myId") 
GROUP BY (root["creationTime"] >= 50 and root["creationTime"] < 60),
(root["creationTime"] >= 40 and root["creationTime"] < 50),
(root["creationTime"] >= 30 and root["creationTime"] < 40),
(root["creationTime"] >= 20 and root["creationTime"] < 30)

Bare in mind the values i showed are for display purposes. The correct result is an array with count values e.g. [5,0,3,6] this means “>=50 and < 60” = 5, “>=40 and < 50” = 0, “>=30 and < 40″ = 3,”>=20 and < 30″ = 6,

The problems with the query are:

  1. The order of result does not match the query, I get [3,6,5]
  2. If count is 0 it get dismissed
  3. if the range did not cover all count, I get difference as result also. Lets say the DB contains the following ranges: from 5 to 55, because I query from 20 to 60 I get also as a result the range from 5 to 19. Summing all the counts will give the total.

Is there a better way to achieve this rather than iterating over each interval and then counting number of entries.

Thanks for your help.

Advertisement

Answer

If you are doing same-partition query..

.. then you can get around the problems by including the range descriptor to select (assuming the actual predicates will be more complex and mod or functions or other such tricks are not relevant for the problem at hand), a la:

SELECT 
    (root.creationTime >= 50 and root.creationTime < 60) as V50To60,
    (root.creationTime >= 40 and root.creationTime < 50) as V40To50,
    (root.creationTime >= 30 and root.creationTime < 40) as V30To40,
    (root.creationTime >= 20 and root.creationTime < 30) as V20To30,
    COUNT(1) as total
FROM root 
WHERE root.Id = "myId" 
GROUP BY 
    (root.creationTime >= 50 and root.creationTime < 60),
    (root.creationTime >= 40 and root.creationTime < 50),
    (root.creationTime >= 30 and root.creationTime < 40),
    (root.creationTime >= 20 and root.creationTime < 30)

This way client can order results as needed and determine any missing groups. Do note that CosmosDB does NOT support order by for group by queries (yet), so you have to do this in client.

If this is a cross-partition query ..

.. then you can only have one VALUE aggregate in output meaning you can’t get group descriptor info out of the query. Also, CosmosDB does not support uncorrelated joins, so I’m inclined to think that there are no magic join tricks possible as well. And even if there are, I doubt it will be pretty and maintainable.

So, I guess you are forced to do multiple queries, one per each range, and combine the results in client. Most likely this is simple and efficient (as long as you have it covered by index), though the different queries will not describe the same moment in time.

If you really-really need the totals be from the same point of time then you may need to keep an index document for tracking the range totals (i.e. with change feed) and query that index document instead.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement