I have an interesting issue that I’m struggling to solve.
I am retrieving data from a DB via a web API. I have direct DB access for testing, but my application will need to read the data via the API. In the API I can essentially provide the where clause for the SQL statement that retrieves my data from the table I specify for the web API. I need to retrieve the data in chunks of ~10,000 in order not to overload the server (the web service is not optimised, and gets exponentially slower the more results I retrieve). With most tables, there is a sequential numeric ID field that I can limit my queries with. So the full query looks like this:
SELECT * FROM TABLE WHERE ID > 0 and ID <= 10000
I am only able to provide the where clause component of that query via the API.
ID > 0 and ID <= 10000
For one specific table, the usual ID field is not available. There is a different ID field, which also contains sequential numeric values, but the field is of type String. Since the where clause I provide to the web API is processed as some sort of prepared statement, it will not process any functions. So I cannot cast the ID to an int:
CAST(ID2 as int) > 0 and CAST(ID2 as int) <= 10000
If I use > or < comparison for string values, it follows alphabetical order, and you get annoying behavior, such as 2,000 being greater than 100,000. This breaks the logic I had been using before.
Does anyone have any ideas how I could limit my retrieves to 10,000 entries with this string ID field? I’m sure there is some logic that will do it, but I haven’t been able to wrap my head around it.
Implicit casting in the where clause also returns an error:
(ID2 + 0) > 0 and (ID2 + 0) <= 10000
The values in the ID2 field range from ~140,000 to ~3,500,000.
I’d be happy to hear any ideas or suggestions! Please let me know if anything is unclear.
Advertisement
Answer
I think that I have figured this out. By using alphabetical sorting, we can still limit the query to ~11,000 results. We just have to respect that it starts sorting by the first digit/letter, and ignores the place value of the digits in the ID. So a number that is larger by magnitudes of 10, but starts with a lower digit, is treated as smaller:
By alphabetical sorting 9 > 100000
We can therefore use a where clause like this:
ID2 > '100000' and ID2 <= '101000'
That query will include all IDs from 100,000 to 101,000 and from 1,000,000 to 1,010,000. It would theoretically also include IDs from 10,000,000 to 10,100,000 and 100,000,000 to 101,000,000 etc. but I know that my values range from ~100,000 to ~4,000,000 (6 to 7 figures), so that range is irrelevant for this DB. I can then step my way through, all the way to:
ID2 > '999000' and ID2 <= '999999'
This will cover all values from 100,000 to 9,999,999 and let me get all data in 899 steps.
Thanks for all the other ideas provided!