Skip to content
Advertisement

SQL apply where clause to an arbitrary query results

I’m working on a system where the user introduces an SQL server/db connection and a valid SQL query, which I save on my system. I’m using python+sqlalchemy+pandas to accomplish this.

That query will return a table like this one, in which the only rule is that the query result must have a timestamp as DateTime field and at least another column with numeric values, but random name.

+-----------+----------+---------+-----+---------+
| timestamp | series_1 | serie_2 | ... | serie_n |
+-----------+----------+---------+-----+---------+
| <time>    | numeric  | numeric | ... | numeric |
+-----------+----------+---------+-----+---------+
|                     ...                        |
+-----------+----------+---------+-----+---------+
| <time>    | numeric  | numeric | ... | numeric |
+-----------+----------+---------+-----+---------+

Then I have a scheduler that runs the query using cron format and I work with the data and store it on my system DB. The problem is that every time I run the query with the scheduler I will have the old+new records, to handle this I’m filtering the results using pandas to get only the new records, which at this point is OK in performances (20s for 10k records with 20 series, thinking on I have 1 update per week is reasonable), but it’s not a good solution for a long time with a lot of more data.

So I’m thinking about if I can to wrap the user query results with a WHERE clause to filter the results by timestamp at the query level, without touching the query.

Advertisement

Answer

I did it in this way:

query = f"SELECT sub.* from ({query}) as sub WHERE sub.timestamp > '{latest_timestamp}' ORDER BY sub.timestamp ASC"
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement