I have a set of large tables with many records each. I’m writing a Python program that SELECT
s a large number of records from these tables, based on the value of multiple columns on those records.
Essentially, these are going to be lots of queries of the form:
SELECT <some columns> FROM <some table> WHERE <column1=val1 AND column2=val2...>
Each table has a different set of columns, but otherwise the SELECT
formula above holds.
By default, I was going to just run all these queries through the psycopg2
PostgreSQL database driver, each as a separate query. But I’m wondering if there’s a more efficient way of going about it, given that there will be a very large number of such queries – thousands or more.
Advertisement
Answer
If the SELECT
list entries are the same for all queries (the same number of entries and the same data types), you can use UNION ALL
to combine several such queries. This won’t reduce the amount of work for the database, but it will reduce the number of client-server round trips. This can be a huge improvement, because for short queries network latency is often the dominant cost.
If all your queries have different SELECT
lists, there is nothing you can do.