How can I perform the same query on multiple tables in Redshift

Question

I&#8217;m working in SQL Workbench in Redshift. We have daily event tables for customer accounts, the same format each day just with updated info. There are currently 300+ tables. For a simple example, I would like to extract the top 10 rows from each table and place them in 1 table. Table name format is Even…

Accepted Answer

You&#8217;ve effectively invented a kind of pseudo-partitioning; where you manually partition the data by day.To manually recombine the tables create a view to union everything together&#8230;CREATE VIEW  events_combinedAS  SELECT 1 AS partition_id, * FROM events_001  UNION ALL  SELECT 2 AS partition_id, * FROM events_002  UNION ALL  SELECT 3 AS partition_id, * FROM events_003  etc, etcThat&#8217;s a hassle, you need to recreate the view every time you add a new table.That&#8217;s why most modern databases have partitioning schemes built in to them, so all the boiler-plate is taken care of for you.But RedShift doesn&#8217;t do that.  So, why not?In general because RedShift has many alternative mechanisms for dividing and conquering data.  It&#8217;s columnar, so you can avoid reading columns you don&#8217;t use.  It&#8217;s horizontally partitioned across multiple nodes (sharded), to share the load with large volumes of data.  It&#8217;s sorted and compressed in pages to avoid loading rows you don&#8217;t want or need.  It has dirty pages for newly arriving data, which can then be cleaned up with a VACUUM.So, I would agree with others that it&#8217;s not normal practice.  Yet, Amazon themselves do have a help page (briefly) describing your use case.https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.htmlSo, I&#8217;d disagree with &#8220;never do this&#8221;.  Still, it is a strong indication that you&#8217;ve accidentally walked in to an anti-pattern and should seriously re-consider your design.

Advertisement

Answer