Redshift table access history

Question

I wonder are there any way to get table access history in Redshift cluster? Our cluster has a lot of tables and it is costing us a lot. I would like to discover what specific tables have not been accessed for a given period and then I would drop those tables. Are there any ways to get table access history?

Accepted Answer

To manage disk space, the STL logs (system tables e.g STL_QUERY, STL_QUERYTEXT, ) only retain approximately two to five days of log history (max 7 days) , depending on log usage and available disk space. If you want to retain the log data, you will need to periodically copy it to other tables or unload it to Amazon S3. If you have not copied/exported the stl logs previously, there is no way to access logs of before 1 week.–>In your case, you can discover which specific tables have not been accessed, only in last 1 week (assuming you have not exported the logs previously).Might be a good idea to check the number of scans on a table with below query to analyse its accessibility. SELECT database, schema AS schemaname, table_id, "table" AS tablename, size, sortkey1, NVL(s.num_qs,0) num_qsFROM svv_table_info tLEFT JOIN (SELECT tbl, perm_table_name, COUNT(DISTINCT query) num_qsFROM stl_scan sWHERE s.userid > 1 AND s.perm_table_name NOT IN ('Internal Worktable','S3')GROUP BY tbl, perm_table_name) s ON s.tbl = t.table_idAND t."schema" NOT IN ('pg_internal')ORDER BY 7 desc;I came across a similar situation in past, I would suggest to firstly check that the tables are not referred in any procedure or views in redshift with below query:SELECT DISTINCT srcobj.oid AS src_oid ,srcnsp.nspname AS src_schemaname ,srcobj.relname AS src_objectname ,tgtobj.oid AS dependent_viewoid ,tgtnsp.nspname AS dependent_schemaname ,tgtobj.relname AS dependent_objectnameFROM pg_catalog.pg_class AS srcobjINNER JOIN pg_catalog.pg_depend AS srcdep ON srcobj.oid = srcdep.refobjidINNER JOIN pg_catalog.pg_depend AS tgtdep ON srcdep.objid = tgtdep.objidJOIN pg_catalog.pg_class AS tgtobj ON tgtdep.refobjid = tgtobj.oid AND srcobj.oid <> tgtobj.oidLEFT OUTER JOIN pg_catalog.pg_namespace AS srcnsp ON srcobj.relnamespace = srcnsp.oidLEFT OUTER JOIN pg_catalog.pg_namespace tgtnsp ON tgtobj.relnamespace = tgtnsp.oidWHERE tgtdep.deptype = 'i' --dependency_internalAND tgtobj.relkind = 'v' --i=index, v=view, s=sequenceand src_schemaname <> 'pg_catalog' and src_schemaname <> 'information_schema';–>Secondly, if time permits start exporting the redshift stl logs to s3 for few weeks to better explore the least accessed tables.–> If tables are critical and time does not permit , its better to export the data of the tables to s3 and retain it for few days prior dropping the tables from redshift. It would serve as a backup just in case something goes wrong.

Advertisement

Answer