I wonder are there any way to get table access history in Redshift cluster?
Our cluster has a lot of tables and it is costing us a lot. I would like to discover what specific tables have not been accessed for a given period and then I would drop those tables.
Are there any ways to get table access history? If someone has opinion or materials please let me know.
Thanks.
Advertisement
Answer
To manage disk space, the STL logs (system tables e.g STL_QUERY, STL_QUERYTEXT, ) only retain approximately two to five days of log history (max 7 days) , depending on log usage and available disk space. If you want to retain the log data, you will need to periodically copy it to other tables or unload it to Amazon S3. If you have not copied/exported the stl logs previously, there is no way to access logs of before 1 week.
–>In your case, you can discover which specific tables have not been accessed, only in last 1 week (assuming you have not exported the logs previously). Might be a good idea to check the number of scans on a table with below query to analyse its accessibility.
SELECT database, schema AS schemaname, table_id, "table" AS tablename, size, sortkey1, NVL(s.num_qs,0) num_qs FROM svv_table_info t LEFT JOIN (SELECT tbl, perm_table_name, COUNT(DISTINCT query) num_qs FROM stl_scan s WHERE s.userid > 1 AND s.perm_table_name NOT IN ('Internal Worktable','S3') GROUP BY tbl, perm_table_name) s ON s.tbl = t.table_id AND t."schema" NOT IN ('pg_internal') ORDER BY 7 desc;
I came across a similar situation in past, I would suggest to firstly check that the tables are not referred in any procedure or views in redshift with below query:
SELECT DISTINCT srcobj.oid AS src_oid ,srcnsp.nspname AS src_schemaname ,srcobj.relname AS src_objectname ,tgtobj.oid AS dependent_viewoid ,tgtnsp.nspname AS dependent_schemaname ,tgtobj.relname AS dependent_objectname FROM pg_catalog.pg_class AS srcobj INNER JOIN pg_catalog.pg_depend AS srcdep ON srcobj.oid = srcdep.refobjid INNER JOIN pg_catalog.pg_depend AS tgtdep ON srcdep.objid = tgtdep.objid JOIN pg_catalog.pg_class AS tgtobj ON tgtdep.refobjid = tgtobj.oid AND srcobj.oid <> tgtobj.oid LEFT OUTER JOIN pg_catalog.pg_namespace AS srcnsp ON srcobj.relnamespace = srcnsp.oid LEFT OUTER JOIN pg_catalog.pg_namespace tgtnsp ON tgtobj.relnamespace = tgtnsp.oid WHERE tgtdep.deptype = 'i' --dependency_internal AND tgtobj.relkind = 'v' --i=index, v=view, s=sequence and src_schemaname <> 'pg_catalog' and src_schemaname <> 'information_schema';
–>Secondly, if time permits start exporting the redshift stl logs to s3 for few weeks to better explore the least accessed tables.
–> If tables are critical and time does not permit , its better to export the data of the tables to s3 and retain it for few days prior dropping the tables from redshift. It would serve as a backup just in case something goes wrong.