Skip to content
Advertisement

Redshift table access history

I wonder are there any way to get table access history in Redshift cluster?

Our cluster has a lot of tables and it is costing us a lot. I would like to discover what specific tables have not been accessed for a given period and then I would drop those tables.

Are there any ways to get table access history? If someone has opinion or materials please let me know.

Thanks.

Advertisement

Answer

To manage disk space, the STL logs (system tables e.g STL_QUERY, STL_QUERYTEXT, ) only retain approximately two to five days of log history (max 7 days) , depending on log usage and available disk space. If you want to retain the log data, you will need to periodically copy it to other tables or unload it to Amazon S3. If you have not copied/exported the stl logs previously, there is no way to access logs of before 1 week.

–>In your case, you can discover which specific tables have not been accessed, only in last 1 week (assuming you have not exported the logs previously). Might be a good idea to check the number of scans on a table with below query to analyse its accessibility.

    SELECT 
    database, 
    schema AS schemaname, 
    table_id, 
    "table" AS tablename, 
    size, 
    sortkey1, 
    NVL(s.num_qs,0) num_qs
FROM svv_table_info t
LEFT JOIN (SELECT
   tbl, perm_table_name,
   COUNT(DISTINCT query) num_qs
FROM
   stl_scan s
WHERE 
   s.userid > 1
   AND s.perm_table_name NOT IN ('Internal Worktable','S3')
GROUP BY 
   tbl, perm_table_name) s ON s.tbl = t.table_id
AND t."schema" NOT IN ('pg_internal')
ORDER BY 7 desc;

I came across a similar situation in past, I would suggest to firstly check that the tables are not referred in any procedure or views in redshift with below query:

SELECT DISTINCT 
    srcobj.oid AS src_oid
    ,srcnsp.nspname AS src_schemaname
    ,srcobj.relname AS src_objectname
    ,tgtobj.oid AS dependent_viewoid
    ,tgtnsp.nspname AS dependent_schemaname
    ,tgtobj.relname AS dependent_objectname
FROM
    pg_catalog.pg_class AS srcobj
INNER JOIN
    pg_catalog.pg_depend AS srcdep
        ON srcobj.oid = srcdep.refobjid
INNER JOIN
    pg_catalog.pg_depend AS tgtdep
        ON srcdep.objid = tgtdep.objid
JOIN
    pg_catalog.pg_class AS tgtobj
        ON tgtdep.refobjid = tgtobj.oid
        AND srcobj.oid <> tgtobj.oid
LEFT OUTER JOIN
    pg_catalog.pg_namespace AS srcnsp
        ON srcobj.relnamespace = srcnsp.oid
LEFT OUTER JOIN
    pg_catalog.pg_namespace tgtnsp
        ON tgtobj.relnamespace = tgtnsp.oid
WHERE tgtdep.deptype = 'i' --dependency_internal
AND tgtobj.relkind = 'v' --i=index, v=view, s=sequence
and src_schemaname <> 'pg_catalog' and src_schemaname <> 'information_schema';

–>Secondly, if time permits start exporting the redshift stl logs to s3 for few weeks to better explore the least accessed tables.

–> If tables are critical and time does not permit , its better to export the data of the tables to s3 and retain it for few days prior dropping the tables from redshift. It would serve as a backup just in case something goes wrong.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement