I have a table with what are essentially historical logs for project tasks.
Each row contains the ID for a project and the date of when one specific task was either started or ended.
I need to reduce the grain so that all of the start/end times are in a single row (see IDs 1 and 2 in the image)
But here’s the part that’s kicking me, some rows may have duplicate task entries, but with different timestamps – meaning the user had to go back and redo a task. For these, I only need the latest timestamp for that specific column (see ID 3 in the image).
There is another table that contains the IDs for the projects that is the main table for this. I have tried using a left join, but I seem to have trouble getting just the latest timestamp.
The way I’ve been going about it so far is like this:
SELECT ID, a1.A_START, a2.A_END [...] c2.C_END FROM PROJECT LEFT JOIN (SELECT ID, A_START FROM PROJECT_TASK ORDER BY A_START DESC) a1 ON a1.ID = p.ID LEFT JOIN (SELECT ID, A_END FROM PROJECT_TASK ORDER BY A_END DESC) a2 ON a2.ID = p.ID LEFT JOIN (SELECT ID, B_START FROM PROJECT_TASK ORDER BY B_START DESC) b1 ON b1.ID = p.ID ... LEFT JOIN (SELECT ID, C_END FROM PROJECT_TASK ORDER BY C_END DESC) c2 ON c2.ID = p.ID
Obviously this doesn’t work as intended because I need to weed out any older tasks that were redone – which is where I’m stuck.
Any tips or advice are appreciated.
NOTE: Neither table can be altered – I can only create a new fact table from the data given.
DISCLAIMER: It’s likely that a similar question has been asked here before, but I couldn’t think of what specifically to ask that actually came up with any helpful results. If someone links to a similar post, I’ll edit this message to include that link as a reference for future users.
Advertisement
Answer
You could use aggregation on all those 6 columns. You can change column names and aliases as desired
select id, max(col1), max(col2), max(col3), max(col4), max(col5), max(col6) from your_table group by id;