Skip to content
Advertisement

How can I make TO_DATE in Amazon Redshift work with multiple date formats in the same column?

I have a string column with date-time values that I need to convert into dates in order to calculate a dense rank.

This column can have any of the below 3 formats:

MM-DD-YYYY H:MM:SS PM
M-DD-YYYY H:MM:SS PM
MM-D-YYYY H:MM:SS PM

My SQL Statement to calculate the dense rank is based on this date-time column that I need to convert to date in order to rank descending:

DENSE_RANK() 
OVER (PARTITION by [columnA] ORDER BY
TO_DATE(SUBSTRING([dateColumn],0,8), 'MM-DD-YYYY') DESC)

However, I’m stuck with the TO_DATE function since I have three different formats in the same column.

How can I make TO_DATE in Amazon Redshift work with multiple date formats in the same column?

Advertisement

Answer

Instead of using SUBSTRING, pass the entire date-time field to the TO_DATE function with the MM-DD-YYYY format which encapsulates all 3 formats above.

Redshift has a strict mode that is turned off by default – TO_DATE(string, format, is_strict=false). When not in strict mode, the time will automatically be discarded & Redshift is clever enough to infer how to convert the single digits to double digits.

The rest of your code can then work as per normal.

To demonstrate:

--MM-DD-YYYY H:MM:SS PM
SELECT TO_DATE('05-15-2022 4:12:34 PM','MM-DD-YYYY');
--M-DD-YYYY H:MM:SS PM
SELECT TO_DATE('4-01-2022 8:45:06 PM','MM-DD-YYYY');
--MM-D-YYYY H:MM:SS PM
SELECT TO_DATE('04-9-2022 6:23:10 PM','MM-DD-YYYY');

Output:

2022-05-15
2022-04-01
2022-04-09

In your case, this should work:

DENSE_RANK()
OVER (partition BY [columnA] ORDER BY 
TO_DATE(datecolumn, 'MM-DD-YYYY') DESC)

P.S. To begin with, these dates should all be in the same format.

If you are responsible for producing the data, please only output dates in one format moving forward to make everyone’s lives’ much easier!

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement