How do you do date math that ignores the year?

Question

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following. SELECT * FROM events WHERE &#8230;

Accepted Answer

If you don’t care for explanation and details, use the “Black magic version” below.All queries presented in other answers so far operate with conditions that are not sargable – they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn’t matter much with small tables. Matters (a lot) with big tables.Given the following simple table:CREATE TABLE event ( event_id serial PRIMARY KEY, event_date date);QueryVersion 1. and 2. below can use a simple index of the form:CREATE INDEX event_event_date_idx ON event(event_date);But all of the following solutions are even faster without index.1. Simple versionSELECT *FROM ( SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date FROM generate_series( 0, 14) d CROSS JOIN generate_series(13, 113) y ) xJOIN event USING (event_date);Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.2. Advanced versionWITH val AS ( SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y , extract(year FROM age(current_date, max(event_date)))::int AS min_y FROM event )SELECT e.*FROM ( SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date FROM generate_series(0, 14) d ,(SELECT generate_series(min_y, max_y) AS y FROM val) y ) xJOIN event e USING (event_date);Range of years is deduced from the table automatically – thereby minimizing generated years.You could go one step further and distill a list of existing years if there are gaps.Effectiveness co-depends on the distribution of dates. Few years with many rows each make this solution more useful. Many years with few rows each make it less useful.Simple SQL Fiddle to play with.3. Black magic versionUpdated 2016 to remove a “generated column”, which would block H.O.T. updates; simpler and faster function.Updated 2018 to calculate MMDD with IMMUTABLE expressions to allow function inlining.Create a simple SQL function to calculate an integer from the pattern 'MMDD':CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:db<>fiddle hereIt allows function inlining because EXTRACT (xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);Multicolumn for a number of reasons:Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.One PL/pgSQL table function to rule them allFork to one of two queries to cover the turn of the year:CREATE OR REPLACE FUNCTION f_anniversary(date = current_date, int = 14) RETURNS SETOF event AS$func$DECLARE d int := f_mmdd($1); d1 int := f_mmdd($1 + $2 - 1); -- fix off-by-1 from upper boundBEGIN IF d1 > d THEN RETURN QUERY SELECT * FROM event e WHERE f_mmdd(e.event_date) BETWEEN d AND d1 ORDER BY f_mmdd(e.event_date), e.event_date; ELSE -- wrap around end of year RETURN QUERY SELECT * FROM event e WHERE f_mmdd(e.event_date) >= d OR f_mmdd(e.event_date) <= d1 ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date; -- chronological across turn of the year END IF;END$func$ LANGUAGE plpgsql;Call using defaults: 14 days beginning “today”:SELECT * FROM f_anniversary();Call for 7 days beginning ‘2014-08-23’:SELECT * FROM f_anniversary(date '2014-08-23', 7);SQL Fiddle comparing EXPLAIN ANALYZE.February 29When dealing with anniversaries or “birthdays”, you need to define how to deal with the special case “February 29” in leap years.When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.Say, Bob is born on the 29th of February:My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.My query 3. includes February 29 in the range. Bob has birthday every year.There is no magical solution. You have to define what you want for every case.TestTo substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.The good news: all of them are correct and yield the same result – except for Gordon’s query that had syntax errors, and @wildplasser’s query that fails when the year wraps around (easy to fix).Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older). INSERT INTO event (event_date)SELECT '2000-1-1'::date - (random() * 36525)::intFROM generate_series (1, 108000);Delete ~ 8 % to create some dead tuples and make the table more “real life”. DELETE FROM event WHERE random() < 0.08;ANALYZE event;My test case had 99289 rows, 4012 hits.C – CatcallWITH anniversaries as ( SELECT event_id, event_date ,(event_date + (n || ' years')::interval)::date anniversary FROM event, generate_series(13, 113) n )SELECT event_id, event_date -- count(*) --FROM anniversariesWHERE anniversary BETWEEN current_date AND current_date + interval '14' day;C1 – Catcall’s idea rewrittenAside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year’s anniversary, which avoids the need for a CTE altogether:SELECT event_id, event_dateFROM eventWHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date BETWEEN current_date AND current_date + 14;D – DanielSELECT * -- count(*) -- FROM eventWHERE extract(month FROM age(current_date + 14, event_date)) = 0AND extract(day FROM age(current_date + 14, event_date)) <= 14;E1 – Erwin 1See “1. Simple version” above.E2 – Erwin 2See “2. Advanced version” above.E3 – Erwin 3See “3. Black magic version” above.G – GordonSELECT * -- count(*) FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) eWHERE to_date(to_char(now(), 'YYYY') || '-' || (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END) ,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;H – a_horse_with_no_nameWITH upcoming as ( SELECT event_id, event_date ,CASE WHEN date_trunc('year', age(event_date)) = age(event_date) THEN current_date ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1) * interval '1' year) AS date) END AS next_event FROM event )SELECT event_id, event_dateFROM upcomingWHERE next_event - current_date <= 14;W – wildplasserCREATE OR REPLACE FUNCTION this_years_birthday(_dut date) RETURNS date AS$func$DECLARE ret date;BEGIN ret := date_trunc( 'year' , current_timestamp) + (date_trunc( 'day' , _dut) - date_trunc( 'year' , _dut)); RETURN ret;END$func$ LANGUAGE plpgsql;Simplified to return the same as all the others:SELECT *FROM event eWHERE this_years_birthday( e.event_date::date ) BETWEEN current_date AND current_date + '2weeks'::interval;W1 – wildplasser’s query rewrittenThe above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date) AS$func$SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date$func$ LANGUAGE sql;SELECT *FROM event eWHERE this_years_birthday(e.event_date) BETWEEN current_date AND (current_date + 14);Test resultsI ran this test with a temporary table on PostgreSQL 9.1.7.Results were gathered with EXPLAIN ANALYZE, best of 5.ResultsWithout indexC: Total runtime: 76714.723 msC1: Total runtime: 307.987 ms -- !D: Total runtime: 325.549 msE1: Total runtime: 253.671 ms -- !E2: Total runtime: 484.698 ms -- min() & max() expensive without indexE3: Total runtime: 213.805 ms -- !G: Total runtime: 984.788 msH: Total runtime: 977.297 msW: Total runtime: 2668.092 msW1: Total runtime: 596.849 ms -- !With indexE1: Total runtime: 37.939 ms --!!E2: Total runtime: 38.097 ms --!!With index on expressionE3: Total runtime: 11.837 ms --!!All other queries perform the same with or without index because they use non-sargable expressions.ConclusioSo far, @Daniel’s query was the fastest.@wildplassers (rewritten) approach performs acceptably, too.@Catcall’s version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.The rewritten version performs pretty well, though. The expression I use is something like a simpler version of @wildplassser’s this_years_birthday() function.My “simple version” is faster even without index, because it needs fewer computations. With index, the “advanced version” is about as fast as the “simple version”, because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.My “black magic version” is fastest with or without index. And it is very simple to call.The updated version (after the benchmark) is a bit faster, yet.With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.

Advertisement

Answer

Query

1. Simple version

2. Advanced version

3. Black magic version

One PL/pgSQL table function to rule them all

February 29

Test

C – Catcall

C1 – Catcall’s idea rewritten

D – Daniel

E1 – Erwin 1

E2 – Erwin 2

E3 – Erwin 3

G – Gordon

H – a_horse_with_no_name

W – wildplasser

W1 – wildplasser’s query rewritten

Test results

Results

Conclusio