Rounding time to aggregate entries close in time

Question

I am trying to sum together content_len that are entered into the database a few seconds apart (colour coded). The table currently breaks into a new row once the character count hits 999, and inserts each overflow a few seconds apart. Due to errors, the overflow can be timestamped earlier than the previous body. My current attempt is to round

Accepted Answer

Do not store dates and times separately and do not store them in non-DATE data types. In Oracle a DATE is a binary data type consisting of 7 bytes which contain the components century, year-of-century, month, day, hour, minute and second and it ALWAYS has those components and is NEVER stored in any particular format.You can then use a single DATE column to store both date and time and do it more efficiently and with better error checking than if you store the values separately as strings or numbers.From Oracle 12, you can use MATCH_RECOGNIZE to do row-by-row processing.If you want each row to be within 20 seconds of the previous row then:SELECT *FROM ( SELECT source, TO_DATE(entry_dt || LPAD(entry_time, '0', 6), 'YYYYMMDDHH24MISS') AS entry_dt, content_len FROM text_length_records)MATCH_RECOGNIZE( PARTITION BY source ORDER BY entry_dt MEASURES FIRST(entry_dt) AS start_entry_dt, LAST(entry_dt) AS end_entry_dt, SUM(content_len) AS content_len ONE ROW PER MATCH PATTERN (within_20* last_time) DEFINE within_20 AS entry_dt + INTERVAL '20' SECOND >= NEXT(entry_dt))If you want each row to be within 20 seconds of the first row of the group then:SELECT *FROM ( SELECT source, TO_DATE(entry_dt || LPAD(entry_time, 6, '0'), 'YYYYMMDDHH24MISS') AS entry_dt, content_len FROM text_length_records)MATCH_RECOGNIZE( PARTITION BY source ORDER BY entry_dt MEASURES FIRST(entry_dt) AS start_entry_dt, LAST(entry_dt) AS end_entry_dt, SUM(content_len) AS content_len ONE ROW PER MATCH PATTERN (within_20*) DEFINE within_20 AS entry_dt <= FIRST(entry_dt) + INTERVAL '20' SECOND)Which, for the sample data:CREATE TABLE text_length_records (source, entry_dt, entry_time, content_len) ASSELECT 1, 20210910, 95059, 37 FROM DUAL UNION ALLSELECT 1, 20210910, 95102, 999 FROM DUAL UNION ALLSELECT 1, 20210910, 95959, 139 FROM DUAL UNION ALLSELECT 2, 20210910, 83320, 999 FROM DUAL UNION ALLSELECT 2, 20210910, 83322, 999 FROM DUAL UNION ALLSELECT 2, 20210910, 83324, 456 FROM DUAL;Both output:SOURCESTART_ENTRY_DTEND_ENTRY_DTCONTENT_LEN12021-09-10 09:50:592021-09-10 09:51:02103612021-09-10 09:59:592021-09-10 09:59:5913922021-09-10 08:33:202021-09-10 08:33:242454Note: Although the queries produce the same output for your sample data, they will produce slightly different outputs if you did have any sample data where the 3th row was not within 20 seconds of the 1st row of the group but was within 20 seconds of the 2nd row of the group.db<>fiddle here

SOURCE	START_ENTRY_DT	END_ENTRY_DT	CONTENT_LEN
1	2021-09-10 09:50:59	2021-09-10 09:51:02	1036
1	2021-09-10 09:59:59	2021-09-10 09:59:59	139
2	2021-09-10 08:33:20	2021-09-10 08:33:24	2454

Advertisement

Answer