I have a small program that I use to track my progress in reading books and stuff like goodreads to know how much I read per day.
I created two tables for that, tbl_materials(material_id int, name varchar), tbl_progress(date_of_update timestamp, material_id int foreign key, read_pages int, skipped bit).
Whenever I read some pages I insert into tbl_progress the current page that I’ve finished
I may read in the book multiple times. And if I skipped some pages I insert them into tbl_progress and mark the bit skipped
to true. The problem is I can’t query the tbl_progress to know how much I read per day
What I have tried is to find the last inserted progress for every single material in every single day
so for example:
+-------------+------------+---------+---------------------+
| material_id | read_pages | skipped | last_update |
+-------------+------------+---------+---------------------+
| 4 | 1 | | 2017-09-22 00:56:02 |
| 3 | 1 | | 2017-09-22 00:56:14 |
| 12 | 1 | | 2017-09-24 20:13:01 |
| 4 | 30 | | 2017-09-25 01:56:38 |
| 4 | 34 | | 2017-09-25 02:19:47 |
| 54 | 1 | | 2017-09-29 04:22:11 |
| 59 | 9 | | 2017-10-14 15:25:14 |
| 4 | 68 | T | 2017-10-18 02:33:04 |
| 4 | 72 | | 2017-10-18 03:50:51 |
| 2 | 3 | | 2017-10-18 15:02:46 |
| 2 | 5 | | 2017-10-18 15:10:46 |
| 4 | 82 | | 2017-10-18 16:18:03 |
| 4 | 84 | | 2017-10-20 18:06:40 |
| 4 | 87 | | 2017-10-20 19:11:07 |
| 4 | 103 | T | 2017-10-21 19:50:29 |
| 4 | 104 | | 2017-10-22 19:56:14 |
| 4 | 108 | | 2017-10-22 20:08:08 |
| 2 | 6 | | 2017-10-23 00:35:45 |
| 4 | 111 | | 2017-10-23 02:29:32 |
| 4 | 115 | | 2017-10-23 03:06:15 |
+-------------+------------+---------+---------------------+
I calculate my total read pages per day = last read page in this day – last read page in a date prior to this date and this works but the problem is I can’t avoid skipped pages.
the first row in 2017-09-22 I read 1 page then another 1 page so the total read in this day = 2 (for only material_id = 4)
in 2017-09-25 the last update for material_id 4 is 34 pages which means I read 34-1 = 33 pages (last update in this day 34 – last update prior to this date 1) = 33
till now every thing works well but when it comes to considering skipped pages I could’t do it for example:
in 2017-10-18 the last number of read pages for material_id = 4 was 34 (in 2017-09-25) then I skipped 34 pages and now the current page is 68 then read 4 pages (2017-10-18 03:50:51 ) then another 10 pages (2017-10-18 16:18:03) so the total for material_id = 4 is 14
I created a view to select the most recent last_update for every book in every day
create view v_mostRecentPerDay as select material_id id, (select title from materials where materials.material_id = id) title, completed_pieces, last_update, date(last_update) dl, skipped from progresses where last_update = ( select max(last_update) from progresses s2 where s2.material_id = progresses.material_id and date(s2.last_update) = date(progresses.last_update) and s2.skipped = false );
so if there are many updates for single book in one day, this view retrieves the last one (with the max of last_update) which accompany the biggest number of read pages and so for every single book and another view to get the total read pages every day:
create view v_totalReadInDay as select dl, sum(diff) totalReadsInThisDay from ( select dl, completed_pieces - ifnull((select completed_pieces from progresses where material_id = id and date(progresses.last_update) < dl ORDER BY last_update desc limit 1 ), 0) diff from v_mostRecentPerDay where skipped = false ) omda group by dl;
but the problem is that the last view calculates skipped pages.
expected result:
+------------+------------------+ | day | total_read_pages | +------------+------------------+ | 2017-09-22 | 2 | +------------+------------------+ | 2017-09-24 | 1 | +------------+------------------+ | 2017-09-25 | 33 | +------------+------------------+ | 2017-09-29 | 1 | +------------+------------------+ | 2017-10-14 | 9 | +------------+------------------+ | 2017-10-18 | 19 | +------------+------------------+ | 2017-10-20 | 5 | +------------+------------------+ | 2017-10-21 | 0 | +------------+------------------+ | 2017-10-22 | 21 | +------------+------------------+ | 2017-10-23 | 8 | +------------+------------------+
mysql> SELECT VERSION(); +-----------------------------+ | VERSION() | +-----------------------------+ | 5.7.26-0ubuntu0.16.04.1-log | +-----------------------------+
Advertisement
Answer
You can make a view the uses the same columns of table progresses + another derived column which uses the same idea as @Arth suggested (pages_completed
column)
This column will contain the current completed_pages
– completed_pages with last update prior to the first completed pages which is the difference.
So for example if your progress table like this:
+-------------+------------+---------+---------------------+ | material_id | read_pages | skipped | last_update | +-------------+------------+---------+---------------------+ | 4 | 68 | T | 2017-10-18 02:33:04 | | 4 | 72 | | 2017-10-18 03:50:51 | | 2 | 3 | | 2017-10-18 15:02:46 | | 2 | 5 | | 2017-10-18 15:10:46 | | 4 | 82 | | 2017-10-18 16:18:03 | +-------------+------------+---------+---------------------+
we will add another derived column called diff.
where diff read_pages
in 2017-10-18 02:33:04
– read_pages
directly prior to 2017-10-18 02:33:04
+-------------+------------+---------+---------------------+------------------+ | material_id | read_pages | skipped | last_update | Derived_col_diff | +-------------+------------+---------+---------------------+------------------+ | | 68 | T | 2017-10-18T02:33:04 | 68 - null = 0 | | 4 | | | | | +-------------+------------+---------+---------------------+------------------+ | 4 | 72 | | 2017-10-18T03:50:51 | 72 - 68 = 4 | +-------------+------------+---------+---------------------+------------------+ | 2 | 3 | | 2017-10-18T15:02:46 | 3 - null = 0 | +-------------+------------+---------+---------------------+------------------+ | 2 | 5 | | 2017-10-18T15:10:46 | 5 - 3 = 2 | +-------------+------------+---------+---------------------+------------------+ | 4 | 82 | | 2017-10-18T16:18:03 | 82 - 72 = 10 | +-------------+------------+---------+---------------------+------------------+
note: that 68 - null
is null but I put it 0 for clarification
The derived column here is the difference between this read_pages – read_pages directly before this read_pages.
Here is a view
create view v_progesses_with_read_pages as select s0.*, completed_pieces - ifnull((select completed_pieces from progresses s1 where s1.material_id = s0.material_id and s1.last_update = ( select max(last_update) FROM progresses s2 where s2.material_id = s1.material_id and s2.last_update < s0.last_update )), 0) read_pages from progresses s0;
Then you can select the sum of this derived column per day:
select date (last_update) dl, sum(read_pages) totalReadsInThisDay from v_progesses_with_read_pages where skipped = false group by dl;
Which will result in something like this:
+-------------+-----------------------------+ | material_id | totalReadsInThisDay | +-------------+-----------------------------+ | 2017-10-18 | 16 | +-------------+-----------------------------+ | 2017-10-19 | 20 (just for clarification) | +-------------+-----------------------------+
Note that the last row is from my mind lol