I have a small program that I use to track my progress in reading books and stuff like goodreads to know how much I read per day.
I created two tables for that, tbl_materials(material_id int, name varchar), tbl_progress(date_of_update timestamp, material_id int foreign key, read_pages int, skipped bit).
Whenever I read some pages I insert into tbl_progress the current page that I’ve finished
I may read in the book multiple times. And if I skipped some pages I insert them into tbl_progress and mark the bit skipped
to true. The problem is I can’t query the tbl_progress to know how much I read per day
What I have tried is to find the last inserted progress for every single material in every single day
so for example:
+-------------+------------+---------+---------------------+
| material_id | read_pages | skipped | last_update |
+-------------+------------+---------+---------------------+
| 4 | 1 | | 2017-09-22 00:56:02 |
| 3 | 1 | | 2017-09-22 00:56:14 |
| 12 | 1 | | 2017-09-24 20:13:01 |
| 4 | 30 | | 2017-09-25 01:56:38 |
| 4 | 34 | | 2017-09-25 02:19:47 |
| 54 | 1 | | 2017-09-29 04:22:11 |
| 59 | 9 | | 2017-10-14 15:25:14 |
| 4 | 68 | T | 2017-10-18 02:33:04 |
| 4 | 72 | | 2017-10-18 03:50:51 |
| 2 | 3 | | 2017-10-18 15:02:46 |
| 2 | 5 | | 2017-10-18 15:10:46 |
| 4 | 82 | | 2017-10-18 16:18:03 |
| 4 | 84 | | 2017-10-20 18:06:40 |
| 4 | 87 | | 2017-10-20 19:11:07 |
| 4 | 103 | T | 2017-10-21 19:50:29 |
| 4 | 104 | | 2017-10-22 19:56:14 |
| 4 | 108 | | 2017-10-22 20:08:08 |
| 2 | 6 | | 2017-10-23 00:35:45 |
| 4 | 111 | | 2017-10-23 02:29:32 |
| 4 | 115 | | 2017-10-23 03:06:15 |
+-------------+------------+---------+---------------------+
I calculate my total read pages per day = last read page in this day – last read page in a date prior to this date and this works but the problem is I can’t avoid skipped pages.
the first row in 2017-09-22 I read 1 page then another 1 page so the total read in this day = 2 (for only material_id = 4)
in 2017-09-25 the last update for material_id 4 is 34 pages which means I read 34-1 = 33 pages (last update in this day 34 – last update prior to this date 1) = 33
till now every thing works well but when it comes to considering skipped pages I could’t do it for example:
in 2017-10-18 the last number of read pages for material_id = 4 was 34 (in 2017-09-25) then I skipped 34 pages and now the current page is 68 then read 4 pages (2017-10-18 03:50:51 ) then another 10 pages (2017-10-18 16:18:03) so the total for material_id = 4 is 14
I created a view to select the most recent last_update for every book in every day
create view v_mostRecentPerDay as
select material_id id,
(select title from materials where materials.material_id = id) title,
completed_pieces,
last_update,
date(last_update) dl,
skipped
from progresses
where last_update = (
select max(last_update)
from progresses s2
where s2.material_id = progresses.material_id
and date(s2.last_update) = date(progresses.last_update)
and s2.skipped = false
);
so if there are many updates for single book in one day, this view retrieves the last one (with the max of last_update) which accompany the biggest number of read pages and so for every single book and another view to get the total read pages every day:
create view v_totalReadInDay as
select dl, sum(diff) totalReadsInThisDay
from (
select dl,
completed_pieces - ifnull((select completed_pieces
from progresses
where material_id = id
and date(progresses.last_update) < dl
ORDER BY last_update desc
limit 1
), 0) diff
from v_mostRecentPerDay
where skipped = false
) omda
group by dl;
but the problem is that the last view calculates skipped pages.
expected result:
+------------+------------------+
| day | total_read_pages |
+------------+------------------+
| 2017-09-22 | 2 |
+------------+------------------+
| 2017-09-24 | 1 |
+------------+------------------+
| 2017-09-25 | 33 |
+------------+------------------+
| 2017-09-29 | 1 |
+------------+------------------+
| 2017-10-14 | 9 |
+------------+------------------+
| 2017-10-18 | 19 |
+------------+------------------+
| 2017-10-20 | 5 |
+------------+------------------+
| 2017-10-21 | 0 |
+------------+------------------+
| 2017-10-22 | 21 |
+------------+------------------+
| 2017-10-23 | 8 |
+------------+------------------+
mysql> SELECT VERSION();
+-----------------------------+
| VERSION() |
+-----------------------------+
| 5.7.26-0ubuntu0.16.04.1-log |
+-----------------------------+
Advertisement
Answer
You can make a view the uses the same columns of table progresses + another derived column which uses the same idea as @Arth suggested (pages_completed
column)
This column will contain the current completed_pages
– completed_pages with last update prior to the first completed pages which is the difference.
So for example if your progress table like this:
+-------------+------------+---------+---------------------+
| material_id | read_pages | skipped | last_update |
+-------------+------------+---------+---------------------+
| 4 | 68 | T | 2017-10-18 02:33:04 |
| 4 | 72 | | 2017-10-18 03:50:51 |
| 2 | 3 | | 2017-10-18 15:02:46 |
| 2 | 5 | | 2017-10-18 15:10:46 |
| 4 | 82 | | 2017-10-18 16:18:03 |
+-------------+------------+---------+---------------------+
we will add another derived column called diff.
where diff read_pages
in 2017-10-18 02:33:04
– read_pages
directly prior to 2017-10-18 02:33:04
+-------------+------------+---------+---------------------+------------------+
| material_id | read_pages | skipped | last_update | Derived_col_diff |
+-------------+------------+---------+---------------------+------------------+
| | 68 | T | 2017-10-18T02:33:04 | 68 - null = 0 |
| 4 | | | | |
+-------------+------------+---------+---------------------+------------------+
| 4 | 72 | | 2017-10-18T03:50:51 | 72 - 68 = 4 |
+-------------+------------+---------+---------------------+------------------+
| 2 | 3 | | 2017-10-18T15:02:46 | 3 - null = 0 |
+-------------+------------+---------+---------------------+------------------+
| 2 | 5 | | 2017-10-18T15:10:46 | 5 - 3 = 2 |
+-------------+------------+---------+---------------------+------------------+
| 4 | 82 | | 2017-10-18T16:18:03 | 82 - 72 = 10 |
+-------------+------------+---------+---------------------+------------------+
note: that 68 - null
is null but I put it 0 for clarification
The derived column here is the difference between this read_pages – read_pages directly before this read_pages.
Here is a view
create view v_progesses_with_read_pages as
select s0.*,
completed_pieces - ifnull((select completed_pieces
from progresses s1
where s1.material_id = s0.material_id
and s1.last_update = (
select max(last_update)
FROM progresses s2
where s2.material_id = s1.material_id and s2.last_update < s0.last_update
)), 0) read_pages
from progresses s0;
Then you can select the sum of this derived column per day:
select date (last_update) dl, sum(read_pages) totalReadsInThisDay from v_progesses_with_read_pages where skipped = false group by dl;
Which will result in something like this:
+-------------+-----------------------------+
| material_id | totalReadsInThisDay |
+-------------+-----------------------------+
| 2017-10-18 | 16 |
+-------------+-----------------------------+
| 2017-10-19 | 20 (just for clarification) |
+-------------+-----------------------------+
Note that the last row is from my mind lol