Skip to content
Advertisement

Calculate read pages through select query in reading progress table

I have a small program that I use to track my progress in reading books and stuff like goodreads to know how much I read per day.
I created two tables for that, tbl_materials(material_id int, name varchar), tbl_progress(date_of_update timestamp, material_id int foreign key, read_pages int, skipped bit).
Whenever I read some pages I insert into tbl_progress the current page that I’ve finished
I may read in the book multiple times. And if I skipped some pages I insert them into tbl_progress and mark the bit skipped to true. The problem is I can’t query the tbl_progress to know how much I read per day

What I have tried is to find the last inserted progress for every single material in every single day so for example: +-------------+------------+---------+---------------------+ | material_id | read_pages | skipped | last_update | +-------------+------------+---------+---------------------+ | 4 | 1 | | 2017-09-22 00:56:02 | | 3 | 1 | | 2017-09-22 00:56:14 | | 12 | 1 | | 2017-09-24 20:13:01 | | 4 | 30 | | 2017-09-25 01:56:38 | | 4 | 34 | | 2017-09-25 02:19:47 | | 54 | 1 | | 2017-09-29 04:22:11 | | 59 | 9 | | 2017-10-14 15:25:14 | | 4 | 68 | T | 2017-10-18 02:33:04 | | 4 | 72 | | 2017-10-18 03:50:51 | | 2 | 3 | | 2017-10-18 15:02:46 | | 2 | 5 | | 2017-10-18 15:10:46 | | 4 | 82 | | 2017-10-18 16:18:03 | | 4 | 84 | | 2017-10-20 18:06:40 | | 4 | 87 | | 2017-10-20 19:11:07 | | 4 | 103 | T | 2017-10-21 19:50:29 | | 4 | 104 | | 2017-10-22 19:56:14 | | 4 | 108 | | 2017-10-22 20:08:08 | | 2 | 6 | | 2017-10-23 00:35:45 | | 4 | 111 | | 2017-10-23 02:29:32 | | 4 | 115 | | 2017-10-23 03:06:15 | +-------------+------------+---------+---------------------+ I calculate my total read pages per day = last read page in this day – last read page in a date prior to this date and this works but the problem is I can’t avoid skipped pages.
the first row in 2017-09-22 I read 1 page then another 1 page so the total read in this day = 2 (for only material_id = 4)
in 2017-09-25 the last update for material_id 4 is 34 pages which means I read 34-1 = 33 pages (last update in this day 34 – last update prior to this date 1) = 33
till now every thing works well but when it comes to considering skipped pages I could’t do it for example:
in 2017-10-18 the last number of read pages for material_id = 4 was 34 (in 2017-09-25) then I skipped 34 pages and now the current page is 68 then read 4 pages (2017-10-18 03:50:51 ) then another 10 pages (2017-10-18 16:18:03) so the total for material_id = 4 is 14

I created a view to select the most recent last_update for every book in every day

create view v_mostRecentPerDay as
select material_id                                                    id,
       (select title from materials where materials.material_id = id) title,
       completed_pieces,
       last_update,
       date(last_update)                                              dl,
       skipped
from progresses
where last_update = (
    select max(last_update)
    from progresses s2
    where s2.material_id = progresses.material_id
      and date(s2.last_update) = date(progresses.last_update)
      and s2.skipped = false
);

so if there are many updates for single book in one day, this view retrieves the last one (with the max of last_update) which accompany the biggest number of read pages and so for every single book and another view to get the total read pages every day:

create view v_totalReadInDay as
select dl, sum(diff) totalReadsInThisDay
from (
         select dl,
                completed_pieces - ifnull((select completed_pieces
                                           from progresses
                                           where material_id = id
                                             and date(progresses.last_update) < dl
                                           ORDER BY last_update desc
                                           limit 1
                                          ), 0) diff
         from v_mostRecentPerDay
         where skipped = false
     ) omda
group by dl;

but the problem is that the last view calculates skipped pages.
expected result:

+------------+------------------+
| day        | total_read_pages |
+------------+------------------+
| 2017-09-22 | 2                |
+------------+------------------+
| 2017-09-24 | 1                |
+------------+------------------+
| 2017-09-25 | 33               |
+------------+------------------+
| 2017-09-29 | 1                |
+------------+------------------+
| 2017-10-14 | 9                |
+------------+------------------+
| 2017-10-18 | 19               |
+------------+------------------+
| 2017-10-20 | 5                |
+------------+------------------+
| 2017-10-21 | 0                |
+------------+------------------+
| 2017-10-22 | 21               |
+------------+------------------+
| 2017-10-23 | 8                |
+------------+------------------+
mysql> SELECT VERSION();
+-----------------------------+
| VERSION()                   |
+-----------------------------+
| 5.7.26-0ubuntu0.16.04.1-log |
+-----------------------------+

Advertisement

Answer

You can make a view the uses the same columns of table progresses + another derived column which uses the same idea as @Arth suggested (pages_completed column)
This column will contain the current completed_pages – completed_pages with last update prior to the first completed pages which is the difference.
So for example if your progress table like this:

+-------------+------------+---------+---------------------+
| material_id | read_pages | skipped | last_update         |
+-------------+------------+---------+---------------------+
|           4 |         68 | T       | 2017-10-18 02:33:04 |
|           4 |         72 |         | 2017-10-18 03:50:51 |
|           2 |          3 |         | 2017-10-18 15:02:46 |
|           2 |          5 |         | 2017-10-18 15:10:46 |
|           4 |         82 |         | 2017-10-18 16:18:03 |
+-------------+------------+---------+---------------------+

we will add another derived column called diff.
where diff read_pages in 2017-10-18 02:33:04read_pages directly prior to 2017-10-18 02:33:04

+-------------+------------+---------+---------------------+------------------+
| material_id | read_pages | skipped | last_update         | Derived_col_diff |
+-------------+------------+---------+---------------------+------------------+
|             | 68         | T       | 2017-10-18T02:33:04 | 68 - null = 0    |
| 4           |            |         |                     |                  |
+-------------+------------+---------+---------------------+------------------+
| 4           | 72         |         | 2017-10-18T03:50:51 | 72 - 68 = 4      |
+-------------+------------+---------+---------------------+------------------+
| 2           | 3          |         | 2017-10-18T15:02:46 | 3 - null = 0     |
+-------------+------------+---------+---------------------+------------------+
| 2           | 5          |         | 2017-10-18T15:10:46 | 5 - 3 = 2        |
+-------------+------------+---------+---------------------+------------------+
| 4           | 82         |         | 2017-10-18T16:18:03 | 82 - 72 = 10     |
+-------------+------------+---------+---------------------+------------------+

note: that 68 - null is null but I put it 0 for clarification
The derived column here is the difference between this read_pages – read_pages directly before this read_pages.
Here is a view

create view v_progesses_with_read_pages as
select s0.*,
       completed_pieces - ifnull((select completed_pieces
                                  from progresses s1
                                  where s1.material_id = s0.material_id
                                    and s1.last_update = (
                                      select max(last_update)
                                      FROM progresses s2
                                      where s2.material_id = s1.material_id and s2.last_update < s0.last_update
                                  )), 0) read_pages
from progresses s0;

Then you can select the sum of this derived column per day:

select date (last_update) dl, sum(read_pages) totalReadsInThisDay from v_progesses_with_read_pages where skipped = false group by dl;

Which will result in something like this:

+-------------+-----------------------------+
| material_id | totalReadsInThisDay         |
+-------------+-----------------------------+
| 2017-10-18  | 16                          |
+-------------+-----------------------------+
| 2017-10-19  | 20 (just for clarification) |
+-------------+-----------------------------+

Note that the last row is from my mind lol

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement