I am using HSQLDB for the database and have the following condition in which I have to avoid duplicate entries while joining 2 tables.
Table1
HMEXPENSE +--------+---------------+-------------+ | USERID | EXPENSEAMOUNT | EXPENSEDATE | +--------+---------------+-------------+ | a | 100 | 2018-10-10 | | a | 200 | 2018-10-11 | | a | 100 | 2018-10-11 | | a | 200 | 2018-10-13 | +--------+---------------+-------------+
Table2
HMINCOME +--------+---------------+-------------+ | USERID | EXPENSEAMOUNT | EXPENSEDATE | +--------+---------------+-------------+ | a | 200 | 2018-10-10 | | a | 100 | 2018-10-11 | | a | 200 | 2018-10-11 | | a | 100 | 2018-10-12 | +--------+---------------+-------------+
The current query which gives me the duplicate entries is as follows
SELECT e.expenseDate ,i.incomeDate , SUM(e.expenseAmount), SUM(i.incomeAmount) FROM HMINCOME i FULL JOIN HMEXPENSE e on i.incomeDate = e.expenseDate GROUP BY i.incomeDate,e.expenseDate, i.incomeAmount, e.expenseAmount
OUTPUT
+-------------+------------+-------+-------+ | EXPENSEDATE | INCOMEDATE | C3 | C4 | +-------------+------------+-------+-------+ | 2018-10-10 | 2018-10-10 | 100.0 | 200.0 | | 2018-10-11 | 2018-10-11 | 200.0 | 100.0 | | 2018-10-11 | 2018-10-11 | 100.0 | 100.0 | | 2018-10-11 | 2018-10-11 | 200.0 | 200.0 | | 2018-10-11 | 2018-10-11 | 100.0 | 200.0 | | <null> | 2018-10-12 | <null>| 100.0 | | 2018-10-13 | <null> | 200.0 | <null>| +-------------+------------+-------+-------+
And if I use this above-mentioned query to get the actual output which is required in my actual scenario is as follows
SELECT e.expenseDate, i.incomeDate , SUM(e.expenseAmount),SUM(i.incomeAmount) FROM HMINCOME i FULL JOIN HMEXPENSE e on i.incomeDate = e.expenseDate GROUP BY i.incomeDate,e.expenseDate
OUTPUT
+-------------+------------+-------+-------+ | EXPENSEDATE | INCOMEDATE | C3 | C4 | +-------------+------------+-------+-------+ | 2018-10-10 | 2018-10-10 | 100.0 | 200.0 | | 2018-10-11 | 2018-10-11 | 600.0 | 600.0 | | <null> | 2018-10-12 | <null>| 100.0 | | 2018-10-13 | <null> | 200.0 | <null>| +-------------+------------+-------+-------+
The requirement is to get the sum of amount for a single day and null entry for the date which is not present in another table.
Expected output is as follows
+-------------+------------+-------+-------+ | EXPENSEDATE | INCOMEDATE | C3 | C4 | +-------------+------------+-------+-------+ | 2018-10-10 | 2018-10-10 | 100.0 | 200.0 | | 2018-10-11 | 2018-10-11 | 300.0 | 300.0 | | <null> | 2018-10-12 | <null>| 100.0 | | 2018-10-13 | <null> | 200.0 | <null>| +-------------+------------+-------+-------+
C3 and C4 column values are not calculated correctly due to the duplicate entries.
Help…
Advertisement
Answer
One method to solve this uses union all
and group by
:
select dte, sum(incomeamount) as incomeamount, sum(expenseamount) as expenseamount from ((select incomedate as dte, incomeamount, 0 as expenseamount from hmincome ) union all (select expensedate, 0, expenseAmount from hmexpense ) ) ie group by dte order by dte;