SQL duplicates despite grouping on all key variabl…

I am new to SQL and use it for work, so I am going to censor the real names of columns going forward in the query below.

I am writing a query where the necessary data is spread across 3 tables. I have a network with users who send and receive packages from different nodes, and I want to see how many packages each user sends and receives from each node internally in the network in each period (these are not just grouping variables, but the target table where I am attempting to store it has the combination of these columns except the number of packages as the primary key). The correct result would give me a table with the identification columns, the period, node, and how many packages are received and sent.

The first table scm.flow has details about the user and how many packages they send and receive in each period. The columns I need to group by from here are 3 columns that identify the unique user and to which unit they belong, and this table also contains a column “number_packs” which has the number of packages received (+xxxx) and sent (-xxxx). Some of these packages are sent or received from sources inside the network and some outside. Those who are sent or received internally are logged in the table scm.internal with their short form name, and in the third table scm.node_names they can be linked with their full name which I want.

While I can run the query and get results which at first glance seem right, I can’t store it because I get error ORA-01452. I ran my code as a view where I used having and count(*) to check for duplicate values across grouping variables, and there are duplicates here despite me grouping these variables, and using aggregation functions on the remaining ones.

The query I wrote is this:

select id1, id2, id3, node, period, received, sent
from (
 select id1, id2, id3, 

 case when (node is not null) as node else 'Failed to record' end as node,
 period,
 sum(case when (trans_type = 'T1' and number_packs > 0) then number_packs else 0 end) as received,
 sum(case when (trans_type = 'T2' and number_packs < 0) then number_packs else 0 end) as sent

 from (
  tr.id1, tr.id2, tr.id3, tr.int_flag, tr.description, tr.period, tr.trans_t_no,
  it.trans_type,
  case when (it.trans_type = 'T1' and number_packs > 0) then kt.rec_from_node else tk.sent_from_node as node,
  number_packs,
  
  from scm.flow tr

  left join (
   select t.id1, t.trans_type, t.trans_t_no, t.rec_from_code, t.sent_from_code
   from (select id1, trans_type, trans_t_no, rec_from_code, sent_from_code
         from scm.internal) t) it
    on tr.id1 = it.id1 and tr.trans_t_no = it.trans_t_no and tr.int_flag = '1' and tr.description like 'GI%'
     
   left join (select code, node from scm.node_names) kt
      on kt.code = it.received_from_code
   left join (select code, node from scm.node_names) tk
      on tk.code = it.sent_from_code

 where tr.int_flag = '1')

group by id1, id2, id3, node, period) k

where received > 0 or sent < 0 ;

When using having and count(*) on the view from this query I noticed 2 things: 1. Duplicates only occur when node = ‘Failed to record’, and 2. both sent and received are 0, which shouldn’t happen since there is always a positive or negative number for the number_packs column, it is never null or 0.

I still do not see how my code could lead to these results. I’d like to know why the duplicates with zeros occur, and how I can fix my query to avoid the problem?

(I know adding some sample data is best practice, but I would like to avoid it since it is potentially sensitive data even with the column names changed.

Answer

You have atleast one syntax error…

case when (node is not null) as node else 'Failed to record' end as node,

The as node in the middle of the CASE expression shouldn’t be there…

(node is not null) as node

Should be THEN node

case when (node is not null) then node else 'Failed to record' end as node,

Or, just use COALESCE()…

COALESCE(node, 'Failed to record') AS node,

Advertisement

Answer