I have an Oracle database table with a field called Classification which is VARCHAR. The VARCHAR is a CSV(using semi colons). Example: I want to pull all the rows with ONLY a different value in the CSV from the others. It is ok if a row has a previously found value as long as it has a new different value.

SELECT rows with a new DISTINCT from a VARCHAR with CSV in it

I have an Oracle database table with a field called Classification which is VARCHAR. The VARCHAR is a CSV(using semi colons). Example:

;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
;OTR;ROW;

​x
 
;CHR;;OTR;CHR;ROW;;CHA;ROW;;OTR;ROW;​

I want to pull all the rows with ONLY a different value in the CSV from the others. It is ok if a row has a previously found value as long as it has a new different value.

For instance from the above dataset it would be:

;CHR;
;OTR;CHR;ROW;
;CHA;ROW;

 
;CHR;;OTR;CHR;ROW;;CHA;ROW;​

If I do just:

Select DISTINCT Classification from Table1

 
Select DISTINCT Classification from Table1​

I get rows that overlap distinct values due to the overall VARCHAR being Distinct.

I can get all the distinct values using:

select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
FROM
(
select distinct  trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val 
  from Table1
   connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
  ORDER BY val
 )

 
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as finalFROM(select distinct  trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val   from Table1   connect by regexp_substr("Classification", '[^,]+', 1, level) is not null  ORDER BY val )​

which give me

FINAL
CHA,CHR,OTR,ROW

 
FINALCHA,CHR,OTR,ROW​

but am unable to make the link to pull out one record per unique value

Is this possible with SQL?

EDIT: This is a database created by a large corporation and mine purchased the product. Now I am tasked with data mining the backend database for BI and have absolutely no control of the database structure.

No offence but I see many answers in the questions I have researched stating ‘Do better database design/normalization’ and while I agree MOST I have read have no control over the database and are asking for SO assistance with a problem because of this, not ridicule on bad database design.

I apologize if I offend anyone

There is no parent/child relationship. I cannot see the object layer but I assume these values are changed in the object layer before propagating to the client as there is no link to them in the actual database

Clarification:

I see 2 ways to solve this:

1: One select statement that pulls out 1 row based on a new unique value within the VARCHAR CSV(Classification)

2: Use my select statement to loop through and pull one row containing that value in the VARCHAR CSV(Classification)

Thanks all for the input. I upvoted the ones that worked for me. In the end I will be using the one I developed just because I can easily manipulate the output(to a csv) for what the analyst wishes.

Answer

I figured it out this way and it runs fast(even once all my joins to other tables are added). Will test other answers as I can and decide best one(others look better than mine if they work as I would rather not use dbms_output)

DECLARE
 v_search_string varchar2(4000);
 v_classification varchar2(4000);
 
 BEGIN
    select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
    INTO v_search_string
FROM
(
select distinct  trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
  from mytable
   connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
  ORDER BY val
 );
 
 FOR i IN
     (SELECT trim(regexp_substr(v_search_string, '[^,]+', 1, LEVEL)) l
     FROM dual
       CONNECT BY LEVEL <= regexp_count(v_search_string, ',')+1
     )
     LOOP
        SELECT   "Classification"
        INTO v_classification
       
        FROM mytable
 
        WHERE "Classification"  LIKE '%' || i.l || '%'
       
        FETCH NEXT 1 ROWS ONLY;
      dbms_output.put_line(v_classification);
    END LOOP;
 
 
 END;

 
DECLARE v_search_string varchar2(4000); v_classification varchar2(4000);  BEGIN    select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final    INTO v_search_stringFROM(select distinct  trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val  from mytable   connect by regexp_substr("Classification", '[^,]+', 1, level) is not null  ORDER BY val );  FOR i IN     (SELECT trim(regexp_substr(v_search_string, '[^,]+', 1, LEVEL)) l     FROM dual       CONNECT BY LEVEL <= regexp_count(v_search_string, ',')+1     )     LOOP        SELECT   "Classification"        INTO v_classification               FROM mytable         WHERE "Classification"  LIKE '%' || i.l || '%'               FETCH NEXT 1 ROWS ONLY;      dbms_output.put_line(v_classification);    END LOOP;   END;​

Advertisement

Answer