(I’m using mysql v5.7)
I have a csv file that has something like
mytable1
name sys id name1 aaa,bbb,ccc 8,3,5 name2 ccc,bbb 22,33 name3 ccc,ccc 11,56 name4 aaa,ccc,ccc,ccc 87,88,40,99
note that ‘aaa,bbb,ccc’ is one string and ‘8,3,5’ is one string.. I know it’s not a good practice putting data like this but this is what I have given from other department… so my first task was to extract number matched with ccc. like this:
mytable2
name id name1 5 name2 22 name3 11 name4 88
for the name3, there were two ‘ccc’ but it is ok to pick the first one.
I could get the right result by doing
select sub.name, case when x = 0 then SUBSTRING_INDEX(SUBSTRING_INDEX(sub.id, ",", 1),",",-1) when x= 1 then SUBSTRING_INDEX(SUBSTRING_INDEX(sub.id, ",", 2),",",-1) when x=2 then SUBSTRING_INDEX(SUBSTRING_INDEX(sub.id, ",", 3),",",-1) when x=3 then SUBSTRING_INDEX(SUBSTRING_INDEX(sub.id, ",", 4),",",-1) when x=4 then SUBSTRING_INDEX(SUBSTRING_INDEX(sub.id, ",", 5),",",-1) else SUBSTRING_INDEX(sub.id, ",", -1) end as y from ( SELECT DISTINCT mytable1.*, case when SUBSTRING_INDEX(SUBSTRING_INDEX(sys, ",", 1),",",-1) = 'ccc' then 0 when SUBSTRING_INDEX(SUBSTRING_INDEX(sys, ",", 2),",",-1) = 'ccc' then 1 when SUBSTRING_INDEX(SUBSTRING_INDEX(sys, ",", 3),",",-1) = 'ccc' then 2 when SUBSTRING_INDEX(SUBSTRING_INDEX(sys, ",", 4),",",-1) = 'ccc' then 3 when SUBSTRING_INDEX(SUBSTRING_INDEX(sys, ",", 5),",",-1) = 'ccc' then 4 else -1 end as x FROM mytable1) sub
so basically, I found the index of the ccc then use that index to find the matched id. (btw max number of items in sys column is 6 like ‘aaa, bbb, ccc, ccc, ccc, ccc’)
now I am wondering if there’s any way to find the ids that’s from ccc but not included in the mytable2. like this.
56 40 99
any help would be appreciated!!
Advertisement
Answer
The id
matched the first specified sys
may be found by simple
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(id, ',', FIND_IN_SET(@parameter, sys)), ',', -1) id FROM mytable1 HAVING id > '';
For to extract all another values use:
SELECT DISTINCT name, SUBSTRING_INDEX(SUBSTRING_INDEX(id, ',', num), ',', -1) id FROM mytable1 JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) numbers ON SUBSTRING_INDEX(SUBSTRING_INDEX(sys, ',', num), ',', -1) = @parameter HAVING id NOT IN (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(id, ',', FIND_IN_SET(@parameter, sys)), ',', -1) FROM mytable1);
You may remove name
from the output if you do not need in it.
Increase the numbers count in numbers
subquery accordingly if the sys
/id
value may contain more than 4 separate values in CSV.