I have a dataset (8.5 mill rows), where all values in all columns must be enclosed in quotation symbols (” “). I have discovered that there is a problem – some few records holds values in some columns with the last quotation symbol missing. Now I need to try to get an overview on the issue – which columns have examples of this error (it is due to truncation upstream in the solution).
From the example dummy data inserted below:
How do I write a query, which outputs the columns “Last name” and “Age” due to the missing end quotation in row 2 and 3 in these columns? To be clear – how do I identify columns with sporadic truncated values?
Example data with missing quotation symbols
Thanks,
knn
Advertisement
Answer
You need to check the last character in the string. It is done using the substring function, passing an argument of -1 means the last character. And you check to see if it is different than the double quote symbol.
SELECT * FROM YourTable WHERE substr("Last Name", -1) <> '"' OR substr(Age,-1) <> '"'
You can play around with it here http://sqlfiddle.com/#!4/10a77e/1