I am currently going through a process of validating a column that holds reference numbers for each customer but they have to be a specific format to be counted as a genuine number.
I initially received around 80 csv files and uploaded this into a single master table after going through a process of making sure all tables have the same number of columns etc. (following is an example as I have over 90 columns).
The following is my create table query and I decided to keep everything as a varchar column bar the first ID column just to ensure there’s no issues with data being missed.
CREATE TABLE IF NOT EXISTS `master_table` ( `id` int(11) NOT NULL, `Name` varchar(255) , `Business_Name` varchar(255) , `Contact_Person` varchar(255) , `Tel_Number` varchar(255) , `Tel_Number_2` varchar(255) , `Email` varchar(255) , `House_Name` varchar(255) , `Door_Number` varchar(255) , `Street` varchar(255) , `Town` varchar(255) , `Locality` varchar(255) , `County` varchar(255) , `Post_Code` varchar(255) , `Supplier` varchar(255) , `Annual_Usage` varchar(255) , `Start_Date` varchar(255) , `End_Date` varchar(255) , `PA_Nmbr` varchar(255) ;
I then uploaded all my data and this gave me around 900k rows of data.
The specific column that I am interested in is the PA_Nmbr column. For this to be a genuine reference number it needs to be 13 digit long so I then wrote the following query to see what my record count looks like:
SELECT SUM(CASE WHEN LEN <13 THEN CNT ELSE 0 END) AS UNDER_13, -- 201,112 SUM(CASE WHEN LEN =13 THEN CNT ELSE 0 END) AS PA_13, -- 701,019 SUM(CASE WHEN LEN >13 THEN CNT ELSE 0 END) AS OVER_13 -- 942 FROM ( SELECT LENGTH(PA_NMBR) AS LEN, COUNT(*) AS CNT FROM master_table GROUP BY LENGTH(PA_NMBR) ) A ;
So this told me that around 70% of my data has the right length but the next check is to make sure when I have added the data into the database the numbers have not been rounded up. So for example if there was a genuine number 1234567899999 and this has been rounded up to 1234567890000.
SO where I am stuck is that I would like to write a query where I can do a count on how many 0s they are in the column from the right side.
So if I want to check quickly and do a count of how many records I have that start off with numbers i.e. 1234 and then all 0s I could quickly do that. That would tell me that there is a problem with the data itself.
I would really appreciate it if somebody could advice on how I could write this query.
Thanks in advance.
Advertisement
Answer
have a look at the SELECT LEN(REPLACE(‘YYNYNYYNNNYYNY’, ‘N’, ”)) method. A similar question has been asked and answered here: How to count instances of character in SQL Column