I’m trying to import data from semicolon separated csv file into a SQL Server database. Here is the table structure
CREATE TABLE [dbo].[waste_facility] ( [Id] INT IDENTITY (1, 1) NOT NULL, [postcode] VARCHAR (50) NULL, [name] VARCHAR (50) NULL, [type] VARCHAR (255) NULL, [street] VARCHAR (255) NULL, [suburb] VARCHAR (255) NULL, [municipality] VARCHAR (255) NULL, [telephone] VARCHAR (255) NULL, [website] VARCHAR (255) NULL, [longtitude] DECIMAL (18, 8) NULL, [latitude] DECIMAL (18, 8) NULL, PRIMARY KEY CLUSTERED ([Id] ASC) );
The csv file is shown below:
Location Coordinate;Feature Extent;Projection;Postcode;Name Of Facility;Type Of Facility;Street;Suburb;Municipality;Telephone Number;Website;Easting Coordinate;Northing Coordinate;Longitude Coordinate;Latitude Coordinate;Google Maps Direction -37.9421182892,145.3193857967;"{""coordinates"": [145.3193857967, -37.9421182892], ""type"": ""Point""}";MGA zone 55;3156;Cleanaway Lysterfield Resource Recovery Centre;Recovery Centre;840 Wellington Road;LYSTERFIELD;Yarra Ranges;9753 5411;https://www.cleanaway.com.au/location/lysterfield/;352325;5799275;145.31938579674124;-37.94211828921733;https://www.google.com.au/maps/dir//-37.94211828921733,145.31938579674124/@your+location,17z/data=!4m2!4m1!3e0 -38.0529529215,145.2433557709;"{""coordinates"": [145.2433557709, -38.0529529215], ""type"": ""Point""}";MGA zone 55;3175;Smart Recycling (South Eastern Depot);Recycling Centre;185 Dandenong-Hastings Rd;LYNDHURST;Greater Dandenong;8787 3300;https://smartrecycling.com.au/;345876;5786853;145.24335577090602;-38.05295292152536;https://www.google.com.au/maps/dir//-38.05295292152536,145.24335577090602/@your+location,17z/data=!4m2!4m1!3e0 -38.0533129717,145.267610135;"{""coordinates"": [145.267610135, -38.0533129717], ""type"": ""Point""}";MGA zone 55;3976;Hampton Park Transfer Station (Outlook Environmental);Transfer Station;274 Hallam Road;HAMPTON PARK;Casey;9554 4502;https://www.suez.com.au/en-au/who-we-are/suez-in-australia-and-new-zealand/our-locations/waste-management-hampton-park-transfer-station;348005;5786853;145.2676101350274;-38.053312971691255;https://www.google.com.au/maps/dir//-38.053312971691255,145.2676101350274/@your+location,17z/data=!4m2!4m1!3e0 -38.1243050577,145.2183465487;"{""coordinates"": [145.2183465487, -38.1243050577], ""type"": ""Point""}";MGA zone 55;3977;Frankston Regional Recycling and Recovery Centre;Recycling Centre;20 Harold Road;SKYE;Frankston;1300 322 322;https://www.frankston.vic.gov.au/Environment-and-Waste/Waste-and-Recycling/Frankston-Regional-Recycling-and-Recovery-Centre-FRRRC/Accepted-Items-at-FRRRC;343833;5778893;145.21834654873447;-38.12430505770815;https://www.google.com.au/maps/dir//-38.12430505770815,145.21834654873447/@your+location,17z/data=!4m2!4m1!3e0 -38.0973208774,145.4920399066;"{""coordinates"": [145.4920399066, -38.0973208774], ""type"": ""Point""}";MGA zone 55;3810;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your+location,17z/data=!4m2!4m1!3e0
There are some columns that I don’t need, so I create a format file to import the data. The format file is shown as below
<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharFixed" LENGTH="50"/> <FIELD ID="12" xsi:type="CharFixed" LENGTH="50"/> <FIELD ID="13" xsi:type="CharFixed" LENGTH="50"/> <FIELD ID="2" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="3" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="4" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="5" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="6" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="7" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="8" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="9" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/> <FIELD ID="14" xsi:type="CharFixed" LENGTH="50"/> <FIELD ID="15" xsi:type="CharFixed" LENGTH="50"/> <FIELD ID="10" xsi:type="CharFixed" LENGTH="41"/> <FIELD ID="11" xsi:type="CharTerm" TERMINATOR="rn" MAX_LENGTH="41"/> <FIELD ID="16" xsi:type="CharFixed" LENGTH="50"/> </RECORD> <ROW> <COLUMN SOURCE="2" NAME="postcode" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="3" NAME="name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="4" NAME="type" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="5" NAME="street" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="6" NAME="suburb" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="7" NAME="municipality" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="8" NAME="telephone" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="9" NAME="website" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="10" NAME="longtitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/> <COLUMN SOURCE="11" NAME="latitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/> </ROW> </BCPFORMAT>
Then I tried both bulk insert and bcp in
– neither of them works.
Here is the bulk insert command
USE [waste-facility-locations]; BULK INSERT [dbo].[waste_facility] FROM 'E:onboardingIterationwaste-facility-locations.csv' WITH (FORMATFILE = 'E:onboardingIterationwaste_facility_formatter.xml', FIRSTROW = 2, LASTROW = 6, FIELDTERMINATOR = ';', ROWTERMINATOR = 'n', ERRORFILE = 'E:onboardingIterationmyRubbishData.log');
But unlucky some error file were generated. Here is what myRubbishData.log
error says:
Row 2 File Offset 1993 ErrorFile Offset 0 – HRESULT 0x80004005
And the actual row stored in myRubbishData.txt
:
;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/@your+location,17z/data=!4m2!4m1!3e0;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09
As you can see, it seems like rows are not correctly separated. So I tried to change the row delimiter to “n”,”r”,”nr”,”rn”, none of them work.
And I tried bcp
. It did not work either.
Here is the bcp
command I used:
bcp [waste-facility-locations].[dbo].[waste_facility] in "E:onboardingIterationwaste-facility-locations.csv" -f "E:onboardingIterationwaste_facility_formatter.xml" -T -S "(LocalDB)MSSQLLocalDB" -F 2 -t ";" -r "n"
Then I get an error said somehow the same thing
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1
One interesting things is, if I create a new excel and choose “Get data” option to import the csv file, the file can be literally correctly parsed.
Basically I can’t find what I did wrong here. Can someone help me on this one?
Advertisement
Answer
The SQL Server import facilities are very intolerant of bad data and even just formatting variations or options. In my career, I have literally spent thousands of work-hours trying to develop and debug import procedures for customers. I can tell you right now, that trying to fix this with SQL alone is both difficult and time-consuming.
When you have this problem (bad data and/or inconsistent formatting) it is almost always easier to find or develop a more flexible tool to pre-process the data into the rigid standard that SQL expects. So I would say that if Excel can parse it then just use Excel automation to pre-process them and then use SQL to import the Excel output. If that’s not practical for you, then I’d advise writing your own tool in some client language (C#, Vb, Java, Python, etc.) to pre-process the files.
You can do it in SQL (and I have done it many times), but I promise you that it is a long complicated trek.
SSIS has more flexible error-handling for problems like this, but if you are not already familiar and using it, it has a very steep learning curve and your first SSIS project is likely to be very time-consuming also.