I am using Azure Synapse and I m querying the following CSV file:
1,"A","G" 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
Here is the T-SQL query:
SELECT * FROM OPENROWSET( BULK 'taxi/raw/Sample.csv', DATA_SOURCE = 'nyc_taxidata', FORMAT = 'CSV', PARSER_VERSION = '2.0', FIRSTROW=2, FIELDTERMINATOR = ',', ROWTERMINATOR = 'n', FIELDQUOTE = '"' ) AS [result]
Here is the output:
C1 C2 C3 1 A G 2 B H 3 C I 4 D J 5 E,F K
I then altered the CSV file to add a colon just before the letter G as so:
1,"A",:"G" 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
I get:
C1 C2 C3 1 A :"G" 2 B H 3 C I 4 D J 5 E,F K
This lead me to make the following conclusions about FIELDQUOTE:
- The query as a whole scans each row, character by character looking for either a FIELDQUOTE or a FIELDTERMINATOR. If it finds a FIELDQUOTE first then it ignores any FIELDTERMINATORS until it finds another FIELDQUOTE (I’m basing this on the last row).
- The FIELDQUOTE should have a FIELDTERMINATOR immediately before it in order for the FIELDQUOTE to work as expected (I’m basing this on the first row of the second example).
I could be well off the mark here but can anyone give me some feedback on this?.
When I tried running the query above on this CSV:
1,"A"":",G 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
It returned the following:
C1 C2 C3 1 A": G 2 B H 3 C I 4 D J 5 E,F K
So I tried with this CSV:
Number,Letter 1,"A""":",G 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
However I got the following error:
Error handling external file: 'Unexpected token ':' at [byte: 22]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.
I then tried the CSV:
Number,Letter 1,"A":",G 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
And I got the following:
Error handling external file: 'Unexpected token ':' at [byte: 20]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.
Lastly, I tried this CSV:
Number,Letter 1,"A":",G 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
I got the following error:
Error handling external file: 'Unexpected token ':' at [byte: 20]. Expecting tokens ',', ' ', or '"'. '. File/External table name: 'taxi/raw/Sample.csv'.
The strangest one was when I tried running:
Number,Letter 1,"A""",G 2,"B","H" 3,"C","I" 4,"D","J" 5,"E,F","K"
I got:
C1 C2 C3 1 A"",G2,B" H 3 C I 4 D J 5 E,F K
Advertisement
Answer
“The query as a whole scans each row, character by character looking for either a FIELDQUOTE or a FIELDTERMINATOR. If it finds a FIELDQUOTE first then it ignores any FIELDTERMINATORS until it finds another FIELDQUOTE (I’m basing this on the last row).”
From my comment: I believe your first conclusion is accurate. Specifying the character for string encapsulation or
FIELDQUOTE
in this dialect of csv parser, acts exactly as you describe. It allows row delimiters/terminators, field delimiters/terminators, and even the character used for string encapsulation itself (in most parsers when specifying an escape character) to exist in the string literal for that field.Oddly, I don’t think that
OPENROWSET()
allows for specifying an escape character, so I’m not sure how you would get a double quote character into your field/string-literal if it’s encapsulated. But that’s not part of your question just an observation reading through the documentation. I would be curious whatOPENROWSET()
would do with1,"A"":",G
or1,"A":",G
“The FIELDQUOTE should have a FIELDTERMINATOR immediately before it in order for the FIELDQUOTE to work as expected (I’m basing this on the first row of the second example).”
Yes. I believe this is true as well and is true with any CSV parser I’ve ever used. The string encapsulation character must follow either row or field terminator/delimiter and must be followed by the row or field terminator/delimiter for it to be recognized as an encapsulated string (or not error). I would be curious what
OPENROWSET()
would do with1,"A":,G
.