I have a problem passing data from R to SQL and then reading it back
The original data is from some excel files and have the following word: Průmyslový
Using latin1
for encoding depreciates the u within the word Prumyslový
Using latin2
for encoding changes the accent of the u Prùmyslový
Which encoding could i use?
I am using an MS SQL 2016 server and the package DBI
and usually the following code where the word is part of the data frame that I am writing to the server.
I am not using UTF-8
because then öffentlicher
becomes öffentlicher
DBI::dbConnect(odbc::odbc(), Driver = "SQL Server", Server = "DBABMZ0006", Database = "EA_DB", encoding = "latin1") DBI::dbWriteTable(con, Tabelle, df_temp, append=TRUE) df_test<-DBI::dbReadTable(con, Tabelle)
Advertisement
Answer
Latin1 Encoding does not support many special characters, especially not the “€” sign, but it is possible to save the “€” sign in columns of VARCHAR type with Latin1 database collations, here the Backgrounds and the solution for your ö
, ů
and other encoding problems:
Our Microsoft SQL Database is set to “Latin1_General_CI_AS” collation, this use “iso_1” character set. In short “iso_1” means “ISO-8859-1” but is “Windows-1252” (CP1252). Mislabeled by Microsoft Details here.
In our Rprofile.site config file of our R installations, we set the encoding with options(encoding = "UTF-8")
for each R Session to “UTF-8” as default.
To check which encoding your R sessions are using, execute getOption("encoding")
command. If you are using Windows and “native.enc” is returned, then I assume that “Windows-1252” encoding used (Encoding of your operating system).
We use VARCHAR Type (= 8-bit codepage) in columns of our tables, with setting encoding = "CP1252"
(Encoding of SQL database) our problems are gone:
DBI = { dbconnection <- DBI::dbConnect( drv = odbc::odbc(), Driver = "ODBC Driver 17 for SQL Server", Server = instance, Database = database, # Encoding of SQL-Server, not latin1(!) encoding = "CP1252", # Encoding of R sessions, Windows R default is "CP1252" (Windows-1252) clientcharset = "UTF-8" ) }
Also you can try to set the clientcharset
property to the encoding of your R sessions.
“ů” not exist in Windows-1252 encoding. If you use NVARCHAR columns (=Unicode) and if you set then encoding
to UTF-16 then you can also save the character “ů”.