I have a problem passing data from R to SQL and then reading it back
The original data is from some excel files and have the following word: Průmyslový
Using latin1
for encoding depreciates the u within the word Prumyslový
Using latin2
for encoding changes the accent of the u Prùmyslový
Which encoding could i use?
I am using an MS SQL 2016 server and the package DBI
and usually the following code where the word is part of the data frame that I am writing to the server.
I am not using UTF-8
because then öffentlicher
becomes öffentlicher
DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "DBABMZ0006",
Database = "EA_DB",
encoding = "latin1")
DBI::dbWriteTable(con,
Tabelle,
df_temp,
append=TRUE)
df_test<-DBI::dbReadTable(con,
Tabelle)
Advertisement
Answer
Latin1 Encoding does not support many special characters, especially not the “€” sign, but it is possible to save the “€” sign in columns of VARCHAR type with Latin1 database collations, here the Backgrounds and the solution for your ö
, ů
and other encoding problems:
Our Microsoft SQL Database is set to “Latin1_General_CI_AS” collation, this use “iso_1” character set. In short “iso_1” means “ISO-8859-1” but is “Windows-1252” (CP1252). Mislabeled by Microsoft Details here.
In our Rprofile.site config file of our R installations, we set the encoding with options(encoding = "UTF-8")
for each R Session to “UTF-8” as default.
To check which encoding your R sessions are using, execute getOption("encoding")
command. If you are using Windows and “native.enc” is returned, then I assume that “Windows-1252” encoding used (Encoding of your operating system).
We use VARCHAR Type (= 8-bit codepage) in columns of our tables, with setting encoding = "CP1252"
(Encoding of SQL database) our problems are gone:
DBI = {
dbconnection <- DBI::dbConnect(
drv = odbc::odbc(),
Driver = "ODBC Driver 17 for SQL Server",
Server = instance,
Database = database,
# Encoding of SQL-Server, not latin1(!)
encoding = "CP1252",
# Encoding of R sessions, Windows R default is "CP1252" (Windows-1252)
clientcharset = "UTF-8"
)
}
Also you can try to set the clientcharset
property to the encoding of your R sessions.
“ů” not exist in Windows-1252 encoding. If you use NVARCHAR columns (=Unicode) and if you set then encoding
to UTF-16 then you can also save the character “ů”.