Skip to content
Advertisement

Encoding special characters SQL to R and back

I have a problem passing data from R to SQL and then reading it back The original data is from some excel files and have the following word: Průmyslový

Using latin1 for encoding depreciates the u within the word Prumyslový

Using latin2 for encoding changes the accent of the u Prùmyslový

Which encoding could i use? I am using an MS SQL 2016 server and the package DBI and usually the following code where the word is part of the data frame that I am writing to the server.

I am not using UTF-8 because then öffentlicher becomes öffentlicher

Advertisement

Answer

Latin1 Encoding does not support many special characters, especially not the “€” sign, but it is possible to save the “€” sign in columns of VARCHAR type with Latin1 database collations, here the Backgrounds and the solution for your ö, ů and other encoding problems:

Our Microsoft SQL Database is set to “Latin1_General_CI_AS” collation, this use “iso_1” character set. In short “iso_1” means “ISO-8859-1” but is “Windows-1252” (CP1252). Mislabeled by Microsoft Details here.

In our Rprofile.site config file of our R installations, we set the encoding with options(encoding = "UTF-8") for each R Session to “UTF-8” as default.

To check which encoding your R sessions are using, execute getOption("encoding") command. If you are using Windows and “native.enc” is returned, then I assume that “Windows-1252” encoding used (Encoding of your operating system).

We use VARCHAR Type (= 8-bit codepage) in columns of our tables, with setting encoding = "CP1252" (Encoding of SQL database) our problems are gone:

Also you can try to set the clientcharset property to the encoding of your R sessions.

“ů” not exist in Windows-1252 encoding. If you use NVARCHAR columns (=Unicode) and if you set then encoding to UTF-16 then you can also save the character “ů”.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement