Skip to content
Advertisement

Encoding special characters SQL to R and back

I have a problem passing data from R to SQL and then reading it back The original data is from some excel files and have the following word: Průmyslový

Using latin1 for encoding depreciates the u within the word Prumyslový

Using latin2 for encoding changes the accent of the u Prùmyslový

Which encoding could i use? I am using an MS SQL 2016 server and the package DBI and usually the following code where the word is part of the data frame that I am writing to the server.

I am not using UTF-8 because then öffentlicher becomes öffentlicher

DBI::dbConnect(odbc::odbc(),
                      Driver = "SQL Server",
                      Server = "DBABMZ0006", 
                      Database = "EA_DB",
                      encoding = "latin1")

DBI::dbWriteTable(con,
                  Tabelle,
                  df_temp,
                  append=TRUE)

df_test<-DBI::dbReadTable(con,
                          Tabelle)

Advertisement

Answer

Latin1 Encoding does not support many special characters, especially not the “€” sign, but it is possible to save the “€” sign in columns of VARCHAR type with Latin1 database collations, here the Backgrounds and the solution for your ö, ů and other encoding problems:

Our Microsoft SQL Database is set to “Latin1_General_CI_AS” collation, this use “iso_1” character set. In short “iso_1” means “ISO-8859-1” but is “Windows-1252” (CP1252). Mislabeled by Microsoft Details here.

In our Rprofile.site config file of our R installations, we set the encoding with options(encoding = "UTF-8") for each R Session to “UTF-8” as default.

To check which encoding your R sessions are using, execute getOption("encoding") command. If you are using Windows and “native.enc” is returned, then I assume that “Windows-1252” encoding used (Encoding of your operating system).

We use VARCHAR Type (= 8-bit codepage) in columns of our tables, with setting encoding = "CP1252" (Encoding of SQL database) our problems are gone:

 DBI = {
      dbconnection <- DBI::dbConnect(
        drv = odbc::odbc(),
        Driver = "ODBC Driver 17 for SQL Server",
        Server = instance,
        Database = database,
        # Encoding of SQL-Server, not latin1(!)
        encoding = "CP1252", 
        # Encoding of R sessions, Windows R default is "CP1252" (Windows-1252)
        clientcharset = "UTF-8" 
      )
    }

Also you can try to set the clientcharset property to the encoding of your R sessions.

“ů” not exist in Windows-1252 encoding. If you use NVARCHAR columns (=Unicode) and if you set then encoding to UTF-16 then you can also save the character “ů”.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement