Skip to content
Advertisement

Is there a way to update ONLY field.type data onto SQL using dbWriteTable? Without having to pass the whole table’s value

So pretty much I am pulling a whole table from my database using:

ss_data<-dbConnect(MySQL(), user = ,password=, host= ,dbname= )
Combined_Query<-gsub("[rn]"," ",paste0("select * FROM ",table_name))
Q_temp<-dbSendQuery(ss_data,Combined_Query)
temp<-fetch(Q_temp, n=-1)
report_pull<-rbind(report_pull,temp) 
dbDisconnect(ss_data)

So this pulls my data as I need. After that perform some tests and to check the size of each column. I reassign field types based on my test. After that I would send the data back to the database using:

fieldtype<- setNames(newcoltypes, colnames(report_pull))
ss_data<-dbConnect(MySQL(), user =  ,password= , host= ,dbname= )
dbWriteTable(ss_data, value= report_pull, name="datatest1" ,overwrite=TRUE,row.names=FALSE, field.types = fieldtype)
dbDisconnect(ss_data)

Because I am not manipulating any of the data and I’m sending it back, I was wondering if there was a way to only send field.type data.

I am going to be doing this for ~500GB worth of tables, it would greatly reduce time spent if I didn’t have to keep pushing back the row values every time.

EDIT: After reading the comments I started updating in my loop.

for (x in 1:ncol(report_pull)){
  ss_data<-dbConnect(MySQL(), user = ,password=, host=,dbname=)
  a <- unique(report_pull[x])
  a[a==""] <- NA
  a <- na.omit(a)
  if (length(a[,]) != 0){
    if ((grepl("^[0-9.]+$", a[,] )) == TRUE){
      if ((grepl("^[0-9]+$", a[,] )) == TRUE){
        if (typeof(report_pull[,x]) == "double"){
          dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` double;"))
          listofcol[x] <- "double"}
        else { # For int, we need to make it tiny text, int isnt working on our DB
          dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))
          listofcol[x] <- "int - tinytext"}
      }
      else {
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` double;"))
        listofcol[x] <- "double"
      }
    }
    else {
      if (between(max(nchar(a[,])), 50, 255)) {
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` varchar(255);"))
        listofcol[x] <- paste0("varchar(255)")}
      else if (max(nchar(a[,])) > 255) { 
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))
        listofcol[x] <- paste0("tinytext") }
      else {
        dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` varchar(", max(nchar(a[,])),");"))
        listofcol[x] <- paste0("varchar(", max(nchar(a[,])),")")}
    }
    
    if (grepl("url|URL", (colnames(report_pull[x] ))) == TRUE){
      dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))
      listofcol[x] <- "tinytext"
    }
  }
  else { 
    print(paste0("Element ",x[1], " has 0 entries."))
    dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` varchar(255);"))
    listofcol[x] <- "varchar(255)"
  }
  dbDisconnect(ss_data)
}

the issue I am facing is in this line of code:

dbExecute(ss_data,paste0("ALTER TABLE ",table," MODIFY COLUMN `", colnames(report_pull[x]) ,"` tinytext;"))

The error I get on both DB and on R saying Incorrect integer value: ” for column ‘id’ at row 1

I think the issue is that when this table was made, all the field types were set as TEXT. Im just having a hard to setting it to int.

Advertisement

Answer

I’m not certain that this will fix all of your woes, but it’s an in-database method (with some help from R) for redefining column types. I’m assuming the data is uploaded always as TEXT regardless of the actual data, so this process will attempt to fix this.

Docker setup

Not required for your setup, but useful if anybody else needs to test this.

$ docker pull mysql:8.0.26
$ docker run -p 3306:3306 --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:8.0.26

Confirm connection in R:

my <- DBI::dbConnect(odbc::odbc(), Driver="MySQL ODBC 8.0 ANSI Driver", pwd="my-secret-pw", uid="root")
DBI::dbGetQuery(my, "SHOW VARIABLES LIKE 'version';")
#   Variable_name  Value
# 1       version 8.0.26
DBI::dbExecute(my, "create schema stackoverflow")
DBI::dbExecute(my, "use stackoverflow")

And when demos/tests are complete (after running the rest of this answer’s code),

DBI::dbExecute(my, "drop table sometable")
DBI::dbExecute(my, "drop schema stackoverflow")

Fake data

DBI::dbWriteTable(my, "sometable", data.frame(
  longstring = c("a very long string here that is going to trigger perhaps a varchar(255)",
                 "234", "3.1415"),
  somefloat = c("234", "3.1415", "2.718271828"),
  someint = c("234", "3", "2"),
  longerstring = c(strrep("A", 260), "A", "A")
))

Retrieve table stats

This is where most of the ‘pain’ will be felt: I have not tested this on gargantuan tables, so please provide some feedback. Since it uses a lot of REGEXP, I don’t assume that it will return in milliseconds, but perhaps it’s faster than downloading and loading into R.

tests <- do.call(rbind, lapply(DBI::dbListFields(my, "sometable"), function(nm) {
  DBI::dbGetQuery(my, sprintf("
    select 
      %s as column_name,
      count(*) as n,
      sum(%s regexp '^[-+]?.*[^0-9.].*$') as nonnumber,
      sum(%s regexp '^[-+]?.*[^0-9].*$') as nonint,
      max(length(%s)) as maxlen
    from sometable where %s is not null and %s <> ''", sQuote(nm), nm, nm, nm, nm, nm))
}))
tests
#    column_name n nonnumber nonint maxlen
# 1   longstring 3         1      2     71
# 2    somefloat 3         0      2     11
# 3      someint 3         0      0      3
# 4 longerstring 3         3      3    260

From here, we need to use these “rules” to determine what the new column type should be. I’m using dplyr::case_when because I find it much better than nested ifelse or the like. If you’re using dplyr in general, then this will fit right in to your current dialect and toolset; if you’re using data.table, then it should be rather straight-forward to translate this to data.table::fcase.

tests$new_type <- with(tests, dplyr::case_when(
  n == 0                   ~ "varchar(255)",
  grepl("url", column_name, ignore.case = TRUE) ~ "tinytext",
  nonint == 0              ~ "tinytext", # For int, we need to make it tiny text, int isnt working on our DB
  nonnumber == 0           ~ "double",
  maxlen < 50              ~ "tinytext",
  between(maxlen, 50, 255) ~ "varchar(255)",
  TRUE                     ~ sprintf("varchar(%s)", maxlen)))
tests
#    column_name n nonnumber nonint maxlen     new_type
# 1   longstring 3         1      2     71 varchar(255)
# 2    somefloat 3         0      2     11       double
# 3      someint 3         0      0      3     tinytext
# 4 longerstring 3         3      3    260 varchar(260)

You will need to review these rules carefully to make sure they capture all of your intent. However, I think the premise of simple rules like this (and the four “meta” columns I stored into tests) is a good start.

Update the columns

Copying the method from https://dba.stackexchange.com/a/198635/156305, I’ll loop over the rows, create a new column as the new type, copy/CAST from the old column, remove the old column, then rename the new to the original name.

I’ll show the column data types before and after the conversion.

DBI::dbGetQuery(my, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='sometable'")
#    COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH
# 1 longerstring      text                    65535
# 2   longstring      text                    65535
# 3    somefloat      text                    65535
# 4      someint      text                    65535

for (rownum in seq_len(nrow(tests))) {
  nm <- tests$column_name[rownum]
  typ <- tests$new_type[rownum]
  typ2 <- 
    if (grepl("varchar", typ)) {
      gsub("var", "", typ)
    } else if (typ == "tinytext") {
      "char(50)"
    } else typ
  message(sprintf("Converting column '%s' to '%s' ('%s')", nm, typ, typ2))
  DBI::dbExecute(my, sprintf("ALTER TABLE sometable ADD COLUMN newcolumn %s", typ))
  DBI::dbExecute(my, sprintf("UPDATE sometable SET newcolumn = cast(%s as %s)", nm, typ2))
  DBI::dbExecute(my, sprintf("ALTER TABLE sometable DROP COLUMN %s", nm))
  DBI::dbExecute(my, sprintf("ALTER TABLE sometable RENAME COLUMN newcolumn to %s", nm))
}
# Converting column 'longstring' to 'varchar(255)' ('char(255)')
# Converting column 'somefloat' to 'double' ('double')
# Converting column 'someint' to 'tinytext' ('char(50)')
# Converting column 'longerstring' to 'varchar(260)' ('char(260)')

DBI::dbGetQuery(my, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='sometable'")
#    COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH
# 1 longerstring   varchar                      260
# 2   longstring   varchar                      255
# 3    somefloat    double                     <NA>
# 4      someint  tinytext                      255

And to prove that the data is still present and correct,

str(DBI::dbGetQuery(my, "select * from sometable"))
# 'data.frame': 3 obs. of  4 variables:
#  $ longstring  : chr  "a very long string here that is going to trigger perhaps a varchar(255)" "234" "3.1415"
#  $ somefloat   : num  234 3.14 2.72
#  $ someint     : chr  "234" "3" "2"
#  $ longerstring: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "A" "A"

Caveats

  • I do not have much experience with the performance of REGEXP, I might be abusing a “heavy tool” here.
  • Copying data from one column to the other may not be cheap, I have not tested with “large-ish data”.
  • I use char(.) and varchar(.), you may prefer to use nchar(.) and nvarchar(.), depending on your data.
  • The reason I create typ2 is because MySQL does not accept cast(.. as varchar(n)) (https://stackoverflow.com/a/59179977/3358272); it also appears to not like cast(.. as tinytext). Ergo the temporary translation during casting.
  • There may be much more MySQL-canonical ways to do many of these steps. I do not profess MySQL guru-status by any stretch.
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement