Is there a way to update ONLY field.type data onto SQL using dbWriteTable? Without having to pass the whole table’s value

Question

So pretty much I am pulling a whole table from my database using: So this pulls my data as I need. After that perform some tests and to check the size of each column. I reassign field types based on my test. After that I would send the data back to the database using: Because I am not manipulating any

Accepted Answer

I’m not certain that this will fix all of your woes, but it’s an in-database method (with some help from R) for redefining column types. I’m assuming the data is uploaded always as TEXT regardless of the actual data, so this process will attempt to fix this.Docker setupNot required for your setup, but useful if anybody else needs to test this.$ docker pull mysql:8.0.26$ docker run -p 3306:3306 --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:8.0.26Confirm connection in R:my <- DBI::dbConnect(odbc::odbc(), Driver="MySQL ODBC 8.0 ANSI Driver", pwd="my-secret-pw", uid="root")DBI::dbGetQuery(my, "SHOW VARIABLES LIKE 'version';")# Variable_name Value# 1 version 8.0.26DBI::dbExecute(my, "create schema stackoverflow")DBI::dbExecute(my, "use stackoverflow")And when demos/tests are complete (after running the rest of this answer’s code),DBI::dbExecute(my, "drop table sometable")DBI::dbExecute(my, "drop schema stackoverflow")Fake dataDBI::dbWriteTable(my, "sometable", data.frame( longstring = c("a very long string here that is going to trigger perhaps a varchar(255)", "234", "3.1415"), somefloat = c("234", "3.1415", "2.718271828"), someint = c("234", "3", "2"), longerstring = c(strrep("A", 260), "A", "A")))Retrieve table statsThis is where most of the ‘pain’ will be felt: I have not tested this on gargantuan tables, so please provide some feedback. Since it uses a lot of REGEXP, I don’t assume that it will return in milliseconds, but perhaps it’s faster than downloading and loading into R.tests <- do.call(rbind, lapply(DBI::dbListFields(my, "sometable"), function(nm) { DBI::dbGetQuery(my, sprintf(" select %s as column_name, count(*) as n, sum(%s regexp '^[-+]?.*[^0-9.].*$') as nonnumber, sum(%s regexp '^[-+]?.*[^0-9].*$') as nonint, max(length(%s)) as maxlen from sometable where %s is not null and %s <> ''", sQuote(nm), nm, nm, nm, nm, nm))}))tests# column_name n nonnumber nonint maxlen# 1 longstring 3 1 2 71# 2 somefloat 3 0 2 11# 3 someint 3 0 0 3# 4 longerstring 3 3 3 260From here, we need to use these “rules” to determine what the new column type should be. I’m using dplyr::case_when because I find it much better than nested ifelse or the like. If you’re using dplyr in general, then this will fit right in to your current dialect and toolset; if you’re using data.table, then it should be rather straight-forward to translate this to data.table::fcase.tests$new_type <- with(tests, dplyr::case_when( n == 0 ~ "varchar(255)", grepl("url", column_name, ignore.case = TRUE) ~ "tinytext", nonint == 0 ~ "tinytext", # For int, we need to make it tiny text, int isnt working on our DB nonnumber == 0 ~ "double", maxlen < 50 ~ "tinytext", between(maxlen, 50, 255) ~ "varchar(255)", TRUE ~ sprintf("varchar(%s)", maxlen)))tests# column_name n nonnumber nonint maxlen new_type# 1 longstring 3 1 2 71 varchar(255)# 2 somefloat 3 0 2 11 double# 3 someint 3 0 0 3 tinytext# 4 longerstring 3 3 3 260 varchar(260)You will need to review these rules carefully to make sure they capture all of your intent. However, I think the premise of simple rules like this (and the four “meta” columns I stored into tests) is a good start.Update the columnsCopying the method from https://dba.stackexchange.com/a/198635/156305, I’ll loop over the rows, create a new column as the new type, copy/CAST from the old column, remove the old column, then rename the new to the original name.I’ll show the column data types before and after the conversion.DBI::dbGetQuery(my, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='sometable'")# COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH# 1 longerstring text 65535# 2 longstring text 65535# 3 somefloat text 65535# 4 someint text 65535for (rownum in seq_len(nrow(tests))) { nm <- tests$column_name[rownum] typ <- tests$new_type[rownum] typ2 <- if (grepl("varchar", typ)) { gsub("var", "", typ) } else if (typ == "tinytext") { "char(50)" } else typ message(sprintf("Converting column '%s' to '%s' ('%s')", nm, typ, typ2)) DBI::dbExecute(my, sprintf("ALTER TABLE sometable ADD COLUMN newcolumn %s", typ)) DBI::dbExecute(my, sprintf("UPDATE sometable SET newcolumn = cast(%s as %s)", nm, typ2)) DBI::dbExecute(my, sprintf("ALTER TABLE sometable DROP COLUMN %s", nm)) DBI::dbExecute(my, sprintf("ALTER TABLE sometable RENAME COLUMN newcolumn to %s", nm))}# Converting column 'longstring' to 'varchar(255)' ('char(255)')# Converting column 'somefloat' to 'double' ('double')# Converting column 'someint' to 'tinytext' ('char(50)')# Converting column 'longerstring' to 'varchar(260)' ('char(260)')DBI::dbGetQuery(my, "select column_name, data_type, character_maximum_length from information_schema.columns where table_name='sometable'")# COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH# 1 longerstring varchar 260# 2 longstring varchar 255# 3 somefloat double # 4 someint tinytext 255And to prove that the data is still present and correct,str(DBI::dbGetQuery(my, "select * from sometable"))# 'data.frame': 3 obs. of 4 variables:# $ longstring : chr "a very long string here that is going to trigger perhaps a varchar(255)" "234" "3.1415"# $ somefloat : num 234 3.14 2.72# $ someint : chr "234" "3" "2"# $ longerstring: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "A" "A"CaveatsI do not have much experience with the performance of REGEXP, I might be abusing a “heavy tool” here.Copying data from one column to the other may not be cheap, I have not tested with “large-ish data”.I use char(.) and varchar(.), you may prefer to use nchar(.) and nvarchar(.), depending on your data.The reason I create typ2 is because MySQL does not accept cast(.. as varchar(n)) (https://stackoverflow.com/a/59179977/3358272); it also appears to not like cast(.. as tinytext). Ergo the temporary translation during casting.There may be much more MySQL-canonical ways to do many of these steps. I do not profess MySQL guru-status by any stretch.

Advertisement

Answer

Docker setup

Fake data

Retrieve table stats

Update the columns

Caveats