Skip to content
Advertisement

Is there a way to update ONLY field.type data onto SQL using dbWriteTable? Without having to pass the whole table’s value

So pretty much I am pulling a whole table from my database using:

So this pulls my data as I need. After that perform some tests and to check the size of each column. I reassign field types based on my test. After that I would send the data back to the database using:

Because I am not manipulating any of the data and I’m sending it back, I was wondering if there was a way to only send field.type data.

I am going to be doing this for ~500GB worth of tables, it would greatly reduce time spent if I didn’t have to keep pushing back the row values every time.

EDIT: After reading the comments I started updating in my loop.

the issue I am facing is in this line of code:

The error I get on both DB and on R saying Incorrect integer value: ” for column ‘id’ at row 1

I think the issue is that when this table was made, all the field types were set as TEXT. Im just having a hard to setting it to int.

Advertisement

Answer

I’m not certain that this will fix all of your woes, but it’s an in-database method (with some help from R) for redefining column types. I’m assuming the data is uploaded always as TEXT regardless of the actual data, so this process will attempt to fix this.

Docker setup

Not required for your setup, but useful if anybody else needs to test this.

Confirm connection in R:

And when demos/tests are complete (after running the rest of this answer’s code),

Fake data

Retrieve table stats

This is where most of the ‘pain’ will be felt: I have not tested this on gargantuan tables, so please provide some feedback. Since it uses a lot of REGEXP, I don’t assume that it will return in milliseconds, but perhaps it’s faster than downloading and loading into R.

From here, we need to use these “rules” to determine what the new column type should be. I’m using dplyr::case_when because I find it much better than nested ifelse or the like. If you’re using dplyr in general, then this will fit right in to your current dialect and toolset; if you’re using data.table, then it should be rather straight-forward to translate this to data.table::fcase.

You will need to review these rules carefully to make sure they capture all of your intent. However, I think the premise of simple rules like this (and the four “meta” columns I stored into tests) is a good start.

Update the columns

Copying the method from https://dba.stackexchange.com/a/198635/156305, I’ll loop over the rows, create a new column as the new type, copy/CAST from the old column, remove the old column, then rename the new to the original name.

I’ll show the column data types before and after the conversion.

And to prove that the data is still present and correct,

Caveats

  • I do not have much experience with the performance of REGEXP, I might be abusing a “heavy tool” here.
  • Copying data from one column to the other may not be cheap, I have not tested with “large-ish data”.
  • I use char(.) and varchar(.), you may prefer to use nchar(.) and nvarchar(.), depending on your data.
  • The reason I create typ2 is because MySQL does not accept cast(.. as varchar(n)) (https://stackoverflow.com/a/59179977/3358272); it also appears to not like cast(.. as tinytext). Ergo the temporary translation during casting.
  • There may be much more MySQL-canonical ways to do many of these steps. I do not profess MySQL guru-status by any stretch.
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement