Skip to content
Advertisement

How to deal with semantic version data type in BigQuery

I know that there is no data type in BigQuery. What would you prefer to deal with semantic versions in BigQuery?

I have the following schema:

software_version column is string but the data I store there is in semver format: `MAJOR.MINOR.PATCH-prerelease

I especially want to perform operators < > =.

This returns true, but according to the semver definition it’s false. Because the char - is used for prerelease.

Advertisement

Answer

Below is for BigQuery Standard SQL

You can use compareSemanticVersion UDF to compare two semantic versions
and/or use normaizedSemanticVersion UDF to sort output via ORDER BY clause.
See example below with both (compare and order by) use cases in one

with output

Note: I wrote above UDFs based on how I understood Semantic Versioning after reading reference you provided. There are potentially some edge cases that still needs to be addressed. But definitely should work for simple cases and I hope you will be able to simply adopt those UDFs and adjust output for your particular needs and maybe even to optimize the I ended up using here

One more as FYI: in the normaizedSemanticVersion UDF I am using zzzzzzzzzz just to address some edge-cases. Another option I tried was ..zzzzzzzzzz (note two extra dots) – I think this gives better result for more complex cases – but I was really out of time to complete testing. Please try

For example, in Semantic Versioning page there is an example: 1.0.0-alpha < 1.0.0-alpha.1 < 1.0.0-alpha.beta < 1.0.0-beta < 1.0.0-beta.2 < 1.0.0-beta.11 < 1.0.0-rc.1 < 1.0.0.

To make this same order as in that example – ..zzzzzzzzzz should be use – see below

with output that matches Semantic Versioning specification

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement