Tag: bigdata

Unnest Query optimisation for singular record

bigdata google-bigquery google-cloud-platform sql

I’m trying to optimise my query for when an internal customer only want to return one result *(and it’s associated nested dataset). My aim is to reduce the query process size. However, it appears to be the exact same value regardless of whether I’m querying for 1 record (with unnested 48,000…

SQL server Change Column datatype on 1K million records

bigdata database sql sql-server

I am facing the column length issue in my table and I want to change column type to big int from int and table rows around 1K million records but when ever I tried to change data type it is taking to much time and it is eating my machine all space, what is best way and fast way to

Google BigQuery – Subtract SUMs of a column basing on values in another column

bigdata google-bigquery sql

Hi I need 1 query to get top 10 country which has largest [total(import) – total(export)] for goods_type medicines between 2019 – 2020. The data sample is as below: The returned data should include country, goods_type, and the value of [total(imports) – total(export)]. I have come up with th…

Is there a way to filter rows in BigQuery by the contents of an array?

bigdata business-intelligence google-bigquery sql

I have data in a BigQuery table that looks like this: My question is, how can I find all rows where “key” = “a”, “value” = 1, but also “key” = “b” and “value” = 3? I’ve tried various forms of using UNNEST but I haven’t been ab…

Process several billion records from Redshift using custom logic

amazon-redshift apache-spark bigdata google-cloud-dataflow sql

I want to apply custom logic over dataset placed in Redshift. Example of input data: userid, event, fileid, timestamp, …. 100000, start, 120, 2018-09-17 19:11:40 100000, done, 120, 2018-…

Storing a huge amount of points(x,y,z) in a relational database

bigdata data-structures rdbms sql sqlite

I need to store a very simple data structure on disk – the Point. It’s fields are just: Moment – 64-bit integer, representing a time with high precision. EventType – 32-bit integer, reference to another object. Value – 64-bit floating point number. Requirements: The pair of (Mome…

How to create a large pandas dataframe from an sql query without running out of memory?

bigdata pandas python sql

I have trouble querying a table of > 5 million records from MS SQL Server database. I want to select all of the records, but my code seems to fail when selecting to much data into memory. This works: …but this does not work: It returns this error: I have read here that a similar problem exists when c…

Best way to delete millions of rows by ID

bigdata postgresql postgresql-performance sql sql-delete

I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days. I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted.