Skip to content
Advertisement

Row sorting and selection logic in Python on Sqlite db

hello Thanks for taking the time to go through my question. I work in the budget space for a small city and during these precarious time, I am learning some python for maybe in the future helping me with some financial data modelling. We use SAP currently but i also wanted to learn a new language.

I need some pointers on where to look for certain answers. for ex, I made a database with a few million records, sorted by date and time. I was able to strip off the data I did not need and now have a clean database to work on

At a high level, I want to know if based on the first record in a day, is there another entry the same day that is double of the first record.

based on the data above and the requirement, I want to get an output of

On Day 3, There were no values that was at least double so, we have none or null.

What I have done so far

I have been able to connect to database [python] 2. I was able to strip off the unnecessary information and depts from the database [sqlite] 3. I have been able to create new tables for result [Python]

Questions / best Practices

  1. How to get the first line per day. Do I start off with a variable before the loop that is assigned to Jan 1, 2019 and then pick the row number and store it in another table or what other options do we have here.
  2. Once the first row per day is stored/captured in another table or a array, How do I get the first occurrence of a value at least twice of the first line.

ex? begin meta code***********

end meta code*****************

is this the right approach, I feel like going through millions of records for each date change is not very optimized.

I can probably add a of condition to exit if date is not equal to table2.date[1] but am still not sure if this is the right way to approach this problem. This will be run only once or twice a year so system performance is not that important but still am thinking of approaching it the right way.

  1. Should I export the final data to excel for analysis or are thee good analysis modelling tools in Python. What would the professionals recommend?

Advertisement

Answer

You could use exists to check if another record exists on the same day and with a value that is twice greater, and window functions to filter on the top record per day:

In versions of SQLite where row_number() is not available, another option is to filter with a correlated subquery:

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement