How to optimize querying multiple unrelated tables in SQLite?

I have scenario when I have to iterate through multiple tables in quite big sqlite database. In tables I store informations about planet position on sky through years. So e.g. for Mars I have tables Mars_2000, Mars_2001 and so on. Table structure is always the same:

|id:INTEGER|date:TEXT|longitude:REAL|

​x
 
|id:INTEGER|date:TEXT|longitude:REAL|​

Thing is that for certain task I need to iterate through this tables, which cost much time (for more than 10 queries it’s painful).

I suppose that if I merge all tables with years to one big table performance might be better as one query through one big table is better than 50 through smaller tables. I wanted to make sure that this might work, as database is humongous (around 20Gb), and reshaping it would cost a while.

Is this plan I just described viable? Is there any other solution for such case?

It might be helpfull so I attach function that produces my SQL query that is unique for each table:

pub fn transition_query(
    select_param: &str, // usually asterix
    table_name: &str, // table I'd like to query
    birth_degree: &f64, // constant number
    wanted_degree: &f64, // another constant number
    orb: &f64, // another constant number
    upper_date_limit: DateTime<Utc>, // casts to SQL-like string
    lower_date_limit: DateTime<Utc>, // casts to SQL-like string
) -> String {
    let parsed_upper_date_limit = CelestialBodyPosition::parse_date(upper_date_limit);
    let parsed_lower_date_limit = CelestialBodyPosition::parse_date(lower_date_limit);
    return format!("
    SELECT *,(SECOND_LAG>60 OR SECOND_LAG IS NULL) AS TRANSIT_START, (SECOND_LEAD > 60 OR SECOND_LEAD IS NULL) AS TRANSIT_END, time FROM (
        SELECT 
          *, 
          UNIX_TIME - LAG(UNIX_TIME,1) OVER (ORDER BY time) as SECOND_LAG,
          LEAD(UNIX_TIME,1) OVER (ORDER BY time) - UNIX_TIME as SECOND_LEAD FROM (
            SELECT {select_param}, 
              DATE(time) as day_scoped_date,
              CAST(strftime('%s', time) AS INT) AS UNIX_TIME,
              longitude
              FROM {table_name} 
                  WHERE  ((-{orb} <= abs(realModulo(longitude -{birth_degree} -{wanted_degree},360)) 
                  AND abs(realModulo(longitude -{birth_degree} -{wanted_degree},360)) <= {orb})
                  OR
                  (-{orb} <= abs(realModulo(longitude -{birth_degree} +{wanted_degree},360)) 
                  AND abs(realModulo(longitude -{birth_degree} +{wanted_degree},360)) <= {orb}))
                  AND time < '{parsed_upper_date_limit}' AND time > '{parsed_lower_date_limit}'
            )
        ) WHERE (TRANSIT_START AND NOT TRANSIT_END) OR (TRANSIT_END AND NOT TRANSIT_START) ;
    ");
}

 
pub fn transition_query(    select_param: &str, // usually asterix    table_name: &str, // table I'd like to query    birth_degree: &f64, // constant number    wanted_degree: &f64, // another constant number    orb: &f64, // another constant number    upper_date_limit: DateTime<Utc>, // casts to SQL-like string    lower_date_limit: DateTime<Utc>, // casts to SQL-like string) -> String {    let parsed_upper_date_limit = CelestialBodyPosition::parse_date(upper_date_limit);    let parsed_lower_date_limit = CelestialBodyPosition::parse_date(lower_date_limit);    return format!("    SELECT *,(SECOND_LAG>60 OR SECOND_LAG IS NULL) AS TRANSIT_START, (SECOND_LEAD > 60 OR SECOND_LEAD IS NULL) AS TRANSIT_END, time FROM (        SELECT           *,           UNIX_TIME - LAG(UNIX_TIME,1) OVER (ORDER BY time) as SECOND_LAG,          LEAD(UNIX_TIME,1) OVER (ORDER BY time) - UNIX_TIME as SECOND_LEAD FROM (            SELECT {select_param},               DATE(time) as day_scoped_date,              CAST(strftime('%s', time) AS INT) AS UNIX_TIME,              longitude              FROM {table_name}                   WHERE  ((-{orb} <= abs(realModulo(longitude -{birth_degree} -{wanted_degree},360))                   AND abs(realModulo(longitude -{birth_degree} -{wanted_degree},360)) <= {orb})                  OR                  (-{orb} <= abs(realModulo(longitude -{birth_degree} +{wanted_degree},360))                   AND abs(realModulo(longitude -{birth_degree} +{wanted_degree},360)) <= {orb}))                  AND time < '{parsed_upper_date_limit}' AND time > '{parsed_lower_date_limit}'            )        ) WHERE (TRANSIT_START AND NOT TRANSIT_END) OR (TRANSIT_END AND NOT TRANSIT_START) ;    ");}​

Answer

I solved the issue programmatically. Whole thing was done with Rust and r2d2_sqlite library. I’m still doing a lot of queries, but now it’s done in threads. It allowed me to reduce execution time from 25s to around 3s. Here’s the code:

use std::sync::mpsc;
use std::thread;
use r2d2_sqlite::SqliteConnectionManager;
use r2d2;

    let manager = SqliteConnectionManager::file("db_path");
    let pool = r2d2::Pool::builder().build(manager).unwrap();
    let mut result: Vec<CelestialBodyPosition> = vec![]; // Vector of structs
    let (tx, rx) = mpsc::channel(); // Allows ansynchronous communication
    let mut children = vec![]; //vector of join handlers (not sure if needed at all

    for query in queries {
        let pool = pool.clone(); // For each loop I clone connection to databse
        let inner_tx = tx.clone(); // and messager, as each thread should have spearated one.
        children.push(thread::spawn(move || {
            let conn = pool.get().unwrap();
            add_real_modulo_function(&conn); // this adds custom sqlite function I needed
            let mut sql =  conn.prepare(&query).unwrap();
            // this does query, and maps result to my internal type
            let positions: Vec<CelestialBodyPosition> = sql 
                .query_map(params![], |row| {
                    Ok(CelestialBodyPosition::new(row.get(1)?, row.get(2)?))
                })
                .unwrap()
                .map(|position| position.unwrap())
                .collect();
                // this sends partial result to receiver
                return inner_tx.send(positions).unwrap();
        }));
    }
    // first messenger has to be dropped, otherwise program will wait for its input
    drop(tx);
    for received in rx {
        result.extend(received); // combine all results
    }
    return result;

 
use std::sync::mpsc;use std::thread;use r2d2_sqlite::SqliteConnectionManager;use r2d2;​    let manager = SqliteConnectionManager::file("db_path");    let pool = r2d2::Pool::builder().build(manager).unwrap();    let mut result: Vec<CelestialBodyPosition> = vec![]; // Vector of structs    let (tx, rx) = mpsc::channel(); // Allows ansynchronous communication    let mut children = vec![]; //vector of join handlers (not sure if needed at all​    for query in queries {        let pool = pool.clone(); // For each loop I clone connection to databse        let inner_tx = tx.clone(); // and messager, as each thread should have spearated one.        children.push(thread::spawn(move || {            let conn = pool.get().unwrap();            add_real_modulo_function(&conn); // this adds custom sqlite function I needed            let mut sql =  conn.prepare(&query).unwrap();            // this does query, and maps result to my internal type            let positions: Vec<CelestialBodyPosition> = sql                 .query_map(params![], |row| {                    Ok(CelestialBodyPosition::new(row.get(1)?, row.get(2)?))                })                .unwrap()                .map(|position| position.unwrap())                .collect();                // this sends partial result to receiver                return inner_tx.send(positions).unwrap();        }));    }    // first messenger has to be dropped, otherwise program will wait for its input    drop(tx);    for received in rx {        result.extend(received); // combine all results    }    return result;​

As you can see no optimization happened from sqlite site, which kinda makes me feel I’m doing something wrong, but for now it’s alright. It might be good to press some more control over amount of spawned threads.

Advertisement

Answer