Skip to content
Advertisement

CSVJDBC – Interpreting Strings Instead of Integers in Aggregate Functions

I am using the CSVJDBC driver for retrieving results from CSV files. All record fields are interpreted as Strings. How can I utilise the MAX aggregate function in order to obtain the maximim Integer of a column? As far as I know, csvjdbc does not support casting.

Consider this sample file:

sequenceNumber,decimalNumber,randomInteger,email,testNumber
0,0.4868176550817932,560801,cleta.stroman@gmail.com,0.0
1,0.9889360969432277,903488,chelsie.roob@hotmail.com,1.0
2,0.8161798688893893,367870,hardy.waelchi@yahoo.com,2.0
3,0.926163166852633,588581,rafaela.white@hotmail.com,3.0
4,0.05084859872223901,563000,belle.hagenes@gmail.com,4.0
5,0.7636864392027013,375299,joey.beier@gmail.com,5.0
6,0.31433980690632457,544036,cornell.will@gmail.com,6.0
7,0.4061012200967966,41792,catalina.kemmer@gmail.com,7.0
8,0.3541002754332119,196272,raoul.bogisich@yahoo.com,8.0
9,0.4189826302561652,798405,clay.roberts@yahoo.com,9.0
10,0.9076084714059381,135783,angel.white@yahoo.com,10.0
11,0.565716974613909,865847,marlin.hoppe@gmail.com,11.0
12,0.9484076609924861,224744,anjali.stanton@gmail.com,12.0
13,0.05223710002804138,977787,harley.morar@hotmail.com,13.0
15,0.6270851001160621,469901,eldora.schmeler@yahoo.com,14.0

I use the following code snippet:

import org.relique.jdbc.csv.CsvDriver;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;


public class CSVDemo
{
    public static void main(String[] args)
    {
    try
    {
        // Load the driver.
        Class.forName("org.relique.jdbc.csv.CsvDriver");

        // Create a connection. The first command line parameter is
        // the directory containing the .csv files.
        // A single connection is thread-safe for use by several threads.

        String CSVDIRECTORY = "/tmp/csv-directory/";
        String CSVDB ="mediumList";
        Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + CSVDIRECTORY);

        // Create a Statement object to execute the query with.
        // A Statement is not thread-safe.
        Statement stmt = conn.createStatement();

        ResultSet results = stmt.executeQuery("SELECT MAX(decimalNumber) FROM "+CSVDB);

        // Dump out the results to a CSV file with the same format
        // using CsvJdbc helper function
        boolean append = true;
        CsvDriver.writeToCsv(results, System.out, append);

        // Clean up
        conn.close();
    }
    catch(Exception e)
    {
        e.printStackTrace();
    }
    }
}

When I execute the query

I get as expected:

MAX([DECIMALNUMBER])
0.9889360969432277

But when I want the maximum sequenceNumber, which is 19 with this

ResultSet results = stmt.executeQuery("SELECT MAX(sequenceNumber)   FROM  "+CSVDB);

I get 9 as a result:

MAX([SEQUENCENUMBER])
9

It works for the decimalNumber and it also works for the text. It does not work for testNumber, as csvjdbs returns the lexicographic largest value instead of the Integer value. Is there a possibility to solve this directly or to I need to fetch all records and select the maximum value with Java?

Basic Solution:

This is my basic solution, which needs to fetch all the numbers first:

        ResultSet results = stmt.executeQuery("SELECT sequenceNumber FROM "+CSVDB);
        int max=-1;

        while(results.next()){
            String sum = results.getString(1);

            int currentSeq = Integer.parseInt(sum);
            System.out.println("current_ "+sum);
            if(currentSeq>max){
                max=currentSeq;
            }

Is there a more elegant way?

Solution based on Joop Eggen

public int getMaxSequenceAggregate() {
       int max = 0;
       try {
           Properties props = new Properties();
           Connection connection;

           props.put("columnTypes", "Int,Double,Int,String,Int");
           connection = DriverManager.getConnection("jdbc:relique:csv:" + this.directoryPath, props);
           PreparedStatement statement = null;
           ResultSet result;
           statement = connection.prepareStatement("SELECT MAX(sequenceNumber) FROM " + this.filePath);
           result = statement.executeQuery();

           while (result.next()) {
               max = result.getInt(1);
               LOGGER.info("maximum sequence: " + max);

           }

           connection.close();
       } catch (SQLException e) {
           e.printStackTrace();
       }

       return max;
   }

Advertisement

Answer

You should better specify the column types as it seems the first column is taken as String, where "9" > "10".

Properties props = new Properties();
props.put("columnTypes", "Integer,Double,Integer,String,Integer");
Connection conn = DriverManager.getConnection("jdbc:relique:csv:" + CSVDIRECTORY, props);
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement