shopkeeper table has following fields:
id (bigint),amount (numeric(19,2)),createddate (timestamp)
Let’s say, I have the above table. I want to get the records for yesterday and generate a report by having the amount printed to cents.
One way of doing is to perform calculations in my java application and execute a simple query
Date previousDate ;// $1 calculate in application Date todayDate;// $2 calculate in application select amount where createddate between $1 and $2
and then loop through the records and convert amount to cents in my java application and generate the report
Another way is like performing calculations in sql query itself:
select cast(amount * 100 as int) as "Cents" from shopkeeper where createddate between date_trunc('day', now()) - interval '1 day' and date_trunc('day', now())
and then loop through the records and generate the report
In one way , all my processing is done in java application and a simple query is fired. In other case all the conversions and calculations is done in Sql query.
The above use case is just an example, in a real scenario a table can have many columns that require processing of the similar kind.
Can you please tell me which approach is better in terms of performance and other aspects and why?
It depends on a lot of factors – but most crucially:
- complexity of calculations (prefer doing complex crunching on an app-server, since that scales out; rather than a db server, which scales up)
- volume of data (if you need to access/aggregate a lot of data, doing it at the db server will save bandwidth, and disk io if the aggregates can be done inside indexes)
- convenience (sql is not the best language for complex work – especially not great for procedural work, but very good for set-based work; lousy error-handling, though)
As always, if you do bring the data back to the app-server, minimising the columns and rows will be to your advantage. Making sure the query is tuned and appropriately indexed will help either scenario.
Re your note:
and then loop through the records
Looping through records is almost always the wrong thing to do in sql – writing a set-based operation is preferred.
As a general rule, I prefer to keep the database’s job to a minimum “store this data, fetch this data” – however, there are always examples of scenarios where an elegant query at the server can save a lot of bandwidth.
Also consider: if this is computationally expensive, can it be cached somewhere?
If you want an accurate “which is better”; code it both ways and compare it (noting that a first draft of either is likely not 100% tuned). But factor in typical usage to that: if, in reality, it is being called 5 times (separately) at once, then simulate that: don’t compare just a single “1 of these vs 1 of those”.