Skip to content
Advertisement

Store hourly data efficient way

There is a requirement to store hourly data in SQL Server 2016 and retrieve. It’s an OLTP database.

I will explain with an example: we need to capture temperature of each city of a country and store on hourly basis. What would be the best and efficient design to do this. The data would be stored for a year and then archived

This is my plan. Can some one review and let me know if this approach is fine?

CREATE TABLE [dbo].[CityMaster]
(
    [CityId] [int] NULL,
    [CityName] [varchar](300) NULL
) ON [PRIMARY]

--OPTION 1
CREATE TABLE [dbo].[WeatherData]
(
    [Id] [bigint] NULL,
    [CityId] [int] NULL,
    [HrlyTemp] [decimal](18, 1) NULL,
    [CapturedTIme] [datetime] NULL
) ON [PRIMARY]
GO

--OPTION2
CREATE TABLE [dbo].[WeatherData_JSon] 
(
    [Id] [bigint] NULL,
    [CityId] [int] NULL,
    [Month] [varchar](50) NULL,
    [Hrlytemp] [nvarchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

Advertisement

Answer

To some extent, this depends on how the data is going to be used. The most natural solution is to tweak the first option and use a partitioned table:

CREATE TABLE [dbo].CityHourlyTemperatures (
    Id bigint NULL,
    CityId int NULL,
    HrlyTemp decimal(6, 1) NULL,
    CapturedTIme datetime NULL
) ;

Note I changed the name to something that seems to better capture the name.

Even with global warming, I think that 4 or 5 digits of precision in the temperature is quite sufficient — 18 is way overkill.

Each row here has 8 + 4 + 5 + 8 bytes = 25 bytes (it may be rounded up if there are alignment restrictions). A year has about 8,766 hours. So, if you have 100 cities, this is less than a million rows per year and just a few tens of megabytes per year.

That is quite feasible, but you might want to consider partitioning the table — the older partitions can act like an “archive”.

Your second option stores the temperatures as a blob. This would make sense under only one circumstance: you don’t care about the temperatures but you need to return the data to an application that does.

The implication from the name is that you want to store the value as JSON. This usually requires more overhead than storing the data using native types — and is often less efficient. JSON is very powerful and useful for some types of data, particularly sparse data. However, your data is quite relational and can be stored in a relational format.

If you wanted to save space, you could consider the following:

  • Replacing the datetime value with an hourId column. This could possibly be a shortint if you only want a few years of data.
  • Removing the id column and defining the cityid/hourid as the primary key.

However, your volume of data does not seem to suggest that such an approach is necessary.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement