Explanation
I have a table which does not have a primary key (or not even a composite key).
The table is for storing the time slots (opening hours and food delivery available hours) of the food shops. Let’s call the table “business_hours” and the main fields are as below.
- shop_id
- day (0 – 6, means Sunday – Saturday)
- type (open, delivery)
- start_time
- end_time
As an example, if shop A is opened on Monday from 9.00am – 01.00pm and 05.00pm to 10.00pm, there will be two records in business_hours table for this scenario.
----------------------------------------------- | shop_id | day | type | start_time | end_time ----------------------------------------------- | 1000 | 1 | open | 09:00:00 | 13:00:00 ----------------------------------------------- | 1000 | 1 | open | 17:00:00 | 22:00:00 -----------------------------------------------
When I query this table, I will use shop_id always as the first condition in where clause.
Ex:
SELECT COUNT(*) FROM business_hours WHERE shop_id = 1000 AND day = 1 AND type = 'open' AND start_time <= '13.29.00' AND end_time > '13.29.00';
Question
Applying index for “shop_id” is enough or “day” & “type” fields also should be indexed?
Also better if you can explain how the indexing really works.
Advertisement
Answer
It depends on several factors that you should specify:
- How fast will the data grow
- What is the estimated table size in rows
- What queries will be run against that table
- How fast do you expect the queries to run
It is more about thinking like: Some service will make thousands of inserts of new records per hour, the old records will be archived nightly and reports are to be created nightly from that table. In such a case you may prefer to not to create many indexes since they slow down inserts.
On the other hand if your table will grow and change slowly and many users will run queries against it, you need to have proper indexes to speed up queries.
If you can, try to create clustered unique primary key that most queries can benefit from. If you have data that form some timeline and most queries will get ranges of data using the datetime criteria (like from – to), it is better to include datetime in clustered index – you will get fastest query performance.
So something like this will grant you best performance for the mentioned select. (But you cannot store duplicate business hours for one shop and type)
CREATE TABLE Business_hours ( shop_id INT NOT NULL , day INT NOT NULL --- other columns , CONSTRAINT Business_hours_PK PRIMARY KEY (shop_id, day, type, start_time, end_time) -- your clustered index )
Just creating an index on fields used in the SELECT (all of them or just some of them most used), will speed up your query too:
CREATE INDEX BusinessHours_IX ON business_hours (shop_id,day,type, start_time, end_time);
Difference between clustered and non-clustered is that clustered index affects order in which are db records stored on disk.
You can use EXPLAIN to find missing indexes in your database, see this answer.
For more detail this blog.