I am reading about sharding and I understood it upto some context. But most of the material I read says that sharding (horizontally scaling) RDBMS is a challenging task. But I don’t see why NO-SQL is easy to shard and RDBMS would be tough to shard?
My understanding is: some NO-SQL provides inbuilt sharding support which makes it easy to shard. But if the NO-SQL does not provide inbuilt sharding support, then sharding overhead in SQL/NO-SQL is same thing as it has to be implemented in application layer.
Is my understanding correct or did I miss anything?
Advertisement
Answer
I don’t think sharding is particularly “harder” in a SQL versus a NO-SQL database from the user perspective. After all, the complicated stuff is all done “under the hood”, so the interface for users is pretty similar.
Sharding means that rows of a given table are stored separately — often in local storage on different nodes. The issue is keeping them up-to-date.
One key difference is that SQL enforces ACID properties on the data, in particular “consistency”. This means that queries see the database only after transactions have been completed entirely or not at all.
NO-SQL databases typically implement eventual consistency. That is, a given transaction may take some time (typically measured in seconds up to a minute) before the transaction completes across all shards.
Consider the situation where a query is deleting one row in each shard. A SQL database will either see all rows deleted or none. A NO-SQL database might return intermediate results.
The advantage of NO-SQL is that large databases are often append-only and transactions only affect one shard — so eventual consistency is quite good-enough.
The advantage of SQL databases is that consistency is guaranteed (well, in some databases you can fiddle with settings to weaken it). However, there is a higher cost of waiting for all shards to agree that a transaction has completed.
I will note that in some situations SQL databases have a tremendous application advantage — because the applications do not need to deal with potentially inconsistent data.