ClickHouse: How to store JSON data the right way?

Question

I&#8217;m going to migrate data from PostgreSQL database to Yandex&#8217;s ClickHouse. One of the fields in a source table is of type JSON &#8211; called additional_data. So, PostgreSQL allows me to access json attributes during e.g. SELECT &#8230; queries with ->> and -> and so on. I need the same b…

Accepted Answer

Although ClickHouse uses the fast JSON libraries (such as simdjson and rapidjson) to parsing I think the Nesting-fields should be faster.If the JSON structure is fixed or be changed predictably try to consider the way of denormalizing data:..    created_at DateTime,    updated_at DateTime,    additional_data_message Nullable(String),    additional_data_eventValue Nullable(String),    additional_data_rating Nullable(String),    additional_data_focalLength Nullable(Float64)..On one hand, it can significantly increase the count of rows and disk space, on another side, it should give a significant increase in performance (especially in the right indexing). Moreover, the disk size can be reduced using LowCardinality-type and Codecs.Some others remarks:avoid to use Nullable types, prefer to use some replacement such as &#8221;, 0, etc (see explanation Clickhouse string field disk usage: null vs empty)UUID type doesn&#8217;t give index monotonicity, this one should be much better (More secrets of ClickHouse Query Performance):..ORDER BY (created_at, uuid);consider using Aggregating-engines to significantly increase the speed of calculation aggregated valuesIn any case before making a final decision need to do manual testing on a data subset (this applies as to choose the schema (json as string/Nested type/denormalized way), as choosing the column codec).

Advertisement

Answer