I’m going to migrate data from PostgreSQL database to Yandex’s ClickHouse.
One of the fields in a source table is of type JSON – called
additional_data. So, PostgreSQL allows me to access json attributes during e.g.
SELECT ... queries with
-> and so on.
I need the same behavior to persist in my resulting table in ClickHouse storage. (i.e. the ability to parse JSON during select queries and/or when using filtering and aggregation clauses)
Here is what I’ve done during
CREATE TABLE ... in ClickHouse client:
create table if not exists analytics.events ( uuid UUID, ..., created_at DateTime, updated_at DateTime, additional_data Nested ( message Nullable(String), eventValue Nullable(String), rating Nullable(String), focalLength Nullable(Float64) ) ) engine = MergeTree ORDER BY (uuid, created_at) PRIMARY KEY uuid;
Is that a good choice how to store JSON-serializable data? Any Ideas?
Maybe It’s better to store a JSON data as a plain
String instead of
Nested and playing with It using special functions?
If the JSON structure is fixed or be changed predictably try to consider the way of denormalizing data:
.. created_at DateTime, updated_at DateTime, additional_data_message Nullable(String), additional_data_eventValue Nullable(String), additional_data_rating Nullable(String), additional_data_focalLength Nullable(Float64) ..
On one hand, it can significantly increase the count of rows and disk space, on another side, it should give a significant increase in performance (especially in the right indexing). Moreover, the disk size can be reduced using LowCardinality-type and Codecs.
- Some others remarks:
avoid to use Nullable types, prefer to use some replacement such as ”, 0, etc (see explanation Clickhouse string field disk usage: null vs empty)
.. ORDER BY (created_at, uuid);
- consider using Aggregating-engines to significantly increase the speed of calculation aggregated values
- In any case before making a final decision need to do manual testing on a data subset (this applies as to choose the schema (json as string/Nested type/denormalized way), as choosing the column codec).