ClickHouse April 2023 Release – Version 23.4

Ilkay Cetindag
Posted on May 4, 2023

Each new release includes new features, improvements, and numerous bug fixes, and the ChistaDATA team is always on top of the latest releases. On 26th April 2023, ClickHouse version 22.4 (April 2023) was released, and this version contains 15 new features, 11 performance optimizations, and 36 bug fixes.

For further details, please see the official ClickHouse change log list here.

Let’s look at some important new features:

SHOW COLUMNS

In the 23.4 release, SHOW COLUMNS is a command used to display the columns of a table or view in the database. It provides information about the columns’ names, types, and properties in a tabular format. The SHOW COLUMNS function is useful for quickly examining the structure of a table or view in ClickHouse and can be used as a reference when working with the database.

When you run the SHOW COLUMNS command in ClickHouse, you will get a result set that contains the following information for each column in the table:

field: The column name (String)
type: The data type of the column (String)
null: if the column datatype is nullable (UInt8)
key: PRI if the column is part of the primary key, SOR if the column is part of the sort key, otherwise empty (String)
default: The default expression of the column if it is of type ALIAS, DEFAULT, or MATERIALIZED; otherwise, NULL. (Nullable(String))
extra: additional information, currently unused (String)
collation: (only if a FULL keyword is specified) Collation of the column, always NULL as ClickHouse does not have per-column collations (Nullable(String))
comment: (only if a FULL keyword is specified) comment on the column (String)
privilege: (only if a FULL keyword was specified) The privilege you have for this column, currently unavailable (String)

Let’s have a look:

SHOW COLUMNS FROM kafka_table;

┌─field─────────────┬─type────┬─null─┬─key─────┬─default─┬─extra─┐
│ city              │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ county            │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ district          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ duration          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ locality          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ newly_built       │ Bool    │    0 │         │ ᴺᵁᴸᴸ    │       │
│ paon              │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ postcode          │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ ppd_category_type │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ price             │ Float32 │    0 │         │ ᴺᵁᴸᴸ    │       │
│ property_type     │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ record_status     │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ saon              │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ street            │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
│ transaction       │ UUID    │    0 │ PRI SOR │ ᴺᵁᴸᴸ    │       │
│ transfer_date     │ String  │    0 │         │ ᴺᵁᴸᴸ    │       │
└───────────────────┴─────────┴──────┴─────────┴─────────┴───────┘

QuantileGK

ClickHouse’s QuantileGK is a probabilistic data structure that provides an efficient way to estimate quantiles of large data sets in real-time. It is based on the Greenwald-Khanna algorithm and is used to approximate quantiles with a configurable accuracy. QuantileGK works by dividing the data set into multiple levels, each level having a different granularity. At each level, it tracks the bounds of the data set within that granularity. This approach allows for a trade-off between accuracy and memory usage, where more accurate estimates require more memory.

In ClickHouse, QuantileGK is used as a built-in aggregate function that can be applied to columns in a SELECT statement. For example, to calculate the median of a column, you can use the quantile function like this.

SELECT quantilesGK(1, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(1, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [1,1,1]                                          │
└──────────────────────────────────────────────────┘

SELECT quantilesGK(10, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(10, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [156,413,659]                                     │
└───────────────────────────────────────────────────┘


SELECT quantilesGK(100, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(100, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [251,498,741]                                      │
└────────────────────────────────────────────────────┘

SELECT quantilesGK(1000, 0.25, 0.5, 0.75)(number + 1)
FROM numbers(1000)

┌─quantilesGK(1000, 0.25, 0.5, 0.75)(plus(number, 1))─┐
│ [249,499,749]                                       │
└─────────────────────────────────────────────────────┘

The QuantileGK feature is particularly useful for performing real-time analysis on large data sets, such as log files or user behavior data, where precise calculations of percentiles or other statistical measures are required, but the data set is too large to process in its entirety.

GRANT CURRENT GRANTS

In ClickHouse, The CURRENT GRANTS statement allows you to grant all of the specified privileges to the specified user or role. If none of the privileges are specified, then the given user or role is granted all the privileges available for CURRENT_USER.

CREATE ROLE accountant;
GRANT  CURRENT GRANTS TO accountant;
SHOW GRANTS FOR accountant FORMAT PrettyJSONEachRow;
{
    "GRANTS FOR accountant FORMAT PrettyJSONEachRow": "GRANT SHOW, SELECT, INSERT, ALTER, CREATE, DROP, UNDROP TABLE, TRUNCATE, OPTIMIZE, BACKUP, KILL QUERY, KILL TRANSACTION, MOVE PARTITION BETWEEN SHARDS, ACCESS MANAGEMENT, SYSTEM, dictGet, INTROSPECTION, SOURCES, CLUSTER ON *.* TO accountant"
}

extractKeyValuePairs

In ClickHouse, extractKeyValuePairs is a built-in function used to parse a string containing key-value pairs and return them as a set of rows. This function is useful for processing and extracting structured data stored in a single string column. The extractKeyValuePairs function takes two arguments: the string to be parsed and a delimiter to separate the key-value pairs. The delimiter is usually a comma (,) or a semicolon (;), but it can be any character that separates the pairs.

select extractKeyValuePairs('name:arda, surname:güler, age:18 team:fenerbahçe, nationality:turkish') as kv

OR

INSERT INTO test_kv VALUES
(1, 'color:red;size:large;material:cotton'),
(2, 'color:blue;size:medium;material:silk'),
(3, 'color:green;size:small;material:wool');

SELECT * FROM test_kv;
┌─id─┬─properties───────────────────────────┐
│  1 │ color:red;size:large;material:cotton │
│  2 │ color:blue;size:medium;material:silk │
│  3 │ color:green;size:small;material:wool │
└────┴──────────────────────────────────────┘

Let’s look at some important improvements:

  • The increased default value for connect_timeout_with_failover_ms to 1000ms, but hedged_connection_timeout_ms = 50. It starts connecting to other replicas after 50 ms but doesn’t drop the connection to the first replica and continues to connect to multiple replicas in parallel.
  • Added support for Iceberg v2. and partitioned and non-partitioned Iceberg, Hudi, and DeltaLake.
  • If we run a mutation with IN (subquery) like this: ALTER TABLE kafka UPDATE col='new value' WHERE id IN (SELECT id FROM huge_table) and the table kafka has multiple parts, then for each part a set for subquery SELECT id FROM huge_table is built-in memory. And if there are many parts, it can use a lot of memory and CPU. The solution is to introduce a short-lived cache of sets currently being built by mutation tasks. If another task with the same mutation is running at the same time, it can look up the set in the cache, wait for it to be built, and reuse it.

Note: If you want to test your scenarios, use the following Docker containers and container packages we have prepared. Here is the link to this installation.

Conclusion

ClickHouse 23.4 brings many exciting new features that further enhance its high-performance data analysis database management system capabilities. ClickHouse continues to evolve and provide users with powerful tools for processing, analyzing, and visualizing data. The performance enhancements and expanded integrations further solidify ClickHouse’s position as the first choice for high-speed data processing, real-time analytics, and advanced data analytics use cases. With its open-source nature and continuous innovation, ClickHouse remains a leading solution for organizations seeking fast, scalable, and feature-rich data analytics capabilities. Upgrade to ClickHouse 23.4 with ChistaDATA support to take advantage of these exciting new features and unlock the full potential of your data analysis workflows.