How regular expressions are implemented in ClickHouse?

Shiv Iyer
Posted on January 22, 2023

How regular expressions are implemented in ClickHouse?


In ClickHouse, regular expressions are implemented using the match function and the like operator.

  1. The match function is used to check if a string matches a specified regular expression. The syntax for the function is as follows: match(string, pattern). The string argument is the string that you want to check, and the pattern argument is the regular expression that you want to match against. The function returns 1 if the string matches the pattern, and 0 if it does not.
  2. The like operator is used to check if a string matches a specified wildcard pattern. The syntax for the operator is as follows: string like pattern. The string argument is the string that you want to check, and the pattern argument is the wildcard pattern that you want to match against. The operator returns 1 if the string matches the pattern, and 0 if it does not.

ClickHouse supports a broad set of regular expressions features, including:

  • Character classes
  • Alternation
  • Anchors
  • Quantifiers
  • Backreferences
  • Lookarounds
  • Named groups

You can use these features to match and extract specific information from a dataset.

Here’s an example of how to use the match function and the like operator in ClickHouse:

-- Check if a string matches a regular expression
SELECT match('abcde', 'abc(de|d)'); -- returns 1

-- Check if a string matches a wildcard pattern
SELECT 'abcde' like 'a%e'; -- returns 1

It’s important to note that the performance of the match function and the like operator can be impacted by the complexity of the regular expression and the size of the dataset. It’s recommended that you test the performance of your regular expressions on a representative sample of your data before using them in a production environment.