Photo Courtesy – Pexels
ClickHouse has MergeTree family of engines and data replication can be achieved through the replicated version of the MergeTree family engines. This replication works on an individual table level. ClickHouse has recently added support for database level replication via the Replicated database engine.
The Replicated database engine is only responsible for replicating the database and table metadata on the cluster. The data replication across the cluster happens only at the table level. That is, ReplicatedMergeTree* table engines are required to replicate the data. Let us look at an example of this Database engine.
Prerequisites:
- 2 node ClickHouse cluster
- Single node zookeeper
The below docker-compose file is used for this article.
version: '3' services: zookeeper: image: 'bitnami/zookeeper:latest' networks: - ch_replicated environment: - ALLOW_ANONYMOUS_LOGIN=yes - ZOOKEEPER_CLIENT_PORT=2181 ports: - "2182:2181" - "2888:2888" - "3888:3888" clickhouse1: image: clickhouse/clickhouse-server ports: - "8002:9000" - "9123:8123" ulimits: nproc: 65535 nofile: soft: 262144 hard: 262144 networks: - ch_replicated depends_on: - zookeeper clickhouse2: image: clickhouse/clickhouse-server ports: - "8003:9000" - "9124:8123" ulimits: nproc: 65535 nofile: soft: 262144 hard: 262144 networks: - ch_replicated depends_on: - zookeeper networks: ch_replicated: driver: bridge
Once the ClickHouse docker containers are up and running make the following changes in the config.xml file.
<remote_servers> <replicated_cluster> <shard> <internal_replication>true</internal_replication> <replica> <host>clickhouse1</host> <port>9000</port> </replica> <replica> <host>clickhouse2</host> <port>9000</port> </replica> </shard> </replicated_cluster> </remote_servers> <zookeeper> <node index="1"> <host>zookeeper</host> <port>2181</port> </node> </zookeeper> <macros> <shard>01</shard> <replica>ch1</replica> </macros>
Steps:
- Enable replicated database engine in users.xml file
<allow_experimental_database_replicated>1</allow_experimental_database_replicated>
- Create the database in both the nodes
CREATE DATABASE testdb ENGINE = Replicated('replicated_db', '{shard}', '{replica}');
- 3. Create a replicated MergeTree table in one of the node
CREATE TABLE testdb.test1 ( col1 Int32, col2 Int32) Engine = ReplicatedMergeTree() ORDER BY col1;
- Insert the data in one node and verify in both the nodes
INSERT INTO testdb.test1 VALUES (1,1); SELECT * FROM testdb.test1;
- Create a MergeTree table in one of the node
CREATE TABLE testdb.test2 ( col1 Int32, col2 Int32) Engine = MergeTree() ORDER BY col1;
- Insert a row and verify
INSERT INTO testdb.test2 VALUES (1,1); SELECT * FROM testdb.test2;
The MergeTree table will be available on the second server but the data will not be replicated automatically.
Conclusion
The new replicated database engine helps us to perform database level replication and will be quite useful in multi-node ClickHouse setup.
References:
https://clickhouse.com/docs/en/engines/database-engines/replicated/