diff --git a/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper.md b/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper.md index c35a6037c2..9e0961d331 100644 --- a/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper.md @@ -2,214 +2,267 @@ title: "Using clickhouse-keeper" linkTitle: "Using clickhouse-keeper" description: > - Moving to the ClickHouse® alternative to Zookeeper -keywords: + Current guidance for running ClickHouse Keeper as the ZooKeeper-compatible coordination service for ClickHouse +keywords: - clickhouse keeper - clickhouse-keeper + - zookeeper --- -Since 2021 the development of built-in ClickHouse® alternative for Zookeeper is happening, whose goal is to address several design pitfalls, and get rid of extra dependency. +ClickHouse Keeper is the ZooKeeper-compatible coordination service used by ClickHouse for replicated tables and `ON CLUSTER` DDL. For new self-managed deployments it is the default recommendation instead of Apache ZooKeeper. -See slides: https://presentations.clickhouse.com/meetup54/keeper.pdf and video https://youtu.be/IfgtdU1Mrm0?t=2682 +This page is a practical Altinity KB summary. For the full upstream reference, use the official ClickHouse Keeper guide: +https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/ -## Current status (last updated: July 2023) +Background material that is still useful: -Since version 23.3 we recommend using clickhouse-keeper for new installations. +- slides: https://presentations.clickhouse.com/meetup54/keeper.pdf +- video: https://youtu.be/IfgtdU1Mrm0?t=2682 -Even better if you will use the latest version of clickhouse-keeper (currently it's 23.7), and it's not necessary to use the same version of clickhouse-keeper as ClickHouse itself. +## Current status (last updated: March 2026) -For existing systems that currently use Apache Zookeeper, you can consider upgrading to clickhouse-keeper especially if you will [upgrade ClickHouse](https://altinity.com/clickhouse-upgrade-overview/) also. +The old 2023 guidance in this article is obsolete. In particular, the recommendations around `23.3` and `23.7` should no longer be treated as the current baseline. -But please remember that on very loaded systems the change can give no performance benefits or can sometimes lead to a worse performance. +Current practical guidance: -The development pace of keeper code is [still high](https://github.com/ClickHouse/ClickHouse/pulls?q=is%3Apr+keeper) -so every new version should bring improvements / cover the issues, and stability/maturity grows from version to version, so -if you want to play with clickhouse-keeper in some environment - please use [the most recent ClickHouse releases](https://altinity.com/altinity-stable/)! And of course: share your feedback :) +- For new installations, prefer ClickHouse Keeper over Apache ZooKeeper. +- Use a current supported stable release of ClickHouse / Keeper. Do not evaluate Keeper based on early `23.x` behavior. +- `async_replication` is available in `23.9+` and is recommended once all Keeper nodes in the ensemble support it. +- Keeper feature flags are visible in `system.zookeeper_connection` and `system.zookeeper_connection_log`. +- Some Keeper feature flags are enabled by default in `25.7+`. If you plan to move directly from a version older than `24.9`, first upgrade the Keeper ensemble to `24.9+`. +- Dynamic reconfiguration and quorum-loss recovery are documented workflows now; you do not need to rely only on old test configs and source code comments anymore. -## How does clickhouse-keeper work? +## Compatibility and limits -Official docs: https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/ +- Keeper speaks the ZooKeeper client protocol, so standard ZooKeeper clients can talk to it. +- Keeper snapshots/logs are not format-compatible with ZooKeeper. Use `clickhouse-keeper-converter` for migration. +- A mixed ZooKeeper / ClickHouse Keeper quorum is not possible. +- Keeper is highly compatible with ZooKeeper for ClickHouse workloads, but not every ZooKeeper feature is implemented. Check the official `Unsupported features` section before depending on niche ZooKeeper APIs or non-ClickHouse external integrations. -ClickHouse-keeper still need to be started additionally on few nodes (similar to 'normal' zookeeper) and speaks normal zookeeper protocol - needed to simplify A/B tests with real zookeeper. +## Topology guidance -To test that you need to run 3 instances of clickhouse-server (which will mimic zookeeper) with an extra config like that: +The biggest problem with many older examples, including the original version of this page, is the 2-node Keeper layout. That layout is fine for a lab, but not for production: a 2-node Keeper cluster loses quorum after one failure. -[https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml) +Practical guidance: -[https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml) +- Use `3` or `5` Keeper nodes. +- For a small 2-server ClickHouse cluster, a common pattern is `2` data nodes plus `1` Keeper-only tie-breaker node. +- Keep the `server_id -> hostname` mapping stable across replacements. +- Prefer hostnames over raw IP addresses. +- If you use embedded Keeper on very busy data nodes, validate latency carefully. Keeper is usually the right choice, but it is not magic and very loaded systems can still behave worse after migration. -or event single instance with config like that: [https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml) -[https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml) +## How to run Keeper -And point all the ClickHouses (zookeeper config section) to those nodes / ports. +Keeper can run embedded inside `clickhouse-server` or as the standalone `clickhouse-keeper` binary. -Latest version is recommended (even testing / master builds). We will be thankful for any feedback. +Standalone example: -## systemd service file +```bash +clickhouse-keeper --config /etc/clickhouse-keeper/keeper_config.xml +``` -See -https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/ +Related KB pages: -## init.d script +- systemd service file: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/ +- init.d script: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/ -See -https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/ +## Example: two ClickHouse data nodes with a 3-node Keeper ensemble -## Example of a simple cluster with 2 nodes of ClickHouse using built-in keeper +A better minimal production pattern is: -For example you can start two ClickHouse nodes (hostname1, hostname2) +- `ch1` - ClickHouse data node + Keeper +- `ch2` - ClickHouse data node + Keeper +- `ch3` - Keeper-only tie-breaker -### hostname1 +### Keeper config -```xml -$ cat /etc/clickhouse-server/config.d/keeper.xml +Use the same `raft_configuration` on all three Keeper nodes. The main per-node difference is `server_id`. + +Example for `ch1` (`server_id=1`): - - +```xml + + 2181 1 /var/lib/clickhouse/coordination/log /var/lib/clickhouse/coordination/snapshots + true 10000 30000 - trace - 10000 + information + + true - + - 1 - hostname1 - 9444 - - - 2 - hostname2 - 9444 - - - + 1 + ch1 + 9234 + + + 2 + ch2 + 9234 + + + 3 + ch3 + 9234 + + - - - - localhost - 2181 - - - - - /clickhouse/testcluster/task_queue/ddl - - - -$ cat /etc/clickhouse-server/config.d/macros.xml - - - - - testcluster - replica1 - 1 - - + ``` -### hostname2 +On `ch2` use the same config with `2`. On `ch3` use `3`. -```xml -$ cat /etc/clickhouse-server/config.d/keeper.xml - - - - - 2181 - 2 - /var/lib/clickhouse/coordination/log - /var/lib/clickhouse/coordination/snapshots +If you need encrypted connections: - - 10000 - 30000 - trace - 10000 - +- use `tcp_port_secure` for client-to-Keeper TLS +- use `true` for Keeper inter-node encryption - - - 1 - hostname1 - 9444 - - - 2 - hostname2 - 9444 - - +### ClickHouse config on data nodes - +Point ClickHouse to the whole Keeper ensemble, not just to localhost: +```xml + + - - localhost + + ch1 + 2181 + + + ch2 + 2181 + + + ch3 2181 - /clickhouse/testcluster/task_queue/ddl + /clickhouse/task_queue/ddl - + +``` -$ cat /etc/clickhouse-server/config.d/macros.xml +Example macros for `ch1`: - - +```xml + + - testcluster - replica2 1 + replica1 - + ``` -### on both +Example macros for `ch2`: ```xml -$ cat /etc/clickhouse-server/config.d/clusters.xml + + + + 1 + replica2 + + +``` - - +Cluster definition on both data nodes: + +```xml + + - + - hostname1 + ch1 9000 - hostname2 + ch2 9000 - + - + ``` -Then create a table +### Test with a replicated table + +Use `{uuid}` in Keeper paths for new replicated tables. This avoids path reuse problems when tables are created and dropped frequently. + +```sql +CREATE DATABASE db1 ON CLUSTER 'cluster_1S_2R'; + +CREATE TABLE db1.test ON CLUSTER 'cluster_1S_2R' +( + A Int64, + S String +) +ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{uuid}', '{replica}') +ORDER BY A; + +INSERT INTO db1.test VALUES (1, 'a'), (2, 'b'); + +SELECT hostName(), count() +FROM clusterAllReplicas('cluster_1S_2R', 'db1', 'test') +GROUP BY hostName() +ORDER BY hostName(); +``` + +## Operational checks + +Check Keeper connectivity and enabled feature flags from ClickHouse: ```sql -create table test on cluster '{cluster}' ( A Int64, S String) -Engine = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{database}/{table}','{replica}') -Order by A; +SELECT + name, + host, + port, + keeper_api_version, + enabled_feature_flags, + session_timeout_ms, + last_zxid_seen +FROM system.zookeeper_connection; +``` + +Inspect the current Keeper cluster configuration: -insert into test select number, '' from numbers(100000000); +```bash +clickhouse-keeper-client --host ch1 --port 2181 -q "get /keeper/config" +``` + +Basic health checks: --- on both nodes: -select count() from test; +```bash +echo ruok | nc ch1 2181 +echo mntr | nc ch1 2181 ``` + +`ruok` should return `imok`. + +If you need to change Keeper membership dynamically, use `clickhouse-keeper-client` `reconfig` commands and keep `enable_reconfiguration=true` on Keeper nodes. + +If you lose quorum, follow the official `Recovering after losing quorum` procedure instead of improvising edits in Keeper state directories. + +## Useful references + +- official Keeper guide: https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/ +- `clickhouse-keeper-client` utility: https://clickhouse.com/docs/en/operations/utilities/clickhouse-keeper-client +- `system.zookeeper_connection`: https://clickhouse.com/docs/en/operations/system-tables/zookeeper_connection +- `system.zookeeper_connection_log`: https://clickhouse.com/docs/en/operations/system-tables/zookeeper_connection_log + +Examples of current Keeper configs and workflows also exist in the ClickHouse integration tests under `tests/integration/test_keeper_*`.