Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,214 +2,267 @@
title: "Using clickhouse-keeper"
linkTitle: "Using clickhouse-keeper"
description: >
Moving to the ClickHouse® alternative to Zookeeper
keywords:
Current guidance for running ClickHouse Keeper as the ZooKeeper-compatible coordination service for ClickHouse
keywords:
- clickhouse keeper
- clickhouse-keeper
- zookeeper
---

Since 2021 the development of built-in ClickHouse® alternative for Zookeeper is happening, whose goal is to address several design pitfalls, and get rid of extra dependency.
ClickHouse Keeper is the ZooKeeper-compatible coordination service used by ClickHouse for replicated tables and `ON CLUSTER` DDL. For new self-managed deployments it is the default recommendation instead of Apache ZooKeeper.

See slides: https://presentations.clickhouse.com/meetup54/keeper.pdf and video https://youtu.be/IfgtdU1Mrm0?t=2682
This page is a practical Altinity KB summary. For the full upstream reference, use the official ClickHouse Keeper guide:
https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/

## Current status (last updated: July 2023)
Background material that is still useful:

Since version 23.3 we recommend using clickhouse-keeper for new installations.
- slides: https://presentations.clickhouse.com/meetup54/keeper.pdf
- video: https://youtu.be/IfgtdU1Mrm0?t=2682

Even better if you will use the latest version of clickhouse-keeper (currently it's 23.7), and it's not necessary to use the same version of clickhouse-keeper as ClickHouse itself.
## Current status (last updated: March 2026)

For existing systems that currently use Apache Zookeeper, you can consider upgrading to clickhouse-keeper especially if you will [upgrade ClickHouse](https://altinity.com/clickhouse-upgrade-overview/) also.
The old 2023 guidance in this article is obsolete. In particular, the recommendations around `23.3` and `23.7` should no longer be treated as the current baseline.

But please remember that on very loaded systems the change can give no performance benefits or can sometimes lead to a worse performance.
Current practical guidance:

The development pace of keeper code is [still high](https://github.com/ClickHouse/ClickHouse/pulls?q=is%3Apr+keeper)
so every new version should bring improvements / cover the issues, and stability/maturity grows from version to version, so
if you want to play with clickhouse-keeper in some environment - please use [the most recent ClickHouse releases](https://altinity.com/altinity-stable/)! And of course: share your feedback :)
- For new installations, prefer ClickHouse Keeper over Apache ZooKeeper.
- Use a current supported stable release of ClickHouse / Keeper. Do not evaluate Keeper based on early `23.x` behavior.
- `async_replication` is available in `23.9+` and is recommended once all Keeper nodes in the ensemble support it.
- Keeper feature flags are visible in `system.zookeeper_connection` and `system.zookeeper_connection_log`.
- Some Keeper feature flags are enabled by default in `25.7+`. If you plan to move directly from a version older than `24.9`, first upgrade the Keeper ensemble to `24.9+`.
- Dynamic reconfiguration and quorum-loss recovery are documented workflows now; you do not need to rely only on old test configs and source code comments anymore.

## How does clickhouse-keeper work?
## Compatibility and limits

Official docs: https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/
- Keeper speaks the ZooKeeper client protocol, so standard ZooKeeper clients can talk to it.
- Keeper snapshots/logs are not format-compatible with ZooKeeper. Use `clickhouse-keeper-converter` for migration.
- A mixed ZooKeeper / ClickHouse Keeper quorum is not possible.
- Keeper is highly compatible with ZooKeeper for ClickHouse workloads, but not every ZooKeeper feature is implemented. Check the official `Unsupported features` section before depending on niche ZooKeeper APIs or non-ClickHouse external integrations.

ClickHouse-keeper still need to be started additionally on few nodes (similar to 'normal' zookeeper) and speaks normal zookeeper protocol - needed to simplify A/B tests with real zookeeper.
## Topology guidance

To test that you need to run 3 instances of clickhouse-server (which will mimic zookeeper) with an extra config like that:
The biggest problem with many older examples, including the original version of this page, is the 2-node Keeper layout. That layout is fine for a lab, but not for production: a 2-node Keeper cluster loses quorum after one failure.

[https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml)
Practical guidance:

[https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml)
- Use `3` or `5` Keeper nodes.
- For a small 2-server ClickHouse cluster, a common pattern is `2` data nodes plus `1` Keeper-only tie-breaker node.
- Keep the `server_id -> hostname` mapping stable across replacements.
- Prefer hostnames over raw IP addresses.
- If you use embedded Keeper on very busy data nodes, validate latency carefully. Keeper is usually the right choice, but it is not magic and very loaded systems can still behave worse after migration.

or event single instance with config like that: [https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml)
[https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml)
## How to run Keeper

And point all the ClickHouses (zookeeper config section) to those nodes / ports.
Keeper can run embedded inside `clickhouse-server` or as the standalone `clickhouse-keeper` binary.

Latest version is recommended (even testing / master builds). We will be thankful for any feedback.
Standalone example:

## systemd service file
```bash
clickhouse-keeper --config /etc/clickhouse-keeper/keeper_config.xml
```

See
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/
Related KB pages:

## init.d script
- systemd service file: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/
- init.d script: https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/

See
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/
## Example: two ClickHouse data nodes with a 3-node Keeper ensemble

## Example of a simple cluster with 2 nodes of ClickHouse using built-in keeper
A better minimal production pattern is:

For example you can start two ClickHouse nodes (hostname1, hostname2)
- `ch1` - ClickHouse data node + Keeper
- `ch2` - ClickHouse data node + Keeper
- `ch3` - Keeper-only tie-breaker

### hostname1
### Keeper config

```xml
$ cat /etc/clickhouse-server/config.d/keeper.xml
Use the same `raft_configuration` on all three Keeper nodes. The main per-node difference is `server_id`.

Example for `ch1` (`server_id=1`):

<?xml version="1.0" ?>
<yandex>
```xml
<?xml version="1.0"?>
<clickhouse>
<keeper_server>
<tcp_port>2181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<enable_reconfiguration>true</enable_reconfiguration>

<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>trace</raft_logs_level>
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
<raft_logs_level>information</raft_logs_level>
<!-- Enable on 23.9+ once all Keeper nodes are upgraded -->
<async_replication>true</async_replication>
</coordination_settings>

<raft_configuration>
<raft_configuration>
<server>
<id>1</id>
<hostname>hostname1</hostname>
<port>9444</port>
</server>
<server>
<id>2</id>
<hostname>hostname2</hostname>
<port>9444</port>
</server>
</raft_configuration>

<id>1</id>
<hostname>ch1</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>ch2</hostname>
<port>9234</port>
</server>
<server>
<id>3</id>
<hostname>ch3</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>

<zookeeper>
<node>
<host>localhost</host>
<port>2181</port>
</node>
</zookeeper>

<distributed_ddl>
<path>/clickhouse/testcluster/task_queue/ddl</path>
</distributed_ddl>
</yandex>

$ cat /etc/clickhouse-server/config.d/macros.xml

<?xml version="1.0" ?>
<yandex>
<macros>
<cluster>testcluster</cluster>
<replica>replica1</replica>
<shard>1</shard>
</macros>
</yandex>
</clickhouse>
```

### hostname2
On `ch2` use the same config with `<server_id>2</server_id>`. On `ch3` use `<server_id>3</server_id>`.

```xml
$ cat /etc/clickhouse-server/config.d/keeper.xml

<?xml version="1.0" ?>
<yandex>
<keeper_server>
<tcp_port>2181</tcp_port>
<server_id>2</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
If you need encrypted connections:

<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>trace</raft_logs_level>
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
</coordination_settings>
- use `tcp_port_secure` for client-to-Keeper TLS
- use `<raft_configuration><secure>true</secure></raft_configuration>` for Keeper inter-node encryption

<raft_configuration>
<server>
<id>1</id>
<hostname>hostname1</hostname>
<port>9444</port>
</server>
<server>
<id>2</id>
<hostname>hostname2</hostname>
<port>9444</port>
</server>
</raft_configuration>
### ClickHouse config on data nodes

</keeper_server>
Point ClickHouse to the whole Keeper ensemble, not just to localhost:

```xml
<?xml version="1.0"?>
<clickhouse>
<zookeeper>
<node>
<host>localhost</host>
<node index="1">
<host>ch1</host>
<port>2181</port>
</node>
<node index="2">
<host>ch2</host>
<port>2181</port>
</node>
<node index="3">
<host>ch3</host>
<port>2181</port>
</node>
</zookeeper>

<distributed_ddl>
<path>/clickhouse/testcluster/task_queue/ddl</path>
<path>/clickhouse/task_queue/ddl</path>
</distributed_ddl>
</yandex>
</clickhouse>
```

$ cat /etc/clickhouse-server/config.d/macros.xml
Example macros for `ch1`:

<?xml version="1.0" ?>
<yandex>
```xml
<?xml version="1.0"?>
<clickhouse>
<macros>
<cluster>testcluster</cluster>
<replica>replica2</replica>
<shard>1</shard>
<replica>replica1</replica>
</macros>
</yandex>
</clickhouse>
```

### on both
Example macros for `ch2`:

```xml
$ cat /etc/clickhouse-server/config.d/clusters.xml
<?xml version="1.0"?>
<clickhouse>
<macros>
<shard>1</shard>
<replica>replica2</replica>
</macros>
</clickhouse>
```

<?xml version="1.0" ?>
<yandex>
Cluster definition on both data nodes:

```xml
<?xml version="1.0"?>
<clickhouse>
<remote_servers>
<testcluster>
<cluster_1S_2R>
<shard>
<replica>
<host>hostname1</host>
<host>ch1</host>
<port>9000</port>
</replica>
<replica>
<host>hostname2</host>
<host>ch2</host>
<port>9000</port>
</replica>
</shard>
</testcluster>
</cluster_1S_2R>
</remote_servers>
</yandex>
</clickhouse>
```

Then create a table
### Test with a replicated table

Use `{uuid}` in Keeper paths for new replicated tables. This avoids path reuse problems when tables are created and dropped frequently.

```sql
CREATE DATABASE db1 ON CLUSTER 'cluster_1S_2R';

CREATE TABLE db1.test ON CLUSTER 'cluster_1S_2R'
(
A Int64,
S String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{uuid}', '{replica}')
ORDER BY A;

INSERT INTO db1.test VALUES (1, 'a'), (2, 'b');

SELECT hostName(), count()
FROM clusterAllReplicas('cluster_1S_2R', 'db1', 'test')
GROUP BY hostName()
ORDER BY hostName();
```

## Operational checks

Check Keeper connectivity and enabled feature flags from ClickHouse:

```sql
create table test on cluster '{cluster}' ( A Int64, S String)
Engine = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{database}/{table}','{replica}')
Order by A;
SELECT
name,
host,
port,
keeper_api_version,
enabled_feature_flags,
session_timeout_ms,
last_zxid_seen
FROM system.zookeeper_connection;
```

Inspect the current Keeper cluster configuration:

insert into test select number, '' from numbers(100000000);
```bash
clickhouse-keeper-client --host ch1 --port 2181 -q "get /keeper/config"
```

Basic health checks:

-- on both nodes:
select count() from test;
```bash
echo ruok | nc ch1 2181
echo mntr | nc ch1 2181
```

`ruok` should return `imok`.

If you need to change Keeper membership dynamically, use `clickhouse-keeper-client` `reconfig` commands and keep `enable_reconfiguration=true` on Keeper nodes.

If you lose quorum, follow the official `Recovering after losing quorum` procedure instead of improvising edits in Keeper state directories.

## Useful references

- official Keeper guide: https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/
- `clickhouse-keeper-client` utility: https://clickhouse.com/docs/en/operations/utilities/clickhouse-keeper-client
- `system.zookeeper_connection`: https://clickhouse.com/docs/en/operations/system-tables/zookeeper_connection
- `system.zookeeper_connection_log`: https://clickhouse.com/docs/en/operations/system-tables/zookeeper_connection_log

Examples of current Keeper configs and workflows also exist in the ClickHouse integration tests under `tests/integration/test_keeper_*`.