From d747af28694e01c7fb4545f7021c7625512c7047 Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Wed, 4 Mar 2026 12:33:04 +0100 Subject: [PATCH 01/10] Create users_in_keeper.md --- .../users_in_keeper.md | 418 ++++++++++++++++++ 1 file changed, 418 insertions(+) create mode 100644 content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md new file mode 100644 index 0000000000..18df7f29c8 --- /dev/null +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -0,0 +1,418 @@ +--- +title: "ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper" +linkTitle: "ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper" +weight: 100 +description: >- + ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper. +--- + +# ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper + +## 1. What this feature is + +ClickHouse can store access control entities in ZooKeeper/ClickHouse Keeper instead of (or together with) local files. + +Access entities include: +- users +- roles +- grants/revokes +- row policies +- quotas +- settings profiles +- masking policies + +In config this is the `replicated` access storage inside `user_directories`. + +```xml + + + /clickhouse/access + true + + +``` + +From here on, this article uses **Keeper** as a short name for ZooKeeper/ClickHouse Keeper. + +## 2. Basic concepts (quick glossary) + +- `Access entity`: one RBAC object (for example one user or one role). +- `ReplicatedAccessStorage`: access storage implementation that persists entities in Keeper. +- `ZooKeeperReplicator`: low-level component that does Keeper reads/writes/watches and maintains local mirror state. +- `ON CLUSTER`: distributed DDL mechanism (queue-based fan-out) for running a query on many hosts. +- `system.user_directories`: system table that shows configured access storages and precedence. + +## 3. Why teams use it (pros/cons) + +### Pros +- Single source of truth for RBAC across nodes. +- No manual file sync of `users.xml`/local access files. +- Immediate propagation through Keeper watches. +- Works naturally with SQL RBAC workflows (`CREATE USER`, `GRANT`, etc.). +- Integrates with backup/restore of access entities. + +### Cons +- Writes depend on Keeper availability (reads continue from local cache, writes fail when Keeper unavailable). +- Operational complexity increases (Keeper health now affects auth/RBAC changes). +- Potential confusion with `ON CLUSTER` (two replication mechanisms can overlap). +- Corrupted entity payload in Keeper can be ignored or fail startup, depending on settings. + +## 4. Where data is stored in Keeper + +Assume: +- configured path is `/clickhouse/access` + +Tree layout: + +```text +/clickhouse/access + /uuid + / -> serialized ATTACH statements of one entity + /U + / -> "" + /R + / -> "" + /S + / -> "" + /P + / -> "" + /Q + / -> "" + /M + / -> "" +``` + +Type-letter mapping is from `AccessEntityTypeInfo`: +- `U` user +- `R` role +- `S` settings profile +- `P` row policy +- `Q` quota +- `M` masking policy + +Important detail: +- names are escaped with `escapeForFileName()`. +- `zookeeper_path` is normalized on startup: trailing `/` removed, leading `/` enforced. + +## 5. What value format is stored under `/uuid/` + +Each entity is serialized as one or more one-line `ATTACH ...` statements: +- first statement is entity definition (`ATTACH USER`, `ATTACH ROLE`, and so on) +- users/roles can include attached grant statements (`ATTACH GRANT ...`) + +So Keeper stores SQL-like payload, not a binary protobuf/json object. + +## 6. How reads/writes happen (from basic to advanced) + +## 6.1 Startup and initialization + +On startup (or reconnect), `ZooKeeperReplicator`: +1. gets Keeper client +2. executes `sync(zookeeper_path)` to reduce stale reads after reconnect +3. creates root nodes if missing (`/uuid`, `/U`, `/R`, ...) +4. loads all entities into an internal `MemoryAccessStorage` +5. starts watch thread + +## 6.2 Insert/update/remove behavior + +- Insert uses Keeper `multi` to create both: + - `/uuid/` with serialized entity + - `//` with value `` +- Update uses versioned `set`/`multi`; rename updates type/name node too. +- Remove deletes both uuid node and type/name node in one `multi`. + +This dual-index model enforces: +- uniqueness by UUID (`/uuid`) +- uniqueness by (type, name) (`//`) + +## 6.3 Read/find behavior + +Reads from SQL path (`find`, `read`, `exists`) go to the in-memory mirror (`MemoryAccessStorage`), not directly to Keeper. + +Keeper is the persistent source; memory is the fast serving layer. + +## 7. Watches, refresh, and caches + +Two watch patterns are used: +- list watch on `/uuid` children: detects new/deleted entities +- per-entity watch on `/uuid/`: detects changes of that entity payload + +Refresh queue: +- `Nil` marker means “refresh entity list” +- concrete UUID means “refresh this entity” + +Thread model: +- dedicated watcher thread (`runWatchingThread`) +- on errors: reset cached Keeper client, sleep, retry +- after successful refresh: sends `AccessChangesNotifier` notifications + +Cache layers to know: +- primary replicated-access cache: `MemoryAccessStorage` inside `ReplicatedAccessStorage` +- higher-level RBAC caches in `AccessControl`: + - `RoleCache` + - `RowPolicyCache` + - `QuotaCache` + - `SettingsProfilesCache` +- these caches subscribe to access-entity change notifications and recalculate/invalidate accordingly + +## 8. Settings that strongly affect behavior + +## 8.1 `ignore_on_cluster_for_replicated_access_entities_queries` + +If enabled and replicated access storage exists: +- access-control queries with `ON CLUSTER` are rewritten to local query (ON CLUSTER removed). + +Why: +- replicated access storage already replicates through Keeper. +- additional distributed DDL fan-out may cause duplicate/conflicting execution. + +Coverage includes grants/revokes too (`ASTGrantQuery` is included). + +## 8.2 `access_control_improvements.throw_on_invalid_replicated_access_entities` + +If enabled: +- parse errors in Keeper entity payload are fatal during full load (can fail server startup). + +If disabled: +- invalid entity is logged and skipped. + +This is tested by injecting invalid `ATTACH GRANT ...` into `/uuid/`. + +## 8.3 `access_control_improvements.on_cluster_queries_require_cluster_grant` + +Controls whether `CLUSTER` grant is required for `ON CLUSTER`. + +## 8.4 `distributed_ddl_use_initial_user_and_roles` (server setting) + +For `ON CLUSTER`, remote execution can preserve initiator user/roles. +This is relevant when mixing distributed DDL with access management. + +## 9. Relationship with `ON CLUSTER` (important) + +There are two independent propagation mechanisms: +- Replicated access storage: Keeper-based data replication. +- `ON CLUSTER`: distributed DDL queue execution. + +When replicated access storage is used, combining both can be redundant or problematic. + +Recommended practice: +- for access-entity SQL in replicated storage deployments, enable `ignore_on_cluster_for_replicated_access_entities_queries`. +- then you may keep existing `... ON CLUSTER ...` statements, but they are safely rewritten locally. + +## 10. Backup/restore behavior + +## 10.1 Access entities backup in replicated mode + +In `IAccessStorage::backup()`: +- non-replicated storage writes backup entry directly. +- replicated storage registers file path in backup coordination by: + - replication id = `zookeeper_path` + - access entity type + +Then backup coordination chooses a single host deterministically to store unified replicated-access files. + +## 10.2 Keeper structure for `BACKUP ... ON CLUSTER` + +Under backup coordination root: + +```text +/backup-/repl_access/ + / + / + -> "" +``` + +## 10.3 Restore coordination lock + +During restore: + +```text +/restore-/repl_access_storages_acquired/ + -> "" +``` + +Only the host that acquires this node restores that replicated access storage. + +## 11. Introspection and debugging + +Start here: + +```sql +SELECT name, type, params, precedence +FROM system.user_directories +ORDER BY precedence; +``` + +Inspect Keeper paths: + +```sql +SELECT path, name, value +FROM system.zookeeper +WHERE path IN ( + '/clickhouse/access', + '/clickhouse/access/uuid', + '/clickhouse/access/U', + '/clickhouse/access/R', + '/clickhouse/access/S', + '/clickhouse/access/P', + '/clickhouse/access/Q', + '/clickhouse/access/M' +); +``` + +Map user name to UUID then to payload: + +```sql +SELECT value AS uuid +FROM system.zookeeper +WHERE path = '/clickhouse/access/U' AND name = 'alice'; + +SELECT value +FROM system.zookeeper +WHERE path = '/clickhouse/access/uuid' AND name = ''; +``` + +Keeper connection and request visibility: + +```sql +SELECT * +FROM system.zookeeper_connection; + +SELECT * +FROM system.zookeeper_connection_log +ORDER BY event_time DESC +LIMIT 50; + +SELECT event_time, type, op_num, path, error +FROM system.zookeeper_log +WHERE path LIKE '/clickhouse/access/%' +ORDER BY event_time DESC +LIMIT 200; +``` + +Aggregated Keeper operations (if table is enabled): + +```sql +SELECT event_time, session_id, parent_path, operation, count, errors, average_latency +FROM system.aggregated_zookeeper_log +WHERE parent_path LIKE '/clickhouse/access/%' +ORDER BY event_time DESC +LIMIT 100; +``` + +Operational metrics: + +```sql +SELECT metric, value +FROM system.metrics +WHERE metric IN ( + 'ZooKeeperSession', + 'ZooKeeperSessionExpired', + 'ZooKeeperConnectionLossStartedTimestampSeconds', + 'ZooKeeperWatch', + 'ZooKeeperRequest', + 'DDLWorkerThreads', + 'DDLWorkerThreadsActive', + 'DDLWorkerThreadsScheduled' +) +ORDER BY metric; + +SELECT event, value +FROM system.events +WHERE event LIKE 'ZooKeeper%' +ORDER BY event; + +SELECT metric, value +FROM system.asynchronous_metrics +WHERE metric = 'ZooKeeperClientLastZXIDSeen'; +``` + +`ON CLUSTER` queue debugging: + +```sql +SELECT cluster, entry, host, status, query, exception_code, exception_text +FROM system.distributed_ddl_queue +ORDER BY query_create_time DESC +LIMIT 100; +``` + +Force reload of all user directories: + +```sql +SYSTEM RELOAD USERS; +``` + +## 12. Troubleshooting patterns + +- Symptom: writes fail, reads still work. + - Likely Keeper unavailable; replicated storage serves cached in-memory entities for reads. +- Symptom: startup failure after corrupted Keeper payload. + - Check `throw_on_invalid_replicated_access_entities`. + - Fix offending `/uuid/` payload in Keeper. +- Symptom: duplicate/“already exists in replicated” around `... ON CLUSTER ...`. + - Enable `ignore_on_cluster_for_replicated_access_entities_queries`. +- Symptom: grants seem stale after changes. + - Check watcher/connection metrics and `system.zookeeper_log`. + - Run `SYSTEM RELOAD USERS` as a recovery action. + +## 13. Developer-level internals + +- `ReplicatedAccessStorage` is now mostly a wrapper; Keeper logic is in `ZooKeeperReplicator`. +- On reconnect, code explicitly calls `sync(zookeeper_path)` to mitigate stale reads after session switch. +- Watch queue is unbounded and can accumulate work under churn; refresh loop drains it. +- Entity parse failures are wrapped with path context (`Could not parse `). +- Updates use optimistic versions via Keeper `set`/`multi`; conflicts become retryable or explicit exceptions. +- Backup integration uses `isReplicated()` and `getReplicationID()` hooks in `IAccessStorage`. +- Restore of replicated access uses explicit distributed lock (`acquireReplicatedAccessStorage`) to avoid duplicate restore writers. + +## 14. Important history and increments (Git timeline) + +| Date | Commit / PR | Change | Why it matters | +|---|---|---|---| +| 2020-04-06 | [`42b8ed3ec64`](https://github.com/ClickHouse/ClickHouse/commit/42b8ed3ec64) | `ON CLUSTER` support for access control SQL | Foundation for distributed RBAC DDL. | +| 2021-07-21 | [`e33a2bf7bc9`](https://github.com/ClickHouse/ClickHouse/commit/e33a2bf7bc9) | Added `ReplicatedAccessStorage` | Initial Keeper-backed replicated access entities. | +| 2021-09-26 (plus later backports) | [`13db65f47c3`](https://github.com/ClickHouse/ClickHouse/commit/13db65f47c3), [`29388`](https://github.com/ClickHouse/ClickHouse/pull/29388) | Shutdown/misconfiguration fixes | Safer lifecycle when Keeper is unavailable/misconfigured. | +| 2022-01-25 | [`0105f7e0bcc`](https://github.com/ClickHouse/ClickHouse/commit/0105f7e0bcc), [`33988`](https://github.com/ClickHouse/ClickHouse/pull/33988) | Startup fix when replicated access depends on keeper | Removed critical startup dead path. | +| 2022-03-30 | [`01e1c5345a2`](https://github.com/ClickHouse/ClickHouse/commit/01e1c5345a2) | Separate `CLUSTER` grant + `on_cluster_queries_require_cluster_grant` | Better security model for `ON CLUSTER`. | +| 2022-06-15 | [`a0c558a17e8`](https://github.com/ClickHouse/ClickHouse/commit/a0c558a17e8) | Backup/restore for ACL system tables | Made access entities first-class in backup/restore flows. | +| 2022-08-08 | [`8f9f5c69daf`](https://github.com/ClickHouse/ClickHouse/commit/8f9f5c69daf) | Simplified with `MemoryAccessStorage` mirror | Clearer in-memory serving model and cleaner replication loop. | +| 2022-08-09 | [`646cd556905`](https://github.com/ClickHouse/ClickHouse/commit/646cd556905), [`39977`](https://github.com/ClickHouse/ClickHouse/pull/39977) | Recovery improvements after errors | Better resilience on Keeper issues. | +| 2022-09-16 | [`69996c960c8`](https://github.com/ClickHouse/ClickHouse/commit/69996c960c8) | Init retries for replicated access | Fewer startup failures on transient network/hardware errors. | +| 2022-09-16 | [`5365b105ccc`](https://github.com/ClickHouse/ClickHouse/commit/5365b105ccc), [`45198`](https://github.com/ClickHouse/ClickHouse/pull/45198) | `SYSTEM RELOAD USERS` | Explicit operator tool for reloading all access storages. | +| 2023-08-18 | [`14590305ad0`](https://github.com/ClickHouse/ClickHouse/commit/14590305ad0), [`52975`](https://github.com/ClickHouse/ClickHouse/pull/52975) | Added ignore settings for replicated-entity queries | Reduced conflict between Keeper replication and `ON CLUSTER`. | +| 2023-12-12 | [`b33f1245559`](https://github.com/ClickHouse/ClickHouse/commit/b33f1245559), [`57538`](https://github.com/ClickHouse/ClickHouse/pull/57538) | Extended ignore behavior to `GRANT/REVOKE` | Closed major practical gap for replicated RBAC management. | +| 2024-09-04 | [`1ccd461c97d`](https://github.com/ClickHouse/ClickHouse/commit/1ccd461c97d) | Fix restoring dependent access entities | More reliable restore ordering/conflict handling. | +| 2024-09-06 | [`3c4d6509f3d`](https://github.com/ClickHouse/ClickHouse/commit/3c4d6509f3d) | Backup/restore refactor for access entities | Cleaner architecture and fewer edge-case restore issues. | +| 2024-09-18 | [`712a7261a9c`](https://github.com/ClickHouse/ClickHouse/commit/712a7261a9c) | Backup filenames changed to `access-.txt` | Deterministic naming across hosts for replicated access backups. | +| 2025-06-16 | [`d58a00754af`](https://github.com/ClickHouse/ClickHouse/commit/d58a00754af), [`81245`](https://github.com/ClickHouse/ClickHouse/pull/81245) | Split Keeper replication into `ZooKeeperReplicator` | Reusable replication core and cleaner separation of concerns. | +| 2025-09-12 | [`efa4d2b605e`](https://github.com/ClickHouse/ClickHouse/commit/efa4d2b605e) | ID/tag based watches in ZooKeeper client path | Lower watch/cache complexity and better correctness under churn. | +| 2025-09-12 | [`2bf08fc9a62`](https://github.com/ClickHouse/ClickHouse/commit/2bf08fc9a62) | Watch leftovers fix | Better long-run stability under frequent watch activity. | +| 2026-01-27 | [`21644efa780`](https://github.com/ClickHouse/ClickHouse/commit/21644efa780), [`95032`](https://github.com/ClickHouse/ClickHouse/pull/95032) | Option to throw on invalid replicated entities | Lets strict deployments fail fast on Keeper data corruption. | + +## 15. Practical guidance + +For most production clusters using replicated access entities: +1. Use replicated access storage as the RBAC source of truth. +2. Enable `ignore_on_cluster_for_replicated_access_entities_queries`. +3. Decide explicitly on strictness for invalid entities (`throw_on_invalid...`). +4. Monitor Keeper connection + request metrics and `system.zookeeper_*` logs. +5. Use `SYSTEM RELOAD USERS` as a controlled recovery tool. + +## 16. Key files (for engineers reading source) + +- `src/Access/ReplicatedAccessStorage.{h,cpp}` +- `src/Access/ZooKeeperReplicator.{h,cpp}` +- `src/Access/Common/AccessEntityType.{h,cpp}` +- `src/Access/AccessEntityIO.cpp` +- `src/Access/AccessControl.cpp` +- `src/Access/AccessChangesNotifier.{h,cpp}` +- `src/Access/IAccessStorage.cpp` +- `src/Backups/BackupCoordinationReplicatedAccess.{h,cpp}` +- `src/Backups/BackupCoordinationOnCluster.cpp` +- `src/Backups/RestoreCoordinationOnCluster.cpp` +- `src/Interpreters/removeOnClusterClauseIfNeeded.cpp` +- `src/Interpreters/Access/InterpreterGrantQuery.cpp` +- `tests/integration/test_replicated_users/test.py` +- `tests/integration/test_replicated_access/test.py` +- `tests/integration/test_replicated_access/test_invalid_entity.py` +- `tests/integration/test_access_control_on_cluster/test.py` From 458d2cc7793b2063be75a73741e4c5064f2f99bb Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Wed, 4 Mar 2026 13:05:14 +0100 Subject: [PATCH 02/10] Update users_in_keeper.md --- .../users_in_keeper.md | 59 +++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 18df7f29c8..7aa52dd1dd 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -233,6 +233,65 @@ During restore: Only the host that acquires this node restores that replicated access storage. +### 10.4 Support in clickhouse-backup tool + +`clickhouse-backup` supports replicated RBAC (`--rbac`) by directly reading and writing Keeper state for replicated access storages. +Its behavior is similar in goal to native `BACKUP`/`RESTORE`, but implementation is different: it does not use ClickHouse native backup-coordination `repl_access` znodes. Instead, it performs explicit Keeper subtree dump/restore from the host running the tool. + +#### 10.4.1 What is backed up + +For `--rbac`, the tool backs up both: + +- Local access files (`*.sql`) from ClickHouse access storage path. +- Replicated access entities from Keeper for each replicated user directory. + +Replicated directories are discovered via: + +- `SELECT name FROM system.user_directories WHERE type='replicated'` + +For each such directory, the tool: + +- Resolves its Keeper path from `config.xml` (`/user_directories//zookeeper_path`). +- Checks that `/uuid` has children. +- Dumps the full subtree to a JSONL file: + - `backup//access/.jsonl` + +RBAC entity kinds handled are: +- `USER` +- `ROLE` +- `ROW POLICY` +- `SETTINGS PROFILE` +- `QUOTA` + +#### 10.4.2 Keeper connection details + +Keeper connection settings are taken from ClickHouse preprocessed `config.xml`: + +- `/zookeeper/node` endpoints +- optional TLS (secure + `/openSSL/client/*`) +- optional digest auth +- optional Keeper root prefix + +So the tool uses the same Keeper connectivity model as ClickHouse server config. + +#### 10.4.3 Restore behavior in replicated mode + +During restore `--rbac`, the tool: + +1. Scans backed-up RBAC (`*.sql` and `*.jsonl`) and resolves conflicts against existing RBAC. +2. Applies conflict policy: + - general.rbac_conflict_resolution: recreate (default) or fail + - `--drop` also forces dropping existing conflicting entries +3. Restores local access files. +4. Restores replicated Keeper data from JSONL files back into replicated access paths. + +JSONL-to-directory mapping rule: + +- If file name matches `.jsonl`, it is restored to that directory. +- If no match is found, it falls back to the first replicated user directory. + +After local RBAC restore, the tool creates `need_rebuild_lists.mark`, removes `*.list`, and restarts ClickHouse (same as with configs restore) so access metadata is rebuilt correctly. + ## 11. Introspection and debugging Start here: From 4a0050c265cc3c074b2632c3dc9d923e458b89c6 Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 00:16:42 +0100 Subject: [PATCH 03/10] Update users_in_keeper.md --- .../users_in_keeper.md | 683 ++++++++---------- 1 file changed, 319 insertions(+), 364 deletions(-) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 7aa52dd1dd..7b28f0752e 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -1,347 +1,338 @@ --- -title: "ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper" -linkTitle: "ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper" +title: "How to Replicate ClickHouse RBAC Users and Grants with ZooKeeper/Keeper" +linkTitle: "Replicate RBAC with Keeper" weight: 100 description: >- - ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper. + Practical guide to configure Keeper-backed RBAC replication for users, roles, grants, policies, quotas, and profiles across ClickHouse nodes, including migration and troubleshooting. --- -# ClickHouse Users/Grants in ZooKeeper or ClickHouse Keeper +# How can I replicate CREATE USER and other RBAC commands automatically between servers? -## 1. What this feature is +This KB explains how to make SQL RBAC changes (`CREATE USER`, `CREATE ROLE`, `GRANT`, row policies, quotas, settings profiles, masking policies) automatically appear on all servers by storing access entities in ZooKeeper/ClickHouse Keeper. -ClickHouse can store access control entities in ZooKeeper/ClickHouse Keeper instead of (or together with) local files. +`Keeper` below means either ClickHouse Keeper or ZooKeeper. -Access entities include: -- users -- roles -- grants/revokes -- row policies -- quotas -- settings profiles -- masking policies +## 1. Why use this instead of only `ON CLUSTER` for RBAC? -In config this is the `replicated` access storage inside `user_directories`. +`ON CLUSTER` executes DDL on hosts that exist at execution time. +It does not automatically replay old RBAC DDL for replicas/shards added later. -```xml - - - /clickhouse/access - true - - -``` - -From here on, this article uses **Keeper** as a short name for ZooKeeper/ClickHouse Keeper. - -## 2. Basic concepts (quick glossary) +Keeper-backed RBAC solves that: +- one shared RBAC state for the cluster; +- new servers read the same RBAC state when they join; +- no need to remember `ON CLUSTER` for every RBAC statement. -- `Access entity`: one RBAC object (for example one user or one role). -- `ReplicatedAccessStorage`: access storage implementation that persists entities in Keeper. -- `ZooKeeperReplicator`: low-level component that does Keeper reads/writes/watches and maintains local mirror state. -- `ON CLUSTER`: distributed DDL mechanism (queue-based fan-out) for running a query on many hosts. -- `system.user_directories`: system table that shows configured access storages and precedence. +### 1.1 Pros and Cons -## 3. Why teams use it (pros/cons) - -### Pros +Pros: - Single source of truth for RBAC across nodes. -- No manual file sync of `users.xml`/local access files. -- Immediate propagation through Keeper watches. -- Works naturally with SQL RBAC workflows (`CREATE USER`, `GRANT`, etc.). -- Integrates with backup/restore of access entities. - -### Cons -- Writes depend on Keeper availability (reads continue from local cache, writes fail when Keeper unavailable). -- Operational complexity increases (Keeper health now affects auth/RBAC changes). -- Potential confusion with `ON CLUSTER` (two replication mechanisms can overlap). -- Corrupted entity payload in Keeper can be ignored or fail startup, depending on settings. - -## 4. Where data is stored in Keeper - -Assume: -- configured path is `/clickhouse/access` - -Tree layout: - -```text -/clickhouse/access - /uuid - / -> serialized ATTACH statements of one entity - /U - / -> "" - /R - / -> "" - /S - / -> "" - /P - / -> "" - /Q - / -> "" - /M - / -> "" -``` - -Type-letter mapping is from `AccessEntityTypeInfo`: -- `U` user -- `R` role -- `S` settings profile -- `P` row policy -- `Q` quota -- `M` masking policy - -Important detail: -- names are escaped with `escapeForFileName()`. -- `zookeeper_path` is normalized on startup: trailing `/` removed, leading `/` enforced. - -## 5. What value format is stored under `/uuid/` +- No manual file sync of `users.xml` / local access files. +- Fast propagation through Keeper watch-driven refresh. +- Natural SQL RBAC workflow (`CREATE USER`, `GRANT`, `REVOKE`, etc.). +- Integrates with access-entity backup/restore. -Each entity is serialized as one or more one-line `ATTACH ...` statements: -- first statement is entity definition (`ATTACH USER`, `ATTACH ROLE`, and so on) -- users/roles can include attached grant statements (`ATTACH GRANT ...`) +Cons: +- Writes depend on Keeper availability (reads can continue from local cache, writes fail when Keeper is unavailable). +- Operational complexity increases (Keeper health directly affects RBAC operations). +- Can conflict with `ON CLUSTER` if both mechanisms are used without guard settings. +- Invalid/corrupted payload in Keeper can be skipped or be startup-fatal, depending on `throw_on_invalid_replicated_access_entities`. +- Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure. +- If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so DBA login is unavailable until startup succeeds. -So Keeper stores SQL-like payload, not a binary protobuf/json object. +## 2. Backup and migration first (important) -## 6. How reads/writes happen (from basic to advanced) +Before switching to Keeper-backed RBAC, treat this as a migration. -## 6.1 Startup and initialization +Key facts: +- Changing `user_directories` storage or changing `zookeeper_path` does **not** move existing SQL RBAC objects automatically. +- If path changes, old users/roles are not deleted, but become effectively hidden from the new storage path. +- `zookeeper_path` cannot be changed at runtime via SQL. -On startup (or reconnect), `ZooKeeperReplicator`: -1. gets Keeper client -2. executes `sync(zookeeper_path)` to reduce stale reads after reconnect -3. creates root nodes if missing (`/uuid`, `/U`, `/R`, ...) -4. loads all entities into an internal `MemoryAccessStorage` -5. starts watch thread +Recommended migration sequence: +1. Back up RBAC objects. +2. Apply the new `user_directories` config on all nodes. +3. Restart/reload config as required by your environment. +4. Restore/recreate RBAC objects to the target storage. +5. Validate on all nodes. -## 6.2 Insert/update/remove behavior +### 2.1 Migration with pure SQL (no backup tool) -- Insert uses Keeper `multi` to create both: - - `/uuid/` with serialized entity - - `//` with value `` -- Update uses versioned `set`/`multi`; rename updates type/name node too. -- Remove deletes both uuid node and type/name node in one `multi`. +This path is useful when: +- RBAC DDL is already versioned in your repo, or +- you want to dump/replay access entities using SQL only. -This dual-index model enforces: -- uniqueness by UUID (`/uuid`) -- uniqueness by (type, name) (`//`) +Recommended SQL-only flow: +1. On source, check where entities are stored (local vs replicated): -## 6.3 Read/find behavior - -Reads from SQL path (`find`, `read`, `exists`) go to the in-memory mirror (`MemoryAccessStorage`), not directly to Keeper. - -Keeper is the persistent source; memory is the fast serving layer. - -## 7. Watches, refresh, and caches +```sql +SELECT name, storage FROM system.users ORDER BY name; +SELECT name, storage FROM system.roles ORDER BY name; +SELECT name, storage FROM system.settings_profiles ORDER BY name; +SELECT name, storage FROM system.quotas ORDER BY name; +SELECT name, storage FROM system.row_policies ORDER BY name; +SELECT name, storage FROM system.masking_policies ORDER BY name; +``` -Two watch patterns are used: -- list watch on `/uuid` children: detects new/deleted entities -- per-entity watch on `/uuid/`: detects changes of that entity payload +2. Export RBAC DDL from source: +- simplest full dump: -Refresh queue: -- `Nil` marker means “refresh entity list” -- concrete UUID means “refresh this entity” +```sql +SHOW ACCESS; +``` -Thread model: -- dedicated watcher thread (`runWatchingThread`) -- on errors: reset cached Keeper client, sleep, retry -- after successful refresh: sends `AccessChangesNotifier` notifications +Save output as SQL (for example `rbac_dump.sql`) in your repo/artifacts. -Cache layers to know: -- primary replicated-access cache: `MemoryAccessStorage` inside `ReplicatedAccessStorage` -- higher-level RBAC caches in `AccessControl`: - - `RoleCache` - - `RowPolicyCache` - - `QuotaCache` - - `SettingsProfilesCache` -- these caches subscribe to access-entity change notifications and recalculate/invalidate accordingly +You can also export individual objects with `SHOW CREATE USER/ROLE/...` when needed. -## 8. Settings that strongly affect behavior +3. Switch config to replicated `user_directories` on target cluster and restart/reload. +4. Replay exported SQL on one node (without `ON CLUSTER` in replicated mode). +5. Validate from another node (`SHOW CREATE USER ...`, `SHOW GRANTS FOR ...`). -## 8.1 `ignore_on_cluster_for_replicated_access_entities_queries` +### 2.2 Migration with `clickhouse-backup` (external tool) -If enabled and replicated access storage exists: -- access-control queries with `ON CLUSTER` are rewritten to local query (ON CLUSTER removed). +```bash +# backup local RBAC users/roles/etc. +clickhouse-backup create --rbac --rbac-only users_bkp_20260304 -Why: -- replicated access storage already replicates through Keeper. -- additional distributed DDL fan-out may cause duplicate/conflicting execution. +# restore (on node configured with replicated user directory) +clickhouse-backup restore --rbac-only users_bkp_20260304 +``` -Coverage includes grants/revokes too (`ASTGrantQuery` is included). +Important: +- this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.); +- if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion. -## 8.2 `access_control_improvements.throw_on_invalid_replicated_access_entities` +### 2.3 Migration with embedded ClickHouse SQL `BACKUP/RESTORE` -If enabled: -- parse errors in Keeper entity payload are fatal during full load (can fail server startup). +```sql +BACKUP + TABLE system.users, + TABLE system.roles, + TABLE system.row_policies, + TABLE system.quotas, + TABLE system.settings_profiles, + TABLE system.masking_policies +TO ; + +-- after switching config +RESTORE + TABLE system.users, + TABLE system.roles, + TABLE system.row_policies, + TABLE system.quotas, + TABLE system.settings_profiles, + TABLE system.masking_policies +FROM ; +``` -If disabled: -- invalid entity is logged and skipped. +`allow_backup` behavior for embedded SQL backup/restore: +- Storage-level flag in `user_directories` (``, ``, ``) controls whether that storage participates in backup/restore. +- Entity-level setting `allow_backup` (for users/roles/settings profiles) can exclude specific RBAC objects from backup. -This is tested by injecting invalid `ATTACH GRANT ...` into `/uuid/`. +Defaults in ClickHouse code: +- `users_xml`: `allow_backup = false` by default. +- `local_directory`: `allow_backup = true` by default. +- `replicated`: `allow_backup = true` by default. -## 8.3 `access_control_improvements.on_cluster_queries_require_cluster_grant` +Operational implication: +- If you disable `allow_backup` for replicated storage, embedded `BACKUP TABLE system.users ...` may skip those entities (or fail if no backup-allowed access storage remains). -Controls whether `CLUSTER` grant is required for `ON CLUSTER`. +About `clickhouse-backup --rbac/--rbac-only`: +- It is an external tool, not ClickHouse embedded backup by itself. +- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules. +- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics. -## 8.4 `distributed_ddl_use_initial_user_and_roles` (server setting) +## 3. Minimal server configuration -For `ON CLUSTER`, remote execution can preserve initiator user/roles. -This is relevant when mixing distributed DDL with access management. +`user_directories` is the ClickHouse server configuration section that defines: +- where access entities are read from (`users.xml`, local SQL access files, Keeper, LDAP, etc.), +- and in which order those sources are checked (precedence). -## 9. Relationship with `ON CLUSTER` (important) +In short: it is the access-storage routing configuration for users/roles/policies/profiles/quotas. -There are two independent propagation mechanisms: -- Replicated access storage: Keeper-based data replication. -- `ON CLUSTER`: distributed DDL queue execution. +Apply on **every** ClickHouse node: -When replicated access storage is used, combining both can be redundant or problematic. +```xml + + + + /etc/clickhouse-server/users.xml + + + /clickhouse/access/ + + + +``` -Recommended practice: -- for access-entity SQL in replicated storage deployments, enable `ignore_on_cluster_for_replicated_access_entities_queries`. -- then you may keep existing `... ON CLUSTER ...` statements, but they are safely rewritten locally. +Why `replace="replace"` matters: +- without `replace="replace"`, your fragment can be merged with defaults; +- defaults include `local_directory`, so SQL RBAC may still be written locally; +- this can cause mixed behavior (some entities in Keeper, some in local files). -## 10. Backup/restore behavior +### 3.1 `user_directories` behavior, defaults, and coexistence -## 10.1 Access entities backup in replicated mode +What can be configured in `user_directories`: +- `users_xml` (read-only config users), +- `local_directory` (SQL users/roles in local files), +- `replicated` (SQL users/roles in Keeper), +- `memory`, +- `ldap` (read-only remote auth source). -In `IAccessStorage::backup()`: -- non-replicated storage writes backup entry directly. -- replicated storage registers file path in backup coordination by: - - replication id = `zookeeper_path` - - access entity type +Defaults if `user_directories` is **not** specified: +- ClickHouse uses legacy settings (`users_config` and `access_control_path`). +- In typical default deployments this means `users_xml` + `local_directory`. -Then backup coordination chooses a single host deterministically to store unified replicated-access files. +If `user_directories` **is** specified: +- ClickHouse uses storages from this section and ignores `users_config` / `access_control_path` paths for access storages. +- Order in `user_directories` defines precedence for lookup/auth. -## 10.2 Keeper structure for `BACKUP ... ON CLUSTER` +When several storages coexist: +- reads/auth checks storages by precedence order; +- `CREATE USER/ROLE/...` without explicit `IN ...` goes to the first writable target by that order (and may conflict with entities found in higher-precedence storages). -Under backup coordination root: +There is special syntax to target a storage explicitly: -```text -/backup-/repl_access/ - / - / - -> "" +```sql +CREATE USER my_user IDENTIFIED BY '***' IN replicated; ``` -## 10.3 Restore coordination lock - -During restore: - -```text -/restore-/repl_access_storages_acquired/ - -> "" +This is supported, but for access control we usually do **not** recommend mixing storages intentionally. +For sensitive access rights, a single source of truth (typically `replicated`) is safer and easier to operate. + +## 4. Altinity operator (CHI) example + +```yaml +apiVersion: clickhouse.altinity.com/v1 +kind: ClickHouseInstallation +metadata: + name: rbac-replicated +spec: + configuration: + files: + config.d/user_directories.xml: | + + + + /etc/clickhouse-server/users.xml + + + /clickhouse/access/ + + + ``` -Only the host that acquires this node restores that replicated access storage. - -### 10.4 Support in clickhouse-backup tool - -`clickhouse-backup` supports replicated RBAC (`--rbac`) by directly reading and writing Keeper state for replicated access storages. -Its behavior is similar in goal to native `BACKUP`/`RESTORE`, but implementation is different: it does not use ClickHouse native backup-coordination `repl_access` znodes. Instead, it performs explicit Keeper subtree dump/restore from the host running the tool. +## 5. Quick validation checklist -#### 10.4.1 What is backed up +Check active storages and precedence: -For `--rbac`, the tool backs up both: +```sql +SELECT name, type, params, precedence +FROM system.user_directories +ORDER BY precedence; +``` -- Local access files (`*.sql`) from ClickHouse access storage path. -- Replicated access entities from Keeper for each replicated user directory. +Check where users are stored: -Replicated directories are discovered via: +```sql +SELECT name, storage +FROM system.users +ORDER BY name; +``` -- `SELECT name FROM system.user_directories WHERE type='replicated'` +Smoke test: +1. On node A: `CREATE USER kb_test IDENTIFIED WITH no_password;` +2. On node B: `SHOW CREATE USER kb_test;` +3. On either node: `DROP USER kb_test;` -For each such directory, the tool: +Check Keeper data exists: -- Resolves its Keeper path from `config.xml` (`/user_directories//zookeeper_path`). -- Checks that `/uuid` has children. -- Dumps the full subtree to a JSONL file: - - `backup//access/.jsonl` +```sql +SELECT * +FROM system.zookeeper +WHERE path = '/clickhouse/access'; +``` -RBAC entity kinds handled are: -- `USER` -- `ROLE` -- `ROW POLICY` -- `SETTINGS PROFILE` -- `QUOTA` +## 6. Relationship with `ON CLUSTER` (important) -#### 10.4.2 Keeper connection details +There are two independent propagation mechanisms: +- Replicated access storage: Keeper-based replication of RBAC entities. +- `ON CLUSTER`: distributed DDL queue execution. -Keeper connection settings are taken from ClickHouse preprocessed `config.xml`: +When replicated access storage is enabled, combining both can be redundant or problematic. -- `/zookeeper/node` endpoints -- optional TLS (secure + `/openSSL/client/*`) -- optional digest auth -- optional Keeper root prefix +Recommended practice: +- Prefer RBAC SQL without `ON CLUSTER`, or enable ignore mode: -So the tool uses the same Keeper connectivity model as ClickHouse server config. +```sql +SET ignore_on_cluster_for_replicated_access_entities_queries = 1; +``` -#### 10.4.3 Restore behavior in replicated mode +With this setting, existing RBAC scripts containing `ON CLUSTER` can still be used safely: the clause is rewritten away for replicated-access queries. -During restore `--rbac`, the tool: +For production, prefer configuring this in a profile (for example `default` in `users.xml`) rather than relying on session-level `SET`: -1. Scans backed-up RBAC (`*.sql` and `*.jsonl`) and resolves conflicts against existing RBAC. -2. Applies conflict policy: - - general.rbac_conflict_resolution: recreate (default) or fail - - `--drop` also forces dropping existing conflicting entries -3. Restores local access files. -4. Restores replicated Keeper data from JSONL files back into replicated access paths. +```xml + + + + 1 + + + +``` -JSONL-to-directory mapping rule: +Also decide your strictness for invalid replicated entities: -- If file name matches `.jsonl`, it is restored to that directory. -- If no match is found, it falls back to the first replicated user directory. +```xml + + true + +``` -After local RBAC restore, the tool creates `need_rebuild_lists.mark`, removes `*.list`, and restarts ClickHouse (same as with configs restore) so access metadata is rebuilt correctly. +- `true`: fail fast on invalid entity payload in Keeper. +- `false`: log and skip invalid entity. -## 11. Introspection and debugging +## 7. Common support issues (generalized) -Start here: +| Symptom | Typical root cause | What to do | +|---|---|---| +| User created on node A is missing on node B | RBAC still stored in `local_directory` | Verify `system.user_directories`; ensure `replicated` is configured on all nodes and active | +| RBAC objects “disappeared” after config change/restart | `zookeeper_path` or storage source changed | Restore from backup or recreate RBAC in the new storage; keep path stable | +| New replica has no historical users/roles | Team used only `... ON CLUSTER ...` before scaling | Enable Keeper-backed RBAC so new nodes load shared state | +| `CREATE USER ... ON CLUSTER` throws "already exists in replicated" | Query fan-out + replicated storage both applied | Remove `ON CLUSTER` for RBAC or enable `ignore_on_cluster_for_replicated_access_entities_queries` | +| RBAC writes still go local though `replicated` exists | `local_directory` remains first writable storage | Use `user_directories replace="replace"` and avoid writable local SQL storage in front of replicated | +| Server does not start when Keeper is down; no one can log in | Replicated access storage needs Keeper during initialization | Restore Keeper first, then restart; if needed use a temporary fallback config and keep a break-glass `users.xml` admin | +| Startup fails (or users are skipped) because of invalid RBAC payload in Keeper | Corrupted/invalid replicated entity and strict validation mode | Use `throw_on_invalid_replicated_access_entities` deliberately: `true` fail-fast, `false` skip+log; fix bad Keeper payload before re-enabling strict mode | +| Two independent clusters unexpectedly share the same users/roles | Both clusters point to the same Keeper ensemble and the same `zookeeper_path` | Use unique RBAC paths per cluster (recommended), or isolate with Keeper chroot (requires Keeper metadata repopulation/migration) | +| Cannot change RBAC keeper path with SQL at runtime | Not supported by design | Change config + controlled migration/restore | +| Trying to “sync” RBAC between independent clusters by pointing to another path | Wrong migration model | Use backup/restore or SQL export/import, not ad hoc path switching | +| Authentication errors from app/job, but local tests work | Network/IP/user mismatch, not replication itself | Check `system.query_log` and source IP; verify user host restrictions | +| Short window where user seems present/absent via load balancer | Propagation + node routing timing | Validate directly on each node; avoid assuming LB view is instantly consistent | +| Server fails after aggressive `user_directories` replacement | Required base users/profiles missing in config | Keep `users_xml` (or equivalent base definitions) intact | -```sql -SELECT name, type, params, precedence -FROM system.user_directories -ORDER BY precedence; -``` +## 8. Operational guardrails -Inspect Keeper paths: +- Keep the same `user_directories` config on all nodes. +- Keep `zookeeper_path` unique per cluster/tenant. +- Use a dedicated admin user for provisioning; avoid using `default` for automation. +- Track configuration rollouts (who/when/what) to avoid hidden behavior changes. +- Treat Keeper health as part of access-management SLO. +- Plan RBAC backup/restore before changing storage path or cluster topology. -```sql -SELECT path, name, value -FROM system.zookeeper -WHERE path IN ( - '/clickhouse/access', - '/clickhouse/access/uuid', - '/clickhouse/access/U', - '/clickhouse/access/R', - '/clickhouse/access/S', - '/clickhouse/access/P', - '/clickhouse/access/Q', - '/clickhouse/access/M' -); -``` +## 9. Debugging and observability -Map user name to UUID then to payload: +Keeper connectivity: ```sql -SELECT value AS uuid -FROM system.zookeeper -WHERE path = '/clickhouse/access/U' AND name = 'alice'; - -SELECT value -FROM system.zookeeper -WHERE path = '/clickhouse/access/uuid' AND name = ''; +SELECT * FROM system.zookeeper_connection; +SELECT * FROM system.zookeeper_connection_log ORDER BY event_time DESC LIMIT 100; ``` -Keeper connection and request visibility: +Keeper operations for RBAC path: ```sql -SELECT * -FROM system.zookeeper_connection; - -SELECT * -FROM system.zookeeper_connection_log -ORDER BY event_time DESC -LIMIT 50; - SELECT event_time, type, op_num, path, error FROM system.zookeeper_log WHERE path LIKE '/clickhouse/access/%' @@ -349,44 +340,19 @@ ORDER BY event_time DESC LIMIT 200; ``` -Aggregated Keeper operations (if table is enabled): - -```sql -SELECT event_time, session_id, parent_path, operation, count, errors, average_latency -FROM system.aggregated_zookeeper_log -WHERE parent_path LIKE '/clickhouse/access/%' -ORDER BY event_time DESC -LIMIT 100; -``` - -Operational metrics: +Note: `system.zookeeper_log` is often disabled in production. +If it is unavailable, use server logs (usually `clickhouse-server.log`) with these patterns: -```sql -SELECT metric, value -FROM system.metrics -WHERE metric IN ( - 'ZooKeeperSession', - 'ZooKeeperSessionExpired', - 'ZooKeeperConnectionLossStartedTimestampSeconds', - 'ZooKeeperWatch', - 'ZooKeeperRequest', - 'DDLWorkerThreads', - 'DDLWorkerThreadsActive', - 'DDLWorkerThreadsScheduled' -) -ORDER BY metric; - -SELECT event, value -FROM system.events -WHERE event LIKE 'ZooKeeper%' -ORDER BY event; - -SELECT metric, value -FROM system.asynchronous_metrics -WHERE metric = 'ZooKeeperClientLastZXIDSeen'; +```text +Access(replicated) +ZooKeeperReplicator +Will try to restart watching thread after error +Initialization failed. Error: +Can't have Replicated access without ZooKeeper +ON CLUSTER clause was ignored for query ``` -`ON CLUSTER` queue debugging: +If troubleshooting mixed usage of distributed DDL: ```sql SELECT cluster, entry, host, status, query, exception_code, exception_text @@ -395,83 +361,72 @@ ORDER BY query_create_time DESC LIMIT 100; ``` -Force reload of all user directories: +Force access reload: ```sql SYSTEM RELOAD USERS; ``` -## 12. Troubleshooting patterns - -- Symptom: writes fail, reads still work. - - Likely Keeper unavailable; replicated storage serves cached in-memory entities for reads. -- Symptom: startup failure after corrupted Keeper payload. - - Check `throw_on_invalid_replicated_access_entities`. - - Fix offending `/uuid/` payload in Keeper. -- Symptom: duplicate/“already exists in replicated” around `... ON CLUSTER ...`. - - Enable `ignore_on_cluster_for_replicated_access_entities_queries`. -- Symptom: grants seem stale after changes. - - Check watcher/connection metrics and `system.zookeeper_log`. - - Run `SYSTEM RELOAD USERS` as a recovery action. - -## 13. Developer-level internals - -- `ReplicatedAccessStorage` is now mostly a wrapper; Keeper logic is in `ZooKeeperReplicator`. -- On reconnect, code explicitly calls `sync(zookeeper_path)` to mitigate stale reads after session switch. -- Watch queue is unbounded and can accumulate work under churn; refresh loop drains it. -- Entity parse failures are wrapped with path context (`Could not parse `). -- Updates use optimistic versions via Keeper `set`/`multi`; conflicts become retryable or explicit exceptions. -- Backup integration uses `isReplicated()` and `getReplicationID()` hooks in `IAccessStorage`. -- Restore of replicated access uses explicit distributed lock (`acquireReplicatedAccessStorage`) to avoid duplicate restore writers. - -## 14. Important history and increments (Git timeline) - -| Date | Commit / PR | Change | Why it matters | -|---|---|---|---| -| 2020-04-06 | [`42b8ed3ec64`](https://github.com/ClickHouse/ClickHouse/commit/42b8ed3ec64) | `ON CLUSTER` support for access control SQL | Foundation for distributed RBAC DDL. | -| 2021-07-21 | [`e33a2bf7bc9`](https://github.com/ClickHouse/ClickHouse/commit/e33a2bf7bc9) | Added `ReplicatedAccessStorage` | Initial Keeper-backed replicated access entities. | -| 2021-09-26 (plus later backports) | [`13db65f47c3`](https://github.com/ClickHouse/ClickHouse/commit/13db65f47c3), [`29388`](https://github.com/ClickHouse/ClickHouse/pull/29388) | Shutdown/misconfiguration fixes | Safer lifecycle when Keeper is unavailable/misconfigured. | -| 2022-01-25 | [`0105f7e0bcc`](https://github.com/ClickHouse/ClickHouse/commit/0105f7e0bcc), [`33988`](https://github.com/ClickHouse/ClickHouse/pull/33988) | Startup fix when replicated access depends on keeper | Removed critical startup dead path. | -| 2022-03-30 | [`01e1c5345a2`](https://github.com/ClickHouse/ClickHouse/commit/01e1c5345a2) | Separate `CLUSTER` grant + `on_cluster_queries_require_cluster_grant` | Better security model for `ON CLUSTER`. | -| 2022-06-15 | [`a0c558a17e8`](https://github.com/ClickHouse/ClickHouse/commit/a0c558a17e8) | Backup/restore for ACL system tables | Made access entities first-class in backup/restore flows. | -| 2022-08-08 | [`8f9f5c69daf`](https://github.com/ClickHouse/ClickHouse/commit/8f9f5c69daf) | Simplified with `MemoryAccessStorage` mirror | Clearer in-memory serving model and cleaner replication loop. | -| 2022-08-09 | [`646cd556905`](https://github.com/ClickHouse/ClickHouse/commit/646cd556905), [`39977`](https://github.com/ClickHouse/ClickHouse/pull/39977) | Recovery improvements after errors | Better resilience on Keeper issues. | -| 2022-09-16 | [`69996c960c8`](https://github.com/ClickHouse/ClickHouse/commit/69996c960c8) | Init retries for replicated access | Fewer startup failures on transient network/hardware errors. | -| 2022-09-16 | [`5365b105ccc`](https://github.com/ClickHouse/ClickHouse/commit/5365b105ccc), [`45198`](https://github.com/ClickHouse/ClickHouse/pull/45198) | `SYSTEM RELOAD USERS` | Explicit operator tool for reloading all access storages. | -| 2023-08-18 | [`14590305ad0`](https://github.com/ClickHouse/ClickHouse/commit/14590305ad0), [`52975`](https://github.com/ClickHouse/ClickHouse/pull/52975) | Added ignore settings for replicated-entity queries | Reduced conflict between Keeper replication and `ON CLUSTER`. | -| 2023-12-12 | [`b33f1245559`](https://github.com/ClickHouse/ClickHouse/commit/b33f1245559), [`57538`](https://github.com/ClickHouse/ClickHouse/pull/57538) | Extended ignore behavior to `GRANT/REVOKE` | Closed major practical gap for replicated RBAC management. | -| 2024-09-04 | [`1ccd461c97d`](https://github.com/ClickHouse/ClickHouse/commit/1ccd461c97d) | Fix restoring dependent access entities | More reliable restore ordering/conflict handling. | -| 2024-09-06 | [`3c4d6509f3d`](https://github.com/ClickHouse/ClickHouse/commit/3c4d6509f3d) | Backup/restore refactor for access entities | Cleaner architecture and fewer edge-case restore issues. | -| 2024-09-18 | [`712a7261a9c`](https://github.com/ClickHouse/ClickHouse/commit/712a7261a9c) | Backup filenames changed to `access-.txt` | Deterministic naming across hosts for replicated access backups. | -| 2025-06-16 | [`d58a00754af`](https://github.com/ClickHouse/ClickHouse/commit/d58a00754af), [`81245`](https://github.com/ClickHouse/ClickHouse/pull/81245) | Split Keeper replication into `ZooKeeperReplicator` | Reusable replication core and cleaner separation of concerns. | -| 2025-09-12 | [`efa4d2b605e`](https://github.com/ClickHouse/ClickHouse/commit/efa4d2b605e) | ID/tag based watches in ZooKeeper client path | Lower watch/cache complexity and better correctness under churn. | -| 2025-09-12 | [`2bf08fc9a62`](https://github.com/ClickHouse/ClickHouse/commit/2bf08fc9a62) | Watch leftovers fix | Better long-run stability under frequent watch activity. | -| 2026-01-27 | [`21644efa780`](https://github.com/ClickHouse/ClickHouse/commit/21644efa780), [`95032`](https://github.com/ClickHouse/ClickHouse/pull/95032) | Option to throw on invalid replicated entities | Lets strict deployments fail fast on Keeper data corruption. | - -## 15. Practical guidance - -For most production clusters using replicated access entities: -1. Use replicated access storage as the RBAC source of truth. -2. Enable `ignore_on_cluster_for_replicated_access_entities_queries`. -3. Decide explicitly on strictness for invalid entities (`throw_on_invalid...`). -4. Monitor Keeper connection + request metrics and `system.zookeeper_*` logs. -5. Use `SYSTEM RELOAD USERS` as a controlled recovery tool. - -## 16. Key files (for engineers reading source) - -- `src/Access/ReplicatedAccessStorage.{h,cpp}` -- `src/Access/ZooKeeperReplicator.{h,cpp}` -- `src/Access/Common/AccessEntityType.{h,cpp}` -- `src/Access/AccessEntityIO.cpp` +## 10. Keeper structure (advanced troubleshooting) + +If `zookeeper_path=/clickhouse/access`: + +```text +/clickhouse/access + /uuid/ -> serialized ATTACH statements for one entity + /U/ -> user name -> UUID + /R/ -> role name -> UUID + /S/ -> settings profile name -> UUID + /P/ -> row policy name -> UUID + /Q/ -> quota name -> UUID + /M/ -> masking policy name -> UUID +``` + +When these paths are accessed: +- startup/reconnect: ClickHouse syncs Keeper, creates roots if missing, loads all entities; +- `CREATE/ALTER/DROP` RBAC SQL: updates `uuid` and type/name index nodes in Keeper transactions; +- runtime: watch callbacks refresh changed entities into local in-memory mirror. + +Advanced note: +- each ClickHouse node keeps a local in-memory cache of all replicated access entities; +- cache is updated from Keeper watch notifications (list/entity watches), so auth/lookup paths use local memory and not direct Keeper reads on each request. +- watch patterns used: + - list watch on `/uuid` children for create/delete detection; + - per-entity watch on `/uuid/` for payload changes. +- thread model: + - dedicated watcher thread (`runWatchingThread`); + - on errors: reset cached Keeper client, sleep, retry; + - after refresh: send `AccessChangesNotifier` notifications. +- cache layers: + - primary cache: `MemoryAccessStorage` inside replicated access storage; + - higher-level caches in `AccessControl` (`RoleCache`, `RowPolicyCache`, `QuotaCache`, `SettingsProfilesCache`) are updated/invalidated via access change notifications. + +## 11. Low-level behavior that explains real incidents + +- Read path is memory-backed (`MemoryAccessStorage` mirror), not direct Keeper reads per query. +- Write path requires Keeper availability; if Keeper is down, RBAC writes fail while some reads can continue from loaded state. +- Insert target is selected by storage order and writeability in `MultipleAccessStorage`; this is why leftover `local_directory` can hijack SQL user creation. +- `ignore_on_cluster_for_replicated_access_entities_queries` is implemented as AST rewrite that removes `ON CLUSTER` for access queries when replicated access storage is enabled. + +## 12. History highlights + +| Date | Change | Why it matters | +|---|---|---| +| 2021-07-21 | `ReplicatedAccessStorage` introduced (`e33a2bf7bc9`, PR #27426) | First Keeper-backed RBAC replication | +| 2023-08-18 | Ignore `ON CLUSTER` for replicated access entities (`14590305ad0`, PR #52975) | Reduced duplicate/overlap behavior | +| 2023-12-12 | Extended ignore behavior to `GRANT/REVOKE` (`b33f1245559`, PR #57538) | Fixed common operational conflict with grants | +| 2025-06-03 | Keeper replication logic extracted to `ZooKeeperReplicator` (`39eb90b73ef`, PR #81245) | Cleaner architecture, shared replication core | +| 2026-01-24 | Optional strict mode on invalid replicated entities (`3d654b79853`) | Lets operators fail fast on corrupted Keeper payloads | + +## 13. References for engineers + - `src/Access/AccessControl.cpp` -- `src/Access/AccessChangesNotifier.{h,cpp}` +- `src/Access/MultipleAccessStorage.cpp` +- `src/Access/ReplicatedAccessStorage.cpp` +- `src/Access/ZooKeeperReplicator.cpp` +- `src/Interpreters/removeOnClusterClauseIfNeeded.cpp` - `src/Access/IAccessStorage.cpp` -- `src/Backups/BackupCoordinationReplicatedAccess.{h,cpp}` - `src/Backups/BackupCoordinationOnCluster.cpp` - `src/Backups/RestoreCoordinationOnCluster.cpp` -- `src/Interpreters/removeOnClusterClauseIfNeeded.cpp` -- `src/Interpreters/Access/InterpreterGrantQuery.cpp` - `tests/integration/test_replicated_users/test.py` -- `tests/integration/test_replicated_access/test.py` - `tests/integration/test_replicated_access/test_invalid_entity.py` -- `tests/integration/test_access_control_on_cluster/test.py` From 1e0906b24f41e6cfc0acc71a58c2a077b7fb08c6 Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 00:49:53 +0100 Subject: [PATCH 04/10] Update users_in_keeper.md --- .../users_in_keeper.md | 252 ++++++++++-------- 1 file changed, 135 insertions(+), 117 deletions(-) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 7b28f0752e..0a04822457 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -12,9 +12,27 @@ This KB explains how to make SQL RBAC changes (`CREATE USER`, `CREATE ROLE`, `GR `Keeper` below means either ClickHouse Keeper or ZooKeeper. -## 1. Why use this instead of only `ON CLUSTER` for RBAC? +Before details, the core concept is: +- ClickHouse stores access entities in access storages configured by `user_directories`. +- By default, following the shared-nothing concept, SQL RBAC objects are local (`local_directory`), so changes done on one node do not automatically appear on another node unless you run `... ON CLUSTER ...`. +- With `user_directories.replicated`, ClickHouse stores the RBAC model in Keeper under a configured path (for example `/clickhouse/access`) and every node watches that path. +- Each node keeps a local in-memory mirror of replicated access entities and updates it from Keeper watch notifications. This is why normal access checks are local-memory fast, while RBAC writes depend on Keeper availability. + +Important mental model: +- this feature replicates RBAC state (users, roles, grants, policies, profiles, quotas, masking policies); +- it is not the same mechanism as distributed DDL queue execution used by `ON CLUSTER`. + +Flow of this KB: +1. Why this model helps. +2. How to configure it on a new cluster. +3. How to validate and operate it. +4. How to migrate existing RBAC safely. +5. Advanced troubleshooting and internals. + +## 1. Choose the RBAC replication model (`ON CLUSTER` vs Keeper) `ON CLUSTER` executes DDL on hosts that exist at execution time. +In practice, it fans out the query through the distributed DDL queue to currently known cluster nodes. It does not automatically replay old RBAC DDL for replicas/shards added later. Keeper-backed RBAC solves that: @@ -39,110 +57,7 @@ Cons: - Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure. - If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so DBA login is unavailable until startup succeeds. -## 2. Backup and migration first (important) - -Before switching to Keeper-backed RBAC, treat this as a migration. - -Key facts: -- Changing `user_directories` storage or changing `zookeeper_path` does **not** move existing SQL RBAC objects automatically. -- If path changes, old users/roles are not deleted, but become effectively hidden from the new storage path. -- `zookeeper_path` cannot be changed at runtime via SQL. - -Recommended migration sequence: -1. Back up RBAC objects. -2. Apply the new `user_directories` config on all nodes. -3. Restart/reload config as required by your environment. -4. Restore/recreate RBAC objects to the target storage. -5. Validate on all nodes. - -### 2.1 Migration with pure SQL (no backup tool) - -This path is useful when: -- RBAC DDL is already versioned in your repo, or -- you want to dump/replay access entities using SQL only. - -Recommended SQL-only flow: -1. On source, check where entities are stored (local vs replicated): - -```sql -SELECT name, storage FROM system.users ORDER BY name; -SELECT name, storage FROM system.roles ORDER BY name; -SELECT name, storage FROM system.settings_profiles ORDER BY name; -SELECT name, storage FROM system.quotas ORDER BY name; -SELECT name, storage FROM system.row_policies ORDER BY name; -SELECT name, storage FROM system.masking_policies ORDER BY name; -``` - -2. Export RBAC DDL from source: -- simplest full dump: - -```sql -SHOW ACCESS; -``` - -Save output as SQL (for example `rbac_dump.sql`) in your repo/artifacts. - -You can also export individual objects with `SHOW CREATE USER/ROLE/...` when needed. - -3. Switch config to replicated `user_directories` on target cluster and restart/reload. -4. Replay exported SQL on one node (without `ON CLUSTER` in replicated mode). -5. Validate from another node (`SHOW CREATE USER ...`, `SHOW GRANTS FOR ...`). - -### 2.2 Migration with `clickhouse-backup` (external tool) - -```bash -# backup local RBAC users/roles/etc. -clickhouse-backup create --rbac --rbac-only users_bkp_20260304 - -# restore (on node configured with replicated user directory) -clickhouse-backup restore --rbac-only users_bkp_20260304 -``` - -Important: -- this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.); -- if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion. - -### 2.3 Migration with embedded ClickHouse SQL `BACKUP/RESTORE` - -```sql -BACKUP - TABLE system.users, - TABLE system.roles, - TABLE system.row_policies, - TABLE system.quotas, - TABLE system.settings_profiles, - TABLE system.masking_policies -TO ; - --- after switching config -RESTORE - TABLE system.users, - TABLE system.roles, - TABLE system.row_policies, - TABLE system.quotas, - TABLE system.settings_profiles, - TABLE system.masking_policies -FROM ; -``` - -`allow_backup` behavior for embedded SQL backup/restore: -- Storage-level flag in `user_directories` (``, ``, ``) controls whether that storage participates in backup/restore. -- Entity-level setting `allow_backup` (for users/roles/settings profiles) can exclude specific RBAC objects from backup. - -Defaults in ClickHouse code: -- `users_xml`: `allow_backup = false` by default. -- `local_directory`: `allow_backup = true` by default. -- `replicated`: `allow_backup = true` by default. - -Operational implication: -- If you disable `allow_backup` for replicated storage, embedded `BACKUP TABLE system.users ...` may skip those entities (or fail if no backup-allowed access storage remains). - -About `clickhouse-backup --rbac/--rbac-only`: -- It is an external tool, not ClickHouse embedded backup by itself. -- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules. -- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics. - -## 3. Minimal server configuration +## 2. Configure Keeper-backed RBAC on a new cluster `user_directories` is the ClickHouse server configuration section that defines: - where access entities are read from (`users.xml`, local SQL access files, Keeper, LDAP, etc.), @@ -170,7 +85,7 @@ Why `replace="replace"` matters: - defaults include `local_directory`, so SQL RBAC may still be written locally; - this can cause mixed behavior (some entities in Keeper, some in local files). -### 3.1 `user_directories` behavior, defaults, and coexistence +### 2.1 Understand `user_directories`: defaults, precedence, coexistence What can be configured in `user_directories`: - `users_xml` (read-only config users), @@ -200,7 +115,7 @@ CREATE USER my_user IDENTIFIED BY '***' IN replicated; This is supported, but for access control we usually do **not** recommend mixing storages intentionally. For sensitive access rights, a single source of truth (typically `replicated`) is safer and easier to operate. -## 4. Altinity operator (CHI) example +## 3. Altinity Operator (CHI) configuration example ```yaml apiVersion: clickhouse.altinity.com/v1 @@ -223,7 +138,7 @@ spec: ``` -## 5. Quick validation checklist +## 4. Validate the setup quickly Check active storages and precedence: @@ -254,11 +169,11 @@ FROM system.zookeeper WHERE path = '/clickhouse/access'; ``` -## 6. Relationship with `ON CLUSTER` (important) +## 5. Handle existing `ON CLUSTER` RBAC scripts safely There are two independent propagation mechanisms: - Replicated access storage: Keeper-based replication of RBAC entities. -- `ON CLUSTER`: distributed DDL queue execution. +- `ON CLUSTER`: query fan-out through the distributed DDL queue (also Keeper/ZooKeeper-dependent). When replicated access storage is enabled, combining both can be redundant or problematic. @@ -294,7 +209,110 @@ Also decide your strictness for invalid replicated entities: - `true`: fail fast on invalid entity payload in Keeper. - `false`: log and skip invalid entity. -## 7. Common support issues (generalized) +## 6. Migrate existing clusters/users + +Before switching to Keeper-backed RBAC, treat this as a storage migration. + +Key facts before migration: +- Changing `user_directories` storage or changing `zookeeper_path` does **not** move existing SQL RBAC objects automatically. +- If path changes, old users/roles are not deleted, but become effectively hidden from the new storage path. +- `zookeeper_path` cannot be changed at runtime via SQL. + +Recommended high-level steps: +1. Export/backup RBAC. +2. Apply the new `user_directories` config on all nodes. +3. Restart/reload as needed. +4. Restore/replay RBAC. +5. Validate from multiple nodes. + +### 6.1 SQL-only migration (export/import RBAC DDL) + +This path is useful when: +- RBAC DDL is already versioned in your repo, or +- you want to dump/replay access entities using SQL only. + +Recommended SQL-only flow: +1. On source, check where entities are stored (local vs replicated): + +```sql +SELECT name, storage FROM system.users ORDER BY name; +SELECT name, storage FROM system.roles ORDER BY name; +SELECT name, storage FROM system.settings_profiles ORDER BY name; +SELECT name, storage FROM system.quotas ORDER BY name; +SELECT name, storage FROM system.row_policies ORDER BY name; +SELECT name, storage FROM system.masking_policies ORDER BY name; +``` + +2. Export RBAC DDL from source: +- simplest full dump: + +```sql +SHOW ACCESS; +``` + +Save output as SQL (for example `rbac_dump.sql`) in your repo/artifacts. + +You can also export individual objects with `SHOW CREATE USER/ROLE/...` when needed. + +3. Switch config to replicated `user_directories` on target cluster and restart/reload. +4. Replay exported SQL on one node (without `ON CLUSTER` in replicated mode). +5. Validate from another node (`SHOW CREATE USER ...`, `SHOW GRANTS FOR ...`). + +### 6.2 Migration with `clickhouse-backup` (`--rbac-only`) + +```bash +# backup local RBAC users/roles/etc. +clickhouse-backup create --rbac --rbac-only users_bkp_20260304 + +# restore (on node configured with replicated user directory) +clickhouse-backup restore --rbac-only users_bkp_20260304 +``` + +Important: +- this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.); +- if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion. + +### 6.3 Migration with embedded SQL `BACKUP/RESTORE` + +```sql +BACKUP + TABLE system.users, + TABLE system.roles, + TABLE system.row_policies, + TABLE system.quotas, + TABLE system.settings_profiles, + TABLE system.masking_policies +TO ; + +-- after switching config +RESTORE + TABLE system.users, + TABLE system.roles, + TABLE system.row_policies, + TABLE system.quotas, + TABLE system.settings_profiles, + TABLE system.masking_policies +FROM ; +``` + +`allow_backup` behavior for embedded SQL backup/restore: +- Storage-level flag in `user_directories` (``, ``, ``) controls whether that storage participates in backup/restore. +- Entity-level setting `allow_backup` (for users/roles/settings profiles) can exclude specific RBAC objects from backup. + +Defaults in ClickHouse code: +- `users_xml`: `allow_backup = false` by default. +- `local_directory`: `allow_backup = true` by default. +- `replicated`: `allow_backup = true` by default. + +Operational implication: +- If you disable `allow_backup` for replicated storage, embedded `BACKUP TABLE system.users ...` may skip those entities (or fail if no backup-allowed access storage remains). + +About `clickhouse-backup --rbac/--rbac-only`: +- It is an external tool, not ClickHouse embedded backup by itself. +- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules. +- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics. + +## 7. Troubleshooting: common support issues | Symptom | Typical root cause | What to do | |---|---|---| @@ -312,7 +330,7 @@ Also decide your strictness for invalid replicated entities: | Short window where user seems present/absent via load balancer | Propagation + node routing timing | Validate directly on each node; avoid assuming LB view is instantly consistent | | Server fails after aggressive `user_directories` replacement | Required base users/profiles missing in config | Keep `users_xml` (or equivalent base definitions) intact | -## 8. Operational guardrails +## 8. Operational guardrails for production - Keep the same `user_directories` config on all nodes. - Keep `zookeeper_path` unique per cluster/tenant. @@ -321,7 +339,7 @@ Also decide your strictness for invalid replicated entities: - Treat Keeper health as part of access-management SLO. - Plan RBAC backup/restore before changing storage path or cluster topology. -## 9. Debugging and observability +## 9. Observability and debugging signals Keeper connectivity: @@ -367,7 +385,7 @@ Force access reload: SYSTEM RELOAD USERS; ``` -## 10. Keeper structure (advanced troubleshooting) +## 10. Keeper path structure and semantics (advanced) If `zookeeper_path=/clickhouse/access`: @@ -401,14 +419,14 @@ Advanced note: - primary cache: `MemoryAccessStorage` inside replicated access storage; - higher-level caches in `AccessControl` (`RoleCache`, `RowPolicyCache`, `QuotaCache`, `SettingsProfilesCache`) are updated/invalidated via access change notifications. -## 11. Low-level behavior that explains real incidents +## 11. Low-level internals behind real incidents - Read path is memory-backed (`MemoryAccessStorage` mirror), not direct Keeper reads per query. - Write path requires Keeper availability; if Keeper is down, RBAC writes fail while some reads can continue from loaded state. - Insert target is selected by storage order and writeability in `MultipleAccessStorage`; this is why leftover `local_directory` can hijack SQL user creation. - `ignore_on_cluster_for_replicated_access_entities_queries` is implemented as AST rewrite that removes `ON CLUSTER` for access queries when replicated access storage is enabled. -## 12. History highlights +## 12. Version and history highlights | Date | Change | Why it matters | |---|---|---| @@ -418,7 +436,7 @@ Advanced note: | 2025-06-03 | Keeper replication logic extracted to `ZooKeeperReplicator` (`39eb90b73ef`, PR #81245) | Cleaner architecture, shared replication core | | 2026-01-24 | Optional strict mode on invalid replicated entities (`3d654b79853`) | Lets operators fail fast on corrupted Keeper payloads | -## 13. References for engineers +## 13. Code references for deep dives - `src/Access/AccessControl.cpp` - `src/Access/MultipleAccessStorage.cpp` From 36c33d1d0b23fd6df2dabf68233807cb21206eec Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 01:06:22 +0100 Subject: [PATCH 05/10] Update users_in_keeper.md --- .../users_in_keeper.md | 61 +++++++++++++++---- 1 file changed, 50 insertions(+), 11 deletions(-) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 0a04822457..523e0820c9 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -12,15 +12,16 @@ This KB explains how to make SQL RBAC changes (`CREATE USER`, `CREATE ROLE`, `GR `Keeper` below means either ClickHouse Keeper or ZooKeeper. +TL;DR: +- By default, SQL RBAC changes (`CREATE USER`, `GRANT`, etc.) are local to each server. +- Replicated access storage keeps RBAC entities in ZooKeeper/ClickHouse Keeper so changes automatically appear on all nodes. +- This guide shows how to configure replicated RBAC, validate it, and migrate existing users safely. + Before details, the core concept is: - ClickHouse stores access entities in access storages configured by `user_directories`. - By default, following the shared-nothing concept, SQL RBAC objects are local (`local_directory`), so changes done on one node do not automatically appear on another node unless you run `... ON CLUSTER ...`. - With `user_directories.replicated`, ClickHouse stores the RBAC model in Keeper under a configured path (for example `/clickhouse/access`) and every node watches that path. -- Each node keeps a local in-memory mirror of replicated access entities and updates it from Keeper watch notifications. This is why normal access checks are local-memory fast, while RBAC writes depend on Keeper availability. - -Important mental model: -- this feature replicates RBAC state (users, roles, grants, policies, profiles, quotas, masking policies); -- it is not the same mechanism as distributed DDL queue execution used by `ON CLUSTER`. +- Each node keeps a local in-memory mirror of replicated access entities and updates it from Keeper watch callbacks. This is why normal access checks are local-memory fast, while RBAC writes depend on Keeper availability. Flow of this KB: 1. Why this model helps. @@ -29,10 +30,10 @@ Flow of this KB: 4. How to migrate existing RBAC safely. 5. Advanced troubleshooting and internals. -## 1. Choose the RBAC replication model (`ON CLUSTER` vs Keeper) +## 1. ON CLUSTER vs Keeper-backed RBAC: when to use which `ON CLUSTER` executes DDL on hosts that exist at execution time. -In practice, it fans out the query through the distributed DDL queue to currently known cluster nodes. +In practice, it fans out the query through the distributed DDL queue (also Keeper/ZooKeeper-dependent) to currently known cluster nodes. It does not automatically replay old RBAC DDL for replicas/shards added later. Keeper-backed RBAC solves that: @@ -40,6 +41,8 @@ Keeper-backed RBAC solves that: - new servers read the same RBAC state when they join; - no need to remember `ON CLUSTER` for every RBAC statement. +Mental model: Keeper-backed RBAC replicates access state, while `ON CLUSTER` fans out DDL to currently known nodes. + ### 1.1 Pros and Cons Pros: @@ -50,12 +53,12 @@ Pros: - Integrates with access-entity backup/restore. Cons: -- Writes depend on Keeper availability (reads can continue from local cache, writes fail when Keeper is unavailable). +- Writes depend on Keeper availability. `CREATE/ALTER/DROP USER` and `CREATE/ALTER/DROP ROLE`, plus `GRANT/REVOKE`, fail if Keeper is unavailable, while existing authentication/authorization may continue from already loaded cache until restart. - Operational complexity increases (Keeper health directly affects RBAC operations). - Can conflict with `ON CLUSTER` if both mechanisms are used without guard settings. - Invalid/corrupted payload in Keeper can be skipped or be startup-fatal, depending on `throw_on_invalid_replicated_access_entities`. - Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure. -- If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so DBA login is unavailable until startup succeeds. +- If Keeper is unavailable during server startup and replicated RBAC storage is configured, startup can fail, so you may be unable to log in until startup succeeds. ## 2. Configure Keeper-backed RBAC on a new cluster @@ -85,6 +88,11 @@ Why `replace="replace"` matters: - defaults include `local_directory`, so SQL RBAC may still be written locally; - this can cause mixed behavior (some entities in Keeper, some in local files). +Recommended configuration for clusters using replicated RBAC: +- `users_xml`: bootstrap/break-glass admin users and static defaults. +- `replicated`: all SQL RBAC objects (`CREATE USER`, `CREATE ROLE`, `GRANT`, policies, profiles, quotas). +- avoid `local_directory` as an active writable SQL RBAC storage to prevent mixed write behavior. + ### 2.1 Understand `user_directories`: defaults, precedence, coexistence What can be configured in `user_directories`: @@ -148,6 +156,14 @@ FROM system.user_directories ORDER BY precedence; ``` +Example expected result (values can vary by version/config; precedence values are relative and order matters): + +```text +name type precedence +users_xml users_xml 0 +replicated replicated 1 +``` + Check where users are stored: ```sql @@ -156,11 +172,20 @@ FROM system.users ORDER BY name; ``` +Example expected result for SQL-created user: + +```text +name storage +kb_test replicated +``` + Smoke test: 1. On node A: `CREATE USER kb_test IDENTIFIED WITH no_password;` 2. On node B: `SHOW CREATE USER kb_test;` 3. On either node: `DROP USER kb_test;` +RBAC changes usually propagate within milliseconds to seconds, depending on Keeper latency and cluster load. + Check Keeper data exists: ```sql @@ -213,6 +238,8 @@ Also decide your strictness for invalid replicated entities: Before switching to Keeper-backed RBAC, treat this as a storage migration. +**Important:** replay/restore RBAC on one node only. Objects are written to Keeper and then reflected on all nodes. + Key facts before migration: - Changing `user_directories` storage or changing `zookeeper_path` does **not** move existing SQL RBAC objects automatically. - If path changes, old users/roles are not deleted, but become effectively hidden from the new storage path. @@ -230,6 +257,7 @@ Recommended high-level steps: This path is useful when: - RBAC DDL is already versioned in your repo, or - you want to dump/replay access entities using SQL only. +- Replaying `SHOW ACCESS` output is idempotent only if you handle `IF NOT EXISTS`/cleanup; otherwise prefer restoring into an empty RBAC namespace. Recommended SQL-only flow: 1. On source, check where entities are stored (local vs replicated): @@ -271,6 +299,7 @@ clickhouse-backup restore --rbac-only users_bkp_20260304 Important: - this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.); - if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion. +- run restore on one node only; entities will be replicated through Keeper. ### 6.3 Migration with embedded SQL `BACKUP/RESTORE` @@ -311,6 +340,7 @@ About `clickhouse-backup --rbac/--rbac-only`: - It is an external tool, not ClickHouse embedded backup by itself. - If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules. - Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics. +- run restore on one node only; entities will be replicated through Keeper. ## 7. Troubleshooting: common support issues @@ -320,6 +350,7 @@ About `clickhouse-backup --rbac/--rbac-only`: | RBAC objects “disappeared” after config change/restart | `zookeeper_path` or storage source changed | Restore from backup or recreate RBAC in the new storage; keep path stable | | New replica has no historical users/roles | Team used only `... ON CLUSTER ...` before scaling | Enable Keeper-backed RBAC so new nodes load shared state | | `CREATE USER ... ON CLUSTER` throws "already exists in replicated" | Query fan-out + replicated storage both applied | Remove `ON CLUSTER` for RBAC or enable `ignore_on_cluster_for_replicated_access_entities_queries` | +| `CREATE USER`/`GRANT` fails with Keeper/ZooKeeper error | Keeper unavailable or connection lost | Check `system.zookeeper_connection`, `system.zookeeper_connection_log`, and server logs | | RBAC writes still go local though `replicated` exists | `local_directory` remains first writable storage | Use `user_directories replace="replace"` and avoid writable local SQL storage in front of replicated | | Server does not start when Keeper is down; no one can log in | Replicated access storage needs Keeper during initialization | Restore Keeper first, then restart; if needed use a temporary fallback config and keep a break-glass `users.xml` admin | | Startup fails (or users are skipped) because of invalid RBAC payload in Keeper | Corrupted/invalid replicated entity and strict validation mode | Use `throw_on_invalid_replicated_access_entities` deliberately: `true` fail-fast, `false` skip+log; fix bad Keeper payload before re-enabling strict mode | @@ -341,14 +372,14 @@ About `clickhouse-backup --rbac/--rbac-only`: ## 9. Observability and debugging signals -Keeper connectivity: +### 9.1 Check Keeper connectivity ```sql SELECT * FROM system.zookeeper_connection; SELECT * FROM system.zookeeper_connection_log ORDER BY event_time DESC LIMIT 100; ``` -Keeper operations for RBAC path: +### 9.2 Inspect RBAC activity in Keeper ```sql SELECT event_time, type, op_num, path, error @@ -358,6 +389,8 @@ ORDER BY event_time DESC LIMIT 200; ``` +### 9.3 Relevant server log patterns + Note: `system.zookeeper_log` is often disabled in production. If it is unavailable, use server logs (usually `clickhouse-server.log`) with these patterns: @@ -370,6 +403,8 @@ Can't have Replicated access without ZooKeeper ON CLUSTER clause was ignored for query ``` +### 9.4 Inspect distributed DDL queue activity (when `ON CLUSTER` is involved) + If troubleshooting mixed usage of distributed DDL: ```sql @@ -379,6 +414,8 @@ ORDER BY query_create_time DESC LIMIT 100; ``` +### 9.5 Force RBAC reload + Force access reload: ```sql @@ -387,6 +424,8 @@ SYSTEM RELOAD USERS; ## 10. Keeper path structure and semantics (advanced) +The following details are useful for advanced debugging or when inspecting Keeper paths manually. + If `zookeeper_path=/clickhouse/access`: ```text From 6a9d32d010a0e2830e24b9bbbf909ff432f37457 Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 01:16:07 +0100 Subject: [PATCH 06/10] Update users_in_keeper.md --- .../users_in_keeper.md | 35 ++----------------- 1 file changed, 3 insertions(+), 32 deletions(-) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 523e0820c9..1a1148023c 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -223,17 +223,6 @@ For production, prefer configuring this in a profile (for example `default` in ` ``` -Also decide your strictness for invalid replicated entities: - -```xml - - true - -``` - -- `true`: fail fast on invalid entity payload in Keeper. -- `false`: log and skip invalid entity. - ## 6. Migrate existing clusters/users Before switching to Keeper-backed RBAC, treat this as a storage migration. @@ -300,6 +289,7 @@ Important: - this applies to SQL/RBAC users (created with `CREATE USER ...`, `CREATE ROLE ...`, etc.); - if your users are in `users.xml`, those are config-based (`--configs`) and this is not an automatic local->replicated RBAC conversion. - run restore on one node only; entities will be replicated through Keeper. +- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules. (see below). ### 6.3 Migration with embedded SQL `BACKUP/RESTORE` @@ -336,12 +326,6 @@ Defaults in ClickHouse code: Operational implication: - If you disable `allow_backup` for replicated storage, embedded `BACKUP TABLE system.users ...` may skip those entities (or fail if no backup-allowed access storage remains). -About `clickhouse-backup --rbac/--rbac-only`: -- It is an external tool, not ClickHouse embedded backup by itself. -- If `clickhouse-backup` is configured with `use_embedded_backup_restore: true`, it delegates to SQL `BACKUP/RESTORE` and follows embedded rules. -- Otherwise it uses its own workflow; do not assume full equivalence with embedded `allow_backup` semantics. -- run restore on one node only; entities will be replicated through Keeper. - ## 7. Troubleshooting: common support issues | Symptom | Typical root cause | What to do | @@ -379,26 +363,13 @@ SELECT * FROM system.zookeeper_connection; SELECT * FROM system.zookeeper_connection_log ORDER BY event_time DESC LIMIT 100; ``` -### 9.2 Inspect RBAC activity in Keeper - -```sql -SELECT event_time, type, op_num, path, error -FROM system.zookeeper_log -WHERE path LIKE '/clickhouse/access/%' -ORDER BY event_time DESC -LIMIT 200; -``` - -### 9.3 Relevant server log patterns +### 9.2 Relevant server log patterns -Note: `system.zookeeper_log` is often disabled in production. -If it is unavailable, use server logs (usually `clickhouse-server.log`) with these patterns: +You can find feature-related line in the log, by those patterns: ```text Access(replicated) ZooKeeperReplicator -Will try to restart watching thread after error -Initialization failed. Error: Can't have Replicated access without ZooKeeper ON CLUSTER clause was ignored for query ``` From 64b441be258837cfabc403d3d640fe9bc179a20d Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 01:17:51 +0100 Subject: [PATCH 07/10] Update users_in_keeper.md --- .../users_in_keeper.md | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 1a1148023c..395104dbab 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -365,7 +365,7 @@ SELECT * FROM system.zookeeper_connection_log ORDER BY event_time DESC LIMIT 100 ### 9.2 Relevant server log patterns -You can find feature-related line in the log, by those patterns: +You can find feature-related lines in the log, by those patterns: ```text Access(replicated) @@ -374,18 +374,7 @@ Can't have Replicated access without ZooKeeper ON CLUSTER clause was ignored for query ``` -### 9.4 Inspect distributed DDL queue activity (when `ON CLUSTER` is involved) - -If troubleshooting mixed usage of distributed DDL: - -```sql -SELECT cluster, entry, host, status, query, exception_code, exception_text -FROM system.distributed_ddl_queue -ORDER BY query_create_time DESC -LIMIT 100; -``` - -### 9.5 Force RBAC reload +### 9.3 Force RBAC reload Force access reload: From bc4b328bdcc696e5a02a05a790561f789937e891 Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 01:21:08 +0100 Subject: [PATCH 08/10] Update users_in_keeper.md --- content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 395104dbab..448fc431f4 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -55,6 +55,7 @@ Pros: Cons: - Writes depend on Keeper availability. `CREATE/ALTER/DROP USER` and `CREATE/ALTER/DROP ROLE`, plus `GRANT/REVOKE`, fail if Keeper is unavailable, while existing authentication/authorization may continue from already loaded cache until restart. - Operational complexity increases (Keeper health directly affects RBAC operations). +- Keeper data loss or accidental Keeper path damage can remove replicated RBAC state, and users may lose access; keep regular RBAC backups and test restore procedures. - Can conflict with `ON CLUSTER` if both mechanisms are used without guard settings. - Invalid/corrupted payload in Keeper can be skipped or be startup-fatal, depending on `throw_on_invalid_replicated_access_entities`. - Very large RBAC sets (thousands of users/roles or very complex grants) can increase Keeper/watch pressure. From f6125ad43bb80edf83e3a71dc3ca4df1ba5a8231 Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 01:25:26 +0100 Subject: [PATCH 09/10] Update users_in_keeper.md --- .../en/altinity-kb-setup-and-maintenance/users_in_keeper.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index 448fc431f4..a16a7e5662 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -383,6 +383,7 @@ Force access reload: SYSTEM RELOAD USERS; ``` + ## 10. Keeper path structure and semantics (advanced) The following details are useful for advanced debugging or when inspecting Keeper paths manually. @@ -405,6 +406,8 @@ When these paths are accessed: - `CREATE/ALTER/DROP` RBAC SQL: updates `uuid` and type/name index nodes in Keeper transactions; - runtime: watch callbacks refresh changed entities into local in-memory mirror. +## 11. Low-level internals + Advanced note: - each ClickHouse node keeps a local in-memory cache of all replicated access entities; - cache is updated from Keeper watch notifications (list/entity watches), so auth/lookup paths use local memory and not direct Keeper reads on each request. @@ -419,13 +422,12 @@ Advanced note: - primary cache: `MemoryAccessStorage` inside replicated access storage; - higher-level caches in `AccessControl` (`RoleCache`, `RowPolicyCache`, `QuotaCache`, `SettingsProfilesCache`) are updated/invalidated via access change notifications. -## 11. Low-level internals behind real incidents - - Read path is memory-backed (`MemoryAccessStorage` mirror), not direct Keeper reads per query. - Write path requires Keeper availability; if Keeper is down, RBAC writes fail while some reads can continue from loaded state. - Insert target is selected by storage order and writeability in `MultipleAccessStorage`; this is why leftover `local_directory` can hijack SQL user creation. - `ignore_on_cluster_for_replicated_access_entities_queries` is implemented as AST rewrite that removes `ON CLUSTER` for access queries when replicated access storage is enabled. + ## 12. Version and history highlights | Date | Change | Why it matters | From 9b564ff27700a2bff02b7b56014d4041f778119f Mon Sep 17 00:00:00 2001 From: filimonov <1549571+filimonov@users.noreply.github.com> Date: Thu, 5 Mar 2026 07:38:38 +0100 Subject: [PATCH 10/10] Update users_in_keeper.md --- content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md index a16a7e5662..32b0d55ae1 100644 --- a/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md +++ b/content/en/altinity-kb-setup-and-maintenance/users_in_keeper.md @@ -362,6 +362,7 @@ Operational implication: ```sql SELECT * FROM system.zookeeper_connection; SELECT * FROM system.zookeeper_connection_log ORDER BY event_time DESC LIMIT 100; +SELECT * FROM system.zookeeper WHERE path = '/clickhouse/access'; ``` ### 9.2 Relevant server log patterns