[PECOBLR-1928] Add AI coding agent detection to User-Agent header#739
[PECOBLR-1928] Add AI coding agent detection to User-Agent header#739vikrantpuppala wants to merge 1 commit intodatabricks:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds AI coding agent detection to the User-Agent header by introducing a new agent.py module that detects 7 different AI coding agents through their environment variables. The detection is integrated into both the Thrift path (via Session) and the SEA path (via build_client_context) to append agent/<product> to the User-Agent header when exactly one agent is detected.
Changes:
- Added new agent detection module with environment variable-based detection supporting 7 AI coding agents
- Integrated agent detection into Session initialization to append agent information to User-Agent header for Thrift path
- Integrated agent detection into build_client_context to append agent information to User-Agent header for SEA path
- Added comprehensive unit tests for agent detection covering all agents, edge cases, and environment handling
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/databricks/sql/common/agent.py | New module implementing environment variable-based detection for 7 AI coding agents with exactly-one detection rule |
| src/databricks/sql/session.py | Integrates agent detection into Session initialization to append agent info to User-Agent header (Thrift path) |
| src/databricks/sql/utils.py | Integrates agent detection into build_client_context to append agent info to User-Agent header (SEA path) |
| tests/unit/test_agent_detection.py | Comprehensive unit tests for agent detection module covering all agents, edge cases, and default environment handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert detect({"CLAUDECODE": "1", "CURSOR_AGENT": "1"}) == "" | ||
|
|
||
| def test_ignores_empty_env_var_values(self): | ||
| assert detect({"CLAUDECODE": ""}) == "" |
There was a problem hiding this comment.
The test should also verify that environment variables set to "0" or "false" are ignored, not just empty strings. Currently, the detection logic treats any truthy string value (including "0" or "false") as a positive detection, which could lead to false positives if an agent sets these variables to "0" to indicate it's disabled.
| assert detect({"CLAUDECODE": ""}) == "" | |
| assert detect({"CLAUDECODE": ""}) == "" | |
| assert detect({"CLAUDECODE": "0"}) == "" | |
| assert detect({"CLAUDECODE": "false"}) == "" |
There was a problem hiding this comment.
Same reasoning as above — the CLI reference implementation does not treat "0" or "false" as non-detection, so we intentionally match that behavior. The agents always set =1.
| if env is None: | ||
| env = os.environ | ||
|
|
||
| detected = [product for var, product in KNOWN_AGENTS if env.get(var)] |
There was a problem hiding this comment.
The detection logic treats any truthy value as a positive detection. This means environment variables set to "0" or "false" will be incorrectly detected as active agents. Consider checking if the value is non-empty and not a false-like value (e.g., "0", "false", "False", "FALSE"). For example: detected = [product for var, product in KNOWN_AGENTS if env.get(var) and env.get(var).lower() not in ("0", "false")]
| detected = [product for var, product in KNOWN_AGENTS if env.get(var)] | |
| detected = [ | |
| product | |
| for var, product in KNOWN_AGENTS | |
| if (val := env.get(var)) and str(val).lower() not in ("0", "false") | |
| ] |
There was a problem hiding this comment.
This is intentional — we are mirroring the behavior of the reference implementation in databricks/cli libs/agent/agent.go, which only checks for non-empty (\!= ""). The agents themselves always set their env var to "1" (e.g. CLAUDECODE=1), so "0" or "false" would indicate a misconfiguration, not a valid state. Keeping the logic identical across all drivers and the CLI avoids subtle behavioral divergence.
| agent_product = detect_agent() | ||
| if agent_product: | ||
| self.useragent_header += " agent/{}".format(agent_product) |
There was a problem hiding this comment.
Consider adding an integration test in test_session.py that verifies the User-Agent header correctly includes the agent suffix when an agent environment variable is set. This would ensure the agent detection integration works end-to-end. For example, using monkeypatch to set an agent environment variable and verify the resulting User-Agent header contains "agent/claude-code".
There was a problem hiding this comment.
The detection logic is fully covered by unit tests in test_agent_detection.py. The integration in session.py is a 3-line append that is straightforward. Adding an integration test here would require mocking the full Session constructor which adds complexity without meaningful coverage gain.
| agent_product = detect_agent() | ||
| if agent_product: | ||
| user_agent += f" agent/{agent_product}" |
There was a problem hiding this comment.
Consider adding a test for build_client_context that verifies the user_agent field correctly includes the agent suffix when an agent environment variable is set. This would ensure the agent detection integration works correctly in the SEA path.
There was a problem hiding this comment.
Same as above — the detection is unit-tested and the integration is a trivial string append. The SEA path goes through build_client_context which uses the same detect() function already covered by tests.
Detect when the Python SQL connector is invoked by an AI coding agent (e.g. Claude Code, Cursor, Gemini CLI) by checking well-known environment variables, and append `agent/<product>` to the User-Agent string. This enables Databricks to understand how much driver usage originates from AI coding agents. Detection only succeeds when exactly one agent is detected to avoid ambiguous attribution. Mirrors the approach in databricks/cli#4287. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
c99d661 to
5e42c33
Compare
|
Recreating from databricks org branch to fix CI permissions. |
Summary
agent.pymodule that detects 7 AI coding agents (Claude Code, Cursor, Gemini CLI, Cline, Codex, OpenCode, Antigravity) by checking well-known environment variables they set in spawned shell processesSession(Thrift path) andbuild_client_context(SEA path) to appendagent/<product>to the User-Agent headerApproach
Mirrors the implementation in databricks/cli#4287 and aligns with the latest agent list in
libs/agent/agent.go.antigravityANTIGRAVITY_AGENTclaude-codeCLAUDECODEclineCLINE_ACTIVEcodexCODEX_CIcursorCURSOR_AGENTgemini-cliGEMINI_CLIopencodeOPENCODEAdding a new agent requires only a new entry in the
KNOWN_AGENTSlist.Changes
src/databricks/sql/common/agent.py— environment-variable-based agent detection with injectable env dict for testabilitysrc/databricks/sql/session.py— appendsagent/<product>touseragent_header(Thrift path)src/databricks/sql/utils.py— appendsagent/<product>inbuild_client_context()(SEA path)tests/unit/test_agent_detection.py— 12 test cases covering all agents, no agent, multiple agents, and empty valuesTest plan
test_agent_detection.py— 12 unit tests passagent/claude-codewhen run from Claude CodeSELECT 1successfully against dogfood warehouse with the new header🤖 Generated with Claude Code