Skip to content

Comments

[PECOBLR-1928] Add AI coding agent detection to User-Agent header#739

Closed
vikrantpuppala wants to merge 1 commit intodatabricks:mainfrom
vikrantpuppala:agent-detection
Closed

[PECOBLR-1928] Add AI coding agent detection to User-Agent header#739
vikrantpuppala wants to merge 1 commit intodatabricks:mainfrom
vikrantpuppala:agent-detection

Conversation

@vikrantpuppala
Copy link
Contributor

Summary

  • Adds agent.py module that detects 7 AI coding agents (Claude Code, Cursor, Gemini CLI, Cline, Codex, OpenCode, Antigravity) by checking well-known environment variables they set in spawned shell processes
  • Integrates detection into both Session (Thrift path) and build_client_context (SEA path) to append agent/<product> to the User-Agent header
  • Uses exactly-one detection rule: if zero or multiple agent env vars are set, no agent is attributed (avoids ambiguity)

Approach

Mirrors the implementation in databricks/cli#4287 and aligns with the latest agent list in libs/agent/agent.go.

Agent Product String Environment Variable
Google Antigravity antigravity ANTIGRAVITY_AGENT
Claude Code claude-code CLAUDECODE
Cline cline CLINE_ACTIVE
OpenAI Codex codex CODEX_CI
Cursor cursor CURSOR_AGENT
Gemini CLI gemini-cli GEMINI_CLI
OpenCode opencode OPENCODE

Adding a new agent requires only a new entry in the KNOWN_AGENTS list.

Changes

  • New: src/databricks/sql/common/agent.py — environment-variable-based agent detection with injectable env dict for testability
  • Modified: src/databricks/sql/session.py — appends agent/<product> to useragent_header (Thrift path)
  • Modified: src/databricks/sql/utils.py — appends agent/<product> in build_client_context() (SEA path)
  • New: tests/unit/test_agent_detection.py — 12 test cases covering all agents, no agent, multiple agents, and empty values

Test plan

  • test_agent_detection.py — 12 unit tests pass
  • Manual: verified User-Agent contains agent/claude-code when run from Claude Code
    User-Agent: PyDatabricksSqlConnector/4.2.5 agent/claude-code
    User-Agent: PyDatabricksSqlConnector/4.2.5 (TestPartner) agent/claude-code
    
  • Executed SELECT 1 successfully against dogfood warehouse with the new header

🤖 Generated with Claude Code

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds AI coding agent detection to the User-Agent header by introducing a new agent.py module that detects 7 different AI coding agents through their environment variables. The detection is integrated into both the Thrift path (via Session) and the SEA path (via build_client_context) to append agent/<product> to the User-Agent header when exactly one agent is detected.

Changes:

  • Added new agent detection module with environment variable-based detection supporting 7 AI coding agents
  • Integrated agent detection into Session initialization to append agent information to User-Agent header for Thrift path
  • Integrated agent detection into build_client_context to append agent information to User-Agent header for SEA path
  • Added comprehensive unit tests for agent detection covering all agents, edge cases, and environment handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/databricks/sql/common/agent.py New module implementing environment variable-based detection for 7 AI coding agents with exactly-one detection rule
src/databricks/sql/session.py Integrates agent detection into Session initialization to append agent info to User-Agent header (Thrift path)
src/databricks/sql/utils.py Integrates agent detection into build_client_context to append agent info to User-Agent header (SEA path)
tests/unit/test_agent_detection.py Comprehensive unit tests for agent detection module covering all agents, edge cases, and default environment handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert detect({"CLAUDECODE": "1", "CURSOR_AGENT": "1"}) == ""

def test_ignores_empty_env_var_values(self):
assert detect({"CLAUDECODE": ""}) == ""
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should also verify that environment variables set to "0" or "false" are ignored, not just empty strings. Currently, the detection logic treats any truthy string value (including "0" or "false") as a positive detection, which could lead to false positives if an agent sets these variables to "0" to indicate it's disabled.

Suggested change
assert detect({"CLAUDECODE": ""}) == ""
assert detect({"CLAUDECODE": ""}) == ""
assert detect({"CLAUDECODE": "0"}) == ""
assert detect({"CLAUDECODE": "false"}) == ""

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above — the CLI reference implementation does not treat "0" or "false" as non-detection, so we intentionally match that behavior. The agents always set =1.

if env is None:
env = os.environ

detected = [product for var, product in KNOWN_AGENTS if env.get(var)]
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detection logic treats any truthy value as a positive detection. This means environment variables set to "0" or "false" will be incorrectly detected as active agents. Consider checking if the value is non-empty and not a false-like value (e.g., "0", "false", "False", "FALSE"). For example: detected = [product for var, product in KNOWN_AGENTS if env.get(var) and env.get(var).lower() not in ("0", "false")]

Suggested change
detected = [product for var, product in KNOWN_AGENTS if env.get(var)]
detected = [
product
for var, product in KNOWN_AGENTS
if (val := env.get(var)) and str(val).lower() not in ("0", "false")
]

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional — we are mirroring the behavior of the reference implementation in databricks/cli libs/agent/agent.go, which only checks for non-empty (\!= ""). The agents themselves always set their env var to "1" (e.g. CLAUDECODE=1), so "0" or "false" would indicate a misconfiguration, not a valid state. Keeping the logic identical across all drivers and the CLI avoids subtle behavioral divergence.

Comment on lines +68 to +70
agent_product = detect_agent()
if agent_product:
self.useragent_header += " agent/{}".format(agent_product)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an integration test in test_session.py that verifies the User-Agent header correctly includes the agent suffix when an agent environment variable is set. This would ensure the agent detection integration works end-to-end. For example, using monkeypatch to set an agent environment variable and verify the resulting User-Agent header contains "agent/claude-code".

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detection logic is fully covered by unit tests in test_agent_detection.py. The integration in session.py is a 3-line append that is straightforward. Adding an integration test here would require mocking the full Session constructor which adds complexity without meaningful coverage gain.

Comment on lines +925 to +927
agent_product = detect_agent()
if agent_product:
user_agent += f" agent/{agent_product}"
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for build_client_context that verifies the user_agent field correctly includes the agent suffix when an agent environment variable is set. This would ensure the agent detection integration works correctly in the SEA path.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above — the detection is unit-tested and the integration is a trivial string append. The SEA path goes through build_client_context which uses the same detect() function already covered by tests.

@vikrantpuppala vikrantpuppala changed the title Add AI coding agent detection to User-Agent header [PECOBLR-1928] Add AI coding agent detection to User-Agent header Feb 24, 2026
Detect when the Python SQL connector is invoked by an AI coding agent
(e.g. Claude Code, Cursor, Gemini CLI) by checking well-known
environment variables, and append `agent/<product>` to the User-Agent
string.

This enables Databricks to understand how much driver usage originates
from AI coding agents. Detection only succeeds when exactly one agent
is detected to avoid ambiguous attribution.

Mirrors the approach in databricks/cli#4287.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala
Copy link
Contributor Author

Recreating from databricks org branch to fix CI permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant