Skip to content

Comments

Add AI bot classification for event enrichment#155

Open
jaredmixpanel wants to merge 4 commits intomasterfrom
feature/ai-bot-classification
Open

Add AI bot classification for event enrichment#155
jaredmixpanel wants to merge 4 commits intomasterfrom
feature/ai-bot-classification

Conversation

@jaredmixpanel
Copy link
Contributor

Summary

Adds AI bot classification consumer wrapper that automatically detects AI crawler requests and enriches tracked events with classification properties.

What it does

  • Classifies user-agent strings against a database of 12 known AI bots
  • Enriches events with $is_ai_bot, $ai_bot_name, $ai_bot_provider, and $ai_bot_category properties
  • Supports custom bot patterns that take priority over built-in patterns
  • Case-insensitive matching

AI Bots Detected

GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Google-Extended, PerplexityBot, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, cohere-ai

Files Added

  • mixpanel/ai_bot_classifier.py
  • mixpanel/ai_bot_consumer.py
  • mixpanel/ai_bot_helpers.py
  • test_ai_bot_classifier.py
  • test_ai_bot_consumer.py

Files Modified

  • mixpanel/__init__.py

Test Plan

  • All 12 AI bot user-agents correctly classified
  • Non-AI-bot user-agents return $is_ai_bot: false (Chrome, Googlebot, curl, etc.)
  • Empty string and null/nil inputs handled gracefully
  • Case-insensitive matching works
  • Custom bot patterns checked before built-in
  • Event properties preserved through enrichment
  • No regressions in existing test suite

Part of AI bot classification feature for Python SDK.
Part of AI bot classification feature for Python SDK.
Part of AI bot classification feature for Python SDK.
@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 96.42857% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.66%. Comparing base (b0fc5e5) to head (0b6e320).

Files with missing lines Patch % Lines
test_ai_bot_consumer.py 94.40% 4 Missing and 4 partials ⚠️
mixpanel/ai_bot_classifier.py 81.81% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
+ Coverage   94.28%   94.66%   +0.38%     
==========================================
  Files           9       13       +4     
  Lines        1557     1893     +336     
  Branches      101      116      +15     
==========================================
+ Hits         1468     1792     +324     
- Misses         54       60       +6     
- Partials       35       41       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds AI bot classification functionality to the Mixpanel Python SDK, enabling automatic detection and enrichment of events from AI crawler requests. The implementation follows the SDK's established patterns and provides both a core classification engine and a consumer wrapper that seamlessly integrates with the existing tracking infrastructure.

Changes:

  • Adds AI bot detection for 12 known AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) with extensible custom bot pattern support
  • Implements BotClassifyingConsumer wrapper that enriches events with $is_ai_bot, $ai_bot_name, $ai_bot_provider, and $ai_bot_category properties
  • Provides framework-specific helper functions for Django, Flask, and FastAPI to simplify user-agent extraction

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mixpanel/ai_bot_classifier.py Core classification logic with bot database and pattern matching using compiled regex patterns
mixpanel/ai_bot_consumer.py Consumer wrapper that intercepts events endpoint and enriches with bot classification
mixpanel/ai_bot_helpers.py Framework integration helpers for extracting user-agent from Django, Flask, and FastAPI requests
mixpanel/init.py Exports new BotClassifyingConsumer and classification functions
test_ai_bot_classifier.py Comprehensive tests for classification logic covering all 12 bots plus edge cases
test_ai_bot_consumer.py Tests for consumer wrapper including property preservation, endpoint filtering, and BufferedConsumer compatibility

Address PR review: add $ai_bot_category assertions for
Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant