Add AI bot classification for event enrichment#155
Open
jaredmixpanel wants to merge 4 commits intomasterfrom
Open
Add AI bot classification for event enrichment#155jaredmixpanel wants to merge 4 commits intomasterfrom
jaredmixpanel wants to merge 4 commits intomasterfrom
Conversation
Part of AI bot classification feature for Python SDK.
Part of AI bot classification feature for Python SDK.
Part of AI bot classification feature for Python SDK.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #155 +/- ##
==========================================
+ Coverage 94.28% 94.66% +0.38%
==========================================
Files 9 13 +4
Lines 1557 1893 +336
Branches 101 116 +15
==========================================
+ Hits 1468 1792 +324
- Misses 54 60 +6
- Partials 35 41 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds AI bot classification functionality to the Mixpanel Python SDK, enabling automatic detection and enrichment of events from AI crawler requests. The implementation follows the SDK's established patterns and provides both a core classification engine and a consumer wrapper that seamlessly integrates with the existing tracking infrastructure.
Changes:
- Adds AI bot detection for 12 known AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) with extensible custom bot pattern support
- Implements BotClassifyingConsumer wrapper that enriches events with
$is_ai_bot,$ai_bot_name,$ai_bot_provider, and$ai_bot_categoryproperties - Provides framework-specific helper functions for Django, Flask, and FastAPI to simplify user-agent extraction
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| mixpanel/ai_bot_classifier.py | Core classification logic with bot database and pattern matching using compiled regex patterns |
| mixpanel/ai_bot_consumer.py | Consumer wrapper that intercepts events endpoint and enriches with bot classification |
| mixpanel/ai_bot_helpers.py | Framework integration helpers for extracting user-agent from Django, Flask, and FastAPI requests |
| mixpanel/init.py | Exports new BotClassifyingConsumer and classification functions |
| test_ai_bot_classifier.py | Comprehensive tests for classification logic covering all 12 bots plus edge cases |
| test_ai_bot_consumer.py | Tests for consumer wrapper including property preservation, endpoint filtering, and BufferedConsumer compatibility |
Address PR review: add $ai_bot_category assertions for Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds AI bot classification consumer wrapper that automatically detects AI crawler requests and enriches tracked events with classification properties.
What it does
$is_ai_bot,$ai_bot_name,$ai_bot_provider, and$ai_bot_categorypropertiesAI Bots Detected
GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Google-Extended, PerplexityBot, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, cohere-ai
Files Added
mixpanel/ai_bot_classifier.pymixpanel/ai_bot_consumer.pymixpanel/ai_bot_helpers.pytest_ai_bot_classifier.pytest_ai_bot_consumer.pyFiles Modified
mixpanel/__init__.pyTest Plan
$is_ai_bot: false(Chrome, Googlebot, curl, etc.)