Skip to content

Conversation

@sidmohan0
Copy link
Contributor

@sidmohan0 sidmohan0 commented Feb 1, 2026

Summary

  • New datafog/telemetry.py: Anonymous, opt-out usage telemetry via PostHog's /capture/ API using only stdlib (urllib.request) — zero new dependencies
  • Instrumented all public API surfaces: detect(), process(), DataFog class, TextService, core functions, and all 5 CLI commands
  • Wired track_error() into exception handlers across core.py, main.py, and client.py
  • Fixed services/__init__.py: Bare imports of heavy deps now wrapped in try/except
  • Fixed pre-existing bug: detect() in __init__.py referenced RegexAnnotator without calling lazy import
  • Added README telemetry disclosure section
  • 44 new tests covering opt-out, privacy, non-blocking behavior, payload correctness, integration, and edge cases

Test plan

  • 44 telemetry tests pass
  • Existing tests pass with zero regressions
  • DATAFOG_NO_TELEMETRY=1 disables all telemetry
  • Dependency audit confirms no disruption to conditional install pattern
  • Verify events appear in PostHog dashboard
  • Verify opt-out prevents events in PostHog

@sidmohan0 sidmohan0 force-pushed the feature/posthog-telemetry branch 3 times, most recently from 0ac5c44 to 64f2736 Compare February 1, 2026 22:35
Add lightweight, privacy-preserving usage telemetry to understand which
engines, functions, and features are actually used. Zero new dependencies
(stdlib urllib.request only). Fire-and-forget daemon threads ensure zero
latency impact.

- Create datafog/telemetry.py with PostHog /capture/ integration
- Instrument detect, process, detect_pii, anonymize_text, scan_text,
  get_supported_entities, DataFog class, TextService, and CLI commands
- Wire track_error() into exception handlers for error visibility
- Opt-out via DATAFOG_NO_TELEMETRY=1 or DO_NOT_TRACK=1
- Anonymous ID via SHA-256 of machine info (no PII)
- Text lengths bucketed, error messages never sent
- Thread-local dedup prevents double-counting nested calls
- Fix services/__init__.py to lazy-import ImageService and SparkService,
  so TextService works on minimal installs without aiohttp/PIL/pyspark
- Fix pre-existing NameError in __init__.py detect() for RegexAnnotator
- 44 tests covering opt-out, privacy, non-blocking, payloads, integration,
  error tracking, and edge cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sidmohan0 sidmohan0 force-pushed the feature/posthog-telemetry branch from 64f2736 to 68bc2e1 Compare February 1, 2026 22:50
@sidmohan0 sidmohan0 merged commit 63e6403 into dev Feb 1, 2026
19 checks passed
@sidmohan0 sidmohan0 deleted the feature/posthog-telemetry branch February 1, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant