Switch to GCS default, remove Firestore, add Prometheus metrics#27
Merged
Switch to GCS default, remove Firestore, add Prometheus metrics#27
Conversation
- Add comprehensive e2e test script (scripts/e2e_test.sh) - Tests full pipeline: API → Queue → RingBuffer → EventLoader → GCS - Uses GCS emulator via Docker - Verifies Parquet files in Hive-style partitions - All checks passing ✓ - Fix test warnings (0 warnings now) - Replace AsyncMock with simpler async mock in ring buffer tests - Change asyncio.get_event_loop() to asyncio.new_event_loop() for Python 3.12 - Clean up docker-compose.yml - Remove Firestore emulator (no longer used) All 232 unit tests passing with 0 warnings. E2E test verifies full event pipeline end-to-end.
- Add prometheus-client dependency
- Create core metrics infrastructure (src/eventkit/metrics.py)
- Custom registry
- Version info metric
- Uptime gauge (auto-updated every 10s)
- Component health gauges
- Metrics HTTP server on port 9090
- Add metrics modules for all components:
- API metrics (requests, duration)
- Event processing metrics (received, processed, failed)
- Storage metrics (files, bytes, operations, duration)
- Queue metrics (enqueued, dequeued, depth)
- Ring buffer metrics (written, published, size)
- Warehouse loader metrics (discovered, loaded, pending)
- Instrument key components:
- API middleware tracks all HTTP requests
- Processor tracks event flow (received → processed → failed)
- GCSEventStore tracks storage operations
- Config: EVENTKIT_METRICS_ENABLED and EVENTKIT_METRICS_PORT
- Tests: 238 passing (6 new metrics tests)
Design principles:
- Counter-focused (prefer counters over gauges)
- Low cardinality labels (no unbounded values)
- Naming: eventkit_{verb_noun}_{unit}_{suffix}
- Separate metrics server (port 9090) isolates monitoring traffic
Next: Instrument queue/ring buffer, add documentation
- Ring buffer metrics: - Track writes, publishes (success/failure), marked published - Track total size and unpublished count (gauges) - Auto-update size metrics on write/mark operations - Ring buffer publisher: - Track successful/failed publish attempts - Async queue metrics: - Track enqueue/dequeue operations - Track processing success/failure - Track queue depth per worker (gauge) All 238 tests passing. Metrics are now wired throughout the pipeline: API → Queue → RingBuffer → Processor → Storage
Metrics server runs on separate port (9090) from API (8000). E2E test now verifies: - Prometheus format - API request metrics - Event processing metrics - Storage metrics
Document all Prometheus metrics: - API layer (requests, latency) - Event processing (received, processed, failed) - Storage (bytes, files written) - Queue/ring buffer (depth, enqueue, dequeue) - Warehouse loader (files processed, errors) - System (uptime, health, version) Includes: - Metrics server configuration - Example Grafana queries - Design principles - All available metrics with labels
Track PubSub-specific operations: - Ring buffer enqueue (before Pub/Sub publish) - Pub/Sub publish success/failure - Messages received from subscription - Ack/nack with reasons (success, decode_error, processing_error, no_loop) - Queue depth per worker Labels match AsyncQueue for consistency: - queue_mode="pubsub" or "pubsub_published" - result="ack_success", "nack_processing_error", etc. All queue modes now fully instrumented for production observability.
The ring buffer integration test relied on Firestore which was removed. Coverage for ring buffer → queue → storage flow is now provided by: - Unit tests for individual components - E2e test script (scripts/e2e_test.sh) with GCS emulator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR completes the transition to GCS + BigQuery as the production-ready storage backend and adds comprehensive Prometheus metrics for observability.
Changes
1. Switch to GCS Default & Remove Firestore (Issue #25)
EVENTKIT_EVENT_STOREto"gcs"2. Prometheus Metrics (Bonus)
3. Testing & CI
Documentation
Migration Notes
EVENTKIT_EVENT_STORE=firestoreno longer supportedEventStoreEVENTKIT_METRICS_ENABLED(default: true),EVENTKIT_METRICS_PORT(default: 9090)Related Issues
Fixes #25