feat(connectors): implement Redshift Sink Connector with S3 staging (#2540)#2557
feat(connectors): implement Redshift Sink Connector with S3 staging (#2540)#2557GaneshPatil7517 wants to merge 17 commits intoapache:masterfrom
Conversation
Implements Issue apache#2540 - Redshift Sink Connector with S3 staging support. Features: - S3 staging with automatic CSV file upload - Redshift COPY command execution via PostgreSQL wire protocol - IAM role authentication (recommended) or access key credentials - Configurable batch size and compression (gzip, lzop, bzip2, zstd) - Automatic table creation with customizable schema - Retry logic with exponential backoff for transient failures - Automatic cleanup of staged S3 files Configuration options: - connection_string: Redshift cluster connection URL - target_table: Destination table name - iam_role: IAM role ARN for S3 access (recommended) - s3_bucket/s3_region/s3_prefix: S3 staging location - batch_size: Messages per batch (default: 10000) - compression: COPY compression format - delete_staged_files: Auto-cleanup toggle (default: true) - auto_create_table: Create table if missing (default: true) Closes apache#2540
- Fix markdown lint issues in README.md (table formatting, blank lines, code fence language) - Fix trailing newline in Cargo.toml - Apply TOML formatting via taplo - Add missing dependencies to DEPENDENCIES.md (rust-s3, rxml, rxml_validation, static_assertions)
|
Please write true integration e2e test, similarly to postgres. |
ok ill update.... |
- Add Redshift sink integration test using PostgreSQL (Redshift-compatible) and LocalStack for S3 - Add s3_endpoint config option to support custom endpoints (LocalStack, MinIO) - Add path-style S3 access for custom endpoints - Add localstack feature to testcontainers-modules - Create test configuration files for Redshift connector
- Add s3_endpoint: None to test_config() in lib.rs (fixes E0063) - Add endpoint parameter to S3Uploader tests in s3.rs - Fix formatting for long line in init_s3_uploader() - Add iggy_connector_redshift_sink to DEPENDENCIES.md - Add maybe-async, md5, minidom to DEPENDENCIES.md
Critical fixes: - Change Rust edition from 2024 to 2021 in Cargo.toml - Fix S3 cleanup to happen regardless of COPY result (prevents orphaned files) Moderate fixes: - Remove zstd from valid compression options (not supported by Redshift) - Update README to remove zstd from compression list - Handle bucket creation error in integration tests with expect() - Log JSON serialization errors instead of silent unwrap_or_default() Performance: - Cache escaped quote string to avoid repeated format! allocations Windows compatibility (for local testing): - Add #[cfg(unix)] conditionals for Unix-specific code in sender/mod.rs
|
I'd greatly appreciate the actual human interaction in this PR, otherwise, I'm closing this. |
Fixes clippy warning about unused 'runtime' field in test setup struct. The runtime field is kept for future test expansion.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@GaneshPatil7517 do you plan to continue? |
|
yes but i want some time..... |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR. Thank you for your contribution! |
- Changed CONFIG_ to PLUGIN_CONFIG_ for plugin configuration fields - Changed TOPICS_0 to TOPICS with proper JSON array format - Added CONSUMER_GROUP environment variable
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR. Thank you for your contribution! |
Summary
Implements Issue #2540 - Redshift Sink Connector with S3 staging support.
Features
Configuration Options
connection_stringtarget_tableiam_roles3_buckets3_regionbatch_sizecompressiondelete_staged_filesauto_create_table*Either
iam_roleoraws_access_key_id+aws_secret_access_keyrequiredTesting
-D warningsCloses #2540