Skip to content

feat(connectors): implement Redshift Sink Connector with S3 staging (#2540)#2557

Open
GaneshPatil7517 wants to merge 17 commits intoapache:masterfrom
GaneshPatil7517:feature/redshift-sink-connector
Open

feat(connectors): implement Redshift Sink Connector with S3 staging (#2540)#2557
GaneshPatil7517 wants to merge 17 commits intoapache:masterfrom
GaneshPatil7517:feature/redshift-sink-connector

Conversation

@GaneshPatil7517
Copy link

Summary

Implements Issue #2540 - Redshift Sink Connector with S3 staging support.

Features

  • ✅ S3 staging with automatic CSV file upload
  • ✅ Redshift COPY command execution via PostgreSQL wire protocol
  • ✅ IAM role authentication (recommended) or access key credentials
  • ✅ Configurable batch size and compression (gzip, lzop, bzip2, zstd)
  • ✅ Automatic table creation with customizable schema
  • ✅ Retry logic with exponential backoff for transient failures
  • ✅ Automatic cleanup of staged S3 files

Configuration Options

Option Description Default
connection_string Redshift cluster connection URL Required
target_table Destination table name Required
iam_role IAM role ARN for S3 access Optional*
s3_bucket S3 bucket for staging Required
s3_region AWS region Required
batch_size Messages per batch 10000
compression COPY compression format None
delete_staged_files Auto-cleanup toggle true
auto_create_table Create table if missing true

*Either iam_role or aws_access_key_id+aws_secret_access_key required

Testing

  • 14 unit tests passing
  • Clippy clean with -D warnings

Closes #2540

Implements Issue apache#2540 - Redshift Sink Connector with S3 staging support.

Features:
- S3 staging with automatic CSV file upload
- Redshift COPY command execution via PostgreSQL wire protocol
- IAM role authentication (recommended) or access key credentials
- Configurable batch size and compression (gzip, lzop, bzip2, zstd)
- Automatic table creation with customizable schema
- Retry logic with exponential backoff for transient failures
- Automatic cleanup of staged S3 files

Configuration options:
- connection_string: Redshift cluster connection URL
- target_table: Destination table name
- iam_role: IAM role ARN for S3 access (recommended)
- s3_bucket/s3_region/s3_prefix: S3 staging location
- batch_size: Messages per batch (default: 10000)
- compression: COPY compression format
- delete_staged_files: Auto-cleanup toggle (default: true)
- auto_create_table: Create table if missing (default: true)

Closes apache#2540
- Fix markdown lint issues in README.md (table formatting, blank lines, code fence language)
- Fix trailing newline in Cargo.toml
- Apply TOML formatting via taplo
- Add missing dependencies to DEPENDENCIES.md (rust-s3, rxml, rxml_validation, static_assertions)
@hubcio
Copy link
Contributor

hubcio commented Jan 14, 2026

Please write true integration e2e test, similarly to postgres.

@GaneshPatil7517
Copy link
Author

Please write true integration e2e test, similarly to postgres.

ok ill update....

- Add Redshift sink integration test using PostgreSQL (Redshift-compatible) and LocalStack for S3
- Add s3_endpoint config option to support custom endpoints (LocalStack, MinIO)
- Add path-style S3 access for custom endpoints
- Add localstack feature to testcontainers-modules
- Create test configuration files for Redshift connector
- Add s3_endpoint: None to test_config() in lib.rs (fixes E0063)
- Add endpoint parameter to S3Uploader tests in s3.rs
- Fix formatting for long line in init_s3_uploader()
- Add iggy_connector_redshift_sink to DEPENDENCIES.md
- Add maybe-async, md5, minidom to DEPENDENCIES.md
Copilot AI review requested due to automatic review settings January 15, 2026 06:59

This comment was marked as outdated.

Critical fixes:
- Change Rust edition from 2024 to 2021 in Cargo.toml
- Fix S3 cleanup to happen regardless of COPY result (prevents orphaned files)

Moderate fixes:
- Remove zstd from valid compression options (not supported by Redshift)
- Update README to remove zstd from compression list
- Handle bucket creation error in integration tests with expect()
- Log JSON serialization errors instead of silent unwrap_or_default()

Performance:
- Cache escaped quote string to avoid repeated format! allocations

Windows compatibility (for local testing):
- Add #[cfg(unix)] conditionals for Unix-specific code in sender/mod.rs
@spetz
Copy link
Contributor

spetz commented Jan 15, 2026

I'd greatly appreciate the actual human interaction in this PR, otherwise, I'm closing this.

Fixes clippy warning about unused 'runtime' field in test setup struct.
The runtime field is kept for future test expansion.
@hubcio
Copy link
Contributor

hubcio commented Jan 21, 2026

@GaneshPatil7517 do you plan to continue?

@GaneshPatil7517
Copy link
Author

yes but i want some time.....

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions bot added the stale Inactive issue or pull request label Jan 30, 2026
@github-actions github-actions bot removed the stale Inactive issue or pull request label Feb 1, 2026
@github-actions
Copy link

github-actions bot commented Feb 8, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions bot added the stale Inactive issue or pull request label Feb 8, 2026
Copy link

@ryerraguntla ryerraguntla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

Copy link

@ryerraguntla ryerraguntla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Redshift Sink Connector with S3 staging

4 participants