-
Notifications
You must be signed in to change notification settings - Fork 618
Description
Summary
The ADF Google Cloud Storage connector currently only supports HMAC key authentication. This feature request asks for Workload Identity Federation (WIF) as an additional authenticationType, enabling secretless, identity-based access to GCS from Azure Data Factory.
Problem
Organizations with many GCP projects face an O(n) scaling problem with HMAC keys:
| GCP Projects | HMAC Keys | Key Vault Secrets | Rotation Events/Year |
|---|---|---|---|
| 10 | 10 | 10 | 20–40 |
| 50 | 50 | 50 | 100–200 |
| 100 | 100 | 100 | 200–400 |
Each HMAC key is a long-lived static credential — created per-project, stored in Key Vault, manually rotated. This does not align with zero-trust principles.
Proposed Solution
Add WorkloadIdentityFederation as an authenticationType on the GCS linked service. The token flow:
- ADF Managed Identity → Azure Entra ID access token (JWT, ~1h TTL)
- GCP Security Token Service → Federated access token (RFC 8693)
- GCP IAM Credentials API → Short-lived GCP access token via SA impersonation (1h)
- GCS API → Read objects using the SA token
Proposed Linked Service Schema
{
"type": "GoogleCloudStorage",
"typeProperties": {
"authenticationType": "WorkloadIdentityFederation",
"workloadIdentityFederation": {
"gcpProjectNumber": "123456789012",
"workloadIdentityPoolId": "azure-adf-pool",
"workloadIdentityProviderId": "azure-entra-oidc",
"serviceAccountEmail": "adf-reader@my-project.iam.gserviceaccount.com"
}
}
}Benefits over HMAC
| Property | HMAC Keys | WIF |
|---|---|---|
| Credential lifetime | Indefinite | ~1 hour (auto-refreshed) |
| Secret storage | Key Vault per project | None |
| Rotation | Manual per project | Automatic |
| Blast radius | Full bucket access if leaked | N/A — no secret to leak |
| Multi-project scale | O(n) keys | O(1) pool + O(n) IAM bindings |
| Auditability | Key Vault logs only | Entra + GCP Cloud Audit Logs |
Precedent: Microsoft Defender for Cloud
Microsoft already ships this pattern in production. Defender for Cloud's GCP connector uses WIF to access GCP APIs via Entra ID — no stored credentials.
| Aspect | Defender for Cloud | Proposed ADF Connector |
|---|---|---|
| Identity source | Defender service principal | ADF Managed Identity |
| Token exchange | Entra JWT → GCP STS | Entra JWT → GCP STS |
| SA impersonation | Yes | Yes |
| Credential storage | None | None |
| Multi-project support | Yes (org connector) | Yes (per linked service) |
PoC Validation
We validated a complete PoC demonstrating the full credential chain end-to-end with msal, google-auth, google-cloud-storage, azure-identity.
Successful Execution
============================================================
ADF <-> GCS Workload Identity Federation PoC
Azure Entra ID -> GCP STS -> SA Impersonation -> GCS
No HMAC keys. No long-lived secrets.
============================================================
INFO token_exchange Acquired Entra ID access token via client_secret (length=1247 chars)
INFO token_exchange GCP credentials acquired — SA: adf-gcs-reader@..., expires: 2026-03-03T13:00:02+00:00
GCS Bucket: gs://adf-gcs-wif-poc-xxxxxxxx
Objects found (1):
- test-data/sample.csv
SUCCESS: GCS access via Workload Identity Federation confirmed.
============================================================
Test Suite (99 tests, 99.3% coverage)
$ python -m pytest tests/unit/ --cov --cov-report=term-missing -q
........................................................................ [72%]
........................... [100%]
Name Stmts Miss Cover Missing
--------------------------------------------------------
python/blob_sink.py 37 0 100%
python/config.py 61 0 100%
python/gcs_client.py 45 0 100%
python/main.py 72 0 100%
python/token_exchange.py 72 2 97%
--------------------------------------------------------
TOTAL 287 2 99%
99 passed
Negative Test (Access Control Enforced)
When the IAM workloadIdentityUser binding is removed, the flow correctly fails with a 403, confirming access control is enforced at the identity level — not via static keys.
Integration Issues Discovered and Resolved
| # | Issue | Root Cause | Resolution |
|---|---|---|---|
| 1 | Token issuer mismatch | Default Entra tokens are v1 (iss: sts.windows.net/{tid}); GCP expects v2 |
Set accessTokenAcceptedVersion = 2 on the Entra app manifest (ref) |
| 2 | Audience mismatch | v2 tokens set aud to client_id (GUID), not the App ID URI |
Configure GCP allowed_audiences with [client_id] (ref) |
| 3 | SA impersonation 400 | identity_pool.Credentials requires OAuth scopes before impersonation call |
Call .with_scopes(["cloud-platform"]) before .refresh() |
| 4 | Entra identifier URI policy | Tenant policy requires tenant ID in api:// URIs |
Dynamically compute URI as api://{tenant_id}/app-name |
| 5 | File encoding on Windows | PowerShell > redirect produces UTF-16 BOM |
Use UTF-8 no-BOM encoding |
GCP-Side Configuration (What Customers Set Up)
OIDC Provider (Terraform)
resource "google_iam_workload_identity_pool" "this" {
project = var.gcp_project_id
workload_identity_pool_id = "azure-adf-pool"
display_name = "Azure ADF Pool"
}
resource "google_iam_workload_identity_pool_provider" "azure_entra" {
project = var.gcp_project_id
workload_identity_pool_id = google_iam_workload_identity_pool.this.workload_identity_pool_id
workload_identity_pool_provider_id = "azure-entra-oidc"
attribute_mapping = {
"google.subject" = "assertion.sub" # Azure SP object ID
"attribute.tid" = "assertion.tid" # Azure tenant ID
"attribute.app_id" = "assertion.azp" # Azure client ID
}
# Only accept tokens from the specific Azure tenant
attribute_condition = "attribute.tid == '${var.azure_tenant_id}'"
oidc {
issuer_uri = "https://login.microsoftonline.com/${var.azure_tenant_id}/v2.0"
allowed_audiences = [var.azure_client_id]
}
}Per-Project Service Account + IAM Binding
resource "google_service_account" "adf_reader" {
project = var.gcp_project_id
account_id = "adf-gcs-reader"
display_name = "ADF GCS Reader (WIF)"
}
resource "google_service_account_iam_member" "wif_impersonation" {
service_account_id = google_service_account.adf_reader.name
role = "roles/iam.workloadIdentityUser"
member = "principalSet://iam.googleapis.com/projects/${var.gcp_project_number}/locations/global/workloadIdentityPools/azure-adf-pool/attribute.app_id/${var.azure_client_id}"
}
# Grant read access to the target bucket
resource "google_storage_bucket_iam_member" "reader" {
bucket = var.gcs_bucket_name
role = "roles/storage.objectViewer"
member = "serviceAccount:${google_service_account.adf_reader.email}"
}Core Token Exchange Code
The actual 3-step flow that ADF would implement internally:
Step 1 — Acquire Entra token (Managed Identity in production):
from azure.identity import ManagedIdentityCredential
credential = ManagedIdentityCredential(client_id=azure_client_id)
token = credential.get_token(f"{app_id_uri}/.default")Steps 2+3 — STS exchange + SA impersonation (handled by google-auth):
from google.auth import identity_pool
creds = identity_pool.Credentials.from_info({
"type": "external_account",
"audience": "//iam.googleapis.com/projects/{project_number}/locations/global/workloadIdentityPools/{pool_id}/providers/{provider_id}",
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
"token_url": "https://sts.googleapis.com/v1/token",
"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/{sa_email}:generateAccessToken",
"credential_source": {"file": "<path_to_azure_jwt>", "format": {"type": "text"}}
})
scoped = creds.with_scopes(["https://www.googleapis.com/auth/cloud-platform"])
scoped.refresh(google.auth.transport.requests.Request())
# scoped is now a valid GCP credential — pass to any GCS clientStep 4 — Use with GCS (unchanged from existing connector):
from google.cloud import storage
client = storage.Client(credentials=scoped, project=None)
bucket = client.bucket("my-bucket")
blobs = list(bucket.list_blobs())What the ADF Team Would Need to Build
- Add
WorkloadIdentityFederationas anauthenticationTypeoption (backward-compatible — HMAC remains the default) - Implement the 3-step token flow: Managed Identity → STS → SA impersonation
- Use the resulting GCP access token with the existing GCS data reader (no changes needed)
- Add configuration fields to Portal UI, ARM/Bicep, and REST API schemas
- Document the GCP-side WIF setup for customers
No changes to GCS APIs are required — only a new authentication path in the ADF connector.
Interim Workaround
Until native support ships, customers can use an ADF Custom Activity running on Azure Batch with a User-Assigned Managed Identity to execute the WIF token exchange and transfer data from GCS to Azure Blob Storage. We have a working implementation of this approach with dual auth mode (client_secret for dev, managed_identity for production).
References
Microsoft Learn
- ADF GCS connector (current)
- Workload identity federation
- Managed identities for Azure resources
- Entra access tokens (v2 format)
- App manifest — accessTokenAcceptedVersion
- MSAL Python
- ManagedIdentityCredential (Python)
- ADF Custom Activity
- Azure Batch Managed Identity pools
- Defender for Cloud — GCP onboarding
- Defender for Cloud — GCP connector architecture
Google Cloud
- Workload Identity Federation overview
- Configure WIF with Azure
- Attribute conditions
- Attribute mapping
- Service account key best practices (recommends WIF)
- IAM roles — storage.objectViewer
Terraform Registry
- google_iam_workload_identity_pool
- google_iam_workload_identity_pool_provider
- google_service_account_iam_member