force-import manifest list into IST after push to fix multi-arch race#5016
force-import manifest list into IST after push to fix multi-arch race#5016joepvd wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughAdds manifest-list import after pushing manifests: new helper creates an ImageStreamImport referencing the pushed manifest list with retry/backoff and status checks; adds parsing helpers to split image stream references into namespace, name, and tag; includes tests for import and parsing logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: joepvd The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/manifestpusher/manifestpusher.go (1)
120-120: Use a timeout-bearing context for the create call.Line 120 uses
context.Background(), so a stalled apiserver call can hang this post-push path indefinitely even though the retry count is bounded. Please thread a caller context throughPushImageWithManifest/importManifestList, or wrap eachCreateattempt in acontext.WithTimeout.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/manifestpusher/manifestpusher.go` at line 120, The call to m.client.Create currently uses context.Background() (in importManifestList/PushImageWithManifest) which can hang; replace it with a timeout-bearing context by either threading the caller context into PushImageWithManifest/importManifestList and using that, or wrapping each Create attempt with a context.WithTimeout (e.g., create ctx, cancel := context.WithTimeout(parentCtx, timeout); defer cancel()) and pass ctx to m.client.Create for streamImport; ensure cancel is called and errors from context deadlines are handled/retried consistently with existing retry logic.pkg/manifestpusher/manifestpusher_test.go (1)
197-307: Add one case for the retry/error-status path.The new flake-prevention logic is in the backoff and
Status.Imageshandling, but this test only locks in request construction. A case forForbidden/Conflictretry or an emptyStatus.Imagesresponse would keep the core behavior from regressing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/manifestpusher/manifestpusher_test.go` around lines 197 - 307, Add a test case in TestImportManifestList that exercises the retry/error-status path: create a case that targets manifestPusher.importManifestList where the importCapturingClient.Create simulates transient API errors (e.g., return apierrors.NewForbidden or apierrors.NewConflict for the first N calls and then succeed) and another case where Create returns an ImageStreamImport with an empty Status.Images slice to assert the function retries or returns the expected error; update importCapturingClient.Create to inspect a test-controlled flag or count (referencing importCapturingClient and its Create method) so it can return apierrors.NewForbidden/apierrors.NewConflict on early calls and later populate isi.Status.Images (or leave it empty) to validate retry and error handling in manifestPusher.importManifestList.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/manifestpusher/manifestpusher_test.go`:
- Around line 197-307: Add a test case in TestImportManifestList that exercises
the retry/error-status path: create a case that targets
manifestPusher.importManifestList where the importCapturingClient.Create
simulates transient API errors (e.g., return apierrors.NewForbidden or
apierrors.NewConflict for the first N calls and then succeed) and another case
where Create returns an ImageStreamImport with an empty Status.Images slice to
assert the function retries or returns the expected error; update
importCapturingClient.Create to inspect a test-controlled flag or count
(referencing importCapturingClient and its Create method) so it can return
apierrors.NewForbidden/apierrors.NewConflict on early calls and later populate
isi.Status.Images (or leave it empty) to validate retry and error handling in
manifestPusher.importManifestList.
In `@pkg/manifestpusher/manifestpusher.go`:
- Line 120: The call to m.client.Create currently uses context.Background() (in
importManifestList/PushImageWithManifest) which can hang; replace it with a
timeout-bearing context by either threading the caller context into
PushImageWithManifest/importManifestList and using that, or wrapping each Create
attempt with a context.WithTimeout (e.g., create ctx, cancel :=
context.WithTimeout(parentCtx, timeout); defer cancel()) and pass ctx to
m.client.Create for streamImport; ensure cancel is called and errors from
context deadlines are handled/retried consistently with existing retry logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 54f2c2c5-758a-4f85-b0c3-a6f2c654e3b1
📒 Files selected for processing (2)
pkg/manifestpusher/manifestpusher.gopkg/manifestpusher/manifestpusher_test.go
After PushManifestList pushes a manifest list to the registry via the Docker V2 API (manifest-tool), the OpenShift image controller's async reconciliation does not reliably update the ImageStreamTag. This leaves the IST at a stale digest, causing downstream builds to fail with "no image found in manifest list for architecture amd64" and burning 30-60 minutes on retries that will never succeed. Fix this by creating an ImageStreamImport with Import:true after the manifest push, which synchronously forces the IST to point at the correct manifest list digest. This uses ImportModePreserveOriginal to retain the manifest list structure for multi-arch resolution. Signed-off-by: Joep van de Leur <jdelft@redhat.com>
a081233 to
1a93112
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/manifestpusher/manifestpusher.go (1)
121-124: Retrying onForbiddenmay be unnecessary.
IsForbiddentypically indicates a permanent authorization issue (HTTP 403), which won't resolve on retry. Consider whether this retry condition was intentional or if it was carried over from other patterns. If RBAC eventually propagates and this is expected, a comment explaining the rationale would help maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/manifestpusher/manifestpusher.go` around lines 121 - 124, The current retry branch in manifestpusher.go treats kerrors.IsForbidden the same as kerrors.IsConflict which causes transient retries; update the condition in the function containing the block that sets lastErr and returns false (the if checking kerrors.IsConflict || kerrors.IsForbidden) to stop retrying on kerrors.IsForbidden by removing kerrors.IsForbidden from the OR, or if Forbidden should be retried due to expected RBAC propagation, add a concise comment above the if explaining that rationale and under what timeframe/conditions a retry is appropriate so future readers understand why IsForbidden is handled as transient.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/manifestpusher/manifestpusher.go`:
- Around line 127-130: The code currently only checks for a non-nil Image
(streamImport.Status.Images[0].Image) but ignores the import result; inspect the
corresponding ImageImportStatus.Status (e.g.,
streamImport.Status.Images[0].Status.Status) and if it indicates failure include
that error text in lastErr and return an error (or false plus an error) instead
of treating it as success. Update the branch that sets lastErr =
fmt.Errorf("import returned no image status") to also check the
ImageImportStatus.Status field, populate lastErr with the Status message when
present, and return the error so callers know the import actually failed.
---
Nitpick comments:
In `@pkg/manifestpusher/manifestpusher.go`:
- Around line 121-124: The current retry branch in manifestpusher.go treats
kerrors.IsForbidden the same as kerrors.IsConflict which causes transient
retries; update the condition in the function containing the block that sets
lastErr and returns false (the if checking kerrors.IsConflict ||
kerrors.IsForbidden) to stop retrying on kerrors.IsForbidden by removing
kerrors.IsForbidden from the OR, or if Forbidden should be retried due to
expected RBAC propagation, add a concise comment above the if explaining that
rationale and under what timeframe/conditions a retry is appropriate so future
readers understand why IsForbidden is handled as transient.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 03d469cc-e7ff-4b52-9e43-0d9376541350
📒 Files selected for processing (2)
pkg/manifestpusher/manifestpusher.gopkg/manifestpusher/manifestpusher_test.go
| if len(streamImport.Status.Images) == 0 || streamImport.Status.Images[0].Image == nil { | ||
| lastErr = fmt.Errorf("import returned no image status") | ||
| return false, nil | ||
| } |
There was a problem hiding this comment.
Consider checking the import status for errors.
The code checks that Status.Images[0].Image is populated but doesn't verify that the import actually succeeded. The ImageImportStatus can contain a Status.Status field with an error message if the import failed. Without this check, you might silently treat a failed import as successful.
🛠️ Proposed fix to add status error check
if len(streamImport.Status.Images) == 0 || streamImport.Status.Images[0].Image == nil {
lastErr = fmt.Errorf("import returned no image status")
return false, nil
}
+ if status := streamImport.Status.Images[0].Status; status.Status == metav1.StatusFailure {
+ lastErr = fmt.Errorf("import failed: %s", status.Message)
+ return false, nil
+ }
return true, nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if len(streamImport.Status.Images) == 0 || streamImport.Status.Images[0].Image == nil { | |
| lastErr = fmt.Errorf("import returned no image status") | |
| return false, nil | |
| } | |
| if len(streamImport.Status.Images) == 0 || streamImport.Status.Images[0].Image == nil { | |
| lastErr = fmt.Errorf("import returned no image status") | |
| return false, nil | |
| } | |
| if status := streamImport.Status.Images[0].Status; status.Status == metav1.StatusFailure { | |
| lastErr = fmt.Errorf("import failed: %s", status.Message) | |
| return false, nil | |
| } | |
| return true, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/manifestpusher/manifestpusher.go` around lines 127 - 130, The code
currently only checks for a non-nil Image (streamImport.Status.Images[0].Image)
but ignores the import result; inspect the corresponding
ImageImportStatus.Status (e.g., streamImport.Status.Images[0].Status.Status) and
if it indicates failure include that error text in lastErr and return an error
(or false plus an error) instead of treating it as success. Update the branch
that sets lastErr = fmt.Errorf("import returned no image status") to also check
the ImageImportStatus.Status field, populate lastErr with the Status message
when present, and return the error so callers know the import actually failed.
|
@joepvd: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
PushManifestListpushes a manifest list to the registry via the Docker V2 API (manifest-tool), the OpenShift image controller's async reconciliation does not reliably update the ImageStreamTag. This leaves the IST at a stale digest, causing downstream builds (likebin,private-org-sync) to fail withno image found in manifest list for architecture "amd64"— burning 30-60 minutes on retries that will never succeed.ImageStreamImportwithImport:trueto synchronously force the IST to point at the correct manifest list digest. UsesImportModePreserveOriginalto preserve the manifest list structure for multi-arch resolution.Details
The
imagespresubmit job has been failing intermittently since #5004 merged (see job history). Affected failures include PRs #5010, #5001, #5011, #5013 — all showing the same pattern:srcbuilds succeed and manifest list is pushed topipeline:srcbin-amd64(or later steps) try to pullpipeline@sha256:<stale-digest>— a digest that does not match the just-pushed manifestamd64entry →PullBuilderImageFailed/FetchImageContentFailedRoot cause:
PushManifestListpushes via the Docker registry API, but the IST update depends on the image controller's async reconciliation which doesn't reliably fire for manifest-tool pushes. TheImageStreamImportAPI is the proven synchronous mechanism for this, already used elsewhere in the codebase (ImportTagWithRetriesinpkg/steps/utils/image.go).Test plan
importManifestListverify correctImageStreamImportparameters (namespace, imagestream name, tag, digest-based pull spec,ImportModePreserveOriginal,Insecure)parseImageStreamRefverify three-way split of image referencesmanifestEntriesandsplitImageStreamTagReftests continue to passimagespresubmit job should stop failing with the multi-arch IST race pattern/cc @openshift/test-platform
Made with Cursor