Skip to content

Conversation

@eas
Copy link
Contributor

@eas eas commented Jan 29, 2026

This consists of the following changes:

  • Merge several overloads of VPlanTransforms::runPass into a single
    function
    to avoid code duplication.

  • Add helper macro RUN_VPLAN_PASS to capture the transformation name
    and pass it to the helper above for printing.

  • Add new -vplan-print-after-all option (somewhat similar to existing
    -vplan-verify-each).

  • Add two empty passes printAfterInitialConstruction/printFinalVPlan
    so that initial/final
    VPlans would be supported in -vplan-print-after-all

This follows the original future plans in
#123640.

Re-commit of #175839 after
fixing build without LLVM_ENABLE_DUMP.

eas added 2 commits January 28, 2026 15:55
This consists of the following changes:

* Merge several overloads of `VPlanTransforms::runPass` into a single
function
  to avoid code duplication.

* Add helper macro `RUN_VPLAN_PASS` to capture the transformation name
  and pass it to the helper above for printing.

* Add new `-vplan-print-after-all` option (somewhat similar to existing
  `-vplan-verify-each`).

* Add two empty passes `printAfterInitialConstruction`/`printFinalVPlan`
so that initial/final
   VPlans would be supported in `-vplan-print-after-all`

This follows the original future plans in
llvm#123640.

Re-commit of llvm#175839 after
fixing build without `LLVM_ENABLE_DUMP`.
@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2026

@llvm/pr-subscribers-llvm-transforms

Author: Andrei Elovikov (eas)

Changes

This consists of the following changes:

  • Merge several overloads of VPlanTransforms::runPass into a single
    function
    to avoid code duplication.

  • Add helper macro RUN_VPLAN_PASS to capture the transformation name
    and pass it to the helper above for printing.

  • Add new -vplan-print-after-all option (somewhat similar to existing
    -vplan-verify-each).

  • Add two empty passes printAfterInitialConstruction/printFinalVPlan
    so that initial/final
    VPlans would be supported in -vplan-print-after-all

This follows the original future plans in
#123640.

Re-commit of #175839 after
fixing build without LLVM_ENABLE_DUMP.


Full diff: https://github.com/llvm/llvm-project/pull/178547.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+33-35)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+17-17)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+28-18)
  • (added) llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll (+67)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index b88779a7828fb..4e056ab30e2a6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -366,6 +366,10 @@ cl::opt<bool>
                           cl::Hidden,
                           cl::desc("Verfiy VPlans after VPlan transforms."));
 
+cl::opt<bool> llvm::PrintAfterEachVPlanPass(
+    "vplan-print-after-all", cl::init(false), cl::Hidden,
+    cl::desc("Print after each VPlanTransforms::runPass."));
+
 // This flag enables the stress testing of the VPlan H-CFG construction in the
 // VPlan-native vectorization path. It must be used in conjuction with
 // -enable-vplan-native-path. -vplan-verify-hcfg can also be used to enable the
@@ -7388,17 +7392,16 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
     ++LoopsEarlyExitVectorized;
   // TODO: Move to VPlan transform stage once the transition to the VPlan-based
   // cost model is complete for better cost estimates.
-  VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF);
-  VPlanTransforms::runPass(VPlanTransforms::materializePacksAndUnpacks,
-                           BestVPlan);
-  VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
-  VPlanTransforms::runPass(VPlanTransforms::replicateByVF, BestVPlan, BestVF);
+  RUN_VPLAN_PASS(VPlanTransforms::unrollByUF, BestVPlan, BestUF);
+  RUN_VPLAN_PASS(VPlanTransforms::materializePacksAndUnpacks, BestVPlan);
+  RUN_VPLAN_PASS(VPlanTransforms::materializeBroadcasts, BestVPlan);
+  RUN_VPLAN_PASS(VPlanTransforms::replicateByVF, BestVPlan, BestVF);
   bool HasBranchWeights =
       hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator());
   if (HasBranchWeights) {
     std::optional<unsigned> VScale = CM.getVScaleForTuning();
-    VPlanTransforms::runPass(VPlanTransforms::addBranchWeightToMiddleTerminator,
-                             BestVPlan, BestVF, VScale);
+    RUN_VPLAN_PASS(VPlanTransforms::addBranchWeightToMiddleTerminator,
+                   BestVPlan, BestVF, VScale);
   }
 
   // Checks are the same for all VPlans, added to BestVPlan only for
@@ -8370,14 +8373,14 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
       // Now optimize the initial VPlan.
       VPlanTransforms::hoistPredicatedLoads(*Plan, PSE, OrigLoop);
       VPlanTransforms::sinkPredicatedStores(*Plan, PSE, OrigLoop);
-      VPlanTransforms::runPass(VPlanTransforms::truncateToMinimalBitwidths,
-                               *Plan, CM.getMinimalBitwidths());
-      VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
+      RUN_VPLAN_PASS(VPlanTransforms::truncateToMinimalBitwidths, *Plan,
+                     CM.getMinimalBitwidths());
+      RUN_VPLAN_PASS(VPlanTransforms::optimize, *Plan);
       // TODO: try to put addExplicitVectorLength close to addActiveLaneMask
       if (CM.foldTailWithEVL()) {
-        VPlanTransforms::runPass(VPlanTransforms::addExplicitVectorLength,
-                                 *Plan, CM.getMaxSafeElements());
-        VPlanTransforms::runPass(VPlanTransforms::optimizeEVLMasks, *Plan);
+        RUN_VPLAN_PASS(VPlanTransforms::addExplicitVectorLength, *Plan,
+                       CM.getMaxSafeElements());
+        RUN_VPLAN_PASS(VPlanTransforms::optimizeEVLMasks, *Plan);
       }
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
@@ -8584,18 +8587,15 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
 
   // Apply mandatory transformation to handle reductions with multiple in-loop
   // uses if possible, bail out otherwise.
-  if (!VPlanTransforms::runPass(VPlanTransforms::handleMultiUseReductions,
-                                *Plan))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::handleMultiUseReductions, *Plan))
     return nullptr;
   // Apply mandatory transformation to handle FP maxnum/minnum reduction with
   // NaNs if possible, bail out otherwise.
-  if (!VPlanTransforms::runPass(VPlanTransforms::handleMaxMinNumReductions,
-                                *Plan))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::handleMaxMinNumReductions, *Plan))
     return nullptr;
 
   // Create whole-vector selects for find-last recurrences.
-  if (!VPlanTransforms::runPass(VPlanTransforms::handleFindLastReductions,
-                                *Plan))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::handleFindLastReductions, *Plan))
     return nullptr;
 
   // Transform recipes to abstract recipes if it is legal and beneficial and
@@ -8605,8 +8605,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   if (!CM.foldTailWithEVL()) {
     VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind, CM.PSE,
                           OrigLoop);
-    VPlanTransforms::runPass(VPlanTransforms::convertToAbstractRecipes, *Plan,
-                             CostCtx, Range);
+    RUN_VPLAN_PASS(VPlanTransforms::convertToAbstractRecipes, *Plan, CostCtx,
+                   Range);
   }
 
   for (ElementCount VF : Range)
@@ -8616,24 +8616,23 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // Interleave memory: for each Interleave Group we marked earlier as relevant
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
-  VPlanTransforms::runPass(VPlanTransforms::createInterleaveGroups, *Plan,
-                           InterleaveGroups, RecipeBuilder,
-                           CM.isScalarEpilogueAllowed());
+  RUN_VPLAN_PASS(VPlanTransforms::createInterleaveGroups, *Plan,
+                 InterleaveGroups, RecipeBuilder, CM.isScalarEpilogueAllowed());
 
   // Replace VPValues for known constant strides.
-  VPlanTransforms::runPass(VPlanTransforms::replaceSymbolicStrides, *Plan, PSE,
-                           Legal->getLAI()->getSymbolicStrides());
+  RUN_VPLAN_PASS(VPlanTransforms::replaceSymbolicStrides, *Plan, PSE,
+                 Legal->getLAI()->getSymbolicStrides());
 
   auto BlockNeedsPredication = [this](BasicBlock *BB) {
     return Legal->blockNeedsPredication(BB);
   };
-  VPlanTransforms::runPass(VPlanTransforms::dropPoisonGeneratingRecipes, *Plan,
-                           BlockNeedsPredication);
+  RUN_VPLAN_PASS(VPlanTransforms::dropPoisonGeneratingRecipes, *Plan,
+                 BlockNeedsPredication);
 
   // Sink users of fixed-order recurrence past the recipe defining the previous
   // value and introduce FirstOrderRecurrenceSplice VPInstructions.
-  if (!VPlanTransforms::runPass(VPlanTransforms::adjustFixedOrderRecurrences,
-                                *Plan, Builder))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::adjustFixedOrderRecurrences, *Plan,
+                      Builder))
     return nullptr;
 
   if (useActiveLaneMask(Style)) {
@@ -8916,7 +8915,7 @@ void LoopVectorizationPlanner::addReductionResultComputation(
   for (VPRecipeBase *R : ToDelete)
     R->eraseFromParent();
 
-  VPlanTransforms::runPass(VPlanTransforms::clearReductionWrapFlags, *Plan);
+  RUN_VPLAN_PASS(VPlanTransforms::clearReductionWrapFlags, *Plan);
 }
 
 void LoopVectorizationPlanner::attachRuntimeChecks(
@@ -9304,7 +9303,7 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
     VPIRInst->eraseFromParent();
     ResumePhi->eraseFromParent();
   }
-  VPlanTransforms::runPass(VPlanTransforms::removeDeadRecipes, MainPlan);
+  RUN_VPLAN_PASS(VPlanTransforms::removeDeadRecipes, MainPlan);
 
   using namespace VPlanPatternMatch;
   // When vectorizing the epilogue, FindFirstIV & FindLastIV reductions can
@@ -10141,9 +10140,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                            BestPlan);
     // TODO: Move to general VPlan pipeline once epilogue loops are also
     // supported.
-    VPlanTransforms::runPass(
-        VPlanTransforms::materializeConstantVectorTripCount, BestPlan, VF.Width,
-        IC, PSE);
+    RUN_VPLAN_PASS(VPlanTransforms::materializeConstantVectorTripCount,
+                   BestPlan, VF.Width, IC, PSE);
     LVP.addMinimumIterationCheck(BestPlan, VF.Width, IC,
                                  VF.MinProfitableTripCount);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index bc51af00bedd1..9b22c399d7acf 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -912,6 +912,9 @@ bool VPlan::isExitBlock(VPBlockBase *VPBB) {
   return is_contained(ExitBlocks, VPBB);
 }
 
+/// To make RUN_VPLAN_PASS print final VPlan.
+static void printFinalVPlan(VPlan &) {}
+
 /// Generate the code inside the preheader and body of the vectorized loop.
 /// Assumes a single pre-header basic-block was created for this. Introduce
 /// additional basic-blocks as needed, and fill them all.
@@ -933,7 +936,11 @@ void VPlan::execute(VPTransformState *State) {
   LLVM_DEBUG(dbgs() << "Executing best plan with VF=" << State->VF
                     << ", UF=" << getUF() << '\n');
   setName("Final VPlan");
+  // TODO: RUN_VPLAN_PASS/VPlanTransforms::runPass should automatically dump
+  // VPlans after some specific stages when "-debug" is specified, but that
+  // hasn't been implemented yet. For now, just do both:
   LLVM_DEBUG(dump());
+  RUN_VPLAN_PASS(printFinalVPlan, *this);
 
   BasicBlock *ScalarPh = State->CFG.ExitBB;
   VPBasicBlock *ScalarPhVPBB = getScalarPreheader();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 4846abdab3c55..bf02c548723de 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -586,6 +586,9 @@ static void simplifyLiveInsWithSCEV(VPlan &Plan,
   }
 }
 
+/// To make RUN_VPLAN_PASS print initial VPlan.
+static void printAfterInitialConstruction(VPlan &) {}
+
 std::unique_ptr<VPlan>
 VPlanTransforms::buildVPlan0(Loop *TheLoop, LoopInfo &LI, Type *InductionTy,
                              DebugLoc IVDL, PredicatedScalarEvolution &PSE,
@@ -594,6 +597,8 @@ VPlanTransforms::buildVPlan0(Loop *TheLoop, LoopInfo &LI, Type *InductionTy,
   std::unique_ptr<VPlan> VPlan0 = Builder.buildPlainCFG();
   addInitialSkeleton(*VPlan0, InductionTy, IVDL, PSE, TheLoop);
   simplifyLiveInsWithSCEV(*VPlan0, PSE);
+
+  RUN_VPLAN_PASS_NO_VERIFY(printAfterInitialConstruction, *VPlan0);
   return VPlan0;
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 27924ae1b6a36..89b60d4a8e3de 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2680,23 +2680,23 @@ void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
 }
 
 void VPlanTransforms::optimize(VPlan &Plan) {
-  runPass(removeRedundantCanonicalIVs, Plan);
-  runPass(removeRedundantInductionCasts, Plan);
-
-  runPass(simplifyRecipes, Plan);
-  runPass(removeDeadRecipes, Plan);
-  runPass(simplifyBlends, Plan);
-  runPass(legalizeAndOptimizeInductions, Plan);
-  runPass(narrowToSingleScalarRecipes, Plan);
-  runPass(removeRedundantExpandSCEVRecipes, Plan);
-  runPass(simplifyRecipes, Plan);
-  runPass(removeBranchOnConst, Plan);
-  runPass(removeDeadRecipes, Plan);
-
-  runPass(createAndOptimizeReplicateRegions, Plan);
-  runPass(hoistInvariantLoads, Plan);
-  runPass(mergeBlocksIntoPredecessors, Plan);
-  runPass(licm, Plan);
+  RUN_VPLAN_PASS(removeRedundantCanonicalIVs, Plan);
+  RUN_VPLAN_PASS(removeRedundantInductionCasts, Plan);
+
+  RUN_VPLAN_PASS(simplifyRecipes, Plan);
+  RUN_VPLAN_PASS(removeDeadRecipes, Plan);
+  RUN_VPLAN_PASS(simplifyBlends, Plan);
+  RUN_VPLAN_PASS(legalizeAndOptimizeInductions, Plan);
+  RUN_VPLAN_PASS(narrowToSingleScalarRecipes, Plan);
+  RUN_VPLAN_PASS(removeRedundantExpandSCEVRecipes, Plan);
+  RUN_VPLAN_PASS(simplifyRecipes, Plan);
+  RUN_VPLAN_PASS(removeBranchOnConst, Plan);
+  RUN_VPLAN_PASS(removeDeadRecipes, Plan);
+
+  RUN_VPLAN_PASS(createAndOptimizeReplicateRegions, Plan);
+  RUN_VPLAN_PASS(hoistInvariantLoads, Plan);
+  RUN_VPLAN_PASS(mergeBlocksIntoPredecessors, Plan);
+  RUN_VPLAN_PASS(licm, Plan);
 }
 
 // Add a VPActiveLaneMaskPHIRecipe and related recipes to \p Plan and replace
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index e0d09a099647a..5af154d73ede6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -16,6 +16,7 @@
 #include "VPlan.h"
 #include "VPlanVerifier.h"
 #include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 
@@ -33,28 +34,37 @@ class VPRecipeBuilder;
 struct VFRange;
 
 LLVM_ABI_FOR_TEST extern cl::opt<bool> VerifyEachVPlan;
+LLVM_ABI_FOR_TEST extern cl::opt<bool> PrintAfterEachVPlanPass;
 LLVM_ABI_FOR_TEST extern cl::opt<bool> EnableWideActiveLaneMask;
 
 struct VPlanTransforms {
-  /// Helper to run a VPlan transform \p Transform on \p VPlan, forwarding extra
-  /// arguments to the transform. Returns the boolean returned by the transform.
-  template <typename... ArgsTy>
-  static bool runPass(bool (*Transform)(VPlan &, ArgsTy...), VPlan &Plan,
-                      typename std::remove_reference<ArgsTy>::type &...Args) {
-    bool Res = Transform(Plan, Args...);
-    if (VerifyEachVPlan)
-      verifyVPlanIsValid(Plan);
-    return Res;
-  }
-  /// Helper to run a VPlan transform \p Transform on \p VPlan, forwarding extra
-  /// arguments to the transform.
-  template <typename... ArgsTy>
-  static void runPass(void (*Fn)(VPlan &, ArgsTy...), VPlan &Plan,
-                      typename std::remove_reference<ArgsTy>::type &...Args) {
-    Fn(Plan, Args...);
-    if (VerifyEachVPlan)
-      verifyVPlanIsValid(Plan);
+  /// Helper to run a VPlan pass \p Pass on \p VPlan, forwarding extra arguments
+  /// to the pass. Performs verification/printing after each VPlan pass if
+  /// requested via command line options.
+  template <bool EnableVerify = true, typename PassTy, typename... ArgsTy>
+  static decltype(auto) runPass(StringRef PassName, PassTy &&Pass, VPlan &Plan,
+                                ArgsTy &&...Args) {
+    scope_exit PostTransformActions{[&]() {
+      // Make sure to print before verification, so that output is more useful
+      // in case of failures:
+      if (PrintAfterEachVPlanPass) {
+        dbgs() << "VPlan after " << PassName << '\n';
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+        dbgs() << Plan << '\n';
+#else
+        dbgs() << "LLVM DUMP is disabled!\n";
+#endif
+      }
+      if (VerifyEachVPlan && EnableVerify)
+        verifyVPlanIsValid(Plan);
+    }};
+
+    return std::forward<PassTy>(Pass)(Plan, std::forward<ArgsTy>(Args)...);
   }
+#define RUN_VPLAN_PASS(PASS, ...)                                              \
+  llvm::VPlanTransforms::runPass(#PASS, PASS, __VA_ARGS__)
+#define RUN_VPLAN_PASS_NO_VERIFY(PASS, ...)                                    \
+  llvm::VPlanTransforms::runPass<false>(#PASS, PASS, __VA_ARGS__)
 
   /// Create a base VPlan0, serving as the common starting point for all later
   /// candidates. It consists of an initial plain CFG loop with loop blocks from
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll b/llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll
new file mode 100644
index 0000000000000..96e15740cd413
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll
@@ -0,0 +1,67 @@
+; RUN: opt -passes=loop-vectorize -disable-output -vplan-print-after-all -force-vector-width=4 -vplan-verify-each < %s 2>&1 | FileCheck %s --implicit-check-not "VPlan after"
+; RUN: opt -passes=loop-vectorize -disable-output -vplan-print-after-all -force-vector-width=4 -vplan-verify-each < %s 2>&1 | FileCheck %s --check-prefix CHECK-DUMP
+; REQUIRES: asserts
+
+; Verify that `-vplan-print-after-all` option works.
+
+; CHECK: VPlan after printAfterInitialConstruction
+; CHECK: VPlan after VPlanTransforms::clearReductionWrapFlags
+; CHECK: VPlan after VPlanTransforms::handleMultiUseReductions
+; CHECK: VPlan after VPlanTransforms::handleMaxMinNumReductions
+; CHECK: VPlan after VPlanTransforms::handleFindLastReductions
+; CHECK: VPlan after VPlanTransforms::convertToAbstractRecipes
+; CHECK: VPlan after VPlanTransforms::createInterleaveGroups
+; CHECK: VPlan after VPlanTransforms::replaceSymbolicStrides
+; CHECK: VPlan after VPlanTransforms::dropPoisonGeneratingRecipes
+; CHECK: VPlan after VPlanTransforms::adjustFixedOrderRecurrences
+; CHECK: VPlan after VPlanTransforms::truncateToMinimalBitwidths
+; CHECK: VPlan after removeRedundantCanonicalIVs
+; CHECK: VPlan after removeRedundantInductionCasts
+; CHECK: VPlan after simplifyRecipes
+; CHECK: VPlan after removeDeadRecipes
+; CHECK: VPlan after simplifyBlends
+; CHECK: VPlan after legalizeAndOptimizeInductions
+; CHECK: VPlan after narrowToSingleScalarRecipes
+; CHECK: VPlan after removeRedundantExpandSCEVRecipes
+; CHECK: VPlan after simplifyRecipes
+; CHECK: VPlan after removeBranchOnConst
+; CHECK: VPlan after removeDeadRecipes
+; CHECK: VPlan after createAndOptimizeReplicateRegions
+; CHECK: VPlan after hoistInvariantLoads
+; CHECK: VPlan after mergeBlocksIntoPredecessors
+; CHECK: VPlan after licm
+; CHECK: VPlan after VPlanTransforms::optimize
+; CHECK: VPlan after VPlanTransforms::materializeConstantVectorTripCount
+; CHECK: VPlan after VPlanTransforms::unrollByUF
+; CHECK: VPlan after VPlanTransforms::materializePacksAndUnpacks
+; CHECK: VPlan after VPlanTransforms::materializeBroadcasts
+; CHECK: VPlan after VPlanTransforms::replicateByVF
+; CHECK: VPlan after printFinalVPlan
+
+; Also verify that VPlans are actually printed (we aren't interested in the
+; exact dump content, just that it's performed):
+
+; CHECK-DUMP:      VPlan after printAfterInitialConstruction
+; CHECK-DUMP-NEXT: VPlan ' for UF>=1' {
+;
+; CHECK-DUMP:      VPlan after VPlanTransforms::optimize
+; CHECK-DUMP-NEXT: VPlan 'Initial VPlan for VF={4},UF>=1' {
+;
+; CHECK-DUMP:      VPlan after printFinalVPlan
+; CHECK-DUMP-NEXT: VPlan 'Final VPlan for VF={4},UF={1}' {
+
+define void @foo(ptr %ptr, i64 %n) {
+entry:
+  br label %header
+
+header:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %header ]
+  %gep = getelementptr i64, ptr %ptr, i64 %iv
+  store i64 %iv, ptr %gep
+  %iv.next = add nsw i64 %iv, 1
+  %exitcond = icmp slt i64 %iv.next, %n
+  br i1 %exitcond, label %header, label %exit
+
+exit:
+  ret void
+}
diff --git a/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp b/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
index 270537a333b8e..c701382c54e6b 100644
--- a/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
+++ b/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
@@ -48,7 +48,7 @@ TEST_F(VPUncountableExitTest, FindUncountableExitRecipes) {
   BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
   auto Plan = buildVPlan(LoopHeader, /*HasUncountableExit=*/true);
   VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(*Plan, *TLI);
-  VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
+  VPlanTransforms::optimize(*Plan);
 
   SmallVector<VPRecipeBase *> Recipes;
   SmallVector<VPRecipeBase *> GEPs;
@@ -84,7 +84,7 @@ TEST_F(VPUncountableExitTest, NoUncountableExit) {
   BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
   auto Plan = buildVPlan(LoopHeader);
   VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(*Plan, *TLI);
-  VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
+  VPlanTransforms::optimize(*Plan);
 
   SmallVector<VPRecipeBase *> Recipes;
   SmallVector<VPRecipeBase *> GEPs;

@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2026

@llvm/pr-subscribers-vectorizers

Author: Andrei Elovikov (eas)

Changes

This consists of the following changes:

  • Merge several overloads of VPlanTransforms::runPass into a single
    function
    to avoid code duplication.

  • Add helper macro RUN_VPLAN_PASS to capture the transformation name
    and pass it to the helper above for printing.

  • Add new -vplan-print-after-all option (somewhat similar to existing
    -vplan-verify-each).

  • Add two empty passes printAfterInitialConstruction/printFinalVPlan
    so that initial/final
    VPlans would be supported in -vplan-print-after-all

This follows the original future plans in
#123640.

Re-commit of #175839 after
fixing build without LLVM_ENABLE_DUMP.


Full diff: https://github.com/llvm/llvm-project/pull/178547.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+33-35)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+17-17)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+28-18)
  • (added) llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll (+67)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index b88779a7828fb..4e056ab30e2a6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -366,6 +366,10 @@ cl::opt<bool>
                           cl::Hidden,
                           cl::desc("Verfiy VPlans after VPlan transforms."));
 
+cl::opt<bool> llvm::PrintAfterEachVPlanPass(
+    "vplan-print-after-all", cl::init(false), cl::Hidden,
+    cl::desc("Print after each VPlanTransforms::runPass."));
+
 // This flag enables the stress testing of the VPlan H-CFG construction in the
 // VPlan-native vectorization path. It must be used in conjuction with
 // -enable-vplan-native-path. -vplan-verify-hcfg can also be used to enable the
@@ -7388,17 +7392,16 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
     ++LoopsEarlyExitVectorized;
   // TODO: Move to VPlan transform stage once the transition to the VPlan-based
   // cost model is complete for better cost estimates.
-  VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF);
-  VPlanTransforms::runPass(VPlanTransforms::materializePacksAndUnpacks,
-                           BestVPlan);
-  VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
-  VPlanTransforms::runPass(VPlanTransforms::replicateByVF, BestVPlan, BestVF);
+  RUN_VPLAN_PASS(VPlanTransforms::unrollByUF, BestVPlan, BestUF);
+  RUN_VPLAN_PASS(VPlanTransforms::materializePacksAndUnpacks, BestVPlan);
+  RUN_VPLAN_PASS(VPlanTransforms::materializeBroadcasts, BestVPlan);
+  RUN_VPLAN_PASS(VPlanTransforms::replicateByVF, BestVPlan, BestVF);
   bool HasBranchWeights =
       hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator());
   if (HasBranchWeights) {
     std::optional<unsigned> VScale = CM.getVScaleForTuning();
-    VPlanTransforms::runPass(VPlanTransforms::addBranchWeightToMiddleTerminator,
-                             BestVPlan, BestVF, VScale);
+    RUN_VPLAN_PASS(VPlanTransforms::addBranchWeightToMiddleTerminator,
+                   BestVPlan, BestVF, VScale);
   }
 
   // Checks are the same for all VPlans, added to BestVPlan only for
@@ -8370,14 +8373,14 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
       // Now optimize the initial VPlan.
       VPlanTransforms::hoistPredicatedLoads(*Plan, PSE, OrigLoop);
       VPlanTransforms::sinkPredicatedStores(*Plan, PSE, OrigLoop);
-      VPlanTransforms::runPass(VPlanTransforms::truncateToMinimalBitwidths,
-                               *Plan, CM.getMinimalBitwidths());
-      VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
+      RUN_VPLAN_PASS(VPlanTransforms::truncateToMinimalBitwidths, *Plan,
+                     CM.getMinimalBitwidths());
+      RUN_VPLAN_PASS(VPlanTransforms::optimize, *Plan);
       // TODO: try to put addExplicitVectorLength close to addActiveLaneMask
       if (CM.foldTailWithEVL()) {
-        VPlanTransforms::runPass(VPlanTransforms::addExplicitVectorLength,
-                                 *Plan, CM.getMaxSafeElements());
-        VPlanTransforms::runPass(VPlanTransforms::optimizeEVLMasks, *Plan);
+        RUN_VPLAN_PASS(VPlanTransforms::addExplicitVectorLength, *Plan,
+                       CM.getMaxSafeElements());
+        RUN_VPLAN_PASS(VPlanTransforms::optimizeEVLMasks, *Plan);
       }
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
@@ -8584,18 +8587,15 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
 
   // Apply mandatory transformation to handle reductions with multiple in-loop
   // uses if possible, bail out otherwise.
-  if (!VPlanTransforms::runPass(VPlanTransforms::handleMultiUseReductions,
-                                *Plan))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::handleMultiUseReductions, *Plan))
     return nullptr;
   // Apply mandatory transformation to handle FP maxnum/minnum reduction with
   // NaNs if possible, bail out otherwise.
-  if (!VPlanTransforms::runPass(VPlanTransforms::handleMaxMinNumReductions,
-                                *Plan))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::handleMaxMinNumReductions, *Plan))
     return nullptr;
 
   // Create whole-vector selects for find-last recurrences.
-  if (!VPlanTransforms::runPass(VPlanTransforms::handleFindLastReductions,
-                                *Plan))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::handleFindLastReductions, *Plan))
     return nullptr;
 
   // Transform recipes to abstract recipes if it is legal and beneficial and
@@ -8605,8 +8605,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   if (!CM.foldTailWithEVL()) {
     VPCostContext CostCtx(CM.TTI, *CM.TLI, *Plan, CM, CM.CostKind, CM.PSE,
                           OrigLoop);
-    VPlanTransforms::runPass(VPlanTransforms::convertToAbstractRecipes, *Plan,
-                             CostCtx, Range);
+    RUN_VPLAN_PASS(VPlanTransforms::convertToAbstractRecipes, *Plan, CostCtx,
+                   Range);
   }
 
   for (ElementCount VF : Range)
@@ -8616,24 +8616,23 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // Interleave memory: for each Interleave Group we marked earlier as relevant
   // for this VPlan, replace the Recipes widening its memory instructions with a
   // single VPInterleaveRecipe at its insertion point.
-  VPlanTransforms::runPass(VPlanTransforms::createInterleaveGroups, *Plan,
-                           InterleaveGroups, RecipeBuilder,
-                           CM.isScalarEpilogueAllowed());
+  RUN_VPLAN_PASS(VPlanTransforms::createInterleaveGroups, *Plan,
+                 InterleaveGroups, RecipeBuilder, CM.isScalarEpilogueAllowed());
 
   // Replace VPValues for known constant strides.
-  VPlanTransforms::runPass(VPlanTransforms::replaceSymbolicStrides, *Plan, PSE,
-                           Legal->getLAI()->getSymbolicStrides());
+  RUN_VPLAN_PASS(VPlanTransforms::replaceSymbolicStrides, *Plan, PSE,
+                 Legal->getLAI()->getSymbolicStrides());
 
   auto BlockNeedsPredication = [this](BasicBlock *BB) {
     return Legal->blockNeedsPredication(BB);
   };
-  VPlanTransforms::runPass(VPlanTransforms::dropPoisonGeneratingRecipes, *Plan,
-                           BlockNeedsPredication);
+  RUN_VPLAN_PASS(VPlanTransforms::dropPoisonGeneratingRecipes, *Plan,
+                 BlockNeedsPredication);
 
   // Sink users of fixed-order recurrence past the recipe defining the previous
   // value and introduce FirstOrderRecurrenceSplice VPInstructions.
-  if (!VPlanTransforms::runPass(VPlanTransforms::adjustFixedOrderRecurrences,
-                                *Plan, Builder))
+  if (!RUN_VPLAN_PASS(VPlanTransforms::adjustFixedOrderRecurrences, *Plan,
+                      Builder))
     return nullptr;
 
   if (useActiveLaneMask(Style)) {
@@ -8916,7 +8915,7 @@ void LoopVectorizationPlanner::addReductionResultComputation(
   for (VPRecipeBase *R : ToDelete)
     R->eraseFromParent();
 
-  VPlanTransforms::runPass(VPlanTransforms::clearReductionWrapFlags, *Plan);
+  RUN_VPLAN_PASS(VPlanTransforms::clearReductionWrapFlags, *Plan);
 }
 
 void LoopVectorizationPlanner::attachRuntimeChecks(
@@ -9304,7 +9303,7 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
     VPIRInst->eraseFromParent();
     ResumePhi->eraseFromParent();
   }
-  VPlanTransforms::runPass(VPlanTransforms::removeDeadRecipes, MainPlan);
+  RUN_VPLAN_PASS(VPlanTransforms::removeDeadRecipes, MainPlan);
 
   using namespace VPlanPatternMatch;
   // When vectorizing the epilogue, FindFirstIV & FindLastIV reductions can
@@ -10141,9 +10140,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
                            BestPlan);
     // TODO: Move to general VPlan pipeline once epilogue loops are also
     // supported.
-    VPlanTransforms::runPass(
-        VPlanTransforms::materializeConstantVectorTripCount, BestPlan, VF.Width,
-        IC, PSE);
+    RUN_VPLAN_PASS(VPlanTransforms::materializeConstantVectorTripCount,
+                   BestPlan, VF.Width, IC, PSE);
     LVP.addMinimumIterationCheck(BestPlan, VF.Width, IC,
                                  VF.MinProfitableTripCount);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index bc51af00bedd1..9b22c399d7acf 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -912,6 +912,9 @@ bool VPlan::isExitBlock(VPBlockBase *VPBB) {
   return is_contained(ExitBlocks, VPBB);
 }
 
+/// To make RUN_VPLAN_PASS print final VPlan.
+static void printFinalVPlan(VPlan &) {}
+
 /// Generate the code inside the preheader and body of the vectorized loop.
 /// Assumes a single pre-header basic-block was created for this. Introduce
 /// additional basic-blocks as needed, and fill them all.
@@ -933,7 +936,11 @@ void VPlan::execute(VPTransformState *State) {
   LLVM_DEBUG(dbgs() << "Executing best plan with VF=" << State->VF
                     << ", UF=" << getUF() << '\n');
   setName("Final VPlan");
+  // TODO: RUN_VPLAN_PASS/VPlanTransforms::runPass should automatically dump
+  // VPlans after some specific stages when "-debug" is specified, but that
+  // hasn't been implemented yet. For now, just do both:
   LLVM_DEBUG(dump());
+  RUN_VPLAN_PASS(printFinalVPlan, *this);
 
   BasicBlock *ScalarPh = State->CFG.ExitBB;
   VPBasicBlock *ScalarPhVPBB = getScalarPreheader();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 4846abdab3c55..bf02c548723de 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -586,6 +586,9 @@ static void simplifyLiveInsWithSCEV(VPlan &Plan,
   }
 }
 
+/// To make RUN_VPLAN_PASS print initial VPlan.
+static void printAfterInitialConstruction(VPlan &) {}
+
 std::unique_ptr<VPlan>
 VPlanTransforms::buildVPlan0(Loop *TheLoop, LoopInfo &LI, Type *InductionTy,
                              DebugLoc IVDL, PredicatedScalarEvolution &PSE,
@@ -594,6 +597,8 @@ VPlanTransforms::buildVPlan0(Loop *TheLoop, LoopInfo &LI, Type *InductionTy,
   std::unique_ptr<VPlan> VPlan0 = Builder.buildPlainCFG();
   addInitialSkeleton(*VPlan0, InductionTy, IVDL, PSE, TheLoop);
   simplifyLiveInsWithSCEV(*VPlan0, PSE);
+
+  RUN_VPLAN_PASS_NO_VERIFY(printAfterInitialConstruction, *VPlan0);
   return VPlan0;
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 27924ae1b6a36..89b60d4a8e3de 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2680,23 +2680,23 @@ void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
 }
 
 void VPlanTransforms::optimize(VPlan &Plan) {
-  runPass(removeRedundantCanonicalIVs, Plan);
-  runPass(removeRedundantInductionCasts, Plan);
-
-  runPass(simplifyRecipes, Plan);
-  runPass(removeDeadRecipes, Plan);
-  runPass(simplifyBlends, Plan);
-  runPass(legalizeAndOptimizeInductions, Plan);
-  runPass(narrowToSingleScalarRecipes, Plan);
-  runPass(removeRedundantExpandSCEVRecipes, Plan);
-  runPass(simplifyRecipes, Plan);
-  runPass(removeBranchOnConst, Plan);
-  runPass(removeDeadRecipes, Plan);
-
-  runPass(createAndOptimizeReplicateRegions, Plan);
-  runPass(hoistInvariantLoads, Plan);
-  runPass(mergeBlocksIntoPredecessors, Plan);
-  runPass(licm, Plan);
+  RUN_VPLAN_PASS(removeRedundantCanonicalIVs, Plan);
+  RUN_VPLAN_PASS(removeRedundantInductionCasts, Plan);
+
+  RUN_VPLAN_PASS(simplifyRecipes, Plan);
+  RUN_VPLAN_PASS(removeDeadRecipes, Plan);
+  RUN_VPLAN_PASS(simplifyBlends, Plan);
+  RUN_VPLAN_PASS(legalizeAndOptimizeInductions, Plan);
+  RUN_VPLAN_PASS(narrowToSingleScalarRecipes, Plan);
+  RUN_VPLAN_PASS(removeRedundantExpandSCEVRecipes, Plan);
+  RUN_VPLAN_PASS(simplifyRecipes, Plan);
+  RUN_VPLAN_PASS(removeBranchOnConst, Plan);
+  RUN_VPLAN_PASS(removeDeadRecipes, Plan);
+
+  RUN_VPLAN_PASS(createAndOptimizeReplicateRegions, Plan);
+  RUN_VPLAN_PASS(hoistInvariantLoads, Plan);
+  RUN_VPLAN_PASS(mergeBlocksIntoPredecessors, Plan);
+  RUN_VPLAN_PASS(licm, Plan);
 }
 
 // Add a VPActiveLaneMaskPHIRecipe and related recipes to \p Plan and replace
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index e0d09a099647a..5af154d73ede6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -16,6 +16,7 @@
 #include "VPlan.h"
 #include "VPlanVerifier.h"
 #include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 
@@ -33,28 +34,37 @@ class VPRecipeBuilder;
 struct VFRange;
 
 LLVM_ABI_FOR_TEST extern cl::opt<bool> VerifyEachVPlan;
+LLVM_ABI_FOR_TEST extern cl::opt<bool> PrintAfterEachVPlanPass;
 LLVM_ABI_FOR_TEST extern cl::opt<bool> EnableWideActiveLaneMask;
 
 struct VPlanTransforms {
-  /// Helper to run a VPlan transform \p Transform on \p VPlan, forwarding extra
-  /// arguments to the transform. Returns the boolean returned by the transform.
-  template <typename... ArgsTy>
-  static bool runPass(bool (*Transform)(VPlan &, ArgsTy...), VPlan &Plan,
-                      typename std::remove_reference<ArgsTy>::type &...Args) {
-    bool Res = Transform(Plan, Args...);
-    if (VerifyEachVPlan)
-      verifyVPlanIsValid(Plan);
-    return Res;
-  }
-  /// Helper to run a VPlan transform \p Transform on \p VPlan, forwarding extra
-  /// arguments to the transform.
-  template <typename... ArgsTy>
-  static void runPass(void (*Fn)(VPlan &, ArgsTy...), VPlan &Plan,
-                      typename std::remove_reference<ArgsTy>::type &...Args) {
-    Fn(Plan, Args...);
-    if (VerifyEachVPlan)
-      verifyVPlanIsValid(Plan);
+  /// Helper to run a VPlan pass \p Pass on \p VPlan, forwarding extra arguments
+  /// to the pass. Performs verification/printing after each VPlan pass if
+  /// requested via command line options.
+  template <bool EnableVerify = true, typename PassTy, typename... ArgsTy>
+  static decltype(auto) runPass(StringRef PassName, PassTy &&Pass, VPlan &Plan,
+                                ArgsTy &&...Args) {
+    scope_exit PostTransformActions{[&]() {
+      // Make sure to print before verification, so that output is more useful
+      // in case of failures:
+      if (PrintAfterEachVPlanPass) {
+        dbgs() << "VPlan after " << PassName << '\n';
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+        dbgs() << Plan << '\n';
+#else
+        dbgs() << "LLVM DUMP is disabled!\n";
+#endif
+      }
+      if (VerifyEachVPlan && EnableVerify)
+        verifyVPlanIsValid(Plan);
+    }};
+
+    return std::forward<PassTy>(Pass)(Plan, std::forward<ArgsTy>(Args)...);
   }
+#define RUN_VPLAN_PASS(PASS, ...)                                              \
+  llvm::VPlanTransforms::runPass(#PASS, PASS, __VA_ARGS__)
+#define RUN_VPLAN_PASS_NO_VERIFY(PASS, ...)                                    \
+  llvm::VPlanTransforms::runPass<false>(#PASS, PASS, __VA_ARGS__)
 
   /// Create a base VPlan0, serving as the common starting point for all later
   /// candidates. It consists of an initial plain CFG loop with loop blocks from
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll b/llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll
new file mode 100644
index 0000000000000..96e15740cd413
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/vplan-print-after-all.ll
@@ -0,0 +1,67 @@
+; RUN: opt -passes=loop-vectorize -disable-output -vplan-print-after-all -force-vector-width=4 -vplan-verify-each < %s 2>&1 | FileCheck %s --implicit-check-not "VPlan after"
+; RUN: opt -passes=loop-vectorize -disable-output -vplan-print-after-all -force-vector-width=4 -vplan-verify-each < %s 2>&1 | FileCheck %s --check-prefix CHECK-DUMP
+; REQUIRES: asserts
+
+; Verify that `-vplan-print-after-all` option works.
+
+; CHECK: VPlan after printAfterInitialConstruction
+; CHECK: VPlan after VPlanTransforms::clearReductionWrapFlags
+; CHECK: VPlan after VPlanTransforms::handleMultiUseReductions
+; CHECK: VPlan after VPlanTransforms::handleMaxMinNumReductions
+; CHECK: VPlan after VPlanTransforms::handleFindLastReductions
+; CHECK: VPlan after VPlanTransforms::convertToAbstractRecipes
+; CHECK: VPlan after VPlanTransforms::createInterleaveGroups
+; CHECK: VPlan after VPlanTransforms::replaceSymbolicStrides
+; CHECK: VPlan after VPlanTransforms::dropPoisonGeneratingRecipes
+; CHECK: VPlan after VPlanTransforms::adjustFixedOrderRecurrences
+; CHECK: VPlan after VPlanTransforms::truncateToMinimalBitwidths
+; CHECK: VPlan after removeRedundantCanonicalIVs
+; CHECK: VPlan after removeRedundantInductionCasts
+; CHECK: VPlan after simplifyRecipes
+; CHECK: VPlan after removeDeadRecipes
+; CHECK: VPlan after simplifyBlends
+; CHECK: VPlan after legalizeAndOptimizeInductions
+; CHECK: VPlan after narrowToSingleScalarRecipes
+; CHECK: VPlan after removeRedundantExpandSCEVRecipes
+; CHECK: VPlan after simplifyRecipes
+; CHECK: VPlan after removeBranchOnConst
+; CHECK: VPlan after removeDeadRecipes
+; CHECK: VPlan after createAndOptimizeReplicateRegions
+; CHECK: VPlan after hoistInvariantLoads
+; CHECK: VPlan after mergeBlocksIntoPredecessors
+; CHECK: VPlan after licm
+; CHECK: VPlan after VPlanTransforms::optimize
+; CHECK: VPlan after VPlanTransforms::materializeConstantVectorTripCount
+; CHECK: VPlan after VPlanTransforms::unrollByUF
+; CHECK: VPlan after VPlanTransforms::materializePacksAndUnpacks
+; CHECK: VPlan after VPlanTransforms::materializeBroadcasts
+; CHECK: VPlan after VPlanTransforms::replicateByVF
+; CHECK: VPlan after printFinalVPlan
+
+; Also verify that VPlans are actually printed (we aren't interested in the
+; exact dump content, just that it's performed):
+
+; CHECK-DUMP:      VPlan after printAfterInitialConstruction
+; CHECK-DUMP-NEXT: VPlan ' for UF>=1' {
+;
+; CHECK-DUMP:      VPlan after VPlanTransforms::optimize
+; CHECK-DUMP-NEXT: VPlan 'Initial VPlan for VF={4},UF>=1' {
+;
+; CHECK-DUMP:      VPlan after printFinalVPlan
+; CHECK-DUMP-NEXT: VPlan 'Final VPlan for VF={4},UF={1}' {
+
+define void @foo(ptr %ptr, i64 %n) {
+entry:
+  br label %header
+
+header:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %header ]
+  %gep = getelementptr i64, ptr %ptr, i64 %iv
+  store i64 %iv, ptr %gep
+  %iv.next = add nsw i64 %iv, 1
+  %exitcond = icmp slt i64 %iv.next, %n
+  br i1 %exitcond, label %header, label %exit
+
+exit:
+  ret void
+}
diff --git a/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp b/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
index 270537a333b8e..c701382c54e6b 100644
--- a/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
+++ b/llvm/unittests/Transforms/Vectorize/VPlanUncountableExitTest.cpp
@@ -48,7 +48,7 @@ TEST_F(VPUncountableExitTest, FindUncountableExitRecipes) {
   BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
   auto Plan = buildVPlan(LoopHeader, /*HasUncountableExit=*/true);
   VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(*Plan, *TLI);
-  VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
+  VPlanTransforms::optimize(*Plan);
 
   SmallVector<VPRecipeBase *> Recipes;
   SmallVector<VPRecipeBase *> GEPs;
@@ -84,7 +84,7 @@ TEST_F(VPUncountableExitTest, NoUncountableExit) {
   BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
   auto Plan = buildVPlan(LoopHeader);
   VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(*Plan, *TLI);
-  VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
+  VPlanTransforms::optimize(*Plan);
 
   SmallVector<VPRecipeBase *> Recipes;
   SmallVector<VPRecipeBase *> GEPs;

@eas
Copy link
Contributor Author

eas commented Jan 29, 2026

@fhahn , I've reproduced the failure locally using release build and verified that the second commit in this PR fixes it.

Comment on lines +369 to +372
cl::opt<bool> llvm::PrintAfterEachVPlanPass(
"vplan-print-after-all", cl::init(false), cl::Hidden,
cl::desc("Print after each VPlanTransforms::runPass."));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap the entire cl::opt in if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) so the flag isn't available without dumping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but those don't seem to be guarded:

// Print IR out before/after specified passes.
static cl::list<std::string>
PrintBefore("print-before",
llvm::cl::desc("Print IR before specified passes"),
cl::CommaSeparated, cl::Hidden);
static cl::list<std::string>
PrintAfter("print-after", llvm::cl::desc("Print IR after specified passes"),
cl::CommaSeparated, cl::Hidden);
static cl::opt<bool> PrintBeforeAll("print-before-all",
llvm::cl::desc("Print IR before each pass"),
cl::init(false), cl::Hidden);
static cl::opt<bool> PrintAfterAll("print-after-all",
llvm::cl::desc("Print IR after each pass"),
cl::init(false), cl::Hidden);

and I thought I should follow the same pattern here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But LLVM IR is always printable so it will work even in release builds without LLVM_ENABLE_DUMP. If we just remove the option when it's not available then you don't need to worry about the dbgs() << "LLVM DUMP is disabled!\n"; error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants