Skip to content

feat(plans): add Auto-Cleanup Architecture Plan for resolver record normalization#1664

Open
djstrong wants to merge 6 commits intomainfrom
1061-enable-auto-cleanup-of-resolver-record-fields
Open

feat(plans): add Auto-Cleanup Architecture Plan for resolver record normalization#1664
djstrong wants to merge 6 commits intomainfrom
1061-enable-auto-cleanup-of-resolver-record-fields

Conversation

@djstrong
Copy link
Contributor

Lite PR

Tip: Review docs on the ENSNode PR process

Summary

  • What changed (1-3 bullets, no essays).

Why

  • Why this change exists. Link to related GitHub issues where relevant.

Testing

  • How this was tested.
  • If you didn't test it, say why.

Notes for Reviewer (Optional)

  • Anything non-obvious or worth a heads-up.

Pre-Review Checklist (Blocking)

  • This PR does not introduce significant changes and is low-risk to review quickly.
  • Relevant changesets are included (or are not required)

Copilot AI review requested due to automatic review settings February 23, 2026 15:17
@vercel
Copy link
Contributor

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
admin.ensnode.io Ready Ready Preview, Comment Feb 25, 2026 0:18am
ensnode.io Ready Ready Preview, Comment Feb 25, 2026 0:18am
ensrainbow.io Ready Ready Preview, Comment Feb 25, 2026 0:18am

@changeset-bot
Copy link

changeset-bot bot commented Feb 23, 2026

⚠️ No Changeset found

Latest commit: c3c1572

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new architecture plan document describing a phased “resolver record normalization” feature, including proposed in-memory normalization types/ops (Phase 1) and planned API integration parameters/response shape (Phase 3).

Changes:

  • Introduces a detailed spec for key/value normalization operations, record-set consolidation, and key expansion.
  • Defines an initial registry of recognized text record keys and per-key validation/normalization rules.
  • Outlines API query params and response metadata strategy for integration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 167 to 169
/** Key recognized, value attempted but could not be normalized — excluded from clean output */
| "UnnormalizableValue"
/** Another record already claimed this normalized key (lower priority variant) */
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec marks UnnormalizableValue as "excluded from clean output", and later states records.texts always uses the stripped output when normalize=true. That effectively answers Open Question #3 (keys with no normalizable candidate will be omitted), but the question remains listed as unresolved. Either update the stripped-output behavior to match the desired semantics (omit vs null) or remove the question and document the chosen behavior explicitly.

Copilot uses AI. Check for mistakes.
- Updated `url` type to allow null values when no URL can be derived.
- Improved clarity in `Unnormalizable` and `Unrecognized` operations to include handling of null values and reasons for normalization failures.
- Refined comments and invariants for better understanding of normalization logic.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

Adds a new Architecture Specification document defining Phase 1 in-memory data models and a three-layer Resolver Record Normalization workflow, pre-resolution key expansion, resolver integration paths, Phase 3 API considerations, a nine-key transformation spec, registry invariants, and an open questions section.

Changes

Cohort / File(s) Summary
Architecture Specification
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md
New comprehensive architecture plan (~554 lines) introducing Resolver Record Normalization: Phase 1 pure in-memory data model (TextRecordNormalizationDefs, TextRecordKeyDef), detailed per-key/per-value transformation types and results, three-layer normalization pipeline (per-record normalization, set-level winner selection, stripped output), pre-resolution key expansion rules, resolver integration (indexed vs RPC), Phase 3 API/response design (optional normalizationMetadata), nine-key initial transformation set, normalization invariants, normalization algorithms (two-pass set-level), and an open-questions list. Notes existing prototype code will be removed prior to implementation.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ResolverService
  participant Normalizer
  participant Registry
  participant IndexStore

  Client->>ResolverService: resolve(name, normalize=true)
  ResolverService->>Registry: lookup(resolverPath)
  Registry-->>ResolverService: resolverConfig
  ResolverService->>IndexStore: fetchRawRecords(resolverConfig)
  IndexStore-->>ResolverService: rawRecordSets
  ResolverService->>Normalizer: normalize(rawRecordSets, normalizationDefs)
  Normalizer-->>ResolverService: normalizedRecords (layer1->layer2->layer3)
  ResolverService-->>Client: response{records, normalizationMetadata?}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled through specs with curious delight,
Nine keys gleam under moon and morning light.
Layers of logic, records sorted and seen,
Winners and losers, neat and serene.
A rabbit's hop—one plan, tidy and bright!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely the template structure with no substantive content; all required sections (Summary, Why, Testing) remain blank and unfilled. Fill in the Summary, Why, and Testing sections with specific details about the architecture plan, its rationale, and how it was reviewed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change—adding an Architecture Plan for resolver record normalization—and aligns well with the changeset content.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 1061-enable-auto-cleanup-of-resolver-record-fields

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 226-231: The docstring for stripNormalizationMetadata currently
omits what to do when an UnrecognizedKeyAndValue record has a null rawValue;
update the spec and implementation to explicitly include such keys with a null
value (i.e., produce { [rawKey]: null }) rather than omitting them. Modify the
description for stripNormalizationMetadata (and any mentions of
NormalizedRecordSet / UnrecognizedKeyAndValue) to state this behavior, and
ensure the function logic that iterates NormalizedRecordSet (referencing
stripNormalizationMetadata and ValueNormalizationResult) adds entries for
unrecognized rawKey even when rawValue === null, consistent with the declared
return type Record<string, string | null>.
- Around line 155-157: Clarify that AlreadyNormalized is determined by comparing
the normalized output to the original raw input: update the sentence around the
`rawValue` string path so it explicitly says "AlreadyNormalized iff
`normalizedValue === rawValue` (i.e., `validate` + `normalize` produced a value
identical to the original `rawValue`); otherwise treat it as `Normalized` when
the normalized value differs; on failure return `Unnormalizable` with the
failing reason." Reference `ValueNormalizationResult`, `rawValue`,
`normalizedValue`, `validate`, `normalize`, `AlreadyNormalized`, `Normalized`,
and `Unnormalizable` to make the baseline unambiguous.
- Around line 272-277: Add a formal TypeScript type for
TextRecordNormalizationRegistry matching the prose (two maps) so the three
functions normalizeRecord, normalizeRecordSet, and expandNormalizedKeys have a
concrete type to reference; define it as an exported type named
TextRecordNormalizationRegistry with byNormalizedKey: Map<string,
TextRecordKeyDef> and byUnnormalizedKey: Map<string, TextRecordKeyDef>, place it
alongside the other type declarations (near TextRecordKeyDef), and update any
function signatures/imports to use this new type.
- Line 86: The spec currently conflicts when normalizeRecord produces an
Unnormalizable (e.g., rawValue === null or invalid) and the rawKey ===
normalizedKey rule that “wins unconditionally” — change the priority algorithm
to a two-pass ordering: first consider only records where validate+normalize
succeeds (i.e., not Unnormalizable), ordering those by rawKey === normalizedKey
first then by unnormalizedKeys order; only if no normalizable records exist
consider the Unnormalizable records, ordered by the same key-priority rule, so
that null/invalid values cannot beat a usable normalized value; update the logic
in normalizeRecord selection and any code paths that implement the old
unconditional rawKey === normalizedKey rule (references: normalizeRecord,
rawKey, normalizedKey, rawValue, Unnormalizable, validate+normalize,
unnormalizedKeys) to implement this two-pass behavior.
- Around line 485-488: Change the default for the `normalize` parameter from
true to false to avoid silently changing response shape for existing callers of
GET /records/:name; update the parameter table so `normalize` default is
`false`, keep `normalizationMetadata` documented as only meaningful when
`normalize=true`, and add a short migration note advising a rollout plan (e.g.,
versioned endpoint or opt-in header and deprecation window) so integrators can
explicitly opt in before normalized responses are enabled.
- Line 334: The Farcaster/Warpcast entry currently requires scheme-prefixed URLs
("https://warpcast.com/alice", "http://warpcast.com/alice") while Twitter and
Telegram accept bare-domain formats; update the Warpcast handling so the
Warpcast URL rule accepts bare-domain forms like "warpcast.com/alice" (in
addition to scheme-prefixed) by normalizing/validating inputs without a scheme
in the same way as Twitter/Telegram, or if intentional, add an explicit note in
the plan next to the "Warpcast URL" entry stating this is a deliberate design
decision and why.
- Line 522: Fix the malformed Markdown in open question `#4` by moving the
backtick so the bold+code label renders correctly: change the segment that
currently shows `**displayValue` to either ``**`displayValue`**`` or
`**displayValue**` (depending on whether you want code formatting inside bold)
and ensure `url` is formatted similarly; confirm the sentence references to
records.texts and normalizationMetadata remain plain text (no stray backticks)
so the question reads: "Should these enrichment fields be part of records.texts
(when normalize=true) or only inside normalizationMetadata?" and update the open
question label to correctly show "#4." with proper Markdown syntax.
- Around line 181-190: Replace the broad RecordNormalizationResult object type
with a discriminated union keyed on the op field (RecordNormalizationResult) so
consumers get compile-time guarantees: define one variant where op: "Normalized"
(include individual: IndividualRecordNormalizationResult plus normalizedKey,
normalizedValue, displayKey, displayValue, and url), and one or more variants
for other op values (e.g., op: "Unnormalizable" | "Unchanged") that only include
individual and the op literal; keep RecordNormalizationOp as the discriminant
source and update code to narrow by result.op (switch/if) when accessing
normalized* or display* fields.
- Line 316: The username validation regex currently used (the pattern starting
with ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,37}[a-zA-Z0-9])?$) allows consecutive hyphens
like "alice--bob"; update the validation to disallow consecutive hyphens by
adding a negative lookahead for "--" (e.g., ensure the pattern starts with a
check that no two hyphens appear adjacently) while preserving the existing
length and start/end character rules so the "extracted username" validation
enforces 1–39 chars, alphanumeric and hyphens, no leading/trailing hyphen, and
no consecutive hyphens.
- Around line 354-360: Update the "New format" and "Legacy format" regexes:
replace the New format regex `^[a-z0-9_.]{2,32}$` with one that forbids
consecutive periods, e.g. `^(?!.*\.\.)[a-z0-9_.]{2,32}$` (reference: "New
format" line), and replace the Legacy format regex `^.{2,32}#[0-9]{4}$` with one
that restricts the username to printable characters and anchors properly, e.g.
`^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (reference: "Legacy format" line).

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9bffd55 and cd22b13.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Comment on lines +485 to +488
| Parameter | Type | Default | Description |
| ----------------------- | ------- | ------- | -------------------------------------------------------------------------------------------------------------- |
| `normalize` | boolean | `true` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
| `normalizationMetadata` | boolean | `false` | Include the full `NormalizedRecordSet` metadata in the response. Only meaningful when `normalize=true`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

normalize=true default is a silent breaking change for existing API consumers.

Defaulting normalize to true means any existing caller of GET /records/:name that doesn't opt in or out explicitly will suddenly receive normalized keys and values — a different shape than what they were getting before. Consider defaulting to false for the initial rollout, or explicitly describe a versioning/migration strategy (e.g., a new endpoint version, a Content-Type negotiation header, or a deprecation window) so existing integrators aren't broken silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
485 - 488, Change the default for the `normalize` parameter from true to false
to avoid silently changing response shape for existing callers of GET
/records/:name; update the parameter table so `normalize` default is `false`,
keep `normalizationMetadata` documented as only meaningful when
`normalize=true`, and add a short migration note advising a rollout plan (e.g.,
versioned endpoint or opt-in header and deprecation window) so integrators can
explicitly opt in before normalized responses are enabled.

- Introduced `TextRecordNormalizationDefs` to streamline key lookups with two maps for normalized and unnormalized keys.
- Updated normalization functions to utilize the new definitions, improving clarity and efficiency.
- Enhanced comments and invariants to better explain normalization operations and priority rules for handling records.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (2)

507-510: normalize=true default is a silent breaking change for existing API consumers.

Existing callers of GET /records/:name who don't explicitly pass normalize will silently receive a different response shape (normalized keys, normalized values). Consider defaulting to false until a versioning or migration strategy (e.g., new endpoint version, opt-in header, or explicit deprecation window) is in place.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
507 - 510, The default of normalize=true causes a silent breaking change for
existing API consumers of GET /records/:name; change the default for the
`normalize` parameter to false (and keep `normalizationMetadata` behavior tied
to `normalize=true`) or alternatively implement an explicit opt-in (e.g., a new
API version, query param/opt-in header) and add deprecation notes; update any
validation/default logic that sets `normalize` (and related docs/comments) so
existing callers keep the current response shape unless they explicitly opt in
to normalization.

91-91: ⚠️ Potential issue | 🟡 Minor

Inaccurate invariant: "loses to any record with a non-null rawValue" contradicts the two-pass algorithm.

A non-null rawValue that fails validation is also Unnormalizable and therefore also falls into Pass 2 — competing on equal footing with a null-rawValue record, not beating it. The statement should say "always loses to any record with a normalizable value (op AlreadyNormalized or Normalized)."

📝 Suggested fix
-The key is still recognized; the record participates in the set-level priority contest but always loses to any record with a non-null rawValue.
+The key is still recognized; the record participates in the set-level priority contest but always loses to any record with a normalizable value (op `AlreadyNormalized` or `Normalized`). A non-null rawValue that fails validation is also Unnormalizable and competes in Pass 2 with null-rawValue records, not ahead of them.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 91,
Update the inaccurate invariant text: change the claim that a null rawValue
record "always loses to any record with a non-null rawValue" to state it "always
loses to any record with a normalizable value (op `AlreadyNormalized` or
`Normalized`)". Reference the `normalizeRecord` behavior and the two-pass
algorithm (Pass 1 vs Pass 2) and clarify that non-null rawValues that validate
as `Unnormalizable` participate in Pass 2 and therefore compete equally with
null rawValue records (both `Unnormalizable`), rather than automatically losing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 455-457: Add a brief note next to the email validation and
canonicalization lines explaining that lowercasing the full email (the
"Canonical form: lowercased email address." statement) is an intentional
deviation from RFC 5321 which treats the local-part as case-sensitive; reference
the validation pattern `^[^\s@]+@[^\s@]+\.[^\s@]+$` and state succinctly that
this normalization is pragmatic because virtually all real-world mail servers
treat the local-part as case-insensitive, but it may theoretically mismatch
strictly case-sensitive mailboxes.
- Around line 260-266: The function expandNormalizedKeys has an implicit
precondition that every entry in the normalizedKeys parameter must be a
canonical key present in defs; make this explicit and either enforce it: update
the expandNormalizedKeys docstring to state "Precondition: every element of
normalizedKeys must be a valid normalized key defined in defs (caller error
otherwise)", and add a runtime sanity check at the start of expandNormalizedKeys
that validates each normalizedKey against defs (collect invalid keys and throw a
clear TypeError listing them), or if you prefer no runtime cost, at minimum add
the docstring/JSdoc note and consider adding an assert helper; refer to the
expandNormalizedKeys function and its normalizedKeys/defs parameters when making
the change.
- Around line 399-401: The spec currently accepts bare-domain "t.me/alice" but
not the equivalent bare-domain "telegram.me/alice"; update the
accepted-inputs/validation for Telegram links so that "telegram.me/alice" (no
scheme) is treated the same as "t.me/alice". Locate the section listing "- t.me
URL: `https://t.me/alice`, `http://t.me/alice`, `t.me/alice`" and the line "-
telegram.me URL: `https://telegram.me/alice`" and modify the pattern/allowed
list or regex used there to include the bare-domain form "telegram.me/alice"
(and ensure the validation accepts optional http/https variants as already done
for t.me).

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 507-510: The default of normalize=true causes a silent breaking
change for existing API consumers of GET /records/:name; change the default for
the `normalize` parameter to false (and keep `normalizationMetadata` behavior
tied to `normalize=true`) or alternatively implement an explicit opt-in (e.g., a
new API version, query param/opt-in header) and add deprecation notes; update
any validation/default logic that sets `normalize` (and related docs/comments)
so existing callers keep the current response shape unless they explicitly opt
in to normalization.
- Line 91: Update the inaccurate invariant text: change the claim that a null
rawValue record "always loses to any record with a non-null rawValue" to state
it "always loses to any record with a normalizable value (op `AlreadyNormalized`
or `Normalized`)". Reference the `normalizeRecord` behavior and the two-pass
algorithm (Pass 1 vs Pass 2) and clarify that non-null rawValues that validate
as `Unnormalizable` participate in Pass 2 and therefore compete equally with
null rawValue records (both `Unnormalizable`), rather than automatically losing.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd22b13 and c9b1d62.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Comment on lines +455 to +457
**Validation**: must match `^[^\s@]+@[^\s@]+\.[^\s@]+$` (basic structural check; full RFC 5322 compliance is not required).

**Canonical form**: lowercased email address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the deliberate RFC deviation for email local-part lowercasing.

RFC 5321 treats the local part (before @) as case-sensitive. Lowercasing the full email address is a pragmatic normalization choice but could theoretically mismatch a server that distinguishes Alice@example.com from alice@example.com. The spec already disclaims full RFC 5322 compliance; add a brief note here that local-part case normalization is an intentional deviation and that real-world mail servers virtually universally treat it as case-insensitive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
455 - 457, Add a brief note next to the email validation and canonicalization
lines explaining that lowercasing the full email (the "Canonical form:
lowercased email address." statement) is an intentional deviation from RFC 5321
which treats the local-part as case-sensitive; reference the validation pattern
`^[^\s@]+@[^\s@]+\.[^\s@]+$` and state succinctly that this normalization is
pragmatic because virtually all real-world mail servers treat the local-part as
case-insensitive, but it may theoretically mismatch strictly case-sensitive
mailboxes.

- Updated `normalizeRecord` to return an `IndividualRecordNormalizationResult` for null values, clarifying its treatment in the normalization process.
- Refined the `normalizedRecords` and `unnormalizedRecords` types to ensure only relevant operation values are included.
- Enhanced comments to provide clearer explanations of the priority rules and preconditions for key normalization.
@djstrong djstrong marked this pull request as ready for review February 23, 2026 18:07
@djstrong djstrong requested a review from a team as a code owner February 23, 2026 18:07
Copilot AI review requested due to automatic review settings February 23, 2026 18:07
@djstrong djstrong marked this pull request as draft February 23, 2026 18:08
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

This PR adds a comprehensive architecture plan for ENS resolver record normalization, documenting a three-phase approach to validate and normalize resolver records across different social platform keys.

Key additions:

  • Detailed type system for key/value normalization with distinct operations (AlreadyNormalized, Normalized, Unnormalizable, Unrecognized)
  • Two-pass priority algorithm for handling multiple records mapping to the same normalized key
  • Validation specifications for 9 common platforms (Twitter/X, GitHub, Farcaster, Discord, Telegram, Reddit, URL, Email, Avatar)
  • API integration design with normalize and normalizationMetadata query parameters

Issues found:

  • Several validation regex patterns have logical errors (GitHub consecutive hyphens, Farcaster trailing hyphens)
  • Type inconsistencies in return values and edge case handling
  • Multiple unresolved design decisions flagged in "Open questions" section that affect the public API contract
  • Some platform validation rules may not match current official constraints

Confidence Score: 3/5

  • Safe to merge as a planning document, but requires resolution of open questions and validation fixes before implementation
  • This is an architecture plan document, not production code. While it contains logical errors in validation patterns and unresolved design decisions, it serves its purpose as a planning artifact. The critical issues identified need resolution before implementation begins.
  • The plan document requires attention to resolve the 7 open questions listed at the end, plus fixing the validation regex patterns for GitHub and Farcaster before implementation

Important Files Changed

Filename Overview
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md Architecture plan for ENS resolver record normalization with comprehensive type definitions and validation specs; contains several ambiguous design decisions and validation pattern issues

Last reviewed commit: c3c1572

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

- reddit.com URL: `https://reddit.com/u/alice`, `https://www.reddit.com/u/alice`, `https://reddit.com/user/alice`

**Validation**: extracted username must match `^[a-zA-Z0-9_-]{3,20}$`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex ^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$ allows # within the username portion (before the discriminator), which could cause parsing ambiguity if a username contains #.

Suggested change
**Validation**: must match `^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$` (printable characters only, no `#` in username before the discriminator).


**Validation**: must begin with one of the recognized prefixes (`https://`, `http://`, `ipfs://`, `eip155:`, `data:image/`).

**Canonical form**: value as-is (no transformation applied).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reddit usernames validation allows up to 20 chars, but actual Reddit usernames can be up to 20 characters. The pattern is correct, but verify this matches Reddit's current username policy.

For the RPC path, ENS resolution already uses a multicall pattern: all record lookups for a given name are batched into a single `eth_call`. Key expansion therefore does **not** require extra round trips — the expanded key list is simply added to the same multicall batch. The overall overhead is one additional text slot per unnormalized variant per expanded key, within the same single RPC call.

### Parameters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Email validation regex ^[^\s@]+@[^\s@]+\.[^\s@]+$ is overly permissive and allows invalid emails like a@b. or @a.b. Consider using a more robust pattern or a proper email validation library.

3. **RPC path**: all candidate keys are included in the same multicall batch used to resolve all other records. No additional RPC round trips are incurred.
4. The resolved raw records (potentially including fallback key variants) are then passed through the normalization pipeline.

### Response shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avatar validation only checks prefixes but doesn't validate the actual URI structure. For example, ipfs:// without a valid CID would pass. Consider adding format validation for each URI type.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (2)

518-520: normalize=true default silently breaks existing API consumers.

Any existing caller of GET /records/:name that doesn't explicitly opt in or out will receive a different response shape (normalized keys/values) after this rolls out. Consider defaulting to false for the initial rollout and providing a migration/versioning note.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
518 - 520, The defaulting of the `normalize` parameter to true will change
response shapes for existing callers of GET /records/:name; change the default
for `normalize` from true to false in the API definition (and any related
handler/configuration where `normalize` is read), and add a migration/versioning
note to the API docs describing the opt-in normalization behavior and how to
enable `normalize=true` to adopt the new shape. Ensure references to the
`normalize` parameter (e.g., in the GET /records/:name handler and any
normalization pipeline function names) are updated so consumers retain current
responses unless they explicitly opt in.

466-468: Missing RFC 5321 deviation note for email local-part lowercasing.

RFC 5321 treats the local part (before @) as case-sensitive, but the spec silently lowercases the entire address. The pragmatic rationale (virtually all real-world servers are case-insensitive) should be documented here so implementors understand this is an intentional deviation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
466 - 468, Add a short RFC deviation note adjacent to the "Canonical form:
lowercased email address" entry explaining that RFC 5321 considers the
local-part case-sensitive but this spec intentionally lowercases the entire
address for pragmatic reasons; reference the "Canonical form" phrase and state
the rationale (real-world servers are effectively case-insensitive,
interoperability benefits) and mark it as an intentional deviation from RFC 5321
so implementors understand this is deliberate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 246-250: normalizeRecordSet currently allows duplicate
unrecognized rawKey entries to silently overwrite each other; update the
two-pass logic in normalizeRecordSet so that when encountering an unrecognized
rawKey the first occurrence is written into normalizedRecords as
UnrecognizedKeyAndValue and any subsequent occurrences do not overwrite it but
are appended to unnormalizedRecords as DuplicateNormalizedKey entries (same
shape used for recognized-key duplicates). Locate the insertion point that
writes unrecognized keys into normalizedRecords and add a check for existing
normalizedRecords[rawKey] to implement “first-in wins” and push duplicates to
unnormalizedRecords with the DuplicateNormalizedKey tag, keeping the rest of the
existing collision handling for recognized keys unchanged.
- Line 495: The IPFS conversion in the url() specification is ambiguous about
path suffixes; update the url function's ipfs:// handling (the url() rule) to
explicitly preserve any path, query, and fragment after the CID and append them
to the gateway URL (e.g., ipfs://<cid>/path -> https://ipfs.io/ipfs/<cid>/path,
including ?query and `#fragment`), and state that these components are not
stripped or reinterpreted; also note that percent-encoding should be preserved
as-is when forming the gateway URL.
- Line 414: The current validation regex `^[a-zA-Z0-9_]{5,32}$` allows
leading/trailing and consecutive underscores; replace it with a regex that
forbids those cases, e.g. use `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$` wherever
the "extracted username must match `^[a-zA-Z0-9_]{5,32}$`" rule is implemented
so usernames cannot start or end with '_' and cannot contain '__'.
- Line 393: The legacy username regex line allows '#' inside the username
because the character class only excludes control chars; update the pattern used
in the "Legacy format" rule (the string starting with ^ and ending with $) so
the username character class also excludes '#' (i.e., add '#' to the negative
character class for the 2–32 username segment) so inputs like "alice#bob#1234"
no longer match.

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 518-520: The defaulting of the `normalize` parameter to true will
change response shapes for existing callers of GET /records/:name; change the
default for `normalize` from true to false in the API definition (and any
related handler/configuration where `normalize` is read), and add a
migration/versioning note to the API docs describing the opt-in normalization
behavior and how to enable `normalize=true` to adopt the new shape. Ensure
references to the `normalize` parameter (e.g., in the GET /records/:name handler
and any normalization pipeline function names) are updated so consumers retain
current responses unless they explicitly opt in.
- Around line 466-468: Add a short RFC deviation note adjacent to the "Canonical
form: lowercased email address" entry explaining that RFC 5321 considers the
local-part case-sensitive but this spec intentionally lowercases the entire
address for pragmatic reasons; reference the "Canonical form" phrase and state
the rationale (real-world servers are effectively case-insensitive,
interoperability benefits) and mark it as an intentional deviation from RFC 5321
so implementors understand this is deliberate.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9b1d62 and 463048d.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

**Validation**:

- New format: must match `^(?!.*\.\.)[a-z0-9_.]{2,32}$` (no consecutive periods).
- Legacy format: must match `^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (printable characters only before the `#` discriminator).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Discord legacy regex allows # inside the username segment.

^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$ — the character class only excludes control characters, so the input alice#bob#1234 matches (greedy), treating alice#bob as the username with discriminator 1234. Discord legacy usernames cannot contain #; the regex should exclude it from the username part:

🐛 Proposed fix
-- Legacy format: must match `^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (printable characters only before the `#` discriminator).
+- Legacy format: must match `^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$` (printable, non-`#` characters before the `#` discriminator).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 393,
The legacy username regex line allows '#' inside the username because the
character class only excludes control chars; update the pattern used in the
"Legacy format" rule (the string starting with ^ and ending with $) so the
username character class also excludes '#' (i.e., add '#' to the negative
character class for the 2–32 username segment) so inputs like "alice#bob#1234"
no longer match.

- t.me URL: `https://t.me/alice`, `http://t.me/alice`, `t.me/alice`
- telegram.me URL: `https://telegram.me/alice`, `http://telegram.me/alice`, `telegram.me/alice`

**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Telegram regex missing leading/trailing/consecutive underscore constraints.

^[a-zA-Z0-9_]{5,32}$ accepts _alice, alice_, and alice__bob, all of which are rejected by Telegram.

Telegram usernames must be between 5 and 32 characters using only Latin letters, numbers, and underscores; usernames cannot start or end with underscores, and consecutive underscores are not permitted. This is also confirmed by the Telegram Desktop bug report showing usernames like @abcde_ or @_abcde are invalid.

🐛 Proposed fix
-**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
+**Validation**: extracted username must match `^[a-zA-Z0-9](?:[a-zA-Z0-9]|_(?!_)){3,30}[a-zA-Z0-9]$` (5–32 chars, no leading/trailing underscore, no consecutive underscores).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 414,
The current validation regex `^[a-zA-Z0-9_]{5,32}$` allows leading/trailing and
consecutive underscores; replace it with a regex that forbids those cases, e.g.
use `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$` wherever the "extracted username
must match `^[a-zA-Z0-9_]{5,32}$`" rule is implemented so usernames cannot start
or end with '_' and cannot contain '__'.


**displayValue**: same as canonical form.

**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

IPFS url() conversion underspecified for URIs with path components.

The spec states the url function converts ipfs://Qm... to https://ipfs.io/ipfs/{cid}, but doesn't define whether the {cid} placeholder includes any path suffix from the original URI (e.g., ipfs://bafybei.../metadata.jsonhttps://ipfs.io/ipfs/bafybei.../metadata.json?). Clarify whether the full URI path after the CID root is preserved or dropped in the gateway URL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 495,
The IPFS conversion in the url() specification is ambiguous about path suffixes;
update the url function's ipfs:// handling (the url() rule) to explicitly
preserve any path, query, and fragment after the CID and append them to the
gateway URL (e.g., ipfs://<cid>/path -> https://ipfs.io/ipfs/<cid>/path,
including ?query and `#fragment`), and state that these components are not
stripped or reinterpreted; also note that percent-encoding should be preserved
as-is when forming the gateway URL.

…ized records

- Expanded the logic for handling unrecognized keys and values in the normalization process, ensuring that duplicates are correctly categorized in `unnormalizedRecords`.
- Updated validation rules for extracted usernames to enforce a minimum length of 4 characters.
- Enhanced comments to clarify the treatment of `Unnormalizable` records and the priority of canonical keys in normalization decisions.
- t.me URL: `https://t.me/alice`, `http://t.me/alice`, `t.me/alice`
- telegram.me URL: `https://telegram.me/alice`, `http://telegram.me/alice`, `telegram.me/alice`

**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
Copy link
Contributor

@vercel vercel bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telegram username regex is too permissive and allows invalid formats (leading/trailing underscores and consecutive underscores)

Fix on Vercel

**Accepted input formats**: any string matching a standard email format.

**Validation**: must match `^[^\s@]+@[^\s@]+\.[^\s@]+$` (basic structural check; full RFC 5322 compliance is not required).

Copy link
Contributor

@vercel vercel bot Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation of email normalization deviation from RFC 5321 regarding case-sensitivity of the local part

Fix on Vercel

…normalization

- Expanded the logic for unrecognized `rawKey` handling, ensuring the first occurrence is categorized as `UnrecognizedKeyAndValue` and subsequent duplicates are marked as `DuplicateNormalizedKey`.
- Enhanced comments to clarify the implications of normalization decisions and the treatment of records in `normalizedRecords` and `unnormalizedRecords`.
@djstrong
Copy link
Contributor Author

@lightwalker-eth The plan is ready for review. There are a few open questions:

  1. Normalized key for Twitter: com.twitter or com.x (reflecting the platform rebrand)? What unnormalized variants should be included?
  2. Parameter name for metadata field: normalizationMetadata, includeNormalizationMetadata, or another name?
  3. Unnormalizable behavior: When a key is recognized but no candidate produces a valid value, should records.texts contain null for that key, or should the key be omitted from the response entirely?
  4. displayValue and url placement: Should these enrichment fields be part of records.texts (when normalize=true) or only inside normalizationMetadata? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
  5. Client requesting an unnormalized key directly: If a client passes texts=vnd.twitter (an unnormalized variant) instead of texts=com.twitter, should expandNormalizedKeys throw immediately (current spec — fail fast, caller error), or silently map it to the canonical key and expand from there (more forgiving for legacy integrations)? The issue does not address this case.
  6. Placement of UnrecognizedKeyAndValue records: The current spec places them in normalizedRecords (keyed by rawKey) so that stripNormalizationMetadata only needs to iterate one map to produce the full output. The issue's own description says normalizedRecords maps normalized keys and unnormalizedRecords holds "all records that were unnormalized for one reason or another" — which by that framing would put unrecognized records in unnormalizedRecords. Decision: should unnormalizedRecords mean "records that don't appear in output" (current spec) or "records that weren't normalized" (issue's framing)?

@djstrong djstrong marked this pull request as ready for review February 25, 2026 12:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

- If `rawValue` is a string: run validate + normalize on rawValue.
- If succeeded: `valueResult` is `AlreadyNormalized` iff `normalizedValue === rawValue` (validate + normalize produced a value identical to the original rawValue); otherwise `valueResult` is `Normalized` (normalizedValue differs from rawValue).
- If failed: `valueResult` is `Unnormalizable` with the reason from the failing step.
- If not found: `keyResult` is Unrecognized; `valueResult` is Unrecognized. At the set level, the first record with a given unrecognized `rawKey` is written into `normalizedRecords` as `UnrecognizedKeyAndValue`; any subsequent record with the same `rawKey` gets `op: "DuplicateNormalizedKey"` and goes into `unnormalizedRecords` (consistent with how recognized-key duplicates are handled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent handling: unrecognized duplicate keys get DuplicateNormalizedKey op, but this contradicts the type definition where DuplicateNormalizedKey should only apply to recognized keys with the same normalized key. For unrecognized keys, there's no normalization happening, so calling it "DuplicateNormalizedKey" is misleading.

): Record<string, string | null>
```

Returns only the `normalizedKey → normalizedValue` pairs from the "Normalized" records, plus `rawKey → rawValue` passthrough for "UnrecognizedKeyAndValue" records. Unrecognized keys are always included even when `rawValue` is null — producing `{ [rawKey]: null }` — so the caller receives a complete picture of every key that was present in the input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type Record<string, string | null> suggests null values are included for unrecognized keys. However, for "Normalized" records, the value is always a string (per line 193). This creates ambiguity - clarify whether the function returns Record<string, string> for normalized records only, or includes nulls for certain cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (6)

513-516: ⚠️ Potential issue | 🟠 Major

Breaking default behavior change remains risky (normalize defaults to true).

Line 515 still defaults normalization on for all callers, which can silently change records.texts semantics for existing clients. Keep this opt-in for rollout safety.

Proposed change
-| `normalize`             | boolean | `true`  | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
+| `normalize`             | boolean | `false` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
513 - 516, The plan currently sets the normalize parameter default to true which
can silently change existing clients' behavior (e.g., semantics of
records.texts); change the default for `normalize` to false (keep
`normalizationMetadata` default false) and update any related docs/examples
mentioning `normalize` or `NormalizedRecordSet` to state that normalization is
opt-in to avoid breaking existing callers. Ensure references to `normalize` in
the plan and API descriptions explicitly note the default is false and that
enabling it will expand keys pre-resolution and run the normalization pipeline
post-resolution.

550-550: ⚠️ Potential issue | 🟡 Minor

Malformed Markdown in open question #4 is still present.

Line 550 has broken backtick/bold ordering and renders incorrectly.

Proposed change
-4. `**displayValue` and `url` placement**: Should these enrichment fields be part of `records.texts` (when `normalize=true`) or only inside `normalizationMetadata`? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
+4. **`displayValue` and `url` placement**: Should these enrichment fields be part of `records.texts` (when `normalize=true`) or only inside `normalizationMetadata`? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 550,
Fix the malformed Markdown in the open question line that currently reads "4.
`**displayValue` and `url` placement**: ...": remove the mismatched
backtick/bold markers and reformat the phrase so inline code and bold are
correctly nested (for example use **displayValue and url placement** or
`displayValue` and `url` inside normal text), updating the sentence in the
paragraph that begins with "4." to render properly in Markdown; ensure the open
question wording (the text mentioning records.texts, normalize=true, and
normalizationMetadata) remains unchanged aside from correcting the backtick/bold
markup.

469-470: ⚠️ Potential issue | 🟡 Minor

Email lowercasing policy needs explicit RFC-deviation note.

Line 469 defines full-address lowercasing but does not document the local-part case-sensitivity tradeoff (RFC 5321). The behavior is reasonable, but it should be explicit in the spec.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
469 - 470, Update the "Canonical form: lowercased email address" section to
explicitly note this is a deliberate deviation from RFC 5321: state that
although the RFC allows local-part case-sensitivity, this system normalizes by
lowercasing the entire address (local-part and domain) for matching and storage,
and document the tradeoff and any user-visible implications (e.g., mailbox
uniqueness, authentication comparisons, and migration considerations). Reference
the canonical-form wording directly so readers see the connection and cite RFC
5321 in the note.

496-496: ⚠️ Potential issue | 🟡 Minor

IPFS url conversion remains underspecified for suffix components.

Line 496 does not state whether /path, ?query, and #fragment are preserved when mapping ipfs:// to gateway URLs. This can cause inconsistent client behavior.

Proposed wording
-**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
+**url**: for `https://`/`http://` — same as value; for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}` and preserve any subsequent path/query/fragment exactly; for `eip155:` and `data:` URIs — `null` (requires off-chain resolution beyond this layer).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 496,
Clarify the ipfs:// → https://ipfs.io/ipfs/{cid} mapping by explicitly stating
that any /path, ?query and `#fragment` suffixes on the original ipfs:// URI must
be preserved and appended to the gateway URL (resulting in
https://ipfs.io/ipfs/{cid}{/path}{?query}{`#fragment`}); update the wording around
the "ipfs:// to https://ipfs.io/ipfs/{cid}" conversion so it unambiguously
instructs preservation of those components and how to handle percent-encoding or
empty components during the transformation.

394-394: ⚠️ Potential issue | 🟡 Minor

Discord legacy regex still permits # inside the username segment.

Line 394 can match alice#bob#1234 due to the current class. Exclude # from the username portion.

Proposed change
-- Legacy format: must match `^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$` (printable characters only before the `#` discriminator).
+- Legacy format: must match `^[^\x00-\x1F\x7F#]{2,32}#[0-9]{4}$` (printable, non-`#` characters only before the `#` discriminator).
Discord legacy username format can username contain # before discriminator
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 394,
The legacy Discord username regex currently uses the class
`^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$`, which still allows `#` inside the username;
update the character class to explicitly exclude `#` (e.g. change
`[^\x00-\x1F\x7F]` to `[^\x00-\x1F\x7F#]`) so usernames like `alice#bob#1234` no
longer match while preserving the 2–32 printable-character constraint and the
`#[0-9]{4}` discriminator.

415-415: ⚠️ Potential issue | 🟡 Minor

Telegram username regex is too permissive.

Line 415 still allows leading/trailing/consecutive underscores (_alice, alice_, alice__bob), which should be rejected.

Proposed change
-**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
+**Validation**: extracted username must match `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$`.
Telegram username rules leading trailing underscore consecutive underscores official documentation
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 415,
The current Validation rule that checks the extracted username using a simple
"letters/digits/underscore, 5–32 chars" pattern allows leading, trailing, and
consecutive underscores; update the Validation for the extracted username to
enforce three constraints: total length 5–32, disallow a leading underscore,
disallow a trailing underscore, and disallow consecutive underscores while still
allowing only letters, digits and underscores; implement this by replacing the
existing permissive pattern on the "Validation" line with a regex that uses a
length check and negative lookarounds to forbid starting/ending underscores and
double underscores so usernames like "_alice", "alice_", and "alice__bob" are
rejected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 549-553: Summary: The spec leaves core response-shape decisions
ambiguous (null vs omit, enrichment placement, unrecognized-record placement,
handling unnormalized keys), while other sections (Layers 2/3, response-shape)
assume concrete behavior. Fix: update the plan so that for each item listed (1.
unnormalizable behavior for records.texts, 2. placement of displayValue and url
between records.texts vs normalizationMetadata, 3. behavior of
expandNormalizedKeys when given unnormalized keys like texts=vnd.twitter, 4.
placement semantics for UnrecognizedKeyAndValue between normalizedRecords and
unnormalizedRecords, and 5. validation regex verification) you either (A) choose
and document the canonical behavior explicitly in the earlier response-shape and
Layers 2/3 sections, referencing symbols records.texts, normalizationMetadata,
expandNormalizedKeys, normalizedRecords, unnormalizedRecords,
stripNormalizationMetadata, or (B) mark those earlier sections as
provisional/TBD with a clear non-normative flag and add a TODO linking to this
decision list; ensure the doc includes the verification task for
service-specific regexes as a required pre-implementation step.

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 513-516: The plan currently sets the normalize parameter default
to true which can silently change existing clients' behavior (e.g., semantics of
records.texts); change the default for `normalize` to false (keep
`normalizationMetadata` default false) and update any related docs/examples
mentioning `normalize` or `NormalizedRecordSet` to state that normalization is
opt-in to avoid breaking existing callers. Ensure references to `normalize` in
the plan and API descriptions explicitly note the default is false and that
enabling it will expand keys pre-resolution and run the normalization pipeline
post-resolution.
- Line 550: Fix the malformed Markdown in the open question line that currently
reads "4. `**displayValue` and `url` placement**: ...": remove the mismatched
backtick/bold markers and reformat the phrase so inline code and bold are
correctly nested (for example use **displayValue and url placement** or
`displayValue` and `url` inside normal text), updating the sentence in the
paragraph that begins with "4." to render properly in Markdown; ensure the open
question wording (the text mentioning records.texts, normalize=true, and
normalizationMetadata) remains unchanged aside from correcting the backtick/bold
markup.
- Around line 469-470: Update the "Canonical form: lowercased email address"
section to explicitly note this is a deliberate deviation from RFC 5321: state
that although the RFC allows local-part case-sensitivity, this system normalizes
by lowercasing the entire address (local-part and domain) for matching and
storage, and document the tradeoff and any user-visible implications (e.g.,
mailbox uniqueness, authentication comparisons, and migration considerations).
Reference the canonical-form wording directly so readers see the connection and
cite RFC 5321 in the note.
- Line 496: Clarify the ipfs:// → https://ipfs.io/ipfs/{cid} mapping by
explicitly stating that any /path, ?query and `#fragment` suffixes on the original
ipfs:// URI must be preserved and appended to the gateway URL (resulting in
https://ipfs.io/ipfs/{cid}{/path}{?query}{`#fragment`}); update the wording around
the "ipfs:// to https://ipfs.io/ipfs/{cid}" conversion so it unambiguously
instructs preservation of those components and how to handle percent-encoding or
empty components during the transformation.
- Line 394: The legacy Discord username regex currently uses the class
`^[^\x00-\x1F\x7F]{2,32}#[0-9]{4}$`, which still allows `#` inside the username;
update the character class to explicitly exclude `#` (e.g. change
`[^\x00-\x1F\x7F]` to `[^\x00-\x1F\x7F#]`) so usernames like `alice#bob#1234` no
longer match while preserving the 2–32 printable-character constraint and the
`#[0-9]{4}` discriminator.
- Line 415: The current Validation rule that checks the extracted username using
a simple "letters/digits/underscore, 5–32 chars" pattern allows leading,
trailing, and consecutive underscores; update the Validation for the extracted
username to enforce three constraints: total length 5–32, disallow a leading
underscore, disallow a trailing underscore, and disallow consecutive underscores
while still allowing only letters, digits and underscores; implement this by
replacing the existing permissive pattern on the "Validation" line with a regex
that uses a length check and negative lookarounds to forbid starting/ending
underscores and double underscores so usernames like "_alice", "alice_", and
"alice__bob" are rejected.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 463048d and c3c1572.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Comment on lines +549 to +553
3. **Unnormalizable behavior**: When a key is recognized but no candidate produces a valid value, should `records.texts` contain `null` for that key, or should the key be omitted from the response entirely?
4. `**displayValue` and `url` placement**: Should these enrichment fields be part of `records.texts` (when `normalize=true`) or only inside `normalizationMetadata`? Including them in the main response is more convenient for UI clients but changes the primary response shape significantly for all callers.
5. **Client requesting an unnormalized key directly**: If a client passes `texts=vnd.twitter` (an unnormalized variant) instead of `texts=com.twitter`, should `expandNormalizedKeys` throw immediately (current spec — fail fast, caller error), or silently map it to the canonical key and expand from there (more forgiving for legacy integrations)? The issue does not address this case.
6. **Placement of `UnrecognizedKeyAndValue` records**: The current spec places them in `normalizedRecords` (keyed by `rawKey`) so that `stripNormalizationMetadata` only needs to iterate one map to produce the full output. The issue's own description says `normalizedRecords` maps *normalized keys* and `unnormalizedRecords` holds "all records that were unnormalized for one reason or another" — which by that framing would put unrecognized records in `unnormalizedRecords`. Decision: should `unnormalizedRecords` mean "records that don't appear in output" (current spec) or "records that weren't normalized" (issue's framing)?
7. **Verify validation rules against each service's official constraints**: The regexes and accepted input formats in this spec were derived from best-effort research and may not match each platform's current actual rules. Before finalising implementation, verify against official documentation or source code for: Twitter/X (username charset, 15-char limit), GitHub (39-char limit, hyphen rules), Farcaster (lowercase-only, length), Discord (new-format charset, period rules, legacy discriminator format), Telegram (5–32 chars, charset), Reddit (3–20 chars, charset), email (RFC compliance level), avatar (supported URI schemes). Flag any discrepancy as a bug in the transform definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Core response-contract decisions are still open while earlier sections imply concrete behavior.

Lines 549–553 keep key API-shape decisions open (null vs omit, enrichment placement, unrecognized placement), but Layers 2/3 and response-shape sections already assume one behavior. This leaves implementation acceptance criteria ambiguous.

Please either:

  1. resolve these in this doc revision, or
  2. mark the earlier sections as provisional/TBD with explicit non-normative status.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
549 - 553, Summary: The spec leaves core response-shape decisions ambiguous
(null vs omit, enrichment placement, unrecognized-record placement, handling
unnormalized keys), while other sections (Layers 2/3, response-shape) assume
concrete behavior. Fix: update the plan so that for each item listed (1.
unnormalizable behavior for records.texts, 2. placement of displayValue and url
between records.texts vs normalizationMetadata, 3. behavior of
expandNormalizedKeys when given unnormalized keys like texts=vnd.twitter, 4.
placement semantics for UnrecognizedKeyAndValue between normalizedRecords and
unnormalizedRecords, and 5. validation regex verification) you either (A) choose
and document the canonical behavior explicitly in the earlier response-shape and
Layers 2/3 sections, referencing symbols records.texts, normalizationMetadata,
expandNormalizedKeys, normalizedRecords, unnormalizedRecords,
stripNormalizationMetadata, or (B) mark those earlier sections as
provisional/TBD with a clear non-normative flag and add a TODO linking to this
decision list; ensure the doc includes the verification task for
service-specific regexes as a required pre-implementation step.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md (5)

496-496: ⚠️ Potential issue | 🟡 Minor

IPFS url() conversion is silent on path/query/fragment after the CID.

ipfs://bafybei.../metadata.json → should it produce https://ipfs.io/ipfs/bafybei.../metadata.json (path preserved) or https://ipfs.io/ipfs/bafybei... (path dropped)? The current spec is ambiguous. Implementors need an explicit rule.

📝 Proposed clarification
-for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}`
+for `ipfs://` — convert to `https://ipfs.io/ipfs/{cid}` preserving any path, query, and fragment components from the original URI (e.g. `ipfs://bafybei.../metadata.json` → `https://ipfs.io/ipfs/bafybei.../metadata.json`); percent-encoding is preserved as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 496,
Clarify the IPFS url() conversion rule: update the specification for the IPFS
branch (the description that currently reads "for `ipfs://` — convert to
`https://ipfs.io/ipfs/{cid}`") to explicitly preserve the entire hierarchical
suffix (path), query string, and fragment — e.g.,
`ipfs://{cid}/path/file?query=1#frag` ->
`https://ipfs.io/ipfs/{cid}/path/file?query=1#frag`; reference the `url()`
conversion rule and the `ipfs://` handling text so implementors know to append
the original path/query/fragment after the CID rather than dropping them.

467-469: ⚠️ Potential issue | 🟡 Minor

Email local-part lowercasing deviates from RFC 5321 — add a note.

RFC 5321 §2.4 treats the local part (before @) as case-sensitive; lowercasing the full address is a pragmatic decision that could theoretically mismatch a server that honours case in the local part. The spec disclaims full RFC 5322 compliance for parsing but does not document this normalization choice as an intentional deviation.

📝 Proposed addition
 **Canonical form**: lowercased email address.
+> **Note**: lowercasing the local part (before `@`) is an intentional deviation from RFC 5321, which treats it as case-sensitive. This is pragmatic because virtually all real-world mail servers are case-insensitive for the local part, but it may theoretically mismatch a strictly case-sensitive mailbox.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
467 - 469, Add a short note under the "Canonical form: lowercased email
address." section clarifying that lowercasing the full address intentionally
deviates from RFC 5321's treatment of the local-part as potentially
case-sensitive; reference RFC 5321 §2.4, explain the pragmatic rationale
(compatibility and normalization benefits) and warn about the small risk of
mismatch with servers that honor local-part case, and mark this as an explicit
design decision tied to the "Validation" and "Canonical form" statements.

550-550: ⚠️ Potential issue | 🟡 Minor

Malformed markdown in open question #4 — stray backtick before bold marker.

The current text starts with `**displayValue, which opens a code span rather than bold. The intended rendering is bold+code for both field names.

📝 Fix
-4. `**displayValue` and `url` placement`: Should these enrichment fields...
+4. **`displayValue` and `url` placement**: Should these enrichment fields...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 550,
The markdown has a stray backtick before the bold marker in the open question so
"`**displayValue`" renders as a code span instead of bold+code for the field
names; fix the sentence to render both field names as bold+code (e.g.,
**`displayValue`** and **`url`**) and ensure the surrounding text references the
correct symbols (`records.texts`, `normalizationMetadata`, and `normalize=true`)
so the question reads clearly about whether these enrichment fields should be in
records.texts (when normalize=true) or only in normalizationMetadata.

513-516: ⚠️ Potential issue | 🟠 Major

normalize=true default silently breaks existing API consumers.

Any existing caller of GET /records/:name that doesn't explicitly pass normalize=false will immediately receive a different response shape (normalized keys, normalized values) after deployment — with no opt-in window. The safe default for any additive API feature is false, with a documented migration path (e.g. versioned endpoint, deprecation window, or Content-Type negotiation).

📝 Proposed fix
-| `normalize`             | boolean | `true`  | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
+| `normalize`             | boolean | `false` | Normalize keys and values. If true, expand keys pre-resolution and run normalization pipeline post-resolution. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
513 - 516, The default for the new normalization feature is unsafe — change the
`normalize` parameter default from true to false in the API docs/table and
update the GET /records/:name implementation to respect that default (ensure
code paths in the request handler that reference `normalize` and
`normalizationMetadata` treat absent/undefined as false), keep
`normalizationMetadata` behavior gated on `normalize===true`, and add a short
migration note/changelog entry describing how consumers can opt in (pass
normalize=true or use the new versioned endpoint) and that NormalizedRecordSet
will only appear when normalize is true.

415-415: ⚠️ Potential issue | 🟡 Minor

Telegram regex still permits leading/trailing and consecutive underscores.

^[a-zA-Z0-9_]{5,32}$ accepts _alice, alice_, and alice__bob. Telegram Desktop bug #569 confirms that usernames like @abcde_ or @_abcde are invalid. The fix proposed in prior review rounds has not yet been applied.

🐛 Proposed fix
-**Validation**: extracted username must match `^[a-zA-Z0-9_]{5,32}$`.
+**Validation**: extracted username must match `^(?!_)(?!.*__)[A-Za-z0-9_]{5,32}(?<!_)$` (5–32 chars, no leading/trailing underscore, no consecutive underscores).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md at line 415,
The current validation pattern `^[a-zA-Z0-9_]{5,32}$` (the "extracted username
must match" rule) allows leading, trailing and consecutive underscores; update
that validation to disallow usernames that start or end with '_' and to forbid
consecutive '__' while still enforcing the 5–32 length and allowed characters.
Replace the existing regex/pattern with one that (1) asserts total length 5–32,
(2) rejects any occurrence of "__", (3) rejects a leading "_" and (4) rejects a
trailing "_" so usernames like `_alice`, `alice_`, and `alice__bob` are invalid.
Ensure the new pattern is used wherever the original `^[a-zA-Z0-9_]{5,32}$` rule
is referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Around line 150-165: The spec incorrectly places set-level deduplication
behavior inside the per-record function description for normalizeRecord (which
returns IndividualRecordNormalizationResult and has no access to set state);
remove the paragraph beginning "At the set level, the first record with a given
unrecognized `rawKey`..." from the normalizeRecord logic and instead insert that
entire sentence into the normalizeRecordSet section (alongside the existing
two-pass algorithm) so that deduplication is described where the full candidate
list and set-level decisions are available.

---

Duplicate comments:
In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md:
- Line 496: Clarify the IPFS url() conversion rule: update the specification for
the IPFS branch (the description that currently reads "for `ipfs://` — convert
to `https://ipfs.io/ipfs/{cid}`") to explicitly preserve the entire hierarchical
suffix (path), query string, and fragment — e.g.,
`ipfs://{cid}/path/file?query=1#frag` ->
`https://ipfs.io/ipfs/{cid}/path/file?query=1#frag`; reference the `url()`
conversion rule and the `ipfs://` handling text so implementors know to append
the original path/query/fragment after the CID rather than dropping them.
- Around line 467-469: Add a short note under the "Canonical form: lowercased
email address." section clarifying that lowercasing the full address
intentionally deviates from RFC 5321's treatment of the local-part as
potentially case-sensitive; reference RFC 5321 §2.4, explain the pragmatic
rationale (compatibility and normalization benefits) and warn about the small
risk of mismatch with servers that honor local-part case, and mark this as an
explicit design decision tied to the "Validation" and "Canonical form"
statements.
- Line 550: The markdown has a stray backtick before the bold marker in the open
question so "`**displayValue`" renders as a code span instead of bold+code for
the field names; fix the sentence to render both field names as bold+code (e.g.,
**`displayValue`** and **`url`**) and ensure the surrounding text references the
correct symbols (`records.texts`, `normalizationMetadata`, and `normalize=true`)
so the question reads clearly about whether these enrichment fields should be in
records.texts (when normalize=true) or only in normalizationMetadata.
- Around line 513-516: The default for the new normalization feature is unsafe —
change the `normalize` parameter default from true to false in the API
docs/table and update the GET /records/:name implementation to respect that
default (ensure code paths in the request handler that reference `normalize` and
`normalizationMetadata` treat absent/undefined as false), keep
`normalizationMetadata` behavior gated on `normalize===true`, and add a short
migration note/changelog entry describing how consumers can opt in (pass
normalize=true or use the new versioned endpoint) and that NormalizedRecordSet
will only appear when normalize is true.
- Line 415: The current validation pattern `^[a-zA-Z0-9_]{5,32}$` (the
"extracted username must match" rule) allows leading, trailing and consecutive
underscores; update that validation to disallow usernames that start or end with
'_' and to forbid consecutive '__' while still enforcing the 5–32 length and
allowed characters. Replace the existing regex/pattern with one that (1) asserts
total length 5–32, (2) rejects any occurrence of "__", (3) rejects a leading "_"
and (4) rejects a trailing "_" so usernames like `_alice`, `alice_`, and
`alice__bob` are invalid. Ensure the new pattern is used wherever the original
`^[a-zA-Z0-9_]{5,32}$` rule is referenced.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 463048d and c3c1572.

📒 Files selected for processing (1)
  • .cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md

Comment on lines +150 to +165
function normalizeRecord(
rawKey: string,
rawValue: string | null,
defs: TextRecordNormalizationDefs,
): IndividualRecordNormalizationResult
```

Logic:

- Look up `rawKey` in `defs` (via `byNormalizedKey` or `byUnnormalizedKey`).
- If found: determine `keyResult` (AlreadyNormalized if rawKey === normalizedKey, else Normalized).
- If `rawValue` is null: `valueResult` is Unnormalizable with reason `"no value set"`.
- If `rawValue` is a string: run validate + normalize on rawValue.
- If succeeded: `valueResult` is `AlreadyNormalized` iff `normalizedValue === rawValue` (validate + normalize produced a value identical to the original rawValue); otherwise `valueResult` is `Normalized` (normalizedValue differs from rawValue).
- If failed: `valueResult` is `Unnormalizable` with the reason from the failing step.
- If not found: `keyResult` is Unrecognized; `valueResult` is Unrecognized. At the set level, the first record with a given unrecognized `rawKey` is written into `normalizedRecords` as `UnrecognizedKeyAndValue`; any subsequent record with the same `rawKey` gets `op: "DuplicateNormalizedKey"` and goes into `unnormalizedRecords` (consistent with how recognized-key duplicates are handled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Set-level deduplication logic is misplaced in normalizeRecord's spec.

Line 165 describes set-level behavior ("At the set level, the first record with a given unrecognized rawKey is written into normalizedRecords…") inside the logic section of normalizeRecord, but normalizeRecord is a pure per-record function whose return type is IndividualRecordNormalizationResult — it has no access to the accumulated set state and cannot determine whether a given rawKey is a "first occurrence." This rule is implementable only in normalizeRecordSet, where the full candidate list is available.

The sentence should be moved in its entirety to Section 1.5 (normalizeRecordSet algorithm), alongside the existing two-pass logic, so that implementors of normalizeRecord are not confused by behavior that the function's signature cannot support.

📝 Suggested placement fix
 - If not found: `keyResult` is Unrecognized; `valueResult` is Unrecognized.
-At the set level, the first record with a given unrecognized `rawKey` is written
-into `normalizedRecords` as `UnrecognizedKeyAndValue`; any subsequent record with
-the same `rawKey` gets `op: "DuplicateNormalizedKey"` and goes into
-`unnormalizedRecords` (consistent with how recognized-key duplicates are handled).

Then, in §1.5 after the two-pass algorithm (e.g. after line 242):

+**Unrecognized-key deduplication**: the first record with a given unrecognized
+`rawKey` is written into `normalizedRecords` as `UnrecognizedKeyAndValue`; any
+subsequent record with the same `rawKey` gets `op: "DuplicateNormalizedKey"` and
+goes into `unnormalizedRecords` (consistent with how recognized-key duplicates
+are handled).
🧰 Tools
🪛 LanguageTool

[style] ~162-~162: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...zable with reason "no value set". - If rawValue is a string: run validate + ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~165-~165: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ith the reason from the failing step. - If not found: keyResult is Unrecognized;...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.cursor/plans/auto-cleanup_architecture_plan_50f1bdfb.plan.md around lines
150 - 165, The spec incorrectly places set-level deduplication behavior inside
the per-record function description for normalizeRecord (which returns
IndividualRecordNormalizationResult and has no access to set state); remove the
paragraph beginning "At the set level, the first record with a given
unrecognized `rawKey`..." from the normalizeRecord logic and instead insert that
entire sentence into the normalizeRecordSet section (alongside the existing
two-pass algorithm) so that deduplication is described where the full candidate
list and set-level decisions are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants