Skip to content

RSPEED-2465: Add character pattern validation to rlsapi v1 fields#1225

Open
major wants to merge 1 commit intolightspeed-core:mainfrom
major:RSPEED-2465/add-pattern-validation-rlsapi-fields
Open

RSPEED-2465: Add character pattern validation to rlsapi v1 fields#1225
major wants to merge 1 commit intolightspeed-core:mainfrom
major:RSPEED-2465/add-pattern-validation-rlsapi-fields

Conversation

@major
Copy link
Contributor

@major major commented Feb 25, 2026

Description

The os, version, arch, and system_id fields in RlsapiV1SystemInfo (and nevra, version in RlsapiV1CLA) accept arbitrary strings with no character restrictions. These values flow directly into Splunk telemetry via _queue_splunk_event(), so a client could inject control characters, newlines, or HTML/script tags into the pipeline.

This adds Pydantic pattern validators to all six fields, restricting each to the safe character set appropriate for its known value space. Empty strings are still accepted since the fields default to "". Invalid characters now result in a 422 rejection.

Type of change

  • New feature
  • Unit tests improvement

Tools used to create PR

  • Assisted-by: Claude
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

41 new parameterized test cases across TestSystemInfoCharacterValidation and TestCLACharacterValidation:

uv run pytest tests/unit/models/rlsapi/test_requests.py -v

Valid values (RHEL, 9.3, x86_64, ULIDs, UUIDs, NEVRA strings) pass. Invalid values (XSS payloads, control chars, newlines, null bytes, semicolons) are rejected with ValidationError matching "String should match pattern".

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced input validation for system information and package data fields to enforce stricter character constraints and prevent invalid data submission.
  • Tests

    • Added comprehensive validation tests to verify correct handling of system information and package field constraints.
    • Added edge-case tests for input source composition and formatting preservation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 3 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 674d880 and 87fedd1.

📒 Files selected for processing (2)
  • src/models/rlsapi/requests.py
  • tests/unit/models/rlsapi/test_requests.py

Walkthrough

The pull request introduces input validation for API request models by adding regex pattern constants and applying them to specific fields in RlsapiV1SystemInfo and RlsapiV1CLA classes, along with comprehensive test coverage for the validation rules and edge cases.

Changes

Cohort / File(s) Summary
Input Validation Patterns
src/models/rlsapi/requests.py
Added module-level regex pattern constants for safe text, machine IDs, NEVRA, and version strings. Applied patterns to RlsapiV1SystemInfo fields (os, version, arch, system_id) and RlsapiV1CLA fields (nevra, version).
Validation and Edge Case Tests
tests/unit/models/rlsapi/test_requests.py
Added three test classes: TestSystemInfoCharacterValidation and TestCLACharacterValidation for pattern validation testing with parameterized valid/invalid test cases; TestGetInputSourceEdgeCases with a fixture for testing multiline content preservation and source priority ordering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding character pattern validation to rlsapi v1 fields, which matches the changeset's primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tests/unit/models/rlsapi/test_requests.py (2)

576-600: Duplicate make_request fixture — extract to module level.

TestGetInputSourceEdgeCases.make_request_fixture (lines 576–600) is an exact copy of TestGetInputSource.make_request_fixture (lines 320–344), including the inner _RequestBuilder class. Hoist it to a module-level fixture so both test classes share a single definition.

♻️ Extract to module-level fixture and remove both class-level copies
+@pytest.fixture(name="make_request")
+def make_request_fixture() -> Any:
+    """Factory fixture to build requests with specific context values."""
+
+    class _RequestBuilder:  # pylint: disable=too-few-public-methods
+        """Helper to construct requests with variable context."""
+
+        `@staticmethod`
+        def build(
+            question: str = "q",
+            stdin: str = "",
+            attachment: str = "",
+            terminal: str = "",
+        ) -> RlsapiV1InferRequest:
+            """Build an RlsapiV1InferRequest with specified context values."""
+            return RlsapiV1InferRequest(
+                question=question,
+                context=RlsapiV1Context(
+                    stdin=stdin,
+                    attachments=RlsapiV1Attachment(contents=attachment),
+                    terminal=RlsapiV1Terminal(output=terminal),
+                ),
+            )
+
+    return _RequestBuilder
+

 class TestGetInputSource:
     ...
-    `@pytest.fixture`(name="make_request")
-    def make_request_fixture(self) -> Any:
-        ...  # remove

 class TestGetInputSourceEdgeCases:
     ...
-    `@pytest.fixture`(name="make_request")
-    def make_request_fixture(self) -> Any:
-        ...  # remove
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/models/rlsapi/test_requests.py` around lines 576 - 600, The
make_request fixture is duplicated in TestGetInputSourceEdgeCases and
TestGetInputSource; extract the pytest fixture (the function
make_request_fixture that returns the inner _RequestBuilder with its static
build method creating RlsapiV1InferRequest / RlsapiV1Context /
RlsapiV1Attachment / RlsapiV1Terminal) to module level, keep the
`@pytest.fixture`(name="make_request") decorator, and remove both class-level
copies so tests use the single shared module-level make_request fixture; ensure
imports remain valid and tests reference make_request as before.

612-628: result.index(...) position checks are redundant after the exact equality assertion.

Line 621 already asserts result == "Q\n\nS\n\nA\n\nT", which fully encodes both content and order. The subsequent index comparisons (lines 623–627) add no incremental value.

♻️ Remove the redundant position checks
     result = request.get_input_source()
     assert result == "Q\n\nS\n\nA\n\nT"
-    # Verify order by checking positions
-    assert (
-        result.index("Q")
-        < result.index("S")
-        < result.index("A")
-        < result.index("T")
-    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/models/rlsapi/test_requests.py` around lines 612 - 628, The
test_priority_order test contains redundant assertions: after asserting result
== "Q\n\nS\n\nA\n\nT" you should remove the subsequent position checks that
compare result.index("Q") < result.index("S") < result.index("A") <
result.index("T"); keep the exact equality assertion only. Locate
test_priority_order and the use of make_request.build / request.get_input_source
and delete the block performing the index() comparisons so the test remains
concise and non-duplicative.
src/models/rlsapi/requests.py (1)

15-21: _VERSION_PATTERN is identical to _MACHINE_ID_PATTERN — consider consolidating or aliasing.

Both constants resolve to r"^[a-zA-Z0-9._\-]*$". They're semantically distinct, but having two independent string literals with the same value risks them diverging silently in future edits.

♻️ Alias one to the other
 # Machine IDs: alphanumeric, dots, underscores, and hyphens (no spaces).
 _MACHINE_ID_PATTERN = r"^[a-zA-Z0-9._\-]*$"

-# Version strings: alphanumeric, dots, underscores, and hyphens.
-_VERSION_PATTERN = r"^[a-zA-Z0-9._\-]*$"
+# Version strings share the same allowed set as machine IDs.
+_VERSION_PATTERN = _MACHINE_ID_PATTERN
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/rlsapi/requests.py` around lines 15 - 21, Migrate the duplicated
regex by making _VERSION_PATTERN an alias of _MACHINE_ID_PATTERN (or vice-versa)
so the same literal isn't repeated; locate the constants _MACHINE_ID_PATTERN and
_VERSION_PATTERN in src/models/rlsapi/requests.py and replace the second literal
with a reference to the first (e.g., set _VERSION_PATTERN = _MACHINE_ID_PATTERN)
and add a short comment explaining they intentionally share the same pattern to
prevent future divergence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/models/rlsapi/requests.py`:
- Around line 15-21: Migrate the duplicated regex by making _VERSION_PATTERN an
alias of _MACHINE_ID_PATTERN (or vice-versa) so the same literal isn't repeated;
locate the constants _MACHINE_ID_PATTERN and _VERSION_PATTERN in
src/models/rlsapi/requests.py and replace the second literal with a reference to
the first (e.g., set _VERSION_PATTERN = _MACHINE_ID_PATTERN) and add a short
comment explaining they intentionally share the same pattern to prevent future
divergence.

In `@tests/unit/models/rlsapi/test_requests.py`:
- Around line 576-600: The make_request fixture is duplicated in
TestGetInputSourceEdgeCases and TestGetInputSource; extract the pytest fixture
(the function make_request_fixture that returns the inner _RequestBuilder with
its static build method creating RlsapiV1InferRequest / RlsapiV1Context /
RlsapiV1Attachment / RlsapiV1Terminal) to module level, keep the
`@pytest.fixture`(name="make_request") decorator, and remove both class-level
copies so tests use the single shared module-level make_request fixture; ensure
imports remain valid and tests reference make_request as before.
- Around line 612-628: The test_priority_order test contains redundant
assertions: after asserting result == "Q\n\nS\n\nA\n\nT" you should remove the
subsequent position checks that compare result.index("Q") < result.index("S") <
result.index("A") < result.index("T"); keep the exact equality assertion only.
Locate test_priority_order and the use of make_request.build /
request.get_input_source and delete the block performing the index() comparisons
so the test remains concise and non-duplicative.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71b93ea and 674d880.

📒 Files selected for processing (2)
  • src/models/rlsapi/requests.py
  • tests/unit/models/rlsapi/test_requests.py

Add Pydantic pattern validators to RlsapiV1SystemInfo (os, version, arch, system_id) and RlsapiV1CLA (nevra, version) to restrict the character set on fields that flow into Splunk telemetry. Prevents injection of control characters, newlines, and HTML/script tags.

Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the RSPEED-2465/add-pattern-validation-rlsapi-fields branch from 674d880 to 87fedd1 Compare February 25, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant