Skip to content

Fix generate_logformat_regex to escape regex metacharacters in literal format segments#85

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-get-configured-variables
Draft

Fix generate_logformat_regex to escape regex metacharacters in literal format segments#85
Copilot wants to merge 2 commits intomainfrom
copilot/fix-get-configured-variables

Conversation

Copy link

Copilot AI commented Mar 11, 2026

generate_logformat_regex was not escaping regex metacharacters in the literal parts of a log format string. For the audit log format type=<Type> msg=audit(<Time>): <Content>, the unescaped ( and ) flanking <Time> acted as regex group delimiters, causing Time to capture the surrounding parentheses:

# Before fix — Time captures parentheses, breaking timestamp parsing
regex.match("type=LOGIN msg=audit(1642723741.076:377): pid=10125 ...")
# → Time = "(1642723741.076:377)"  ← invalid for int(float(...))

# After fix — Time captures the bare timestamp
# → Time = "1642723741.076:377"   ← parses correctly

_extract_timestamp in CoreDetector.run does int(float(time.split(":")[0])), which raised ValueError on the leading (, causing detection to fail for every log entry processed through a detector with this format — most visibly for type=LOGIN (the third distinct line in audit.log).

Changes

  • utils/log_format_utils.pygenerate_logformat_regex: replace the bare re.sub(r' +', ...) substitution with splitting on whitespace runs, applying re.escape() to each part, and rejoining with \s+. This correctly escapes all regex metacharacters while preserving multi-space-to-\s+ semantics.

  • tests/test_utils/test_log_format_utils.py — Add TestGenerateLogformatRegex covering: basic format parsing, parentheses-in-literal escaping across all three audit log entry types (including LOGIN), and verifying the extracted timestamp is parseable by _extract_timestamp.

Original prompt

This section details on the original issue you should resolve

<issue_title>get_configured_variables does not detect type=login (third line in audit.log)</issue_title>
<issue_description>renaming to LOGIN1 for example makes it work.</issue_description>

Comments on the Issue (you are @copilot in this section)

@viktorbeck98 Any idea what the issue is? I'll assign this to copilot. I would like to see if it is able to find the cause.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…teral parts

Co-authored-by: viktorbeck98 <95294548+viktorbeck98@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix get_configured_variables detection for type=login Fix generate_logformat_regex to escape regex metacharacters in literal format segments Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_configured_variables does not detect type=login (third line in audit.log)

2 participants