Skip to content

Fix training bugs, improve API, retrain model v3, and expand test coverage#12

Merged
mostafa merged 8 commits intomainfrom
improvements
Feb 21, 2026
Merged

Fix training bugs, improve API, retrain model v3, and expand test coverage#12
mostafa merged 8 commits intomainfrom
improvements

Conversation

@mostafa
Copy link
Member

@mostafa mostafa commented Feb 21, 2026

Ticket(s)

Description

This PR addresses multiple bugs, improves the API, retrains model v3 with a deterministic tokenizer, and expands test coverage.

Bug fixes

  • Fix broken evaluation in train.py: np.argmax(y_pred, axis=1) on a single-neuron sigmoid output (shape [N, 1]) always returned 0, making all post-training metrics meaningless. Replaced with threshold-based classification.
  • Fix variable shadowing in train_v3.py: Local f1_score variable shadowed the sklearn import.
  • Fix non-deterministic tokenizer vocabulary: SQLTokenizer.fit_on_texts() used set() then list(), producing different token orderings across runs. Now sorts tokens before slicing.
  • Remove unused numpy import from train.py.

Model retraining

  • Retrained CNN-BiLSTM model v3 with the deterministic tokenizer.
  • Cross-validation results: 99% accuracy, 100% precision, 99% recall, 99% F1.
  • Major improvements over previous model:
    • UNION injections: 0.99 confidence (was 0.01)
    • pg_sleep injections: 0.85 confidence (was 0.02)
    • Legitimate WHERE id=10000: 0.001 confidence (was 0.73 — false positive risk)
  • Known limitation: or 1=1; with trailing semicolon is still a false negative (needs dataset enrichment).

API improvements

  • Load vocab from sql_tokenizer_vocab.json instead of re-fitting from the full CSV dataset on every startup (saves memory and startup time).
  • Replace print() with Python logging module.
  • Add /health endpoint for container health checks.
  • Sanitize error responses to avoid leaking internal details.
  • Update Dockerfile: Pin base image to tensorflow/tensorflow:2.16.1, only copy model v3.

Version alignment

  • Align API Python version to ^3.12 and TensorFlow to ^2.16.1 to match training.
  • Remove redundant training/requirements.txt (outdated, conflicted with pyproject.toml).

CI/CD

  • Update actions/checkout v3→v4, actions/setup-python v2→v5.
  • Add separate lint step for API code.
  • Add API tests to CI pipeline.

New tests

  • Flask test client tests for the API: health endpoint, prediction with SQLi/legitimate/empty queries, missing body, error sanitization.
  • Updated model v3 test expectations to match retrained model.

Related PRs

Development Checklist

  • I have added a descriptive title to this PR.
  • I have performed a self-review of my own code.
  • I have added tests for my changes.

Legal Checklist

  • I have read and agreed to the LICENSE (required).

- Update actions/checkout from v3 to v4
- Update actions/setup-python from v2 to v5
- Add separate lint step for API code
- Pin Docker base image to tensorflow/tensorflow:2.16.1 for reproducibility
- train.py: Replace np.argmax(y_pred, axis=1) with threshold-based
  classification for single-neuron sigmoid output. The argmax on a
  shape [N, 1] tensor always returned 0, making all post-training
  metrics meaningless.
- train_v3.py: Rename local f1_score/f2_score variables to f1/f2
  to avoid shadowing the sklearn f1_score import.
…ements.txt

- Sort tokens before slicing in SQLTokenizer.fit_on_texts() to ensure
  deterministic vocabulary ordering across training runs
- Align API Python version to ^3.12 and TensorFlow to ^2.16.1 to
  match training environment and avoid model loading incompatibilities
- Remove training/requirements.txt which was redundant with pyproject.toml
  and contained outdated versions (e.g. black instead of ruff)
…ize errors

- Load tokenizer vocabulary from sql_tokenizer_vocab.json instead of
  re-fitting from the full CSV dataset on every startup, saving memory
  and startup time
- Replace print() with Python logging module
- Add /health endpoint for container health checks
- Sanitize error responses to avoid leaking internal details
- Update Dockerfile to only copy model v3 (the active model) and
  remove unnecessary dataset copy
- Add Flask test client tests covering: health endpoint, prediction
  with SQLi queries, legitimate queries, empty queries, missing body,
  missing query key, and error message sanitization
- Update CI workflow to install API deps and run API tests
Model v3 (CNN-BiLSTM) underperforms v1/v2 (LSTM) on key samples:
classic SQLi patterns score below the 0.8 threshold while legitimate
queries score dangerously close to it. The model needs retraining
with the now-deterministic tokenizer.
@mostafa mostafa self-assigned this Feb 21, 2026
…tions

- Retrained CNN-BiLSTM model with sorted tokenizer vocabulary,
  achieving 99% accuracy, 100% precision, 99% recall, 99% F1
- Major improvements over previous model:
  * UNION injections now correctly detected (0.99 vs 0.01 before)
  * pg_sleep injections detected (0.85 vs 0.02 before)
  * Legitimate "WHERE id=10000" no longer a false positive (0.001 vs 0.73)
- Save deterministic vocab from training script automatically
- Remove unused numpy import from train.py
- Known limitation: "or 1=1;" with trailing semicolon still a false
  negative, likely needs dataset enrichment
@mostafa mostafa changed the title Improvements Fix training bugs, improve API, retrain model v3, and expand test coverage Feb 21, 2026
pip install flask was installing into the system Python, but the API
tests run via poetry run pytest which uses the Poetry virtualenv.
@mostafa mostafa merged commit 2c0b0fe into main Feb 21, 2026
1 check passed
@mostafa mostafa deleted the improvements branch February 21, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant