Fix training bugs, improve API, retrain model v3, and expand test coverage by mostafa · Pull Request #12 · gatewayd-io/DeepSQLi

mostafa · 2026-02-21T22:13:10Z

Ticket(s)

Related: Investigate model training and optimization #1 (model training and optimization)
Related: Optimize preprocessing steps #4 (optimize preprocessing steps)
Related: Feature engineering #5 (feature engineering)
Related: Upgrade model files to Keras v3 #9 (upgrade model files to Keras v3)

Description

This PR addresses multiple bugs, improves the API, retrains model v3 with a deterministic tokenizer, and expands test coverage.

Bug fixes

Fix broken evaluation in train.py: np.argmax(y_pred, axis=1) on a single-neuron sigmoid output (shape [N, 1]) always returned 0, making all post-training metrics meaningless. Replaced with threshold-based classification.
Fix variable shadowing in train_v3.py: Local f1_score variable shadowed the sklearn import.
Fix non-deterministic tokenizer vocabulary: SQLTokenizer.fit_on_texts() used set() then list(), producing different token orderings across runs. Now sorts tokens before slicing.
Remove unused numpy import from train.py.

Model retraining

Retrained CNN-BiLSTM model v3 with the deterministic tokenizer.
Cross-validation results: 99% accuracy, 100% precision, 99% recall, 99% F1.
Major improvements over previous model:
- UNION injections: 0.99 confidence (was 0.01)
- pg_sleep injections: 0.85 confidence (was 0.02)
- Legitimate WHERE id=10000: 0.001 confidence (was 0.73 — false positive risk)
Known limitation: or 1=1; with trailing semicolon is still a false negative (needs dataset enrichment).

API improvements

Load vocab from sql_tokenizer_vocab.json instead of re-fitting from the full CSV dataset on every startup (saves memory and startup time).
Replace print() with Python logging module.
Add /health endpoint for container health checks.
Sanitize error responses to avoid leaking internal details.
Update Dockerfile: Pin base image to tensorflow/tensorflow:2.16.1, only copy model v3.

Version alignment

Align API Python version to ^3.12 and TensorFlow to ^2.16.1 to match training.
Remove redundant training/requirements.txt (outdated, conflicted with pyproject.toml).

CI/CD

Update actions/checkout v3→v4, actions/setup-python v2→v5.
Add separate lint step for API code.
Add API tests to CI pipeline.

New tests

Flask test client tests for the API: health endpoint, prediction with SQLi/legitimate/empty queries, missing body, error sanitization.
Updated model v3 test expectations to match retrained model.

Related PRs

Fix bugs, improve error handling, and expand test coverage gatewayd-plugin-sql-ids-ips#29

Development Checklist

I have added a descriptive title to this PR.
I have performed a self-review of my own code.
I have added tests for my changes.

Legal Checklist

I have read and agreed to the LICENSE (required).

- Update actions/checkout from v3 to v4 - Update actions/setup-python from v2 to v5 - Add separate lint step for API code - Pin Docker base image to tensorflow/tensorflow:2.16.1 for reproducibility

- train.py: Replace np.argmax(y_pred, axis=1) with threshold-based classification for single-neuron sigmoid output. The argmax on a shape [N, 1] tensor always returned 0, making all post-training metrics meaningless. - train_v3.py: Rename local f1_score/f2_score variables to f1/f2 to avoid shadowing the sklearn f1_score import.

…ements.txt - Sort tokens before slicing in SQLTokenizer.fit_on_texts() to ensure deterministic vocabulary ordering across training runs - Align API Python version to ^3.12 and TensorFlow to ^2.16.1 to match training environment and avoid model loading incompatibilities - Remove training/requirements.txt which was redundant with pyproject.toml and contained outdated versions (e.g. black instead of ruff)

…ize errors - Load tokenizer vocabulary from sql_tokenizer_vocab.json instead of re-fitting from the full CSV dataset on every startup, saving memory and startup time - Replace print() with Python logging module - Add /health endpoint for container health checks - Sanitize error responses to avoid leaking internal details - Update Dockerfile to only copy model v3 (the active model) and remove unnecessary dataset copy

- Add Flask test client tests covering: health endpoint, prediction with SQLi queries, legitimate queries, empty queries, missing body, missing query key, and error message sanitization - Update CI workflow to install API deps and run API tests

Model v3 (CNN-BiLSTM) underperforms v1/v2 (LSTM) on key samples: classic SQLi patterns score below the 0.8 threshold while legitimate queries score dangerously close to it. The model needs retraining with the now-deterministic tokenizer.

…tions - Retrained CNN-BiLSTM model with sorted tokenizer vocabulary, achieving 99% accuracy, 100% precision, 99% recall, 99% F1 - Major improvements over previous model: * UNION injections now correctly detected (0.99 vs 0.01 before) * pg_sleep injections detected (0.85 vs 0.02 before) * Legitimate "WHERE id=10000" no longer a false positive (0.001 vs 0.73) - Save deterministic vocab from training script automatically - Remove unused numpy import from train.py - Known limitation: "or 1=1;" with trailing semicolon still a false negative, likely needs dataset enrichment

pip install flask was installing into the system Python, but the API tests run via poetry run pytest which uses the Poetry virtualenv.

mostafa added 6 commits February 21, 2026 23:02

Update CI action versions, add API linting, and pin Docker base image

56dcf52

- Update actions/checkout from v3 to v4 - Update actions/setup-python from v2 to v5 - Add separate lint step for API code - Pin Docker base image to tensorflow/tensorflow:2.16.1 for reproducibility

Add API tests and include them in CI

ddcca27

- Add Flask test client tests covering: health endpoint, prediction with SQLi queries, legitimate queries, empty queries, missing body, missing query key, and error message sanitization - Update CI workflow to install API deps and run API tests

mostafa self-assigned this Feb 21, 2026

mostafa changed the title ~~Improvements~~ Fix training bugs, improve API, retrain model v3, and expand test coverage Feb 21, 2026

mostafa mentioned this pull request Feb 21, 2026

Fix bugs, improve error handling, and expand test coverage gatewayd-io/gatewayd-plugin-sql-ids-ips#29

Merged

4 tasks

Fix CI: install Flask into Poetry venv for API tests

58a7cf3

pip install flask was installing into the system Python, but the API tests run via poetry run pytest which uses the Poetry virtualenv.

mostafa force-pushed the improvements branch from 6037534 to 58a7cf3 Compare February 21, 2026 22:59

mostafa merged commit 2c0b0fe into main Feb 21, 2026
1 check passed

mostafa deleted the improvements branch February 21, 2026 23:02

mostafa mentioned this pull request Feb 21, 2026

Update plugin docs for sql-ids-ips, cache, and js plugins gatewayd-io/docs#65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix training bugs, improve API, retrain model v3, and expand test coverage#12

Fix training bugs, improve API, retrain model v3, and expand test coverage#12
mostafa merged 8 commits intomainfrom
improvements

mostafa commented Feb 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mostafa commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket(s)

Description

Bug fixes

Model retraining

API improvements

Version alignment

CI/CD

New tests

Related PRs

Development Checklist

Legal Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mostafa commented Feb 21, 2026 •

edited

Loading