Fix training bugs, improve API, retrain model v3, and expand test coverage#12
Merged
Fix training bugs, improve API, retrain model v3, and expand test coverage#12
Conversation
- Update actions/checkout from v3 to v4 - Update actions/setup-python from v2 to v5 - Add separate lint step for API code - Pin Docker base image to tensorflow/tensorflow:2.16.1 for reproducibility
- train.py: Replace np.argmax(y_pred, axis=1) with threshold-based classification for single-neuron sigmoid output. The argmax on a shape [N, 1] tensor always returned 0, making all post-training metrics meaningless. - train_v3.py: Rename local f1_score/f2_score variables to f1/f2 to avoid shadowing the sklearn f1_score import.
…ements.txt - Sort tokens before slicing in SQLTokenizer.fit_on_texts() to ensure deterministic vocabulary ordering across training runs - Align API Python version to ^3.12 and TensorFlow to ^2.16.1 to match training environment and avoid model loading incompatibilities - Remove training/requirements.txt which was redundant with pyproject.toml and contained outdated versions (e.g. black instead of ruff)
…ize errors - Load tokenizer vocabulary from sql_tokenizer_vocab.json instead of re-fitting from the full CSV dataset on every startup, saving memory and startup time - Replace print() with Python logging module - Add /health endpoint for container health checks - Sanitize error responses to avoid leaking internal details - Update Dockerfile to only copy model v3 (the active model) and remove unnecessary dataset copy
- Add Flask test client tests covering: health endpoint, prediction with SQLi queries, legitimate queries, empty queries, missing body, missing query key, and error message sanitization - Update CI workflow to install API deps and run API tests
Model v3 (CNN-BiLSTM) underperforms v1/v2 (LSTM) on key samples: classic SQLi patterns score below the 0.8 threshold while legitimate queries score dangerously close to it. The model needs retraining with the now-deterministic tokenizer.
…tions - Retrained CNN-BiLSTM model with sorted tokenizer vocabulary, achieving 99% accuracy, 100% precision, 99% recall, 99% F1 - Major improvements over previous model: * UNION injections now correctly detected (0.99 vs 0.01 before) * pg_sleep injections detected (0.85 vs 0.02 before) * Legitimate "WHERE id=10000" no longer a false positive (0.001 vs 0.73) - Save deterministic vocab from training script automatically - Remove unused numpy import from train.py - Known limitation: "or 1=1;" with trailing semicolon still a false negative, likely needs dataset enrichment
4 tasks
pip install flask was installing into the system Python, but the API tests run via poetry run pytest which uses the Poetry virtualenv.
6037534 to
58a7cf3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket(s)
Description
This PR addresses multiple bugs, improves the API, retrains model v3 with a deterministic tokenizer, and expands test coverage.
Bug fixes
train.py:np.argmax(y_pred, axis=1)on a single-neuron sigmoid output (shape[N, 1]) always returned 0, making all post-training metrics meaningless. Replaced with threshold-based classification.train_v3.py: Localf1_scorevariable shadowed the sklearn import.SQLTokenizer.fit_on_texts()usedset()thenlist(), producing different token orderings across runs. Now sorts tokens before slicing.numpyimport fromtrain.py.Model retraining
WHERE id=10000: 0.001 confidence (was 0.73 — false positive risk)or 1=1;with trailing semicolon is still a false negative (needs dataset enrichment).API improvements
sql_tokenizer_vocab.jsoninstead of re-fitting from the full CSV dataset on every startup (saves memory and startup time).print()with Pythonloggingmodule./healthendpoint for container health checks.tensorflow/tensorflow:2.16.1, only copy model v3.Version alignment
^3.12and TensorFlow to^2.16.1to match training.training/requirements.txt(outdated, conflicted withpyproject.toml).CI/CD
actions/checkoutv3→v4,actions/setup-pythonv2→v5.New tests
Related PRs
Development Checklist
Legal Checklist