Wordlists for web fuzzing: curated micro, categorized short/long, and combined final lists.
onelistforallmicro.txt: curated list (maintained manually).dict/<category>_short.txt: per-category curated wordlist (small/quality sources).dict/<category>_long.txt: per-category comprehensive wordlist (all sources).onelistforall.txt: micro + all*_short.txt, deduplicated.onelistforall_big.txt: everything combined, deduplicated.
- Sync (
update): Clones ~36 wordlist repos intosources/. - Classify (
classify): Walks every.txtfile insources/, classifies each into categories using path structure, filename keywords, and content sampling. Producesclassification_index.json. - Build (
build --all-categories): For each category, merges classified source files intodict/{cat}_short.txtanddict/{cat}_long.txtwith filtering and deduplication. - Assemble (
assemble): Combines micro + all shorts intoonelistforall.txt, and everything intoonelistforall_big.txt. - Package (
package): Creates 7z archives with checksums. - Publish (
publish): Auto-commitsdict/changes and pushes to remote.
- Go 1.22+
git(forupdate)7z(only forpackage)
# Check dependencies
go run ./cmd/olfa check
# List sources and categories
go run ./cmd/olfa list
go run ./cmd/olfa list --categories
# Sync source repos
go run ./cmd/olfa update
go run ./cmd/olfa update --source SecLists
# Classify source files
go run ./cmd/olfa classify
go run ./cmd/olfa classify --format json
# Build all category wordlists
go run ./cmd/olfa build --all-categories
go run ./cmd/olfa build --category wordpress --variant short
# Assemble final combined lists
go run ./cmd/olfa assemble
# Validate and package
go run ./cmd/olfa validate-categories
go run ./cmd/olfa stats
go run ./cmd/olfa package
# Full pipeline (all steps in one command)
go run ./cmd/olfa pipeline
go run ./cmd/olfa pipeline --dry-run
go run ./cmd/olfa pipeline --skip-publish # skip git commit+push
go run ./cmd/olfa pipeline --skip-update # skip git fetch
go run ./cmd/olfa pipeline --commit-msg "my custom message"# Option A: single command
go run ./cmd/olfa pipeline
# Option B: step by step
go run ./cmd/olfa check
go run ./cmd/olfa update
go run ./cmd/olfa classify
go run ./cmd/olfa build --all-categories
go run ./cmd/olfa assemble
go run ./cmd/olfa validate-categories
go run ./cmd/olfa package
git add dict/ && git commit -m "chore: update dict/ wordlists" && git pushThe repo includes ~418 category wordlists in dict/ (~930 MB). However, 6 files exceed GitHub's 100 MB file size limit and are not included in the repository:
| File | Size |
|---|---|
dict/subdomains_long.txt |
493 MB |
dict/passwords_long.txt |
351 MB |
dict/passwords_short.txt |
296 MB |
dict/fuzz_general_long.txt |
178 MB |
dict/directories_long.txt |
153 MB |
dict/dns_long.txt |
112 MB |
To generate them locally, run:
go run ./cmd/olfa pipelineRunning the full pipeline (syncing sources + building all categories) requires ~15 GB of disk space.
Everything is controlled by two files:
configs/pipeline.yml— sources, filters, classification rules, dedup settings, release config.configs/taxonomy.json— category taxonomy (236 categories with aliases).
Edit configs/pipeline.yml and add an entry to the sources array:
{
"name": "my-wordlists",
"repo": "username/repo-name",
"branch": "main",
"paths": ["all"],
"tags": ["directories", "api"],
"priority": "medium"
}| Field | Description |
|---|---|
name |
Unique identifier for the source |
repo |
GitHub owner/repo (cloned via https://github.com/...) |
branch |
Branch to track |
paths |
Directories to scan inside the repo (["all"] = everything) |
tags |
Fallback categories when auto-classification can't determine the category |
priority |
high, medium, or low — high-priority sources go into *_short.txt, all sources go into *_long.txt |
After adding a source, run the pipeline to pull and classify it:
go run ./cmd/olfa pipelineOr sync just the new source:
go run ./cmd/olfa update --source my-wordlistsGlobal filters in pipeline.yml control what lines are kept or dropped:
regex_denylist— lines matching any pattern are removed (e.g., URLs, UUIDs, image extensions)max_line_len— lines longer than this are dropped (default: 100 chars)trim— strip leading/trailing whitespacedrop_empty— remove blank lines
Per-category filters can be set in category_filters (e.g., subdomains only allow valid hostname characters).
Each category produces two wordlists:
*_short.txt— only fromhighpriority sources, or files with fewer than 5000 lines, or filenames containing keywords likecommon,short,top,default.*_long.txt— all sources combined.
Thresholds and keywords are configurable in classification.short_line_threshold, classification.short_keywords, and classification.long_keywords.
Categories are defined in configs/taxonomy.json. Each has a canonical name and aliases. Source files are matched to categories by:
- Explicit path rules (e.g.,
Discovery/DNS/*→ subdomains) - Directory path keywords matched against taxonomy
- Filename keywords matched against taxonomy
- Content sampling with regex patterns (fallback)
- Source-level tags (last resort)
To list all available categories:
go run ./cmd/olfa list --categories
go run ./cmd/olfa list --categories --format json