Skip to content

Add latest versions of scripts#3

Open
adihebbalae wants to merge 26 commits intoUTAustin-SwarmLab:mainfrom
adihebbalae:main
Open

Add latest versions of scripts#3
adihebbalae wants to merge 26 commits intoUTAustin-SwarmLab:mainfrom
adihebbalae:main

Conversation

@adihebbalae
Copy link
Collaborator

No description provided.

adihebbalae and others added 26 commits February 23, 2026 16:01
Bug fixes:
- Fix .r13.r13 double-append in video_paths
- Fix video_paths to use MP4 slot-grouped format (/mp4s/{date}/{hour}/{slot}/)
- Fix counting/best_camera/spatial/summary having too many video_paths
  (was globbing entire hour directory, now uses clip_files from debug_info)
- Fix best_camera using nonexistent event.clip_file (now uses event.video_file)

Temporal quality:
- Selection now prefers strong > medium > weak connection strength
- Priority: strong+MEVID > strong > medium > any (score-sorted)

Data files added:
- slot_index.json, canonical_slots.json, geom_slot_index.json
- person_database.json, person_database_yolo.json
- mevid_supported_slots.json, annotated_activity_slots.txt
- These are required by the v10 pipeline at runtime

New:
- render_question_validation.py (validation video renderer)
- Updated run.sh to use v10 pipeline
Naturalize pipeline overhaul (Session 56):
- Refactored naturalize.py (removed two-pass mode, -292 LOC)
  * Deleted _naturalize_two_pass(), _grammar_check_one() functions
  * Deleted SYSTEM_PROMPT_NATURALIZE, GRAMMAR_CHECKER_PROMPT
  * Deleted --two-pass CLI argument and dual temperature constants
  * Simplified _naturalize_question() with labeled plaintext input format
  * Consolidated _short_activity_label() from 25 to 3 lines
  * Removed redundant article-agreement fixes, delegated to GPT
- Applied HoE feedback:
  * Kept JSON output via response_format=json_object (safer than plaintext splitting)
  * Added 4 explicit DO NOT constraints to prevent semantic drift
  * Lowered temperature from 0.4 to 0.3 for naturalization stability

Critical bug fix (person_descriptions.py):
- Fixed MEVID slot format mismatch (HH-MM vs HH-MM-SS)
  * Added _resolve_all_mevid_slots() and _resolve_mevid_slot() to bridge formats
  * Updated is_mevid_supported(), get_mevid_persons_for_slot(),
    get_mevid_persons_with_cameras() to merge across raw slot variants
  * MEVID lookup now works with both canonical (HH-MM) and raw (HH-MM-SS) formats

Test run (2018-03-07.11-00.school):
- Generated 12 questions (2 temporal, 2 perception, 3 spatial, 1 summarization,
  1 counting, 3 best_camera)
- Naturalized 12/12 (0 failures, 10,572 tokens, temp=0.3)
- Rendered 12 validation videos (16.6 MB)
- 27 MEVID entities resolved, cross-validated across 11 cameras

Other updates in v10 scripts:
- render_question_validation.py: major enhancements
- generate_temporal.py: additional event selection logic
- generate_spatial.py, generate_numerical.py: refinements
- Added review_qa.py for QA visualization

Fixed natrualization prompt
- Add MEVA_ENTITY_DESC_DIR env var (default: /nas/mars/dataset/MEVA/entity_descriptions)
  so geom-color descriptions load correctly regardless of MEVA_OUTPUT_DIR
- Fix run_pipeline.py: desc_path (Step 3.5) was hardcoded to /home/ah66742/data/
- Fix person_descriptions.py: _GEOM_DESC_DIR now uses _ENTITY_DESC_DIR not _OUTPUT
- Without this fix: all entities fall back to 'someone' -> 0 temporal/spatial/ordering questions

Other path fixes from session 57 (collaborator onboarding):
- All scripts use _REPO_DATA for read-only data files (slot_index, person_db, etc.)
- All scripts use MEVA_OUTPUT_DIR for writable output dirs
- run.sh and QUICKSTART.md updated with correct invocation
- batch_extract_all_slots.py: SLOT_INDEX_PATH, OUTPUT_DIR
- build_geom_slot_index.py: SLOT_INDEX_PATH, OUTPUT_PATH
- review_qa.py: QA_DIR

Verified with strace: pipeline now has zero runtime reads from /home/ah66742/data
batch_extract_all_slots.py:
  - SLOT_INDEX_PATH: _REPO_DATA/geom_slot_index.json
  - OUTPUT_DIR: MEVA_ENTITY_DESC_DIR env var (default: /nas/mars/dataset/MEVA/entity_descriptions)
  - LOG_DIR: _OUTPUT/extraction_logs
  - EXTRACTION_SCRIPT: repo-relative (scripts/v10/extract_entity_descriptions.py)

build_geom_slot_index.py:
  - SLOT_INDEX_PATH/OUTPUT_PATH: _REPO_DATA relative
  - EXTRACTION_SCRIPT: repo-relative

review_qa.py:
  - QA_DIR/VIDEO_DIR/AUDIT_DIR: _OUTPUT (MEVA_OUTPUT_DIR env var)

Verified: zero Path('/home/ah66742/') code-level references remain in v10/
export_to_multicam_format.py:
  - OUTPUT_DIR now uses MEVA_MULTICAM_OUT env var
  - Default: /nas/neurosymbolic/multi-cam-dataset/meva/data/qa_pairs
  - Was hardcoded to .../meva/qa_pairs (wrong path)

run.sh:
  - Fix RAW_JSON path: remove extra $SLOT/ subdirectory
  - Was: $OUTPUT_DIR/qa_pairs/$SLOT/$SLOT.final.raw.json
  - Now: $OUTPUT_DIR/qa_pairs/$SLOT.final.raw.json

Verified: full 3-step pipeline (generate + naturalize + export) with
MEVA_OUTPUT_DIR=/nas/neurosymbolic/multi-cam-dataset/meva/data
writes all output to NAS, zero reads from ~/data
Wide-field outdoor cameras (G328, G336, G339, G424, G638, G639) annotate
people that appear much smaller in frame (~6k-11k px²) compared to close-up
entrance cameras (G419, G420, G421) where people appear at ~100k px².

MIN_BBOX_AREA = 41472 (2% of 1920x1080) was silently dropping every single
actor on KRTD cameras, producing 0 spatial questions on all slots.

Fix: add MIN_BBOX_AREA_KRTD = 2048 (0.1% of 1920x1080, ~46x46 px) and
apply it when cameras[cam_id].has_krtd is True. Non-KRTD cameras keep
the original 41472 threshold.

Verified: 2018-03-13.16-20.school now yields 455 entities with 3D positions
and 3 spatial questions (was 0 before).
- Add build_mevid_slots.py: standalone script to rebuild mevid_supported_slots.json
  from authoritative MEVID annotation sources (train/test_name.txt + video URLs)
- Rebuild mevid_supported_slots.json: 80 cross-camera persons across 168 slots
  (was 23 persons / 887 slots from stale source)
- Add MEVA_MEVID_DATA_DIR / MEVA_MEVID_URLS env var overrides in both
  build_mevid_slots.py and utils/mevid.py (NAS defaults preserved)
- Remove all /home/ah66742 references from live code (docstrings, run.sh)
…ertical splits

- Add analyze_crops_segformer() using mattmdjaga/segformer_b2_clothes model
  - 18 semantic body-part classes (hair, upper-clothes, pants/skirt/dress, shoes, etc.)
  - Per-region HSV color extraction instead of crude 10-45%/55-90% vertical slices
  - Detects accessories: hat, sunglasses, bag, scarf
  - Distinguishes pants vs skirt vs dress
- Update build_description() for richer output:
  'a person with black hair, wearing a blue top and black pants, black shoes'
- Add --method CLI flag: segformer (default), yolo, color-only
- Fix MIN_BBOX_WIDTH: 144 → 48 (person bboxes are tall & narrow, median W/H=0.54)
  - Actor yield: 70 → 394 on test slot
- Fix OUTPUT_DIR to match person_descriptions.py reader default
- Backward compatible: still outputs upper_color, lower_color, description keys
- Tested: 394 actors, 145 unique descriptions (36.8%) on 2018-03-11.11-25.school
- Introduced `extract_slot_lists.py` to extract slot lists from `slot_index.json` and save them to a text file.
- Added `run_all_slots.sh` script to automate the processing of slots listed in `slot_list_from_slot_index.txt`.
- Created `slot_list_from_slot_index.txt` containing a comprehensive list of slots for processing.
- Updated `.gitignore` to ensure proper handling of the `data/` directory.
- Introduced timestamped log files for `1_run_all_slots.sh`, `2_run_naturalize_all_slots.sh`, and `3_convert_naturalized_to_standard.sh`.
- Created a dedicated logs directory within the MEVA output directory to store logs for each script execution.
Phase 1 (Foundation):
- Issue 1: Color-match MEVID assignment scoring in person_descriptions.py
- Issue 2: Change default extraction method to SegFormer

Phase 2 (Bug Fixes):
- Issue 5: First-instance pre-filter in temporal + event ordering
- Issue 6: Cross-camera dedup in event ordering + 3D matching in counting
- Issue 8: Same-person spatial filter via cluster/IoU/proximity
- Issue 9: Deterministic _build_reasoning() for counting questions
- Issue 10: Key frames in counting verification output
- Issue 11: Annotation verification check in validator (Check 8)

Phase 3 (Policy Enforcement):
- Issue 12: Remove camera IDs/timestamps from restricted categories,
  add forbidden_pattern validator (Check 7), update naturalize hints
- Issue 4: Camera leak removal (covered by Issue 12)
- Issue 3: Rename TARGET_* to MAX_* soft ceilings in run_pipeline.py
- Issue 7: Perception enrichment with entity descriptions + uniqueness

10 files modified across the V10 pipeline. All imports validated,
integration test passes on 2018-03-11.11-25.school (13 Qs, 12.4s).
forbidden_pattern and annotation_verify checks both PASS.
- extract_entity_descriptions: two-tier approach (SegFormer ≥144px, HSV color-only ≥40px)
  Coverage: 674→3256 actors on test slot, 0 'a person' fallbacks
- activity_hierarchy: fix a/an article before vowels (a object → an object)
- generate_perception: exclude all cameras with same activity from distractors
- validate_qa: minor adjustments to check weights

Test results (4 slots):
  admin:    100/100
  hospital:  85/100
  school-1:  89/100
  school-2:  51/100
…ng reasoning, fix validator FPs

- Comment out perception category (import, generation call, MAX_PERCEPTION)
- Rewrite generate_spatial.py: closest-approach distance across multi-frame
  sampling with 3D projection via KRTD. Categories: near/moderate/far/cross_paths
- Fix counting reasoning in naturalize.py: pass original reasoning as GPT
  context, post-validate that correct count appears in naturalized text
- Fix validator false positives: raise near-dup threshold 80->90%, exclude
  structural template categories, add clothing-context negative lookahead
  for generic_description regex
- batch_extract: slot name normalization, dedup, fixed entity count check

Test scores: admin=100, hospital=92, school-10-15=84, school-11-25=86 (up from 51)
…mLab#2) + confidence-based filtering (#5)

- UTAustin-SwarmLab#2 Texture/pattern: _detect_texture() analyzes SegFormer mask regions for
  solid vs patterned/striped via HSV hue/saturation variance + directional
  gradient analysis. Adds light/dark brightness qualifier for chromatic colors.
  Skips texture on very dark clothing (V<70) to avoid compression noise FP.
  build_description() integrates: 'a patterned dark blue top', 'light teal pants'
  Excludes redundant qualifiers on inherently dark/light colors (black, white, etc)

- #5 Multi-crop consistency: _majority_vote_with_confidence() returns agreement
  score per attribute. Low-confidence (<40%) colors dropped from description
  rather than stating wrong color. Crop selection prefers middle-of-track frames
  (inner 80%) for more stable pose/lighting.

- New output fields: upper_texture, lower_texture, upper_brightness,
  lower_brightness, confidence dict (per-attribute agreement 0-1)

Test scores: admin=100, hospital=92, school-10-15=90(+6), school-11-25=88(+2)
…biguity metric

- person_descriptions.py: Add cross-camera merge (majority-vote attributes),
  within-camera height differentiation, VLM description loader, 68b-aware
  _build_description (texture/brightness qualifiers), color consolidation
- run_pipeline.py: Add Step 4b (cross-camera clustering), Step 13 (ambiguity
  analysis with pct_unique), fix imports, add VLM stats tracking
- vlm_describe_entities.py: NEW - VLM captioning script using InternVL2.5-8B
  via vLLM for entity description enrichment
- activity_hierarchy: 28 humanization overrides for vague/vehicle activities
- generate_temporal: vehicle-as-person fix, camera overlap dedup
- generate_numerical: overlap-aware dedup (8s window for adjacent cams)
- naturalize: vehicle entity preservation in preprocess + GPT prompt
- person_descriptions: unknown texture exclusion, article agreement (a/an),
  clean_geom_description, revert color consolidation (keep specific colors)
- utils/camera_overlap.py: new KRTD-based overlap detection module
…ch infrastructure

TEMPORAL (generate_temporal.py):
- Rewrote to V10: camera-proximity-based pair selection (adjacent > same-site)
- All questions use 'which occurred first?' format (removed before/after)
- Uniqueness gate: rejects pairs with indistinguishable entity descriptions
- Same-area relaxation for 2-camera sites (admin)
- First-instance filter aligned to 30s threshold
- Removed dead code (.bak deleted)

SPATIAL (generate_spatial.py):
- Closest-approach returns full distance trajectory (5-tuple)
- True crossing detection: far→close→far pattern (>5m → ≤2m → >5m) + bbox position swap
- Options A/D clearly differentiated (approach/stay near vs walk past/swap positions)
- SAMPLE_EVERY reduced 30→15 for finer spatial resolution

EVENT ORDERING (generate_event_ordering.py):
- Combinatorial chain building
- First-instance filtering with 30s threshold

COUNTING (generate_numerical.py):
- Entity-identity-aware dedup

ENTITY DESCRIPTIONS (person_descriptions.py, extract_entity_descriptions.py):
- 5-tier description priority: MEVID-GPT > SegFormer/geom > VLM > MEVID-YOLO > fallback
- Cross-camera cluster differentiation

NEW FILES:
- batch_run_all_slots.py: multiprocessing batch runner (8 workers)
- batch_verify.py: automated quality validation (structural, temporal, description checks)
- reextract_missing_cameras.py: re-extract entity descriptions for missing cameras
- utils/camera_proximity.py: camera proximity tiers (same_area/adjacent/same_site)

CLEANUP:
- Removed hardcoded /home/ah66742 paths (reextract uses shutil.which, run.sh uses $HOME)
- Added *.bak to .gitignore
- Deleted .bak backup files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants