Skip to content

fix: Docker container fails to start with misleading OOM error#22

Merged
prakersh merged 2 commits intoonllm-dev:mainfrom
kapdon:fix/docker-nonroot-permissions
Mar 11, 2026
Merged

fix: Docker container fails to start with misleading OOM error#22
prakersh merged 2 commits intoonllm-dev:mainfrom
kapdon:fix/docker-nonroot-permissions

Conversation

@kapdon
Copy link
Contributor

@kapdon kapdon commented Mar 8, 2026

Summary

  • The distroless nonroot image (UID 65532) couldn't write to the bind-mounted data directory owned by the host user, causing SQLite to report SQLITE_CANTOPEN as "out of memory (14)"
  • Create /data with correct ownership in the builder stage and COPY --chown=65532:65532 to the runtime stage
  • Switch from bind mount to named volume so Docker handles permissions automatically on first use

Test plan

  • docker compose build succeeds
  • docker compose up starts without errors (previously crash-looped with OOM)
  • Container runs at ~10MB RSS, well within the 64MB limit
  • Verify data persists across container restarts with named volume

🤖 Generated with Claude Code

The distroless nonroot image (UID 65532) couldn't write to the
bind-mounted data directory owned by the host user, causing SQLite
to report SQLITE_CANTOPEN as "out of memory (14)".

- Create /data with correct ownership in builder stage and COPY
  it to runtime stage with --chown=65532:65532
- Switch from bind mount to named volume so Docker handles
  permissions automatically on first use

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@prakersh
Copy link
Contributor

prakersh commented Mar 9, 2026

Thanks for tracking this down - the permission issue with distroless + bind mounts is a real pain point.

One concern before merging: this switches existing compose users away from ./onwatch-data with no migration path. Users upgrading would lose their historical data.

Suggested fix - migration-aware entrypoint:

The idea is to detect when a user is upgrading from bind mount to named volume and copy their existing DB files once.

1. Add a Docker entrypoint in main.go (or separate file)

// +build docker

func dockerEntrypoint() {
    legacyPath := "/legacy-data/onwatch.db"
    newPath := "/data/onwatch.db"
    
    // Only migrate if:
    // - legacy DB exists (user upgrading from bind mount)
    // - new DB doesn't exist (named volume is fresh)
    if fileExists(legacyPath) && !fileExists(newPath) {
        log.Println("Migrating DB from legacy bind mount to named volume...")
        
        // Copy main DB and WAL/SHM files
        copyFile(legacyPath, newPath)
        copyFileIfExists("/legacy-data/onwatch.db-wal", "/data/onwatch.db-wal")
        copyFileIfExists("/legacy-data/onwatch.db-shm", "/data/onwatch.db-shm")
        
        log.Println("Migration complete")
    }
}

2. Update Dockerfile - call entrypoint before main binary

ENTRYPOINT ["/onwatch"]
CMD ["--docker-entrypoint"]

Or use a shell wrapper if you prefer separation.

3. Update docker-compose.yml - mount legacy path read-only for upgrades

volumes:
  - onwatch-data:/data                    # new named volume (read-write)
  - ./onwatch-data:/legacy-data:ro        # old bind mount (read-only, for migration)

Why this works:

  • Fresh installs: /legacy-data is empty, no migration runs
  • Upgrades: DB copied from /legacy-data/data on first start, then ignored
  • The :ro mount ensures we never write to the old location

Validation approach:

  1. Create a test DB in ./onwatch-data with a sentinel value
  2. Start container with both mounts
  3. Verify sentinel exists in the named volume

If you'd like, I can implement this and push to your branch - just let me know. Either way, I'd hold off merging until we have an upgrade path for existing users.

@kapdon
Copy link
Contributor Author

kapdon commented Mar 10, 2026

Thanks for the review! I don't think migration logic is needed here though - users are either using a bind mount or a named volume, never both at the same time. There's no scenario where data silently disappears:

  • Bind mount users keep using ./onwatch-data:/data and nothing changes for them
  • Named volume users start fresh with onwatch-data:/data
  • If someone explicitly switches from bind mount to named volume, they can just cp ./onwatch-data/* /var/lib/docker/volumes/onwatch-data/_data/ beforehand

Adding an entrypoint migration path with a legacy mount adds complexity for a case that doesn't organically occur.

@prakersh
Copy link
Contributor

The scenario I'm concerned about:

  1. User has been running with current docker-compose.yml (bind mount to ./onwatch-data)
  2. User pulls the updated version with your changes
  3. User runs docker-compose down && docker-compose up -d
  4. New compose uses named volume → container starts fresh, historical data not visible

The data isn't deleted (still in ./onwatch-data), but from the user's perspective their dashboard is empty after a routine update. They wouldn't know why or how to recover it.

Just want to make sure we don't surprise existing users. How would you approach handling this?

The Dockerfile fix (creating /data owned by UID 65532) resolves
permission errors for both bind mounts and named volumes, so there's
no need to change the default volume strategy in docker-compose.yml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kapdon
Copy link
Contributor Author

kapdon commented Mar 10, 2026

Good point - I've reverted the docker-compose.yml changes. The Dockerfile fix alone handles this: creating /data owned by UID 65532 in the image means the nonroot user has write access regardless of whether a bind mount or named volume is used.

So now:

  • docker-compose.yml stays unchanged - bind mount to ./onwatch-data:/data remains the default
  • No migration needed - existing users' data and workflow are untouched
  • The Dockerfile fix (mkdir -p /data && chown 65532:65532 /data + COPY --chown) resolves the permission error for both volume types

Simpler change, no breaking behavior.

Copy link
Contributor

@prakersh prakersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick iteration and for reverting the compose default change to avoid user migration surprises. Approving this as an incremental improvement; we will follow up with DB-path preflight diagnostics to replace the SQLite OOM-style startup error with actionable permission guidance.

@prakersh prakersh merged commit f66b4c2 into onllm-dev:main Mar 11, 2026
prakersh added a commit that referenced this pull request Mar 11, 2026
Fail fast with actionable diagnostics when the SQLite DB path is not writable,
including existing read-only database files. Preserve valid SQLite file: URIs
and memory DSNs, and update Docker bind-mount guidance to use recursive chown.

Follow-up to #22.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants