Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,30 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

concurrency:
group: ci-${{ github.head_ref }}
group: ci-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

jobs:
quality:
name: Quality Gates
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 22]
steps:
- uses: actions/checkout@v4

- uses: pnpm/action-setup@v4

- uses: actions/setup-node@v4
with:
node-version: 22
node-version: ${{ matrix.node-version }}
cache: pnpm

- run: pnpm install --frozen-lockfile
Expand All @@ -35,3 +40,20 @@ jobs:

- name: Test
run: pnpm test

# Coverage is best-effort: vitest v3 + v8 provider has a known worker
# timeout bug (onTaskUpdate) on CI runners. Tests pass above; coverage
# collection may timeout without affecting correctness.
- name: Coverage
if: matrix.node-version == 22
continue-on-error: true
run: pnpm vitest run --coverage --pool forks --no-file-parallelism

- name: Upload coverage
if: matrix.node-version == 22 && always()
uses: actions/upload-artifact@v4
with:
name: coverage-report
path: coverage/
retention-days: 14
if-no-files-found: ignore
105 changes: 105 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Codebase Intelligence

TypeScript codebase analysis engine. Parses source, builds dependency graphs, computes architectural metrics.

## Setup

```bash
npx codebase-intelligence@latest <path> # One-shot (no install)
npm install -g codebase-intelligence # Global install
```

## Interfaces

### MCP (for AI agents — preferred)

Start the MCP stdio server:

```bash
codebase-intelligence ./path/to/project
```

15 tools available: `codebase_overview`, `file_context`, `get_dependents`, `find_hotspots`, `get_module_structure`, `analyze_forces`, `find_dead_exports`, `get_groups`, `symbol_context`, `search`, `detect_changes`, `impact_analysis`, `rename_symbol`, `get_processes`, `get_clusters`.

2 prompts: `detect_impact`, `generate_map`.
3 resources: `codebase://clusters`, `codebase://processes`, `codebase://setup`.

### CLI (for humans and CI)

15 commands — full parity with MCP tools:

```bash
codebase-intelligence overview ./src # Codebase snapshot
codebase-intelligence hotspots ./src # Rank files by metric
codebase-intelligence file ./src auth/login.ts # File context
codebase-intelligence search ./src "auth" # Keyword search
codebase-intelligence changes ./src # Git diff analysis
codebase-intelligence dependents ./src types.ts # File blast radius
codebase-intelligence modules ./src # Module architecture
codebase-intelligence forces ./src # Force analysis
codebase-intelligence dead-exports ./src # Unused exports
codebase-intelligence groups ./src # Directory groups
codebase-intelligence symbol ./src parseCodebase # Symbol context
codebase-intelligence impact ./src getUserById # Symbol blast radius
codebase-intelligence rename ./src old new # Rename references
codebase-intelligence processes ./src # Execution flows
codebase-intelligence clusters ./src # File clusters
```

Add `--json` for machine-readable output. All commands auto-cache the index.

### Tool Selection

| Question | MCP Tool | CLI Command |
|----------|----------|-------------|
| What does this codebase look like? | `codebase_overview` | `overview` |
| Tell me about file X | `file_context` | `file` |
| What are the riskiest files? | `find_hotspots` | `hotspots` |
| Find files related to X | `search` | `search` |
| What changed? | `detect_changes` | `changes` |
| What breaks if I change file X? | `get_dependents` | `dependents` |
| What breaks if I change function X? | `impact_analysis` | `impact` |
| What's architecturally wrong? | `analyze_forces` | `forces` |
| Who calls this function? | `symbol_context` | `symbol` |
| Find all references for rename | `rename_symbol` | `rename` |
| What files naturally group together? | `get_clusters` | `clusters` |
| What can I safely delete? | `find_dead_exports` | `dead-exports` |
| How are modules organized? | `get_module_structure` | `modules` |
| What are the main areas? | `get_groups` | `groups` |
| How does data flow? | `get_processes` | `processes` |

## Documentation

- `docs/architecture.md` — Pipeline, module map, data flow
- `docs/data-model.md` — All TypeScript interfaces
- `docs/metrics.md` — Per-file and module metrics, force analysis
- `docs/mcp-tools.md` — 15 MCP tools with inputs, outputs, use cases
- `docs/cli-reference.md` — CLI commands with examples
- `llms.txt` — AI-consumable doc index
- `llms-full.txt` — Full documentation for context injection

## Metrics

Key file metrics: PageRank, betweenness, fan-in/out, coupling, tension, churn, cyclomatic complexity, blast radius, dead exports, test coverage.

Module metrics: cohesion, escape velocity, verdict (LEAF/COHESIVE/MODERATE/JUNK_DRAWER).

Force analysis: tension files, bridge files, extraction candidates.

## Project Structure

```
src/
cli.ts Entry point + CLI commands
core/index.ts Shared computation (used by MCP + CLI)
types/index.ts All interfaces (single source of truth)
parser/index.ts TypeScript AST parser
graph/index.ts Dependency graph builder (graphology)
analyzer/index.ts Metric computation engine
mcp/index.ts MCP stdio server (15 tools)
impact/index.ts Symbol-level impact analysis
search/index.ts BM25 search engine
process/index.ts Entry point + call chain tracing
community/index.ts Louvain clustering
persistence/index.ts Graph cache (.code-visualizer/)
```
15 changes: 10 additions & 5 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,12 @@ Analyzer
| computes: churn, complexity, blast radius, dead exports, test coverage
| produces: ForceAnalysis (tension files, bridges, extraction candidates)
v
MCP (stdio)
| exposes: 15 tools, 2 prompts, 3 resources for LLM agents
Core (shared computation)
| result builders used by both MCP and CLI
v
MCP (stdio) CLI (terminal/CI)
| 15 tools, 2 prompts, | 5 commands: overview, hotspots,
| 3 resources for LLMs | file, search, changes + --json
```

## Module Map
Expand All @@ -30,6 +34,7 @@ src/
parser/index.ts <- TS AST extraction + git churn + test detection
graph/index.ts <- graphology graph + circular dep detection
analyzer/index.ts <- All metric computation
core/index.ts <- Shared result computation (MCP + CLI)
mcp/index.ts <- 15 MCP tools for LLM integration
mcp/hints.ts <- Next-step hints for MCP tool responses
impact/index.ts <- Symbol-level impact analysis + rename planning
Expand All @@ -38,7 +43,7 @@ src/
community/index.ts <- Louvain clustering
persistence/index.ts <- Graph export/import to .code-visualizer/
server/graph-store.ts <- Global graph state (shared by CLI + MCP)
cli.ts <- Entry point, wires pipeline together
cli.ts <- Entry point, CLI commands + MCP fallback
```

## Data Flow
Expand All @@ -63,12 +68,12 @@ startMcpServer(codebaseGraph)

## Key Design Decisions

- **MCP-only**: No web UI or REST API. All interaction through MCP stdio for LLM agents.
- **Dual interface**: MCP stdio for LLM agents, CLI subcommands for humans/CI. Both consume `src/core/`.
- **graphology**: In-memory graph with O(1) neighbor lookup. PageRank and betweenness computed via graphology-metrics.
- **Batch git churn**: Single `git log --all --name-only` call, parsed for all files. Avoids O(n) subprocess spawning.
- **Dead export detection**: Cross-references parsed exports against edge symbol lists. May miss `import *` or re-exports (known limitation).
- **Graceful degradation**: Non-git dirs get churn=0, no-test codebases get coverage=false. Never crashes.
- **Graph persistence**: Optional `--index` flag caches parsed graph to `.code-visualizer/` for instant startup on unchanged HEAD.
- **Graph persistence**: CLI commands always cache the graph index to `.code-visualizer/`. MCP mode (`codebase-intelligence <path>`) requires `--index` to persist the cache.

## Adding a New Metric

Expand Down
192 changes: 192 additions & 0 deletions docs/cli-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# CLI Reference

15 commands for terminal and CI use. Full parity with MCP tools. All commands auto-cache the index to `.code-visualizer/`.

## Commands

### overview

High-level codebase snapshot.

```bash
codebase-intelligence overview <path> [--json] [--force]
```

**Output:** file count, function count, dependency count, modules (path, files, LOC, coupling, cohesion), top 5 depended files, avg LOC, max depth, circular dep count.

### hotspots

Rank files by metric.

```bash
codebase-intelligence hotspots <path> [--metric <metric>] [--limit <n>] [--json] [--force]
```

**Metrics:** `coupling` (default), `pagerank`, `fan_in`, `fan_out`, `betweenness`, `tension`, `churn`, `complexity`, `blast_radius`, `coverage`, `escape_velocity`.

### file

Detailed file context.

```bash
codebase-intelligence file <path> <file> [--json] [--force]
```

`<file>` is relative to the codebase root (e.g., `parser/index.ts`).

**Output:** LOC, exports, imports, dependents, all FileMetrics. Error: prints top 3 similar path suggestions.

### search

BM25 keyword search.

```bash
codebase-intelligence search <path> <query> [--limit <n>] [--json] [--force]
```

**Output:** Ranked results grouped by file, with symbol name, type, LOC, and relevance score.

### changes

Git diff analysis with risk metrics.

```bash
codebase-intelligence changes <path> [--scope <scope>] [--json] [--force]
```

**Scope:** `staged`, `unstaged`, `all` (default).

### dependents

File-level blast radius: direct + transitive dependents.

```bash
codebase-intelligence dependents <path> <file> [--depth <n>] [--json] [--force]
```

**Output:** direct dependents with symbols, transitive dependents with paths, total affected, risk level (LOW/MEDIUM/HIGH).

### modules

Module architecture with cross-module dependencies.

```bash
codebase-intelligence modules <path> [--json] [--force]
```

**Output:** modules with cohesion/escape velocity, cross-module deps, circular deps.

### forces

Architectural force analysis.

```bash
codebase-intelligence forces <path> [--cohesion <n>] [--tension <n>] [--escape <n>] [--json] [--force]
```

**Output:** module cohesion verdicts, tension files, bridge files, extraction candidates, summary.

### dead-exports

Find unused exports across the codebase.

```bash
codebase-intelligence dead-exports <path> [--module <module>] [--limit <n>] [--json] [--force]
```

**Output:** dead export count, files with unused exports, summary.

### groups

Top-level directory groups with aggregate metrics.

```bash
codebase-intelligence groups <path> [--json] [--force]
```

**Output:** groups ranked by importance with files, LOC, coupling.

### symbol

Function/class context with callers and callees.

```bash
codebase-intelligence symbol <path> <name> [--json] [--force]
```

**Output:** symbol metadata, fan-in/out, PageRank, betweenness, callers, callees.

### impact

Symbol-level blast radius with depth-grouped impact levels.

```bash
codebase-intelligence impact <path> <symbol> [--json] [--force]
```

**Output:** impact levels (WILL BREAK / LIKELY / MAY NEED TESTING), total affected.

### rename

Find all references for rename planning (read-only by default).

```bash
codebase-intelligence rename <path> <oldName> <newName> [--no-dry-run] [--json] [--force]
```

**Output:** references with file, symbol, and confidence level.

### processes

Entry point execution flows through the call graph.

```bash
codebase-intelligence processes <path> [--entry <name>] [--limit <n>] [--json] [--force]
```

**Output:** processes with entry point, steps, depth, modules touched.

### clusters

Community-detected file clusters (Louvain algorithm).

```bash
codebase-intelligence clusters <path> [--min-files <n>] [--json] [--force]
```

**Output:** clusters with files, file count, cohesion.

## Flags

| Flag | Available On | Description |
|------|-------------|-------------|
| `--json` | All commands | Output stable JSON to stdout |
| `--force` | All commands | Re-parse even if cached index matches HEAD |
| `--metric <m>` | hotspots | Metric to rank by (default: coupling) |
| `--limit <n>` | hotspots, search, dead-exports, processes | Max results |
| `--scope <s>` | changes | Git diff scope: staged, unstaged, all |
| `--depth <n>` | dependents | Max traversal depth (default: 2) |
| `--cohesion <n>` | forces | Min cohesion threshold (default: 0.6) |
| `--tension <n>` | forces | Min tension threshold (default: 0.3) |
| `--escape <n>` | forces | Min escape velocity threshold (default: 0.5) |
| `--module <m>` | dead-exports | Filter by module path |
| `--entry <name>` | processes | Filter by entry point name |
| `--min-files <n>` | clusters | Min files per cluster |
| `--no-dry-run` | rename | Actually perform the rename (default: dry run) |

## Behavior

**Auto-caching:** First CLI invocation parses the codebase and saves the index to `.code-visualizer/`. Subsequent commands use the cache if `git HEAD` hasn't changed. Add `.code-visualizer/` to `.gitignore`.

**stdout/stderr:** Results go to stdout. Progress messages go to stderr. Safe for piping (`| jq`, `> file.json`).

**Exit codes:**
- `0` — success
- `1` — runtime error (file not found, no TS files, git unavailable)
- `2` — bad args or usage error

**MCP mode:** Running `codebase-intelligence <path>` without a subcommand starts the MCP stdio server (backward compatible). MCP-specific flags:
- `--index` — persist graph index to `.code-visualizer/` (CLI auto-caches, MCP requires this flag)
- `--status` — print index status and exit
- `--clean` — remove `.code-visualizer/` index and exit
- `--force` — re-index even if HEAD unchanged
Loading
Loading