Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 51 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ But Engram isn't just a handoff bus. It solves four fundamental problems with ho
| **Nobody forgets** | Store everything forever | **Ebbinghaus decay curve, ~45% less storage** |
| **Agents write with no oversight** | Store directly | **Staging + verification + trust scoring** |
| **No episodic memory** | Vector search only | **CAST scenes (time/place/topic)** |
| **No consolidation** | Store everything as-is | **CLS Distillation — replay-driven fact extraction** |
| **Single decay rate** | One exponential curve | **Multi-trace Benna-Fusi model (fast/mid/slow)** |
| **No intent routing** | Same search for all queries | **Episodic vs semantic query classification** |
| Multi-modal encoding | Single embedding | **5 retrieval paths (EchoMem)** |
| Cross-agent memory sharing | Per-agent silos | **Scoped retrieval with all-but-mask privacy** |
| Concurrent multi-agent access | Single-process locks | **sqlite-vec WAL mode — multiple agents, one DB** |
Expand Down Expand Up @@ -90,6 +93,9 @@ pip install "engram-memory[sqlite_vec]"
# OpenAI provider add-on
pip install "engram-memory[openai]"

# NVIDIA provider add-on (Llama 3.1, nv-embed-v1, etc.)
pip install "engram-memory[nvidia]"

# Ollama provider add-on
pip install "engram-memory[ollama]"
```
Expand Down Expand Up @@ -144,7 +150,7 @@ Engram has five opinions about how memory should work:

1. **Switching agents shouldn't mean starting over.** When an agent pauses — rate limit, crash, tool switch — it saves a session digest. The next agent loads it and continues. Zero re-explanation.
2. **Agents need shared real-time state.** Active Memory lets agents broadcast what they're doing right now — no polling, no coordination protocol. Agent A posts "editing auth.py"; Agent B sees it instantly.
3. **Memory has a lifecycle.** New memories start in short-term (SML), get promoted to long-term (LML) through repeated access, and fade away through Ebbinghaus decay if unused.
3. **Memory has a lifecycle.** New memories start in short-term (SML), get promoted to long-term (LML) through repeated access, and fade away through Ebbinghaus decay if unused. Sleep cycles distill episodic conversations into durable semantic facts (CLS consolidation), cascade strength traces from fast to slow, and prune redundant or contradictory memories.
4. **Agents are untrusted writers.** Every write is a proposal that lands in staging. Trusted agents can auto-merge; untrusted ones wait for approval.
5. **Scoping is mandatory.** Every memory is scoped by user. Agents see only what they're allowed to — everything else gets the "all but mask" treatment (structure visible, details redacted).

Expand Down Expand Up @@ -209,7 +215,7 @@ Engram has five opinions about how memory should work:

### The Memory Stack

Engram combines seven systems, each handling a different aspect of how memory should work:
Engram combines multiple systems, each handling a different aspect of how memory should work:

#### Active Memory — Real-Time Signal Bus

Expand Down Expand Up @@ -289,6 +295,48 @@ Scene: "Engram v2 architecture session"
Memories: [mem_1, mem_2] ← semantic facts extracted
```

#### CLS Distillation Memory — Bio-Inspired Consolidation (v1.4)

Inspired by Complementary Learning Systems (CLS) theory — how the hippocampus and neocortex work together in the brain. Engram v1.4 adds five mechanisms that make memory smarter over time:

**1. Episodic/Semantic Memory Types**
Conversations are stored as `episodic` memories. During sleep cycles, a replay-driven distiller extracts durable facts into `semantic` memories — just like how your brain consolidates experiences into knowledge overnight.

**2. Replay-Driven Distillation**
The `ReplayDistiller` samples recent episodic memories, groups them by scene/time, and uses the LLM to extract reusable semantic facts. Every distilled fact links back to its source episodes (provenance tracking).

**3. Multi-Mechanism Forgetting**
Beyond simple exponential decay, Engram now has three advanced forgetting mechanisms:
- **Interference Pruning** — contradictory memories are detected and the weaker one is demoted
- **Redundancy Collapse** — near-duplicate memories are auto-fused
- **Homeostatic Normalization** — memory budgets per namespace prevent unbounded growth

**4. Multi-Timescale Strength Traces (Benna-Fusi Model)**
Each memory has three strength traces instead of one scalar:
```
s_fast (decay: 0.20/day) — recent access, volatile
s_mid (decay: 0.05/day) — medium-term consolidation
s_slow (decay: 0.005/day) — durable long-term knowledge
```
New memories start in `s_fast`. Sleep cycles cascade strength: `fast → mid → slow`. Important facts become nearly permanent.

**5. Intent-Aware Retrieval Routing**
Queries are classified as episodic ("when did we discuss..."), semantic ("what is the deployment process?"), or mixed. Matching memory types get a retrieval boost — the right type of answer for the right type of question.

```
┌──────────────────────────────────────────────────────────────┐
│ Sleep Cycle (v1.4) │
│ │
│ 1. Standard FadeMem decay (SML/LML) │
│ 2. Multi-trace decay (fast/mid/slow independently) │
│ 3. Interference pruning (contradict → demote weaker) │
│ 4. Redundancy collapse (near-dupes → fuse) │
│ 5. Homeostatic normalization (budget enforcement) │
│ 6. Replay distillation (episodic → semantic facts) │
│ 7. Trace cascade (fast → mid → slow consolidation) │
└──────────────────────────────────────────────────────────────┘
```

#### Handoff Bus — Cross-Agent Continuity

Engram now defaults to a zero-intervention continuity model: MCP adapters automatically request resume context before tool execution and auto-write checkpoints on lifecycle events (`tool_complete`, `agent_pause`, `agent_end`). The legacy tools (`save_session_digest`, `get_last_session`, `list_sessions`) remain available for compatibility.
Expand Down Expand Up @@ -785,7 +833,7 @@ Engram is based on:
| Multi-hop Reasoning | +12% accuracy |
| Retrieval Precision | +8% on LTI-Bench |

Biological inspirations: Ebbinghaus Forgetting Curve → exponential decay, Spaced Repetition → access boosts strength, Sleep Consolidation → SML → LML promotion, Working Memory → Active Memory signal bus, Conscious/Subconscious Split → Active vs Passive memory, Production Effect → echo encoding, Elaborative Encoding → deeper processing = stronger memory.
Biological inspirations: Ebbinghaus Forgetting Curve → exponential decay, Spaced Repetition → access boosts strength, Sleep Consolidation → SML → LML promotion + CLS replay distillation, Benna-Fusi Model → multi-timescale strength traces (fast/mid/slow), Complementary Learning Systems → episodic-to-semantic consolidation, Working Memory → Active Memory signal bus, Conscious/Subconscious Split → Active vs Passive memory, Production Effect → echo encoding, Elaborative Encoding → deeper processing = stronger memory.

---

Expand Down
32 changes: 25 additions & 7 deletions engram/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,22 +85,32 @@ class DecayResponse(BaseModel):
redoc_url="/redoc",
)

_cors_origins_raw = os.environ.get("ENGRAM_CORS_ORIGINS", "")
_cors_origins = (
[o.strip() for o in _cors_origins_raw.split(",") if o.strip()]
if _cors_origins_raw
else ["http://localhost:3000", "http://127.0.0.1:3000"]
)

app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_origins=_cors_origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
add_metrics_routes(app)

_memory: Optional[Memory] = None
_memory_lock = threading.Lock()


def get_memory() -> Memory:
global _memory
if _memory is None:
_memory = Memory()
with _memory_lock:
if _memory is None:
_memory = Memory()
return _memory


Expand Down Expand Up @@ -403,7 +413,7 @@ async def search_memories(request: SearchRequestV2, http_request: Request):
raise require_session_error(exc)
except Exception as exc:
logger.exception("Error searching memories")
raise HTTPException(status_code=500, detail=str(exc))
raise HTTPException(status_code=500, detail="Internal server error")


@app.get("/v1/scenes")
Expand Down Expand Up @@ -494,7 +504,7 @@ async def add_memory(request: AddMemoryRequestV2, http_request: Request):
raise require_session_error(exc)
except Exception as exc:
logger.exception("Error creating proposal/direct memory")
raise HTTPException(status_code=500, detail=str(exc))
raise HTTPException(status_code=500, detail="Internal server error")


@app.get("/v1/staging/commits")
Expand Down Expand Up @@ -779,15 +789,19 @@ async def get_memory_by_id(memory_id: str):

@app.put("/v1/memories/{memory_id}", response_model=Dict[str, Any])
@app.put("/v1/memories/{memory_id}/", response_model=Dict[str, Any])
async def update_memory(memory_id: str, request: Dict[str, Any]):
async def update_memory(memory_id: str, request: Dict[str, Any], http_request: Request):
token = get_token_from_request(http_request)
require_token_for_untrusted_request(http_request, token)
memory = get_memory()
result = memory.update(memory_id, request)
return result


@app.delete("/v1/memories/{memory_id}")
@app.delete("/v1/memories/{memory_id}/")
async def delete_memory(memory_id: str):
async def delete_memory(memory_id: str, http_request: Request):
token = get_token_from_request(http_request)
require_token_for_untrusted_request(http_request, token)
memory = get_memory()
memory.delete(memory_id)
return {"status": "deleted", "id": memory_id}
Expand All @@ -796,14 +810,18 @@ async def delete_memory(memory_id: str):
@app.delete("/v1/memories", response_model=Dict[str, Any])
@app.delete("/v1/memories/", response_model=Dict[str, Any])
async def delete_memories(
http_request: Request,
user_id: Optional[str] = Query(default=None),
agent_id: Optional[str] = Query(default=None),
run_id: Optional[str] = Query(default=None),
app_id: Optional[str] = Query(default=None),
dry_run: bool = Query(default=False, description="Preview what would be deleted without actually deleting"),
):
token = get_token_from_request(http_request)
require_token_for_untrusted_request(http_request, token)
memory = get_memory()
try:
return memory.delete_all(user_id=user_id, agent_id=agent_id, run_id=run_id, app_id=app_id)
return memory.delete_all(user_id=user_id, agent_id=agent_id, run_id=run_id, app_id=app_id, dry_run=dry_run)
except FadeMemValidationError as exc:
raise HTTPException(status_code=400, detail=exc.message)

Expand Down
8 changes: 4 additions & 4 deletions engram/api/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,31 +93,31 @@ class HandoffSessionDigestRequest(BaseModel):


class SearchRequestV2(BaseModel):
query: str
query: str = Field(min_length=1, max_length=10000)
user_id: str = Field(default="default")
agent_id: Optional[str] = Field(default=None)
limit: int = Field(default=10, ge=1, le=100)
categories: Optional[List[str]] = Field(default=None)


class AddMemoryRequestV2(BaseModel):
content: Optional[str] = Field(default=None)
content: Optional[str] = Field(default=None, max_length=100000)
messages: Optional[Union[str, List[Dict[str, Any]]]] = Field(default=None)
user_id: str = Field(default="default")
agent_id: Optional[str] = Field(default=None)
metadata: Optional[Dict[str, Any]] = Field(default=None)
categories: Optional[List[str]] = Field(default=None)
scope: Optional[str] = Field(default="work")
namespace: Optional[str] = Field(default="default")
mode: str = Field(default="staging", description="staging|direct")
mode: Literal["staging", "direct"] = Field(default="staging", description="staging|direct")
infer: bool = Field(default=False)
source_app: Optional[str] = Field(default=None)
source_type: str = Field(default="rest")
source_event_id: Optional[str] = Field(default=None)


class SceneSearchRequest(BaseModel):
query: str
query: str = Field(min_length=1, max_length=10000)
user_id: str = Field(default="default")
agent_id: Optional[str] = Field(default=None)
limit: int = Field(default=10, ge=1, le=100)
Expand Down
29 changes: 22 additions & 7 deletions engram/configs/active.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from enum import Enum
from typing import Dict

from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, field_validator


class TTLTier(str, Enum):
Expand All @@ -25,6 +25,14 @@ class SignalScope(str, Enum):
NAMESPACE = "namespace" # Only agents in same namespace


class ConsolidationConfig(BaseModel):
"""Configuration for active → passive memory consolidation."""
promote_critical: bool = True
promote_high_read: bool = True
promote_read_threshold: int = 3
directive_to_passive: bool = True


class ActiveMemoryConfig(BaseModel):
"""Configuration for the Active Memory signal bus."""
enabled: bool = True
Expand All @@ -40,11 +48,18 @@ class ActiveMemoryConfig(BaseModel):
consolidation_enabled: bool = True
consolidation_min_age_seconds: int = 600
consolidation_min_reads: int = 3
consolidation: ConsolidationConfig = Field(default_factory=ConsolidationConfig)

@field_validator("default_ttl_tier")
@classmethod
def _valid_ttl_tier(cls, v: str) -> str:
allowed = {t.value for t in TTLTier}
v = str(v).strip().lower()
if v not in allowed:
return TTLTier.NOTABLE.value
return v

class ConsolidationConfig(BaseModel):
"""Configuration for active → passive memory consolidation."""
promote_critical: bool = True
promote_high_read: bool = True
promote_read_threshold: int = 3
directive_to_passive: bool = True
@field_validator("max_signals_per_response")
@classmethod
def _clamp_max_signals(cls, v: int) -> int:
return min(100, max(1, int(v)))
Loading
Loading