Context Poisoning
Context Distraction
Context Confusion
Context Quarantine
Isolating contexts in dedicated threads, often facilitated by subagents, to explore different aspects of a question in parallel and then condense the most important tokens for a lead agent.
Context Clash
New information or tools in the context directly conflict with other existing information, derailing the model’s reasoning.
Context Pruning
Context Summarization
Using a separate LLM to condense conversation history.
Context Offloading
Storing information outside the LLM’s active context via a tool, often referred to as a “scratchpad”. Particularly effective for detailed tool output analysis, policy-heavy environments, and sequential decision-making.
What is provence
Why should you use a map for conversation history?
How many tokens does it take for a model to experience context distraction?
Beyond 100k tokens in some cases, or around 32k for smaller models like Llama 3.1-405b)
Rolling Window
RAG as a context management strategy
Rather than keeping all history in the active context, details are offloaded to an external database (like a vector database). When information is needed, the system retrieves only the necessary, relevant data based on the current query, ensuring the active context remains lean and focused.
Hierarchical Memory Management: This approach uses a tiered system where critical information (e.g., system instructions, key decisions) is preserved, recent history is kept in detail, and older or less relevant content is summarized or removed entirely.
Hierarchical Memory Management
This approach uses a tiered system where critical information (e.g., system instructions, key decisions) is preserved, recent history is kept in detail, and older or less relevant content is summarized or removed entirely)