Token-Aware Context Building

Why context building matters

LLMs have finite context windows. A typical model accepts between 4,000 and 128,000 tokens, and every token you spend on context is a token you cannot spend on the model’s response. When building RAG applications, you often have more retrieved content than fits in the window – system prompts, retrieved chunks, conversation history, and user instructions all compete for space.

Naive approaches (concatenate everything, hope it fits) fail in predictable ways: either the prompt is truncated silently, or the model ignores information buried in the middle of a long context. securecontext’s context_builder() solves this with a priority-based token budget that gives you explicit control over what gets included and clear reporting on what gets dropped.

How the context builder works

The builder follows a simple algorithm:

                    +------------------+
                    | Set token budget |
                    | (max_tokens)     |
                    +--------+---------+
                             |
                    +--------v---------+
                    | Sort items by    |
                    | priority (desc)  |
                    +--------+---------+
                             |
                +------------v------------+
                | For each item (highest  |
                | priority first):        |
                |                         |
                |  Estimate token count   |
                |         |               |
                |    Fits in budget?       |
                |    /           \         |
                |  YES           NO       |
                |   |             |        |
                | Include      Exclude    |
                | (deduct      (record    |
                |  tokens)      label)    |
                +------------+------------+
                             |
                    +--------v---------+
                    | Return:          |
                    | - context string |
                    | - included list  |
                    | - excluded list  |
                    | - total_tokens   |
                    +------------------+

Items with the highest priority numbers are included first. When the remaining budget cannot accommodate the next item, that item (and all lower-priority items) are excluded. The builder reports both lists, so you always know exactly what the LLM will see and what it will not.

Basic usage

Create a builder with a token budget, add content with priorities, and build the final context string.

library(securecontext)

cb <- context_builder(max_tokens = 100)

# Add content with different priorities (higher = included first)
cb <- cb_add(cb, "You are a helpful assistant.", priority = 10, label = "system")
cb <- cb_add(cb,
  "R is great for statistics and data visualization.",
  priority = 5, label = "retrieved_chunk_1"
)
cb <- cb_add(cb,
  "Python is popular for machine learning and web development.",
  priority = 4, label = "retrieved_chunk_2"
)
cb <- cb_add(cb,
  "Julia offers high performance for numerical computing workloads.",
  priority = 3, label = "retrieved_chunk_3"
)

result <- cb_build(cb)

cat("Assembled context:\n")
cat(result$context, "\n\n")
cat("Included:", paste(result$included, collapse = ", "), "\n")
cat("Excluded:", paste(result$excluded, collapse = ", "), "\n")
cat("Total tokens:", result$total_tokens, "\n")

With a 100-token budget, the system prompt (priority 10) is always included. Retrieved chunks are then packed in priority order until the budget is exhausted. The $excluded field tells you exactly which chunks were dropped, which is valuable for debugging and logging in production agents.

Priority design patterns

Choosing priority values is a design decision that depends on your application. Here are common patterns:

Fixed tiers assign priorities by content type:

Priority	Content type	Rationale
10	System prompt	Defines agent behavior, always needed
7	User’s question	The query must be visible to the model
5	Retrieved chunks	Supporting evidence, best-effort
3	Conversation history	Helpful but expendable
1	Disclaimers/footers	Include only if space permits

Dynamic priorities use retrieval scores directly. When you retrieve chunks with cosine similarity scores, you can pass those scores as priorities so the most relevant chunks are included first:

cb <- context_builder(max_tokens = 500)

# System prompt gets highest priority -- always included
cb <- cb_add(cb, "You are an R expert.", priority = 10, label = "system")

# Retrieved chunks get decreasing priority by relevance score
hits <- retrieve(ret, "statistical models", k = 5)
for (i in seq_len(nrow(hits))) {
  chunk_text <- hits$id[i]  # or look up the original text
  cb <- cb_add(cb, chunk_text, priority = hits$score[i], label = hits$id[i])
}

result <- cb_build(cb)
cat("Included:", paste(result$included, collapse = ", "), "\n")
cat("Excluded:", paste(result$excluded, collapse = ", "), "\n")
cat("Total tokens:", result$total_tokens, "\n")

Resetting between turns

In a multi-turn conversation, you typically rebuild the context for each turn – the retrieved chunks change, the conversation history grows, and the system prompt may be updated. Use cb_reset() to clear all items and reuse the same builder without re-specifying the token budget.

cb2 <- cb_reset(cb)
cb2 <- cb_add(cb2, "New system prompt.", priority = 10, label = "system_v2")
result2 <- cb_build(cb2)
cat("After reset -- included:", paste(result2$included, collapse = ", "), "\n")

This avoids the overhead of creating a new builder object each turn and makes intent clear: this is a fresh context assembly for a new turn.

The `context_for_chat()` shortcut

For the common case of “retrieve chunks and build a context string,” context_for_chat() combines both steps in a single call. It retrieves the top-k chunks from a retriever and packs them into a token-limited string.

result <- context_for_chat(ret, "statistics", max_tokens = 2000)
cat(result$context)