Why local retrieval?
Retrieval-augmented generation (RAG) lets an LLM answer questions about your own documents (internal reports, package documentation, domain knowledge) by fetching relevant passages and injecting them into the prompt. Most RAG tooling requires sending your documents to an external embedding API, which introduces latency, cost, and privacy concerns.
securecontext takes a local-first approach. Its TF-IDF embedder runs entirely in your R process: no API keys, no network calls, no data leaving your machine. This makes it suitable for air-gapped environments, privacy-sensitive workloads, and rapid prototyping where you want results without configuring external services.
The sections below cover the complete retrieval pipeline step by step. Every operation (document creation, chunking, embedding, vector search, knowledge storage, and context assembly) runs locally.
The full RAG pipeline
The following diagram shows how data flows from raw documents to an LLM-ready context string:
Ingest Search Assemble
------ ------ --------
document() retrieve() context_builder()
| | |
v v v
chunk_text() embed query cb_add() per chunk
| with TF-IDF |
v | v
embed_tfidf() cosine similarity cb_build()
| against store |
v | v
vector_store$add() ranked results token-limited
context string
The left column (ingest) happens once when you load your corpus. The middle column (search) runs on every query. The right column (assemble) packs results into a token budget before sending to an LLM.
Step 1: Create documents
A document() wraps raw text with metadata and an
auto-generated identifier. Metadata travels with the document through
the pipeline, making it easy to trace which source a retrieved chunk
came from. Documents are S7 objects, so properties are accessed with
@.
doc_r <- document(
"R is a programming language for statistical computing and graphics.
It is widely used among statisticians and data scientists. R provides
a wide variety of statistical and graphical techniques, including
linear and nonlinear modelling, classical statistical tests, time-series
analysis, classification, and clustering.",
metadata = list(source = "intro", topic = "R")
)
doc_python <- document(
"Python is a high-level, general-purpose programming language.
Its design philosophy emphasizes code readability. Python supports
multiple programming paradigms, including structured, object-oriented,
and functional programming. It is often used for machine learning,
web development, and data analysis.",
metadata = list(source = "intro", topic = "Python")
)
doc_julia <- document(
"Julia is a high-level, high-performance programming language for
technical computing. It was designed for numerical analysis and
computational science. Julia features optional typing, multiple dispatch,
and good performance approaching that of statically-typed languages
like C and Fortran.",
metadata = list(source = "intro", topic = "Julia")
)
# S7 property access with @
doc_r@id
#> [1] "doc_20260310163658_c6be45ee"
doc_r@metadata
#> $source
#> [1] "intro"
#>
#> $topic
#> [1] "R"Step 2: Chunk text
Chunking splits long text into smaller pieces suitable for embedding and retrieval. Smaller, focused chunks improve search precision because the embedder can match a query against a specific passage rather than an entire document. The tradeoff is that very small chunks may lose surrounding context.
securecontext offers four strategies:
| Strategy | How it splits |
|---|---|
"sentence" |
On sentence boundaries (. + space) |
"paragraph" |
On double newlines |
"fixed" |
Fixed character width with overlap |
"recursive" |
Hierarchical separators (LangChain-style) |
Choosing a chunking strategy
The right strategy depends on your content and how users will query it:
Sentence chunking works best for narrative text (documentation, articles) where individual statements carry meaning. Queries like “What does dplyr do?” match well against single-sentence chunks.
Paragraph chunking preserves more context per chunk, which helps when meaning spans multiple sentences. It works well for structured documents with clear paragraph breaks.
Fixed-size chunking guarantees uniform chunk lengths, which is useful when downstream components expect consistent input sizes. The
overlapparameter ensures that information near chunk boundaries is not lost.Recursive chunking is the most robust general-purpose strategy. It tries larger separators first (double newlines, then single newlines, then spaces, then characters), producing natural-looking chunks that respect document structure. This is a good default when you are unsure.
# Sentence-level chunking
sentences <- chunk_text(doc_r@text, strategy = "sentence")
cat("Sentence chunks:", length(sentences), "\n")
#> Sentence chunks: 3
sentences
#> [1] "R is a programming language for statistical computing and graphics."
#> [2] "It is widely used among statisticians and data scientists."
#> [3] "R provides\na wide variety of statistical and graphical techniques, including\nlinear and nonlinear modelling, classical statistical tests, time-series\nanalysis, classification, and clustering."
# Recursive chunking with a small max_size to demonstrate splitting
small_chunks <- chunk_text(doc_r@text, strategy = "recursive", max_size = 120)
cat("\nRecursive chunks (max 120 chars):", length(small_chunks), "\n")
#>
#> Recursive chunks (max 120 chars): 5
small_chunks
#> [1] "R is a programming language for statistical computing and graphics."
#> [2] "It is widely used among statisticians and data scientists. R provides"
#> [3] "a wide variety of statistical and graphical techniques, including"
#> [4] "linear and nonlinear modelling, classical statistical tests, time-series"
#> [5] "analysis, classification, and clustering."Fixed-size chunking is useful when you need consistent chunk lengths,
for example when working with models that expect uniform input sizes.
The overlap parameter creates a sliding window, ensuring
that information near chunk boundaries appears in both adjacent
chunks:
long_text <- paste(
"The tidyverse is a collection of R packages designed for data science.",
"It includes ggplot2 for visualization, dplyr for data manipulation,",
"tidyr for data tidying, readr for reading data, purrr for functional",
"programming, tibble for modern data frames, stringr for string",
"manipulation, and forcats for factor handling.",
"All tidyverse packages share an underlying design philosophy,",
"grammar, and data structures."
)
fixed_chunks <- chunk_fixed(long_text, size = 100, overlap = 20)
cat("Fixed chunks (size=100, overlap=20):", length(fixed_chunks), "\n\n")
#> Fixed chunks (size=100, overlap=20): 5
for (i in seq_along(fixed_chunks)) {
cat(sprintf("Chunk %d (%d chars): %s\n\n", i, nchar(fixed_chunks[i]), fixed_chunks[i]))
}
#> Chunk 1 (100 chars): The tidyverse is a collection of R packages designed for data science. It includes ggplot2 for visua
#>
#> Chunk 2 (100 chars): es ggplot2 for visualization, dplyr for data manipulation, tidyr for data tidying, readr for reading
#>
#> Chunk 3 (100 chars): g, readr for reading data, purrr for functional programming, tibble for modern data frames, stringr
#>
#> Chunk 4 (100 chars): ata frames, stringr for string manipulation, and forcats for factor handling. All tidyverse packages
#>
#> Chunk 5 (89 chars): l tidyverse packages share an underlying design philosophy, grammar, and data structures.Step 3: Build a TF-IDF embedder
Embeddings are numerical representations of text that capture semantic similarity. Texts about similar topics produce vectors that are close together in the embedding space, enabling search by meaning rather than exact keywords.
embed_tfidf() builds a vocabulary from a corpus and
returns an embedder that can project new texts into that TF-IDF space.
TF-IDF (Term Frequency-Inverse Document Frequency) weighs words by how
important they are to a specific document relative to the corpus. Common
words like “the” get low weights; distinctive words like “regression”
get high weights.
Everything runs locally; no API keys required.
# Gather all document texts as the training corpus
corpus <- c(doc_r@text, doc_python@text, doc_julia@text)
embedder <- embed_tfidf(corpus)
# The embedder is an S7 object; dims equals the vocabulary size
cat("Embedding dimensions:", embedder@dims, "\n")
#> Embedding dimensions: 79
# Embed a new query
query_matrix <- embed_texts(embedder, "statistical analysis in R")
cat("Query embedding shape:", nrow(query_matrix), "x", ncol(query_matrix), "\n")
#> Query embedding shape: 1 x 79Step 4: Vector store
The vector_store is an R6 class providing in-memory
cosine-similarity search with optional RDS persistence. It stores
embedding vectors keyed by ID and retrieves the closest matches to a
query vector. Since it is R6, use $ for method access.
Cosine similarity measures the angle between two vectors, ignoring magnitude. A score of 1.0 means identical direction (maximum similarity); 0.0 means orthogonal (no similarity).
vs <- vector_store$new(dims = embedder@dims)
# Embed and store each document manually
ids <- c("r", "python", "julia")
embeddings <- embed_texts(embedder, corpus)
vs$add(ids, embeddings)
cat("Store size:", vs$size(), "vectors\n")
#> Store size: 3 vectors
# Search for the closest match to a query
query_emb <- embed_texts(embedder, "data science and statistics")
results <- vs$search(query_emb, k = 3)
print(results)
#> id score
#> 1 python 0.07428944
#> 2 r 0.06359400
#> 3 julia 0.00000000Persistence is straightforward with $save() and
$load(). This lets you build an embedding index once and
reuse it across R sessions without re-embedding your corpus:
tmp <- tempfile(fileext = ".rds")
vs$save(tmp)
vs2 <- vector_store$new(dims = embedder@dims)
vs2$load(tmp)
cat("Loaded store size:", vs2$size(), "\n")
#> Loaded store size: 3
# Clean up
unlink(tmp)Step 5: Retriever, the high-level interface
The previous steps (chunk, embed, store, search) are the building
blocks. The retriever() combines a vector store and an
embedder into a single object that handles the full ingest-and-search
workflow. Use add_documents() to chunk, embed, and store
documents in one call, then retrieve() to search.
This is the recommended interface for most applications. It reduces boilerplate and ensures that chunking and embedding stay consistent between ingest and query time.
# Fresh store for the retriever
vs_ret <- vector_store$new(dims = embedder@dims)
ret <- retriever(vs_ret, embedder)
# add_documents handles chunking + embedding internally
docs <- list(doc_r, doc_python, doc_julia)
add_documents(ret, docs, chunk_strategy = "sentence")
cat("Chunks in store:", vs_ret$size(), "\n\n")
#> Chunks in store: 10
# Retrieve the top 3 chunks for a query
hits <- retrieve(ret, "machine learning", k = 3)
print(hits)
#> id score
#> 1 doc_20260310163658_e35844e7_chunk_4 0.3992843
#> 2 doc_20260310163658_c6be45ee_chunk_1 0.0000000
#> 3 doc_20260310163658_c6be45ee_chunk_2 0.0000000The returned data frame contains chunk IDs and cosine similarity
scores. Higher scores indicate greater relevance. You can use these
scores directly as priorities in the context builder (see
vignette("context-building")).
Step 6: Knowledge store
The knowledge_store is an R6 class providing persistent
key-value storage backed by JSONL. While the vector store is optimized
for similarity search over embeddings, the knowledge store is designed
for structured data you look up by key: agent memory, user preferences,
session history, learned facts.
The JSONL format (one JSON object per line) makes the store append-friendly and human-readable. Data persists across R sessions automatically.
ks <- knowledge_store$new(path = tempfile(fileext = ".jsonl"))
# Store some facts
ks$set("lang.r", list(type = "statistical", year = 1993))
ks$set("lang.python", list(type = "general-purpose", year = 1991))
ks$set("lang.julia", list(type = "numerical", year = 2012))
ks$set("user.preference", "R")
cat("Total entries:", ks$size(), "\n")
#> Total entries: 4
# Retrieve a value
ks$get("lang.r")
#> $type
#> [1] "statistical"
#>
#> $year
#> [1] 1993
# Search keys by regex
ks$search("^lang")
#> [1] "lang.r" "lang.python" "lang.julia"
# List all keys
ks$list()
#> [1] "lang.r" "lang.python" "lang.julia" "user.preference"
# Clean up
unlink(ks$.__enclos_env__$private$.path)Step 7: Context builder
The context_builder() assembles a token-limited context
string from multiple sources, prioritizing the most important content.
This is the final step before sending context to an LLM: it keeps you
within the model’s context window while including the most relevant
information.
For more on priority strategies, overflow behavior, and multi-turn
patterns, see vignette("context-building").
cb <- context_builder(max_tokens = 100)
# Add content with different priorities (higher = included first)
cb <- cb_add(cb, "You are a helpful assistant.", priority = 10, label = "system")
cb <- cb_add(cb,
"R is great for statistics and data visualization.",
priority = 5, label = "retrieved_chunk_1"
)
cb <- cb_add(cb,
"Python is popular for machine learning and web development.",
priority = 4, label = "retrieved_chunk_2"
)
cb <- cb_add(cb,
"Julia offers high performance for numerical computing workloads.",
priority = 3, label = "retrieved_chunk_3"
)
result <- cb_build(cb)
cat("Assembled context:\n")
#> Assembled context:
cat(result$context, "\n\n")
#> You are a helpful assistant.
#>
#> R is great for statistics and data visualization.
#>
#> Python is popular for machine learning and web development.
#>
#> Julia offers high performance for numerical computing workloads.
cat("Included:", paste(result$included, collapse = ", "), "\n")
#> Included: system, retrieved_chunk_1, retrieved_chunk_2, retrieved_chunk_3
cat("Excluded:", paste(result$excluded, collapse = ", "), "\n")
#> Excluded:
cat("Total tokens:", result$total_tokens, "\n")
#> Total tokens: 41The builder processes items in priority order and stops adding when
the token budget is exhausted. Items that do not fit are reported in
$excluded, making it easy to log what was dropped.
Use cb_reset() to clear all items and reuse the same
builder with a new turn of conversation:
Full pipeline: retrieve and build context
context_for_chat() combines retrieval and context
building in a single call. Given a retriever and a query, it retrieves
the top-k chunks and packs them into a token-limited context string.
This is the typical integration point for an agent: retrieve relevant
information, assemble a context window, and pass it to an LLM
provider.
context_result <- context_for_chat(ret, "statistical computing", max_tokens = 2000, k = 5)
cat("Context for LLM:\n")
#> Context for LLM:
cat(context_result$context, "\n\n")
#> R is a programming language for statistical computing and graphics.
#>
#> R provides
#> a wide variety of statistical and graphical techniques, including
#> linear and nonlinear modelling, classical statistical tests, time-series
#> analysis, classification, and clustering.
#>
#> It is widely used among statisticians and data scientists.
#>
#> Python is a high-level, general-purpose programming language.
#>
#> Its design philosophy emphasizes code readability.
cat("Included chunks:", length(context_result$included), "\n")
#> Included chunks: 5
cat("Total tokens:", context_result$total_tokens, "\n")
#> Total tokens: 73Summary
The securecontext retrieval pipeline follows these steps:
-
document()– wrap text with metadata -
chunk_text()– split into retrieval units -
embed_tfidf()– build a local embedder from a corpus -
vector_store$new()– store and search embeddings -
retriever()+add_documents()– high-level ingest -
retrieve()– semantic search -
knowledge_store$new()– persistent key-value memory -
context_builder()+cb_add()+cb_build()– token-aware assembly -
context_for_chat()– one-call retrieve-and-build
For more on context building strategies, see
vignette("context-building"). To wire retrieval into
orchestr agent graphs, see
vignette("orchestr-integration").