Skip to contents

[!CAUTION] Alpha software. This package is part of a broader effort by Ian Flores Siaca to develop proper AI infrastructure for the R ecosystem. It is under active development and should not be used in production until an official release is published. APIs may change without notice.

Memory, knowledge persistence, RAG retrieval, and context management for R LLM agents.

Why securecontext?

Most RAG solutions for LLM agents require sending your documents to external embedding APIs. securecontext takes a different approach: it builds local TF-IDF embeddings entirely in R, with no external API calls and no data leaving your machine. The package provides token-aware chunking that respects LLM context windows, splitting documents by sentence, paragraph, or recursively so chunks fit within token budgets. A built-in knowledge store with JSONL persistence lets agents retrieve relevant context across sessions without relying on third-party services. Everything runs locally, making it suitable for sensitive data, air-gapped environments, and workflows where data privacy matters.

Part of the secure-r-dev Ecosystem

securecontext is part of a 7-package ecosystem for building governed AI agents in R:

                    ┌─────────────┐
                    │   securer    │
                    └──────┬──────┘
          ┌────────────────┼───────────────────┐
          │                │                    │
   ┌──────▼──────┐  ┌─────▼──────┐  ┌──────────▼──────────┐
   │ securetools  │  │ secureguard│  │ >>> securecontext <<< │
   └──────┬───────┘  └─────┬──────┘  └──────────┬──────────┘
          └────────────────┼───────────────────┘
                    ┌──────▼───────┐
                    │   orchestr   │
                    └──────┬───────┘
          ┌────────────────┼─────────────────┐
          │                                  │
   ┌──────▼──────┐                    ┌──────▼──────┐
   │ securetrace  │                   │ securebench  │
   └─────────────┘                    └─────────────┘

securecontext provides the memory and retrieval layer for agents. It sits alongside securetools and secureguard in the middle tier, giving agents the ability to chunk documents, build TF-IDF embeddings locally, and retrieve relevant context for LLM prompts.

Package Role
securer Sandboxed R execution with tool-call IPC
securetools Pre-built security-hardened tool definitions
secureguard Input/code/output guardrails (injection, PII, secrets)
orchestr Graph-based agent orchestration
securecontext Document chunking, embeddings, RAG retrieval
securetrace Structured tracing, token/cost accounting, JSONL export
securebench Guardrail benchmarking with precision/recall/F1 metrics

Installation

# install.packages("pak")
pak::pak("ian-flores/securecontext")

Features

  • Document chunking – fixed-size, sentence, paragraph, and recursive strategies
  • TF-IDF embeddings – local embeddings with no external API required
  • Vector store – in-memory cosine similarity search with RDS persistence
  • Knowledge store – persistent JSONL key-value storage
  • Semantic retrieval – query documents by meaning
  • Context builder – token-aware priority-based context assembly
  • Integration helpers – works with orchestr and ellmer

Document Chunking

Split text into manageable pieces using one of four strategies. chunk_text() dispatches to the appropriate strategy function:

library(securecontext)

text <- "First paragraph with several sentences.\n\nSecond paragraph here.\n\nThird."

# Sentence-level splitting
chunk_text(text, strategy = "sentence")
#> [1] "First paragraph with several sentences." "Second paragraph here."
#> [3] "Third."

# Paragraph-level splitting
chunk_text(text, strategy = "paragraph")
#> [1] "First paragraph with several sentences." "Second paragraph here."
#> [3] "Third."

# Fixed-size chunks with overlap
chunk_fixed(paste(rep("word", 200), collapse = " "), size = 100, overlap = 10)

# Recursive splitting (tries paragraph -> newline -> sentence -> space)
chunk_recursive(text, max_size = 80)

Knowledge Store

A persistent JSONL key-value store for agent memory. Entries are keyed strings with optional metadata and timestamps:

# In-memory store
ks <- knowledge_store$new()

# Persistent store backed by a JSONL file
ks <- knowledge_store$new(path = "agent-memory.jsonl")

# Store and retrieve values
ks$set("user_preference", "dark mode", metadata = list(source = "onboarding"))
ks$get("user_preference")
#> [1] "dark mode"

# Search keys by regex
ks$search("user_")
#> [1] "user_preference"

# List all keys and check size
ks$list()
ks$size()

Context Builder

Assemble token-aware context for LLM prompts. Higher-priority items are included first; lower-priority items are dropped when the token budget is exceeded:

cb <- context_builder(max_tokens = 200)
cb <- cb_add(cb, "System instructions go here.", priority = 10, label = "system")
cb <- cb_add(cb, "Relevant retrieved passage.", priority = 5, label = "rag")
cb <- cb_add(cb, "Nice-to-have background info.", priority = 1, label = "background")

result <- cb_build(cb)
result$context       # assembled text, highest priority first
result$included      # labels of items that fit
result$excluded      # labels of items that were dropped
result$total_tokens  # token count of the assembled context

Quick start

library(securecontext)

# Create documents
docs <- list(
  document("R is great for statistics.", metadata = list(topic = "R")),
  document("Python excels at machine learning.", metadata = list(topic = "Python"))
)

# Build embeddings and index documents
emb <- embed_tfidf(vapply(docs, `[[`, character(1), "text"))
vs <- vector_store$new(dims = emb$dims)
ret <- retriever(vs, emb)
add_documents(ret, docs)

# Retrieve relevant context
result <- context_for_chat(ret, "statistical computing", max_tokens = 2000)
cat(result$context)

Documentation

securecontext ships with three vignettes covering common workflows:

Full reference documentation is available at the pkgdown site.

Contributing

Contributions are welcome! Please file issues on GitHub and submit pull requests.

License

MIT