[!CAUTION] Alpha software. This package is part of a broader effort by Ian Flores Siaca to develop proper AI infrastructure for the R ecosystem. It is under active development and should not be used in production until an official release is published. APIs may change without notice.
Benchmarking framework for guardrail accuracy in R LLM agent workflows. Evaluate guardrails against labeled datasets, compute precision/recall/F1 metrics, generate confusion matrices, compare results across iterations, and export as vitals-compatible scorers.
Why securebench?
When you build guardrails, you need to know if they actually work. securebench gives you precision, recall, and F1 metrics for any guardrail – so you can measure how well your prompt injection detector catches attacks without blocking legitimate queries, and compare different guardrail configurations side by side.
Features
| Function | Description |
|---|---|
guardrail_eval() |
Evaluate a guardrail against a labeled data frame |
guardrail_metrics() |
Compute precision, recall, F1, and accuracy |
guardrail_confusion() |
Generate a 2x2 confusion matrix |
guardrail_compare() |
Compare two guardrails with delta metrics and per-case diffs |
guardrail_report() |
Print a formatted report or return results as a data frame |
benchmark_guardrail() |
Quick-start: benchmark from positive/negative case vectors |
benchmark_pipeline() |
Evaluate a full secureguard pipeline end-to-end |
as_vitals_scorer() |
Convert any guardrail to a vitals-compatible scorer function |
Part of the secure-r-dev Ecosystem
securebench is part of a 7-package ecosystem for building governed AI agents in R:
┌─────────────┐
│ securer │
└──────┬──────┘
┌────────────────┼─────────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼──────┐ ┌───────▼────────┐
│ securetools │ │ secureguard│ │ securecontext │
└──────┬───────┘ └─────┬──────┘ └───────┬────────┘
└────────────────┼─────────────────┘
┌──────▼───────┐
│ orchestr │
└──────┬───────┘
┌────────────────┼─────────────────┐
│ │
┌──────▼──────┐ ┌───────▼────────┐
│ securetrace │ │>>> securebench<<<│
└─────────────┘ └────────────────┘
securebench sits at the bottom of the stack alongside securetrace. It benchmarks guardrail accuracy by evaluating secureguard guardrails (or any boolean classifier) against labeled datasets, producing precision/recall/F1 metrics and confusion matrices.
| Package | Role |
|---|---|
| securer | Sandboxed R execution with tool-call IPC |
| securetools | Pre-built security-hardened tool definitions |
| secureguard | Input/code/output guardrails (injection, PII, secrets) |
| orchestr | Graph-based agent orchestration |
| securecontext | Document chunking, embeddings, RAG retrieval |
| securetrace | Structured tracing, token/cost accounting, JSONL export |
| securebench | Guardrail benchmarking with precision/recall/F1 metrics |
Installation
# install.packages("pak")
pak::pak("ian-flores/securebench")Quick Start
library(securebench)
# Benchmark a guardrail with known positive/negative cases
my_guardrail <- function(text) !grepl("DROP TABLE", text, fixed = TRUE)
metrics <- benchmark_guardrail(
my_guardrail,
positive_cases = c("DROP TABLE users", "SELECT 1; DROP TABLE x"),
negative_cases = c("SELECT * FROM users", "Hello world")
)
metrics$precision
metrics$recall
metrics$f1Data Frame API
data <- data.frame(
input = c("normal text", "DROP TABLE users"),
expected = c(TRUE, FALSE),
label = c("benign", "injection")
)
result <- guardrail_eval(my_guardrail, data)
m <- guardrail_metrics(result)
cm <- guardrail_confusion(result)
guardrail_report(result)Vitals Interop
scorer <- as_vitals_scorer(my_guardrail)
scorer("safe query", TRUE) # 1 (correct)
scorer("DROP TABLE x", FALSE) # 1 (correct)Comparing Guardrails
# Define test data
data <- data.frame(
input = c("hello", "how are you?", "DROP TABLE users", "'; DELETE FROM accounts"),
expected = c(TRUE, TRUE, FALSE, FALSE),
label = c("benign", "benign", "injection", "injection")
)
# Two guardrail versions to compare
guard_v1 <- function(text) !grepl("DROP", text, fixed = TRUE)
guard_v2 <- function(text) !grepl("DROP|DELETE", text)
# Evaluate both against the same dataset
result_v1 <- guardrail_eval(guard_v1, data)
result_v2 <- guardrail_eval(guard_v2, data)
# Compare: see which improved, which regressed
diff <- guardrail_compare(result_v1, result_v2)
diff$delta_f1 # positive = v2 is better
diff$improved # cases v2 got right that v1 missed
diff$regressed # cases v2 got wrong that v1 had rightDocumentation
securebench ships with two vignettes:
- Getting Started with securebench – walkthrough of the core evaluation workflow
- Guardrail Testing Patterns – strategies for building labeled datasets and iterating on guardrail accuracy
Browse the full documentation at https://ian-flores.github.io/securebench/.
Contributing
Contributions are welcome! Please file issues on GitHub and submit pull requests.