What This Vignette Covers
vignette("secureguard") introduces the defense layers
and built-in guardrails. This vignette covers building custom guardrails
for domain-specific threats, composing guardrails with pass/fail logic,
assembling pipelines, and wiring everything into securer for sandboxed
execution.
How a Guardrail Pipeline Works
The following diagram shows the flow of data through a
secure_pipeline(), the recommended way to wire guardrails
into an agent loop:
User prompt
|
v
+--------------------+
| check_input() | Input guardrails:
| - injection | prompt injection, topic scope, PII
| - topic scope |
| - input PII |
+--------------------+
|
Pass? ----No----> Return failure (stage: input)
|
Yes
|
v
LLM generates code
|
v
+--------------------+
| check_code() | Code guardrails:
| - AST analysis | blocked functions, complexity,
| - complexity | dependencies, data flow
| - dependencies |
| - data flow |
+--------------------+
|
Pass? ----No----> Return failure (stage: code)
|
Yes
|
v
Execute in sandbox
(securer)
|
v
+--------------------+
| check_output() | Output guardrails:
| - PII | PII blocking, secret redaction,
| - secrets | size limits
| - size |
+--------------------+
|
Pass? ----No----> Return failure (stage: output)
|
Yes
|
v
Return result to user
(possibly redacted)
Each stage short-circuits on failure: if input guardrails reject the prompt, the LLM never sees it. If code guardrails reject the generated code, it never executes. This minimizes both risk and wasted computation.
Creating Custom Guardrails
The built-in guardrails cover common threats: prompt injection, dangerous function calls, PII leakage, and secret exposure. But every application has domain-specific risks that generic guardrails will not catch. An agent that generates SQL needs SQL injection detection. A healthcare application needs HIPAA-specific PII patterns. A financial tool needs checks for account numbers and routing numbers.
Every guardrail, built-in or custom, is an S3 object of class
secureguard with four properties: name,
type, check_fn, and description.
The new_guardrail() constructor validates these and returns
a guardrail you can use with run_guardrail(),
compose_guardrails(), and secure_pipeline().
Custom guardrails work with built-in ones because they share the same
interface.
A SQL Injection Detector
When an LLM generates SQL queries, it may produce syntactically valid but semantically dangerous output, especially if the user’s prompt contains adversarial patterns. A guardrail can catch these patterns before any query reaches a database.
library(secureguard)
guard_sql_injection <- function() {
sql_patterns <- c(
"(?i)\\b(?:UNION\\s+SELECT|DROP\\s+TABLE|DELETE\\s+FROM)\\b",
"(?i)\\b(?:INSERT\\s+INTO|UPDATE\\s+.+\\s+SET)\\b.*?;\\s*--",
"(?i)'\\s*(?:OR|AND)\\s+['\"]?\\d['\"]?\\s*=\\s*['\"]?\\d",
"(?i)(?:--|#|/\\*).*(?:SELECT|DROP|INSERT|UPDATE|DELETE)"
)
check_fn <- function(x) {
hits <- vapply(sql_patterns, function(pat) {
grepl(pat, x, perl = TRUE)
}, logical(1))
if (any(hits)) {
guardrail_result(
pass = FALSE,
reason = "Potential SQL injection detected",
details = list(
matched_patterns = which(hits)
)
)
} else {
guardrail_result(pass = TRUE)
}
}
new_guardrail(
name = "sql_injection",
type = "input",
check_fn = check_fn,
description = "Detects common SQL injection patterns"
)
}Now use it like any built-in guardrail:
g <- guard_sql_injection()
g
#> <secureguard> sql_injection (input)
#> Detects common SQL injection patterns
# Safe query
run_guardrail(g, "SELECT name FROM users WHERE id = 42")
#> <guardrail_result> PASS
# Injection attempt
run_guardrail(g, "SELECT * FROM users WHERE id = 1; DROP TABLE users; --")
#> <guardrail_result> FAIL
#> Reason: Potential SQL injection detectedA Code Length Limiter
Custom guardrails of type "code" work exactly the same
way. Here is one that limits the number of lines in LLM-generated
code:
guard_code_length <- function(max_lines = 100L) {
check_fn <- function(code) {
n_lines <- length(strsplit(code, "\n", fixed = TRUE)[[1L]])
if (n_lines > max_lines) {
guardrail_result(
pass = FALSE,
reason = sprintf("Code has %d lines (max %d)", n_lines, max_lines),
details = list(n_lines = n_lines, max_lines = max_lines)
)
} else {
guardrail_result(pass = TRUE, details = list(n_lines = n_lines))
}
}
new_guardrail(
name = "code_length",
type = "code",
check_fn = check_fn,
description = sprintf("Limits code to %d lines", max_lines)
)
}
g_len <- guard_code_length(max_lines = 5)
run_guardrail(g_len, "x <- 1\ny <- 2\nz <- x + y")
#> <guardrail_result> PASS
long_code <- paste(sprintf("x%d <- %d", 1:10, 1:10), collapse = "\n")
run_guardrail(g_len, long_code)
#> <guardrail_result> FAIL
#> Reason: Code has 10 lines (max 5)Anatomy of a check_fn
Every check_fn must:
- Accept a single argument (the text or object to check).
- Return a
guardrail_result()with at minimumpass = TRUEorpass = FALSE. - Optionally include
reason(why it failed),warnings(advisory notes), anddetails(a named list of metadata).
The @ operator accesses properties on the result:
result <- run_guardrail(guard_code_analysis(), "system('ls')")
result@pass
#> [1] FALSE
result@reason
#> [1] "Blocked function(s) detected: system"
result@details
#> $blocked_calls
#> [1] "system"Composing Guardrails
In practice, you almost always want to run multiple guardrails together: checking for dangerous functions and excessive complexity, or detecting both prompt injection and off-topic prompts. secureguard provides two ways to combine them.
compose_guardrails(): Same-Type Composition
compose_guardrails() merges multiple guardrails of the
same type into a single composite guardrail. The result
is itself a guardrail, so you can pass it to
run_guardrail(), nest it inside another composition, or use
it in a pipeline. Bundle all your code checks into a single “strict
code” guardrail to treat them as one unit.
# Compose three code guardrails -- ALL must pass (default)
strict_code <- compose_guardrails(
guard_code_analysis(),
guard_code_complexity(max_ast_depth = 10, max_calls = 50),
guard_code_dependencies(allowed_packages = c("dplyr", "ggplot2"))
)
strict_code
#> <secureguard> composed(code_analysis + code_complexity + code_dependencies)
#> (code)
#> Composite guardrail (mode=all): code_analysis + code_complexity +
#> code_dependencies
# Clean code passes all three
run_guardrail(strict_code, "dplyr::filter(mtcars, cyl == 4)")
#> <guardrail_result> PASS
# system() fails code analysis
run_guardrail(strict_code, "system('whoami')")
#> <guardrail_result> FAIL
#> Reason: Blocked function(s) detected: system
# processx fails dependency check
run_guardrail(strict_code, "processx::run('ls')")
#> <guardrail_result> FAIL
#> Reason: Blocked function(s) detected: processx::run; Disallowed package(s):
#> processxmode = “any”: At Least One Must Pass
The default mode = "all" is the right choice for
security checks: all guards must pass. But sometimes you need the
opposite logic: an allowlist where the input is acceptable if it matches
any of several categories. With
mode = "any", the composite passes if at least one child
guardrail passes:
# Accept prompts about either statistics OR machine learning
topic_guard <- compose_guardrails(
guard_topic_scope(allowed_topics = c("statistics", "regression", "t-test")),
guard_topic_scope(allowed_topics = c("machine learning", "neural network")),
mode = "any"
)
run_guardrail(topic_guard, "How do I run a t-test in R?")
#> <guardrail_result> PASS
run_guardrail(topic_guard, "Explain neural network backpropagation")
#> <guardrail_result> PASS
run_guardrail(topic_guard, "What is the weather today?")
#> <guardrail_result> FAIL
#> Reason: Input does not match any allowed topic.; Input does not match any
#> allowed topic.check_all(): Run a List and Collect Results
Sometimes you need individual results from each guardrail rather than
a single composite result. check_all() runs a list of
guardrails and returns a summary:
guards <- list(
guard_code_analysis(),
guard_code_complexity(max_ast_depth = 10),
guard_code_dataflow()
)
result <- check_all(guards, "x <- mean(1:10)")
result$pass
#> [1] TRUE
length(result$results) # one per guardrail
#> [1] 3
# Inspect individual results
vapply(result$results, function(r) r@pass, logical(1))
#> [1] TRUE TRUE TRUEWhen a check fails, check_all() collects all failure
reasons:
result <- check_all(guards, "Sys.getenv('SECRET_KEY')")
result$pass
#> [1] FALSE
result$reasons
#> [1] "Data flow violation(s): Sys.getenv"When to Use compose_guardrails() vs check_all()
Both functions combine multiple guardrails, but they serve different purposes and return different types:
Use compose_guardrails() when you want
a single guardrail object that you can pass to
run_guardrail(), nest inside another
compose_guardrails(), or use in a
secure_pipeline(). The composed guardrail behaves as one
unit: you get a single pass/fail result. This is the right choice when
you are building reusable guardrail configurations (e.g., a “strict
code” composite) that you want to treat as a single check.
Use check_all() when you need
diagnostic detail. It returns individual results for each guardrail in
the list, so you can report exactly which checks failed and why. This is
useful in logging, debugging, and user-facing error messages where “code
guardrail failed” is less helpful than “blocked function
system() detected by code_analysis; exceeded max AST depth
of 10 per code_complexity.”
In practice, many applications use both:
compose_guardrails() to build reusable guardrail groups,
and check_all() at the top level to get per-group
diagnostics.
Building Pipelines with secure_pipeline()
Individual guardrails and compositions are useful for targeted checks, but a production agent needs all three defense layers working together. A pipeline bundles guardrails for input, code, and output into one object with methods for each stage. You define your security policy once and apply it to every agent turn.
Defining a Pipeline
pipeline <- secure_pipeline(
input_guardrails = list(
guard_prompt_injection(sensitivity = "high"),
guard_input_pii(),
guard_topic_scope(allowed_topics = c("statistics", "data analysis", "R"))
),
code_guardrails = list(
guard_code_analysis(),
guard_code_complexity(max_ast_depth = 15, max_calls = 100),
guard_code_dependencies(allowed_packages = c("dplyr", "ggplot2", "tidyr")),
guard_code_dataflow(block_network = TRUE, block_file_write = TRUE)
),
output_guardrails = list(
guard_output_pii(),
guard_output_secrets(action = "redact"),
guard_output_size(max_chars = 10000, max_lines = 200)
)
)Running Each Stage
# Stage 1: validate user input
input_result <- pipeline$check_input("Calculate the mean and sd of mtcars$mpg")
input_result$pass
#> [1] TRUE
# Stage 2: validate LLM-generated code
code_result <- pipeline$check_code("
library(dplyr)
mtcars %>%
summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))
")
code_result$pass
#> [1] TRUE
# Stage 3: filter execution output
output_result <- pipeline$check_output("mean_mpg = 20.09, sd_mpg = 6.03")
output_result$pass
#> [1] TRUE
output_result$result # possibly redacted text
#> [1] "mean_mpg = 20.09, sd_mpg = 6.03"Pipeline in an Agent Loop
Call the three check_* methods in sequence inside your
agent loop. Each stage short-circuits on failure: if
check_input() rejects the prompt, you skip the LLM call
entirely. If check_code() rejects the generated code, you
skip execution. Here is the complete pattern:
process_turn <- function(pipeline, user_prompt, llm_fn, execute_fn) {
# 1. Input guardrails
input_check <- pipeline$check_input(user_prompt)
if (!input_check$pass) {
return(list(
success = FALSE,
stage = "input",
reasons = input_check$reasons
))
}
# 2. LLM generates code
code <- llm_fn(user_prompt)
# 3. Code guardrails
code_check <- pipeline$check_code(code)
if (!code_check$pass) {
return(list(
success = FALSE,
stage = "code",
reasons = code_check$reasons
))
}
# 4. Execute in sandbox
result <- execute_fn(code)
# 5. Output guardrails
output_check <- pipeline$check_output(result)
if (!output_check$pass) {
return(list(
success = FALSE,
stage = "output",
reasons = output_check$reasons
))
}
list(success = TRUE, result = output_check$result)
}Mixing Custom and Built-In Guardrails
Custom and built-in guardrails share the same interface. You can mix
them in compose_guardrails(), check_all(), and
secure_pipeline(). There is no registration step or plugin
system; any secureguard object works everywhere:
# The SQL injection guard from earlier alongside built-in input guards
input_guards <- compose_guardrails(
guard_prompt_injection(),
guard_input_pii(),
guard_sql_injection()
)
run_guardrail(input_guards, "Please help me write a SELECT query")
#> <guardrail_result> PASS
run_guardrail(input_guards, "' OR 1=1 --")
#> <guardrail_result> FAIL
#> Reason: Potential SQL injection detectedSimilarly for code guardrails:
# Custom length guard composed with built-in code guards
code_guards <- compose_guardrails(
guard_code_analysis(),
guard_code_complexity(max_ast_depth = 10),
guard_code_length(max_lines = 50)
)
run_guardrail(code_guards, "x <- mean(1:10)")
#> <guardrail_result> PASSIntegration with securer
secureguard analyzes code and outputs to decide whether they are safe. securer provides OS-level sandboxing that limits what the code can do, regardless of what it tries. secureguard catches known-dangerous patterns before execution; securer contains unknown threats at the operating system level.
securer is a suggested dependency; all of the patterns above work without it. The integration adds two things: pre-execution hooks and output guarding after execution.
Pre-Execute Hooks
as_pre_execute_hook() converts code guardrails into a
function that securer calls before executing each code snippet. It
returns TRUE to allow execution or FALSE to
block it.
library(securer)
library(secureguard)
hook <- as_pre_execute_hook(
guard_code_analysis(),
guard_code_complexity(max_ast_depth = 15),
guard_code_dataflow()
)
sess <- SecureSession$new(pre_execute_hook = hook)
sess$execute("mean(1:10)") # allowed
sess$execute("system('whoami')") # blocked by code_analysis
sess$execute("Sys.getenv('KEY')") # blocked by dataflow
sess$close()Post-Execute Output Guarding
guard_output() runs output guardrails on execution
results. Guardrails with action = "redact" transform the
output rather than blocking it:
result <- sess$execute("paste('My API key is', 'AKIAIOSFODNN7EXAMPLE')")
checked <- guard_output(
result,
guard_output_pii(),
guard_output_secrets(action = "redact")
)
if (checked$pass) {
# Return the (possibly redacted) result to the user
checked$result
} else {
paste("Blocked:", paste(checked$reasons, collapse = "; "))
}Pipeline Hook
A pipeline can produce a pre-execute hook from its code guardrails:
pipeline <- secure_pipeline(
input_guardrails = list(guard_prompt_injection()),
code_guardrails = list(
guard_code_analysis(),
guard_code_dataflow()
),
output_guardrails = list(
guard_output_secrets(action = "redact")
)
)
sess <- SecureSession$new(
pre_execute_hook = pipeline$as_pre_execute_hook()
)
# The session now has code guardrails enforced automatically.
# Input and output guardrails are checked manually:
input_check <- pipeline$check_input(user_prompt)
# ... LLM generates code, session executes it ...
output_check <- pipeline$check_output(execution_result)
sess$close()Advanced Composition Patterns
The patterns above apply the same guardrails to every request. In practice, you often need to vary strictness based on context: who the user is, where the request came from, and what level of trust is appropriate.
Layered Sensitivity
A public-facing chatbot is exposed to adversarial users and needs high-sensitivity injection detection and tight topic scoping. An internal analytics tool used by trusted data scientists can use lower sensitivity to avoid false positives on legitimate analytical prompts:
# Public-facing: high sensitivity, strict topic scoping
public_guards <- compose_guardrails(
guard_prompt_injection(sensitivity = "high"),
guard_input_pii(),
guard_topic_scope(allowed_topics = c("data analysis", "statistics"))
)
# Internal tool: lower sensitivity, broader topics
internal_guards <- compose_guardrails(
guard_prompt_injection(sensitivity = "low"),
guard_input_pii()
)
run_guardrail(
public_guards,
"Continue from where we left off with the regression"
)
#> <guardrail_result> FAIL
#> Reason: Prompt injection detected: continuation_attack; Input does not match
#> any allowed topic.
run_guardrail(
internal_guards,
"Continue from where we left off with the regression"
)
#> <guardrail_result> PASSGraduated Code Restrictions
You can do the same with code guardrails. A trusted internal user running vetted analysis scripts needs fewer restrictions than an untrusted external user whose prompts generate arbitrary code:
# Trusted context: only block the most dangerous operations
trusted_code <- compose_guardrails(
guard_code_analysis(blocked_functions = c("system", "system2", "shell")),
guard_code_dataflow(
block_env_access = TRUE,
block_network = FALSE,
block_file_write = FALSE
)
)
# Untrusted context: strict lockdown
untrusted_code <- compose_guardrails(
guard_code_analysis(),
guard_code_complexity(max_ast_depth = 10, max_calls = 30),
guard_code_dependencies(allowed_packages = c("dplyr", "ggplot2")),
guard_code_dataflow(
block_env_access = TRUE,
block_network = TRUE,
block_file_write = TRUE,
block_file_read = TRUE
)
)
# The same code may pass in trusted but fail in untrusted
code <- "readLines('data.csv')"
run_guardrail(trusted_code, code)
#> <guardrail_result> PASS
run_guardrail(untrusted_code, code)
#> <guardrail_result> FAIL
#> Reason: Data flow violation(s): readLinesRedact vs Block Decision
PII like social security numbers or patient records should block the
entire response; partial disclosure is still a privacy violation. API
keys and tokens can often be redacted in place, keeping the useful parts
of the response while replacing the sensitive value. Output guardrails
support three actions ("block", "redact",
"warn") for this:
# PII blocks the output entirely
# Secrets get redacted so the response is still useful
pipeline <- secure_pipeline(
output_guardrails = list(
guard_output_pii(), # blocks on PII
guard_output_secrets(action = "redact"), # redacts secrets
guard_output_size(max_chars = 5000) # blocks oversized output
)
)
# Secrets are redacted, not blocked
result <- pipeline$check_output("API key: AKIAIOSFODNN7EXAMPLE, data looks good")
result$pass
#> [1] TRUE
result$result
#> [1] "API key: [REDACTED_AWS_KEY], data looks good"
# PII causes a block
result <- pipeline$check_output("Patient SSN: 123-45-6789")
result$pass
#> [1] FALSE
result$reasons
#> [1] "PII detected in output: ssn"Summary
| Pattern | Function | Use Case |
|---|---|---|
| Custom guardrail | new_guardrail() |
Domain-specific checks |
| Same-type composition | compose_guardrails() |
Merge guards into one reusable unit |
| Batch check | check_all() |
Individual results per guard (diagnostics) |
| Full pipeline | secure_pipeline() |
Three-layer defense for production |
| Pre-execute hook | as_pre_execute_hook() |
securer integration |
| Output guard | guard_output() |
Post-execution filtering |
Build small guards that each target one threat. Combine them with
compose_guardrails() or check_all(), and wire
them into pipelines that check every stage of an agent workflow. When
the built-in guardrails do not cover your domain, write your own with
new_guardrail(). Custom guards and built-in guards have the
same interface and compose the same way.