Advanced Guardrail Patterns • secureguard

What This Vignette Covers

vignette("secureguard") introduces the defense layers and built-in guardrails. This vignette covers building custom guardrails for domain-specific threats, composing guardrails with pass/fail logic, assembling pipelines, and wiring everything into securer for sandboxed execution.

How a Guardrail Pipeline Works

The following diagram shows the flow of data through a secure_pipeline(), the recommended way to wire guardrails into an agent loop:

         User prompt
              |
              v
    +--------------------+
    |  check_input()     |    Input guardrails:
    |  - injection       |    prompt injection, topic scope, PII
    |  - topic scope     |
    |  - input PII       |
    +--------------------+
              |
         Pass? ----No----> Return failure (stage: input)
              |
             Yes
              |
              v
         LLM generates code
              |
              v
    +--------------------+
    |  check_code()      |    Code guardrails:
    |  - AST analysis    |    blocked functions, complexity,
    |  - complexity      |    dependencies, data flow
    |  - dependencies    |
    |  - data flow       |
    +--------------------+
              |
         Pass? ----No----> Return failure (stage: code)
              |
             Yes
              |
              v
      Execute in sandbox
        (securer)
              |
              v
    +--------------------+
    |  check_output()    |    Output guardrails:
    |  - PII             |    PII blocking, secret redaction,
    |  - secrets         |    size limits
    |  - size            |
    +--------------------+
              |
         Pass? ----No----> Return failure (stage: output)
              |
             Yes
              |
              v
      Return result to user
      (possibly redacted)

Each stage short-circuits on failure: if input guardrails reject the prompt, the LLM never sees it. If code guardrails reject the generated code, it never executes. This minimizes both risk and wasted computation.

Creating Custom Guardrails

The built-in guardrails cover common threats: prompt injection, dangerous function calls, PII leakage, and secret exposure. But every application has domain-specific risks that generic guardrails will not catch. An agent that generates SQL needs SQL injection detection. A healthcare application needs HIPAA-specific PII patterns. A financial tool needs checks for account numbers and routing numbers.

Every guardrail, built-in or custom, is an S3 object of class secureguard with four properties: name, type, check_fn, and description. The new_guardrail() constructor validates these and returns a guardrail you can use with run_guardrail(), compose_guardrails(), and secure_pipeline(). Custom guardrails work with built-in ones because they share the same interface.

A SQL Injection Detector

When an LLM generates SQL queries, it may produce syntactically valid but semantically dangerous output, especially if the user’s prompt contains adversarial patterns. A guardrail can catch these patterns before any query reaches a database.

library(secureguard)

guard_sql_injection <- function() {
  sql_patterns <- c(
    "(?i)\\b(?:UNION\\s+SELECT|DROP\\s+TABLE|DELETE\\s+FROM)\\b",
    "(?i)\\b(?:INSERT\\s+INTO|UPDATE\\s+.+\\s+SET)\\b.*?;\\s*--",
    "(?i)'\\s*(?:OR|AND)\\s+['\"]?\\d['\"]?\\s*=\\s*['\"]?\\d",
    "(?i)(?:--|#|/\\*).*(?:SELECT|DROP|INSERT|UPDATE|DELETE)"
  )

  check_fn <- function(x) {
    hits <- vapply(sql_patterns, function(pat) {
      grepl(pat, x, perl = TRUE)
    }, logical(1))

    if (any(hits)) {
      guardrail_result(
        pass = FALSE,
        reason = "Potential SQL injection detected",
        details = list(
          matched_patterns = which(hits)
        )
      )
    } else {
      guardrail_result(pass = TRUE)
    }
  }

  new_guardrail(
    name = "sql_injection",
    type = "input",
    check_fn = check_fn,
    description = "Detects common SQL injection patterns"
  )
}

Now use it like any built-in guardrail:

g <- guard_sql_injection()
g
#> <secureguard> sql_injection (input)
#> Detects common SQL injection patterns

# Safe query
run_guardrail(g, "SELECT name FROM users WHERE id = 42")
#> <guardrail_result> PASS

# Injection attempt
run_guardrail(g, "SELECT * FROM users WHERE id = 1; DROP TABLE users; --")
#> <guardrail_result> FAIL
#> Reason: Potential SQL injection detected

A Code Length Limiter

Custom guardrails of type "code" work exactly the same way. Here is one that limits the number of lines in LLM-generated code:

guard_code_length <- function(max_lines = 100L) {
  check_fn <- function(code) {
    n_lines <- length(strsplit(code, "\n", fixed = TRUE)[[1L]])
    if (n_lines > max_lines) {
      guardrail_result(
        pass = FALSE,
        reason = sprintf("Code has %d lines (max %d)", n_lines, max_lines),
        details = list(n_lines = n_lines, max_lines = max_lines)
      )
    } else {
      guardrail_result(pass = TRUE, details = list(n_lines = n_lines))
    }
  }

  new_guardrail(
    name = "code_length",
    type = "code",
    check_fn = check_fn,
    description = sprintf("Limits code to %d lines", max_lines)
  )
}

g_len <- guard_code_length(max_lines = 5)
run_guardrail(g_len, "x <- 1\ny <- 2\nz <- x + y")
#> <guardrail_result> PASS

long_code <- paste(sprintf("x%d <- %d", 1:10, 1:10), collapse = "\n")
run_guardrail(g_len, long_code)
#> <guardrail_result> FAIL
#> Reason: Code has 10 lines (max 5)

Anatomy of a check_fn

Every check_fn must:

Accept a single argument (the text or object to check).
Return a guardrail_result() with at minimum pass = TRUE or pass = FALSE.
Optionally include reason (why it failed), warnings (advisory notes), and details (a named list of metadata).

The @ operator accesses properties on the result:

result <- run_guardrail(guard_code_analysis(), "system('ls')")
result@pass
#> [1] FALSE
result@reason
#> [1] "Blocked function(s) detected: system"
result@details
#> $blocked_calls
#> [1] "system"

Composing Guardrails

In practice, you almost always want to run multiple guardrails together: checking for dangerous functions and excessive complexity, or detecting both prompt injection and off-topic prompts. secureguard provides two ways to combine them.

compose_guardrails(): Same-Type Composition

compose_guardrails() merges multiple guardrails of the same type into a single composite guardrail. The result is itself a guardrail, so you can pass it to run_guardrail(), nest it inside another composition, or use it in a pipeline. Bundle all your code checks into a single “strict code” guardrail to treat them as one unit.

# Compose three code guardrails -- ALL must pass (default)
strict_code <- compose_guardrails(
  guard_code_analysis(),
  guard_code_complexity(max_ast_depth = 10, max_calls = 50),
  guard_code_dependencies(allowed_packages = c("dplyr", "ggplot2"))
)

strict_code
#> <secureguard> composed(code_analysis + code_complexity + code_dependencies)
#> (code)
#> Composite guardrail (mode=all): code_analysis + code_complexity +
#> code_dependencies

# Clean code passes all three
run_guardrail(strict_code, "dplyr::filter(mtcars, cyl == 4)")
#> <guardrail_result> PASS

# system() fails code analysis
run_guardrail(strict_code, "system('whoami')")
#> <guardrail_result> FAIL
#> Reason: Blocked function(s) detected: system

# processx fails dependency check
run_guardrail(strict_code, "processx::run('ls')")
#> <guardrail_result> FAIL
#> Reason: Blocked function(s) detected: processx::run; Disallowed package(s):
#> processx

mode = “any”: At Least One Must Pass

The default mode = "all" is the right choice for security checks: all guards must pass. But sometimes you need the opposite logic: an allowlist where the input is acceptable if it matches any of several categories. With mode = "any", the composite passes if at least one child guardrail passes:

# Accept prompts about either statistics OR machine learning
topic_guard <- compose_guardrails(
  guard_topic_scope(allowed_topics = c("statistics", "regression", "t-test")),
  guard_topic_scope(allowed_topics = c("machine learning", "neural network")),
  mode = "any"
)

run_guardrail(topic_guard, "How do I run a t-test in R?")
#> <guardrail_result> PASS
run_guardrail(topic_guard, "Explain neural network backpropagation")
#> <guardrail_result> PASS
run_guardrail(topic_guard, "What is the weather today?")
#> <guardrail_result> FAIL
#> Reason: Input does not match any allowed topic.; Input does not match any
#> allowed topic.

check_all(): Run a List and Collect Results

Sometimes you need individual results from each guardrail rather than a single composite result. check_all() runs a list of guardrails and returns a summary:

guards <- list(
  guard_code_analysis(),
  guard_code_complexity(max_ast_depth = 10),
  guard_code_dataflow()
)

result <- check_all(guards, "x <- mean(1:10)")
result$pass
#> [1] TRUE
length(result$results)  # one per guardrail
#> [1] 3

# Inspect individual results
vapply(result$results, function(r) r@pass, logical(1))
#> [1] TRUE TRUE TRUE

When a check fails, check_all() collects all failure reasons:

result <- check_all(guards, "Sys.getenv('SECRET_KEY')")
result$pass
#> [1] FALSE
result$reasons
#> [1] "Data flow violation(s): Sys.getenv"

When to Use compose_guardrails() vs check_all()

Both functions combine multiple guardrails, but they serve different purposes and return different types:

Use compose_guardrails() when you want a single guardrail object that you can pass to run_guardrail(), nest inside another compose_guardrails(), or use in a secure_pipeline(). The composed guardrail behaves as one unit: you get a single pass/fail result. This is the right choice when you are building reusable guardrail configurations (e.g., a “strict code” composite) that you want to treat as a single check.

Use check_all() when you need diagnostic detail. It returns individual results for each guardrail in the list, so you can report exactly which checks failed and why. This is useful in logging, debugging, and user-facing error messages where “code guardrail failed” is less helpful than “blocked function system() detected by code_analysis; exceeded max AST depth of 10 per code_complexity.”

In practice, many applications use both: compose_guardrails() to build reusable guardrail groups, and check_all() at the top level to get per-group diagnostics.

Building Pipelines with secure_pipeline()

Individual guardrails and compositions are useful for targeted checks, but a production agent needs all three defense layers working together. A pipeline bundles guardrails for input, code, and output into one object with methods for each stage. You define your security policy once and apply it to every agent turn.

Defining a Pipeline

pipeline <- secure_pipeline(
  input_guardrails = list(
    guard_prompt_injection(sensitivity = "high"),
    guard_input_pii(),
    guard_topic_scope(allowed_topics = c("statistics", "data analysis", "R"))
  ),
  code_guardrails = list(
    guard_code_analysis(),
    guard_code_complexity(max_ast_depth = 15, max_calls = 100),
    guard_code_dependencies(allowed_packages = c("dplyr", "ggplot2", "tidyr")),
    guard_code_dataflow(block_network = TRUE, block_file_write = TRUE)
  ),
  output_guardrails = list(
    guard_output_pii(),
    guard_output_secrets(action = "redact"),
    guard_output_size(max_chars = 10000, max_lines = 200)
  )
)

Running Each Stage

# Stage 1: validate user input
input_result <- pipeline$check_input("Calculate the mean and sd of mtcars$mpg")
input_result$pass
#> [1] TRUE

# Stage 2: validate LLM-generated code
code_result <- pipeline$check_code("
  library(dplyr)
  mtcars %>%
    summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))
")
code_result$pass
#> [1] TRUE

# Stage 3: filter execution output
output_result <- pipeline$check_output("mean_mpg = 20.09, sd_mpg = 6.03")
output_result$pass
#> [1] TRUE
output_result$result  # possibly redacted text
#> [1] "mean_mpg = 20.09, sd_mpg = 6.03"

Pipeline in an Agent Loop

Call the three check_* methods in sequence inside your agent loop. Each stage short-circuits on failure: if check_input() rejects the prompt, you skip the LLM call entirely. If check_code() rejects the generated code, you skip execution. Here is the complete pattern:

process_turn <- function(pipeline, user_prompt, llm_fn, execute_fn) {
  # 1. Input guardrails

  input_check <- pipeline$check_input(user_prompt)
  if (!input_check$pass) {
    return(list(
      success = FALSE,
      stage = "input",
      reasons = input_check$reasons
    ))
  }

  # 2. LLM generates code
  code <- llm_fn(user_prompt)

  # 3. Code guardrails
  code_check <- pipeline$check_code(code)
  if (!code_check$pass) {
    return(list(
      success = FALSE,
      stage = "code",
      reasons = code_check$reasons
    ))
  }

  # 4. Execute in sandbox
  result <- execute_fn(code)

  # 5. Output guardrails
  output_check <- pipeline$check_output(result)
  if (!output_check$pass) {
    return(list(
      success = FALSE,
      stage = "output",
      reasons = output_check$reasons
    ))
  }

  list(success = TRUE, result = output_check$result)
}

Mixing Custom and Built-In Guardrails

Custom and built-in guardrails share the same interface. You can mix them in compose_guardrails(), check_all(), and secure_pipeline(). There is no registration step or plugin system; any secureguard object works everywhere:

# The SQL injection guard from earlier alongside built-in input guards
input_guards <- compose_guardrails(
  guard_prompt_injection(),
  guard_input_pii(),
  guard_sql_injection()
)

run_guardrail(input_guards, "Please help me write a SELECT query")
#> <guardrail_result> PASS
run_guardrail(input_guards, "' OR 1=1 --")
#> <guardrail_result> FAIL
#> Reason: Potential SQL injection detected

Similarly for code guardrails:

# Custom length guard composed with built-in code guards
code_guards <- compose_guardrails(
  guard_code_analysis(),
  guard_code_complexity(max_ast_depth = 10),
  guard_code_length(max_lines = 50)
)

run_guardrail(code_guards, "x <- mean(1:10)")
#> <guardrail_result> PASS

Integration with securer

secureguard analyzes code and outputs to decide whether they are safe. securer provides OS-level sandboxing that limits what the code can do, regardless of what it tries. secureguard catches known-dangerous patterns before execution; securer contains unknown threats at the operating system level.

securer is a suggested dependency; all of the patterns above work without it. The integration adds two things: pre-execution hooks and output guarding after execution.

Pre-Execute Hooks

as_pre_execute_hook() converts code guardrails into a function that securer calls before executing each code snippet. It returns TRUE to allow execution or FALSE to block it.

library(securer)
library(secureguard)

hook <- as_pre_execute_hook(
  guard_code_analysis(),
  guard_code_complexity(max_ast_depth = 15),
  guard_code_dataflow()
)

sess <- SecureSession$new(pre_execute_hook = hook)
sess$execute("mean(1:10)")        # allowed
sess$execute("system('whoami')")  # blocked by code_analysis
sess$execute("Sys.getenv('KEY')") # blocked by dataflow
sess$close()

Post-Execute Output Guarding

guard_output() runs output guardrails on execution results. Guardrails with action = "redact" transform the output rather than blocking it:

result <- sess$execute("paste('My API key is', 'AKIAIOSFODNN7EXAMPLE')")

checked <- guard_output(
  result,
  guard_output_pii(),
  guard_output_secrets(action = "redact")
)

if (checked$pass) {
  # Return the (possibly redacted) result to the user
  checked$result
} else {
  paste("Blocked:", paste(checked$reasons, collapse = "; "))
}

Pipeline Hook

A pipeline can produce a pre-execute hook from its code guardrails:

pipeline <- secure_pipeline(
  input_guardrails = list(guard_prompt_injection()),
  code_guardrails = list(
    guard_code_analysis(),
    guard_code_dataflow()
  ),
  output_guardrails = list(
    guard_output_secrets(action = "redact")
  )
)

sess <- SecureSession$new(
  pre_execute_hook = pipeline$as_pre_execute_hook()
)

# The session now has code guardrails enforced automatically.
# Input and output guardrails are checked manually:
input_check <- pipeline$check_input(user_prompt)
# ... LLM generates code, session executes it ...
output_check <- pipeline$check_output(execution_result)

sess$close()

Advanced Composition Patterns

The patterns above apply the same guardrails to every request. In practice, you often need to vary strictness based on context: who the user is, where the request came from, and what level of trust is appropriate.

Layered Sensitivity

A public-facing chatbot is exposed to adversarial users and needs high-sensitivity injection detection and tight topic scoping. An internal analytics tool used by trusted data scientists can use lower sensitivity to avoid false positives on legitimate analytical prompts:

# Public-facing: high sensitivity, strict topic scoping
public_guards <- compose_guardrails(
  guard_prompt_injection(sensitivity = "high"),
  guard_input_pii(),
  guard_topic_scope(allowed_topics = c("data analysis", "statistics"))
)

# Internal tool: lower sensitivity, broader topics
internal_guards <- compose_guardrails(
  guard_prompt_injection(sensitivity = "low"),
  guard_input_pii()
)

run_guardrail(
  public_guards,
  "Continue from where we left off with the regression"
)
#> <guardrail_result> FAIL
#> Reason: Prompt injection detected: continuation_attack; Input does not match
#> any allowed topic.

run_guardrail(
  internal_guards,
  "Continue from where we left off with the regression"
)
#> <guardrail_result> PASS

Graduated Code Restrictions

You can do the same with code guardrails. A trusted internal user running vetted analysis scripts needs fewer restrictions than an untrusted external user whose prompts generate arbitrary code:

# Trusted context: only block the most dangerous operations
trusted_code <- compose_guardrails(
  guard_code_analysis(blocked_functions = c("system", "system2", "shell")),
  guard_code_dataflow(
    block_env_access = TRUE,
    block_network = FALSE,
    block_file_write = FALSE
  )
)

# Untrusted context: strict lockdown
untrusted_code <- compose_guardrails(
  guard_code_analysis(),
  guard_code_complexity(max_ast_depth = 10, max_calls = 30),
  guard_code_dependencies(allowed_packages = c("dplyr", "ggplot2")),
  guard_code_dataflow(
    block_env_access = TRUE,
    block_network = TRUE,
    block_file_write = TRUE,
    block_file_read = TRUE
  )
)

# The same code may pass in trusted but fail in untrusted
code <- "readLines('data.csv')"
run_guardrail(trusted_code, code)
#> <guardrail_result> FAIL
#> Reason: Data flow violation(s): readLines
run_guardrail(untrusted_code, code)
#> <guardrail_result> FAIL
#> Reason: Data flow violation(s): readLines

Redact vs Block Decision

PII like social security numbers or patient records should block the entire response; partial disclosure is still a privacy violation. API keys and tokens can often be redacted in place, keeping the useful parts of the response while replacing the sensitive value. Output guardrails support three actions ("block", "redact", "warn") for this:

# PII blocks the output entirely
# Secrets get redacted so the response is still useful
pipeline <- secure_pipeline(
  output_guardrails = list(
    guard_output_pii(),                         # blocks on PII
    guard_output_secrets(action = "redact"),     # redacts secrets
    guard_output_size(max_chars = 5000)          # blocks oversized output
  )
)

# Secrets are redacted, not blocked
result <- pipeline$check_output("API key: AKIAIOSFODNN7EXAMPLE, data looks good")
result$pass
#> [1] TRUE
result$result
#> [1] "API key: [REDACTED_AWS_KEY], data looks good"

# PII causes a block
result <- pipeline$check_output("Patient SSN: 123-45-6789")
result$pass
#> [1] FALSE
result$reasons
#> [1] "PII detected in output: ssn"

Summary

Pattern	Function	Use Case
Custom guardrail	`new_guardrail()`	Domain-specific checks
Same-type composition	`compose_guardrails()`	Merge guards into one reusable unit
Batch check	`check_all()`	Individual results per guard (diagnostics)
Full pipeline	`secure_pipeline()`	Three-layer defense for production
Pre-execute hook	`as_pre_execute_hook()`	securer integration
Output guard	`guard_output()`	Post-execution filtering

Build small guards that each target one threat. Combine them with compose_guardrails() or check_all(), and wire them into pipelines that check every stage of an agent workflow. When the built-in guardrails do not cover your domain, write your own with new_guardrail(). Custom guards and built-in guards have the same interface and compose the same way.