Configuration¶
mcpbr uses YAML configuration files to define your MCP server settings and evaluation parameters.
Generating a Config File¶
Create a starter configuration:
This creates mcpbr.yaml with sensible defaults.
Configuration Reference¶
Full Example¶
# MCP Server Configuration
mcp_server:
name: "mcpbr" # Name for the MCP server (appears in tool names)
command: "npx"
args:
- "-y"
- "@modelcontextprotocol/server-filesystem"
- "{workdir}"
env: {}
# Provider and Harness
provider: "anthropic"
agent_harness: "claude-code"
# Custom Agent Prompt (optional)
agent_prompt: |
Fix the following bug in this repository:
{problem_statement}
Make the minimal changes necessary to fix the issue.
Focus on the root cause, not symptoms.
# Model Configuration (use alias or full name)
model: "sonnet" # or "claude-sonnet-4-5-20250929"
# Dataset Configuration
dataset: "SWE-bench/SWE-bench_Lite"
sample_size: 10 # null for full dataset
# Execution Parameters
timeout_seconds: 300
max_concurrent: 4
max_iterations: 10
# Docker Configuration
use_prebuilt_images: true
MCP Server Section¶
The mcp_server section defines how to start your MCP server:
| Field | Type | Description |
|---|---|---|
name | string | Name to register the MCP server as (default: mcpbr) |
command | string | Executable to run (e.g., npx, uvx, python) |
args | list | Command arguments. Use {workdir} as placeholder |
env | dict | Additional environment variables |
The {workdir} Placeholder¶
The {workdir} placeholder is replaced at runtime with the path to the task repository inside the Docker container (typically /workspace). This allows your MCP server to access the codebase.
Environment Variables¶
Reference environment variables using ${VAR_NAME} syntax:
mcp_server:
command: "npx"
args: ["-y", "@supermodeltools/mcp-server"]
env:
SUPERMODEL_API_KEY: "${SUPERMODEL_API_KEY}"
Provider and Harness¶
| Field | Values | Description |
|---|---|---|
provider | anthropic | LLM provider (currently only Anthropic is supported) |
agent_harness | claude-code | Agent backend (currently only Claude Code CLI is supported) |
Custom Agent Prompt¶
Customize the prompt sent to the agent:
agent_prompt: |
Fix the following bug in this repository:
{problem_statement}
Make the minimal changes necessary to fix the issue.
Focus on the root cause, not symptoms.
Use {problem_statement} as a placeholder for the SWE-bench issue text.
CLI Override
Override the prompt at runtime with --prompt:
Model Configuration¶
| Field | Default | Description |
|---|---|---|
model | sonnet | Model alias or full Anthropic model ID |
You can use either aliases (sonnet, opus, haiku) or full model names (claude-sonnet-4-5-20250929). Aliases automatically resolve to the latest model version.
See Installation for the full list of supported models.
Benchmark Configuration¶
| Field | Default | Description |
|---|---|---|
benchmark | swe-bench | Benchmark to run (swe-bench or cybergym) |
cybergym_level | 1 | CyberGym difficulty level (0-3, only used for CyberGym) |
Benchmark Selection
- SWE-bench: Bug fixing in Python repositories, evaluated with test suites
- CyberGym: Security exploit generation in C/C++ projects, evaluated by crash detection
See the Benchmarks guide for detailed information.
CLI Override
Override the benchmark at runtime:
Dataset Configuration¶
| Field | Default | Description |
|---|---|---|
dataset | null | HuggingFace dataset (optional, benchmark provides default) |
sample_size | null | Number of tasks (null = full dataset) |
The dataset field is optional. If not specified, each benchmark uses its default dataset:
- SWE-bench:
SWE-bench/SWE-bench_Lite - CyberGym:
sunblaze-ucb/cybergym
Execution Parameters¶
| Field | Default | Description |
|---|---|---|
timeout_seconds | 300 | Timeout per task in seconds |
max_concurrent | 4 | Maximum parallel task evaluations |
max_iterations | 10 | Maximum agent iterations (turns) per task |
Docker Configuration¶
| Field | Default | Description |
|---|---|---|
use_prebuilt_images | true | Use pre-built SWE-bench Docker images when available |
Example Configurations¶
Anthropic Filesystem Server¶
Basic file system access:
Custom Python MCP Server¶
mcp_server:
command: "python"
args: ["-m", "my_mcp_server", "--workspace", "{workdir}"]
env:
LOG_LEVEL: "debug"
Supermodel Codebase Analysis¶
mcp_server:
command: "npx"
args: ["-y", "@supermodeltools/mcp-server"]
env:
SUPERMODEL_API_KEY: "${SUPERMODEL_API_KEY}"
Fast Iteration (Development)¶
Small sample size with single concurrency for debugging:
mcp_server:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "{workdir}"]
model: "haiku" # Faster, cheaper
sample_size: 3
max_concurrent: 1
timeout_seconds: 180
max_iterations: 5
Full Benchmark Run¶
Comprehensive evaluation with maximum parallelism:
mcp_server:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "{workdir}"]
model: "sonnet"
sample_size: null # Full dataset
max_concurrent: 8
timeout_seconds: 600
max_iterations: 30
Configuration Validation¶
mcpbr validates your configuration on startup:
providermust be one of:anthropicagent_harnessmust be one of:claude-codemax_concurrentmust be at least 1timeout_secondsmust be at least 30
Invalid configurations will produce clear error messages.
Next Steps¶
- CLI Reference - Command options that override config values
- MCP Integration - Tips for testing your MCP server
- Evaluation Results - Understanding output formats