Skip to content

CLI Reference

mcpbr provides a command-line interface for running evaluations and managing configurations.

Global Help

mcpbr --help
mcpbr run --help
mcpbr init --help

Commands Overview

Command Description
mcpbr run Run benchmark evaluation with configured MCP server
mcpbr init Generate an example configuration file
mcpbr models List supported models for evaluation
mcpbr providers List available model providers
mcpbr harnesses List available agent harnesses
mcpbr benchmarks List available benchmarks (SWE-bench, CyberGym)
mcpbr cleanup Remove orphaned mcpbr Docker containers

mcpbr run

Run SWE-bench evaluation with the configured MCP server.

Usage

mcpbr run -c CONFIG [OPTIONS]

Options

Option Short Type Description
--config PATH -c Required Path to YAML configuration file
--model TEXT -m String Override model from config
--provider TEXT -p Choice Override provider from config
--harness TEXT Choice Override agent harness from config
--benchmark TEXT -b Choice Override benchmark from config (swe-bench or cybergym)
--level INTEGER Integer Override CyberGym difficulty level (0-3)
--sample INTEGER -n Integer Override sample size from config
--mcp-only -M Flag Run only MCP evaluation (skip baseline)
--baseline-only -B Flag Run only baseline evaluation (skip MCP)
--no-prebuilt Flag Disable pre-built SWE-bench images
--output PATH -o Path Path to save JSON results
--report PATH -r Path Path to save Markdown report
--verbose -v Count Verbose output (-v summary, -vv detailed)
--log-file PATH -l Path Path to write raw JSON log output (single file)
--log-dir PATH Path Directory to write per-instance JSON log files
--task TEXT -t String Run specific task(s) by instance_id (repeatable)
--prompt TEXT String Override agent prompt (use {problem_statement} placeholder)
--help -h Flag Show help message

Examples

Basic Evaluation

# Full evaluation (MCP + baseline)
mcpbr run -c config.yaml

# With verbose output
mcpbr run -c config.yaml -v

# Very verbose (detailed tool calls)
mcpbr run -c config.yaml -vv

Selective Runs

# Run only MCP evaluation
mcpbr run -c config.yaml -M

# Run only baseline evaluation
mcpbr run -c config.yaml -B

# Run specific tasks
mcpbr run -c config.yaml -t astropy__astropy-12907 -t django__django-11099

Override Config Values

# Override model (use alias or full name)
mcpbr run -c config.yaml -m opus

# Override sample size
mcpbr run -c config.yaml -n 50

# Override benchmark
mcpbr run -c config.yaml --benchmark cybergym

# Run CyberGym with specific level
mcpbr run -c config.yaml --benchmark cybergym --level 3

# Override prompt
mcpbr run -c config.yaml --prompt "Fix this bug: {problem_statement}"

Save Results

# Save JSON results
mcpbr run -c config.yaml -o results.json

# Save Markdown report
mcpbr run -c config.yaml -r report.md

# Both
mcpbr run -c config.yaml -o results.json -r report.md

# Per-instance logs
mcpbr run -c config.yaml -v --log-dir logs/

mcpbr init

Generate an example configuration file.

Usage

mcpbr init [OPTIONS]

Options

Option Short Type Default Description
--output PATH -o Path mcpbr.yaml Path to write example config
--help -h Flag Show help message

Examples

# Create default config
mcpbr init

# Custom filename
mcpbr init -o my-config.yaml

mcpbr models

List supported Anthropic models for evaluation.

Usage

mcpbr models

Output

                   Supported Anthropic Models
+----------------------------+------------------------+---------+
| Model ID                   | Display Name           | Context |
+----------------------------+------------------------+---------+
| claude-opus-4-5-20251101   | Claude Opus 4.5        | 200,000 |
| claude-sonnet-4-5-20250929 | Claude Sonnet 4.5      | 200,000 |
| claude-haiku-4-5-20251001  | Claude Haiku 4.5       | 200,000 |
| opus                       | Claude Opus (alias)    | 200,000 |
| sonnet                     | Claude Sonnet (alias)  | 200,000 |
| haiku                      | Claude Haiku (alias)   | 200,000 |
+----------------------------+------------------------+---------+

mcpbr providers

List available model providers.

Usage

mcpbr providers

Output

Available Model Providers

+----------+-------------------+---------------------+
| Provider | Env Variable      | Description         |
+----------+-------------------+---------------------+
| anthropic| ANTHROPIC_API_KEY | Direct Anthropic API|
+----------+-------------------+---------------------+

mcpbr harnesses

List available agent harnesses.

Usage

mcpbr harnesses

Output

Available Agent Harnesses

claude-code (default)
  Shells out to Claude Code CLI with MCP server support
  Requires: claude CLI installed

mcpbr benchmarks

List available benchmarks with their characteristics.

Usage

mcpbr benchmarks

Output

Available Benchmarks

┌────────────┬──────────────────────────────────────────────────────────┬─────────────────────────┐
│ Benchmark  │ Description                                              │ Output Type             │
├────────────┼──────────────────────────────────────────────────────────┼─────────────────────────┤
│ swe-bench  │ Software bug fixes in GitHub repositories                │ Patch (unified diff)    │
│ cybergym   │ Security vulnerability exploitation (PoC generation)     │ Exploit code            │
└────────────┴──────────────────────────────────────────────────────────┴─────────────────────────┘

Use --benchmark flag with 'run' command to select a benchmark
Example: mcpbr run -c config.yaml --benchmark cybergym --level 2

See the Benchmarks guide for detailed information about each benchmark.


mcpbr cleanup

Remove orphaned mcpbr Docker containers that were not properly cleaned up.

Usage

mcpbr cleanup [OPTIONS]

Options

Option Short Type Description
--dry-run Flag Show containers that would be removed without removing them
--force -f Flag Skip confirmation prompt
--help -h Flag Show help message

Examples

# Preview containers to remove
mcpbr cleanup --dry-run

# Remove with confirmation prompt
mcpbr cleanup

# Remove without confirmation
mcpbr cleanup -f

Exit Codes

Code Meaning
0 Success
1 Error (configuration, API, etc.)
130 Interrupted by user (Ctrl+C)

Environment Variables

Variable Required Description
ANTHROPIC_API_KEY Yes Anthropic API key for Claude models

Next Steps