Skip to content

MCP Server Integration

This guide explains how to benchmark your MCP (Model Context Protocol) server with mcpbr.

What is MCP?

The Model Context Protocol is an open standard that allows AI models to access external tools and data sources. MCP servers expose tools that Claude can use during agent runs.

How mcpbr Uses MCP

mcpbr runs two parallel evaluations for each task:

  1. MCP Agent: Claude Code CLI with your MCP server registered
  2. Baseline Agent: Claude Code CLI without MCP tools

By comparing resolution rates, you can measure the effectiveness of your MCP server.

Configuring Your MCP Server

Basic Configuration

mcp_server:
  name: "mcpbr"
  command: "npx"
  args:
    - "-y"
    - "@modelcontextprotocol/server-filesystem"
    - "{workdir}"
  env: {}

Configuration Fields

Field Description
name Name to register the server as (tools appear as mcp__{name}__{tool})
command Executable to run your MCP server
args Command arguments (use {workdir} for repository path)
env Environment variables for the server

The {workdir} Placeholder

The {workdir} placeholder is replaced at runtime with /workspace - the path to the task repository inside the Docker container.

Example Configurations

Anthropic Filesystem Server

Basic file system access:

mcp_server:
  command: "npx"
  args: ["-y", "@modelcontextprotocol/server-filesystem", "{workdir}"]

This provides tools for reading, writing, and listing files.

Custom Python MCP Server

mcp_server:
  command: "python"
  args: ["-m", "my_mcp_server", "--workspace", "{workdir}"]
  env:
    LOG_LEVEL: "debug"
    CUSTOM_SETTING: "value"

Node.js MCP Server

mcp_server:
  command: "node"
  args: ["/path/to/server/index.js", "--root", "{workdir}"]

External API Server

mcp_server:
  command: "npx"
  args: ["-y", "@supermodeltools/mcp-server"]
  env:
    SUPERMODEL_API_KEY: "${SUPERMODEL_API_KEY}"

Testing Your Server

1. Verify Standalone

Before running mcpbr, test your MCP server independently:

# For the filesystem server
npx -y @modelcontextprotocol/server-filesystem /tmp/test

# For a custom server
python -m my_mcp_server --workspace /tmp/test

2. Quick Smoke Test

Run a single task with verbose output:

mcpbr run -c config.yaml -n 1 -v -M

This runs:

  • Only 1 task (-n 1)
  • With verbose output (-v)
  • MCP agent only (-M)

3. Check Tool Registration

In verbose output, you'll see tool calls like:

14:23:22 astropy-12907:mcp    > mcp__mcpbr__read_file
14:23:22 astropy-12907:mcp    < File content: ...

If your tools aren't appearing, check:

  • Server startup logs (stderr)
  • Environment variables are set correctly
  • The {workdir} path is valid

MCP Tools in Action

When your MCP server is registered, Claude can use its tools alongside built-in tools:

Built-in tools:
  Bash, Glob, Grep, Read, Write, Edit, TodoWrite, Task

MCP tools (with mcp__ prefix):
  mcp__mcpbr__read_file
  mcp__mcpbr__write_file
  mcp__mcpbr__list_directory
  mcp__mcpbr__search_files
  ...

Evaluation Strategy

Small-Scale Testing

Start with a small sample to verify your server works:

sample_size: 5
max_concurrent: 1
timeout_seconds: 180

Full Benchmark

For comprehensive results:

sample_size: null  # Full SWE-bench Lite (300 tasks)
max_concurrent: 4
timeout_seconds: 600
max_iterations: 30

Comparing Servers

To compare multiple MCP servers:

  1. Create separate config files
  2. Run evaluations with different configs
  3. Compare JSON results
mcpbr run -c server-a.yaml -o results-a.json
mcpbr run -c server-b.yaml -o results-b.json

Server Development Tips

Tools that help locate relevant code quickly tend to improve performance:

  • Semantic search
  • Symbol lookup
  • Reference finding

2. Provide Context

Tools that provide contextual information help Claude understand the codebase:

  • File summaries
  • Module structure
  • Dependency graphs

3. Minimize Latency

Each tool call adds time. Consider:

  • Caching results
  • Batch operations
  • Precomputing indexes

4. Handle Errors Gracefully

Return informative error messages that help Claude recover:

# Bad
raise Exception("Error")

# Good
raise Exception("File not found: {path}. Did you mean {suggestion}?")

Debugging

Logs Per Instance

Enable per-instance logs to debug specific tasks:

mcpbr run -c config.yaml -v --log-dir logs/

This creates JSON log files with full tool call traces.

Check Tool Usage

The results JSON includes tool usage statistics:

{
  "tool_usage": {
    "mcp__mcpbr__read_file": 15,
    "mcp__mcpbr__search_files": 8,
    "Bash": 27,
    "Read": 22
  }
}

Low MCP tool usage may indicate:

  • Tools not helpful for the task
  • Tool discovery issues
  • Better built-in alternatives

Common Issues

Server Not Starting

Warning: MCP server add failed (exit 1): ...

Check:

  • Command exists and is executable
  • Environment variables are set
  • No syntax errors in server code

Tools Not Appearing

If Claude isn't using your MCP tools:

  • Verify server registers tools correctly
  • Check tool descriptions are clear
  • Ensure {workdir} is resolved correctly

Next Steps