MCP Server Integration¶
This guide explains how to benchmark your MCP (Model Context Protocol) server with mcpbr.
What is MCP?¶
The Model Context Protocol is an open standard that allows AI models to access external tools and data sources. MCP servers expose tools that Claude can use during agent runs.
How mcpbr Uses MCP¶
mcpbr runs two parallel evaluations for each task:
- MCP Agent: Claude Code CLI with your MCP server registered
- Baseline Agent: Claude Code CLI without MCP tools
By comparing resolution rates, you can measure the effectiveness of your MCP server.
Configuring Your MCP Server¶
Basic Configuration¶
mcp_server:
name: "mcpbr"
command: "npx"
args:
- "-y"
- "@modelcontextprotocol/server-filesystem"
- "{workdir}"
env: {}
Configuration Fields¶
| Field | Description |
|---|---|
name | Name to register the server as (tools appear as mcp__{name}__{tool}) |
command | Executable to run your MCP server |
args | Command arguments (use {workdir} for repository path) |
env | Environment variables for the server |
The {workdir} Placeholder¶
The {workdir} placeholder is replaced at runtime with /workspace - the path to the task repository inside the Docker container.
Example Configurations¶
Anthropic Filesystem Server¶
Basic file system access:
This provides tools for reading, writing, and listing files.
Custom Python MCP Server¶
mcp_server:
command: "python"
args: ["-m", "my_mcp_server", "--workspace", "{workdir}"]
env:
LOG_LEVEL: "debug"
CUSTOM_SETTING: "value"
Node.js MCP Server¶
External API Server¶
mcp_server:
command: "npx"
args: ["-y", "@supermodeltools/mcp-server"]
env:
SUPERMODEL_API_KEY: "${SUPERMODEL_API_KEY}"
Testing Your Server¶
1. Verify Standalone¶
Before running mcpbr, test your MCP server independently:
# For the filesystem server
npx -y @modelcontextprotocol/server-filesystem /tmp/test
# For a custom server
python -m my_mcp_server --workspace /tmp/test
2. Quick Smoke Test¶
Run a single task with verbose output:
This runs:
- Only 1 task (
-n 1) - With verbose output (
-v) - MCP agent only (
-M)
3. Check Tool Registration¶
In verbose output, you'll see tool calls like:
If your tools aren't appearing, check:
- Server startup logs (stderr)
- Environment variables are set correctly
- The
{workdir}path is valid
MCP Tools in Action¶
When your MCP server is registered, Claude can use its tools alongside built-in tools:
Built-in tools:
Bash, Glob, Grep, Read, Write, Edit, TodoWrite, Task
MCP tools (with mcp__ prefix):
mcp__mcpbr__read_file
mcp__mcpbr__write_file
mcp__mcpbr__list_directory
mcp__mcpbr__search_files
...
Evaluation Strategy¶
Small-Scale Testing¶
Start with a small sample to verify your server works:
Full Benchmark¶
For comprehensive results:
sample_size: null # Full SWE-bench Lite (300 tasks)
max_concurrent: 4
timeout_seconds: 600
max_iterations: 30
Comparing Servers¶
To compare multiple MCP servers:
- Create separate config files
- Run evaluations with different configs
- Compare JSON results
Server Development Tips¶
1. Optimize for Code Search¶
Tools that help locate relevant code quickly tend to improve performance:
- Semantic search
- Symbol lookup
- Reference finding
2. Provide Context¶
Tools that provide contextual information help Claude understand the codebase:
- File summaries
- Module structure
- Dependency graphs
3. Minimize Latency¶
Each tool call adds time. Consider:
- Caching results
- Batch operations
- Precomputing indexes
4. Handle Errors Gracefully¶
Return informative error messages that help Claude recover:
# Bad
raise Exception("Error")
# Good
raise Exception("File not found: {path}. Did you mean {suggestion}?")
Debugging¶
Logs Per Instance¶
Enable per-instance logs to debug specific tasks:
This creates JSON log files with full tool call traces.
Check Tool Usage¶
The results JSON includes tool usage statistics:
{
"tool_usage": {
"mcp__mcpbr__read_file": 15,
"mcp__mcpbr__search_files": 8,
"Bash": 27,
"Read": 22
}
}
Low MCP tool usage may indicate:
- Tools not helpful for the task
- Tool discovery issues
- Better built-in alternatives
Common Issues¶
Server Not Starting¶
Check:
- Command exists and is executable
- Environment variables are set
- No syntax errors in server code
Tools Not Appearing¶
If Claude isn't using your MCP tools:
- Verify server registers tools correctly
- Check tool descriptions are clear
- Ensure
{workdir}is resolved correctly
Next Steps¶
- Evaluation Results - Understanding output formats
- Architecture - How mcpbr works internally
- Troubleshooting - Common issues and solutions