Run vMCP locally with the CLI
Most of the Virtual MCP Server (vMCP) guides deploy vMCP to Kubernetes using the operator. The ToolHive CLI also ships a local mode for the same vMCP runtime, so you can aggregate MCP servers from a ToolHive group on your workstation without a cluster.
Local mode is useful for:
- Prototyping aggregation against a handful of MCP servers before
committing to a
VirtualMCPServerresource in Kubernetes. - Local development against the same optimizer, conflict resolution, and composite-tool logic that the operator uses.
- Demos where you want a single endpoint fronting several MCP servers without the operator overhead.
The CLI command is thv vmcp with three subcommands: serve, init, and
validate. They share the pkg/vmcp/ runtime with the Kubernetes operator,
so configuration, tool aggregation, and the optimizer behave the same way in
both environments.
Prerequisites
- ToolHive CLI
v0.24.0or later. See Install the ToolHive CLI. - One or more MCP servers running in a ToolHive group. See Run MCP servers and Group management to create a group and add servers to it.
- Docker or Podman if you plan to enable the Tier 2 semantic optimizer, which starts a Text Embeddings Inference (TEI) container.
Quick mode: aggregate a group in one command
When you already have a ToolHive group with running MCP servers, the fastest
way to start vMCP is quick mode. Pass --group and vMCP generates a minimal
in-memory configuration from the group at startup:
thv vmcp serve --group demo-tools
This is equivalent to creating a config file with:
groupRef: demo-toolsincomingAuth.type: anonymousoutgoingAuth.source: inlineaggregation.conflictResolution: prefixwith prefix format{workload}_
vMCP binds to 127.0.0.1:4483 and aggregates every accessible MCP server in
the group behind that single endpoint.
Because quick mode uses anonymous authentication, thv vmcp serve --group
rejects non-loopback bind addresses. Valid values for --host are
127.0.0.1, localhost, or any other loopback address. To expose vMCP on
another interface or add real authentication, switch to a config file (see
below).
Connect a client
In another terminal, register the vMCP endpoint as a remote MCP server so your configured AI clients can discover it:
thv run http://localhost:4483/mcp --name local-vmcp
If you haven't registered any AI clients yet, run thv client setup first.
See Client configuration for
details.
Config-file mode: generate, review, serve
When you need incoming OIDC auth, a non-loopback bind, per-backend outgoing
auth, or composite tools, use a configuration file. The init subcommand
scaffolds one from an existing group, validate checks it, and serve
runs it.
Step 1: generate a starter config
thv vmcp init enumerates the workloads in a group and emits a YAML config
with one backend entry per accessible server:
thv vmcp init --group demo-tools --output vmcp.yaml
The --group flag is required. If you omit --output, the generated YAML
is written to stdout so you can pipe it through less or redirect it
yourself.
A generated file looks similar to this (comments trimmed):
name: demo-tools-vmcp
groupRef: demo-tools
incomingAuth:
type: anonymous
outgoingAuth:
source: inline
aggregation:
conflictResolution: prefix
conflictResolutionConfig:
prefixFormat: "{workload}_"
backends:
- name: fetch
url: http://localhost:24162/mcp
transport: streamable-http
- name: osv
url: http://localhost:24163/mcp
transport: streamable-http
Edit the file to switch to OIDC, change the conflict-resolution strategy, add composite tool definitions, or enable the optimizer. See Configure vMCP servers for the complete schema.
Step 2: validate the config
Before starting the server, run the validator to catch syntax errors and schema violations:
thv vmcp validate --config vmcp.yaml
A valid config exits with status 0 and no output. An invalid config exits
non-zero with a descriptive error, for example:
Error: validation failed: aggregation.conflictResolution: unknown strategy "preffix"
Step 3: start the server
Once the config validates, start vMCP:
thv vmcp serve --config vmcp.yaml
When --config is set, --group is ignored. Pass --host and --port to
override the default 127.0.0.1:4483 bind address (non-loopback addresses
are allowed in config-file mode).
Enable the optimizer
vMCP supports the same tool optimizer in local mode as in
Kubernetes. The CLI exposes three tiers through flags on thv vmcp serve.
| Tier | Flag | What it does | External service |
|---|---|---|---|
| 0 | (none) | Pass-through: backends' tools are exposed as-is. | None |
| 1 | --optimizer | FTS5 keyword optimizer: clients see only find_tool and call_tool, backed by an in-process SQLite FTS5 index. | None |
| 2 | --optimizer-embedding | Tier 1 plus TEI semantic search. Implies --optimizer. | Managed TEI container |
Tier 2 starts a managed TEI container on first run and stops it when the
server exits. Customize the model and image with --embedding-model and
--embedding-image:
thv vmcp serve --group demo-tools \
--optimizer-embedding \
--embedding-model BAAI/bge-small-en-v1.5 \
--embedding-image ghcr.io/huggingface/text-embeddings-inference:cpu-latest
The defaults use the upstream CPU image; switch to a GPU image if you have one available. First-start container pulls can take 30-60 seconds.
Enable audit logging
Pass --enable-audit to log each incoming request with the default audit
configuration. If your config file already defines an audit section, the
flag has no effect and the file's configuration wins.
thv vmcp serve --config vmcp.yaml --enable-audit
See Audit logging for the event format and how to customize it from a config file.
Local CLI vs Kubernetes deployment
| Aspect | thv vmcp serve | VirtualMCPServer CRD |
|---|---|---|
| Where it runs | Your workstation | Kubernetes cluster |
| Backend discovery | Local ToolHive group | MCPGroup with MCPServer/MCPRemoteProxy |
| Default bind | 127.0.0.1:4483 | Service in the cluster |
| Authentication | Anonymous (quick mode) or as configured | Full OIDC, token exchange, embedded auth |
| Lifecycle | Foreground process; Ctrl-C stops | Operator-managed Deployment |
| Optimizer | Flag-driven (Tier 0/1/2) | embeddingServerRef on the CRD |
Both paths load the same vMCP runtime from pkg/vmcp/, so a config file
that validates locally will behave the same way when the operator loads it
as a ConfigMap. This makes local mode useful for iterating on aggregation,
conflict resolution, and composite-tool definitions before moving to
Kubernetes.
Next steps
- Configure vMCP servers for the full config schema.
- Optimize tool discovery for details on Tier 1 and Tier 2.
- Run vMCP in Kubernetes when you're ready to move from a local process to the operator.