Mistral4SmallClient

Retro CLI for testing Mistral Small 4 against local llama.cpp with the official Mistral SDK and MCP tools.

Mistral4Cli

Retro terminal CLI for testing and using Mistral Small 4 locally against
llama.cpp, with the official mistralai Python SDK.

The repository is intentionally focused on one workflow:

run Mistral Small 4 locally
exercise the official SDK against that deployment
validate image, document, tool and MCP flows
keep the CLI useful for day-to-day experimentation

What this project includes

a dedicated interactive CLI with retro green/orange presentation
always-on local tools: shell, read_file, write_file, list_dir, search_text
optional FireCrawl MCP tools loaded from mcp.json using
FIRECRAWL_API_KEY from your environment
/image and /doc attachment commands
tests for completion, streaming, cancellation recovery and multimodal payloads

Quick start

make sync
uv run python -m mistral4cli

Useful one-shot smoke test:

uv run python -m mistral4cli --once "Return only the word ok." --no-stream

Inside the REPL:

/help for actionable usage
/defaults to inspect runtime parameters
/tools to inspect loaded tools
/run -- ... to execute a shell command
/ls [PATH] to inspect the tree
/find -- ... to search text in the workspace
/edit PATH -- ... to write text files
/image to pick and analyze images
/doc to pick and analyze documents
/reset, /system ..., /exit

Local Mistral Small 4 setup

The local model is expected to be running outside this repo with llama.cpp.
The documented launch profile is:

llama-server \
  -hf unsloth/Mistral-Small-4-119B-2603-GGUF:UD-Q5_K_XL \
  --host 0.0.0.0 --port 8080 \
  --jinja --flash-attn off --no-mmap \
  --chat-template-file ./mistral-small-4-reasoning.jinja \
  --ctx-size 262144 \
  -ngl 99 \
  --temp 0.7 --top-p 0.95 --top-k 40 --min-p 0.0 \
  --parallel 1 --ctx-checkpoints 32 --cache-prompt \
  --threads "$(nproc)"

Recommended runtime defaults used by the CLI:

temperature=0.7
top_p=0.95
prompt_mode=reasoning
streaming on by default
max_tokens unset unless you override it

The repository now includes the exact reasoning template at
mistral-small-4-reasoning.jinja. In this
local setup it is effectively required if you want reasoning enabled by
default, because it sets reasoning_effort=high in the llama.cpp chat
template.

For the detailed local runbook, see
docs/local-mistral-small-4.md.

Testing

make check
make test
make docs

make check runs formatting, lint and type checks.
make test runs the full pytest suite, including local integration tests
that require the llama.cpp server.
make docs regenerates the checked-in API reference from public docstrings.

Repository layout

src/mistral4cli/ - CLI, session, tools and attachment handling
tests/ - unit and integration tests
docs/local-mistral-small-4.md - detailed local deployment notes
docs/reference.md - generated API reference from public docstrings
mistral-small-4-reasoning.jinja - versioned llama.cpp reasoning template
mcp.json - optional FireCrawl MCP config that expands
FIRECRAWL_API_KEY at runtime

License

MIT. See LICENSE.

Repository

ibitato

ibitato/Mistral4SmallClient

Created

April 13, 2026

Updated