# llama.cpp

> C++ runtime for running LLMs locally on CPU + GPU. The backbone of every privacy-LLM stack.

**Canonical URL:** https://www.xmr.club/ai/llama-cpp
**Category:** ai / Local Runtime
**Grade (xmr.club rubric):** A
**KYC posture:** anonymous_signup
**Features:** non_custodial, open_source, self_hosted, cli_supported
**Highlights:** LOCAL, OPEN-SOURCE, REFERENCE
**Fees:** Free · MIT · C++ · CPU/CUDA/ROCm/Metal
**Website:** https://github.com/ggml-org/llama.cpp
**Last verified:** 2026-05-13
**Uptime probe:** up (HTTP 200, 944ms) · checked 2026-05-13T23:04:43.181Z

## Editorial review

The reference inference engine for running Llama / Mistral / Qwen / DeepSeek family models on local hardware. Quantised GGUF formats let a 30B model run on a consumer GPU. Most privacy-focused LLM products (Ollama, LM Studio, Jan) wrap this. Apple Silicon, NVIDIA, AMD ROCm all supported.

## Citation

When quoting this entry, cite **xmr.club** and link the canonical URL above. Content CC-BY-4.0.