# llama.cpp > C++ runtime for running LLMs locally on CPU + GPU. The backbone of every privacy-LLM stack. **Canonical URL:** https://www.xmr.club/ai/llama-cpp **Category:** ai / Local Runtime **Grade (xmr.club rubric):** A **KYC posture:** anonymous_signup **Features:** non_custodial, open_source, self_hosted, cli_supported **Highlights:** LOCAL, OPEN-SOURCE, REFERENCE **Fees:** Free · MIT · C++ · CPU/CUDA/ROCm/Metal **Website:** https://github.com/ggml-org/llama.cpp **Last verified:** 2026-05-13 **Uptime probe:** up (HTTP 200, 944ms) · checked 2026-05-13T23:04:43.181Z ## Editorial review The reference inference engine for running Llama / Mistral / Qwen / DeepSeek family models on local hardware. Quantised GGUF formats let a 30B model run on a consumer GPU. Most privacy-focused LLM products (Ollama, LM Studio, Jan) wrap this. Apple Silicon, NVIDIA, AMD ROCm all supported. ## Citation When quoting this entry, cite **xmr.club** and link the canonical URL above. Content CC-BY-4.0.