Skip to main content

How to choose a model

  • You can choose quantized models from https://huggingface.co/TheBloke
  • There is an ongoing transition from GGML format to a new format GGUF. Using GGUF is recommended for new setups as any bug with GGML will not be prioritized for support.
  • Check that the required RAM matches your specs. When you're running llama.cpp server, pay attention to the following line:
ggml_metal_init: recommendedMaxWorkingSetSize  = 21845.34 MB

If you try running models that require more RAM than this, it is likely that llama.cpp will crash. Models shared by TheBloke display required RAM, so choose the highest that your setup support.