How to choose a model

You can choose quantized models from https://huggingface.co/TheBloke
There is an ongoing transition from GGML format to a new format GGUF. Using GGUF is recommended for new setups as any bug with GGML will not be prioritized for support.
Check that the required RAM matches your specs. When you're running llama.cpp server, pay attention to the following line:

ggml_metal_init: recommendedMaxWorkingSetSize  = 21845.34 MB

If you try running models that require more RAM than this, it is likely that llama.cpp will crash. Models shared by TheBloke display required RAM, so choose the highest that your setup support.

Here is a public Trello sharing info about which models to run on which device : https://trello.com/b/LqqB0cZw/running-llms-on-macos. As models come out, more info will be added here.