The easiest way to run a LLM on macOS

Install ollama

brew install ollama

Start ollama server

ollama serve

Keep this window open to keep ollama server running.

Alternatively, if you want to run olama as a background service you can run brew services start ollama

Then the following command

ollama pull codellama:latest

This will download codellama-7b model.

You can also try other models listed here

You can check models you have downloaded by running

ollama list

When the download is completed you can run the following

ollama run codellama

This will run the LLM in interactive mode.

If you want to write a multiline prompt in a file, you can then use it by running the following command

ollama run codellama "$(cat prompt.txt)"

You can get more info about the run, like the token per second, by running ollama with the --verbose flag

ollama run codellama --verbose