The easiest way to run a LLM on macOS
Install ollama
brew install ollama
Start ollama server
ollama serve
Keep this window open to keep ollama server running.
Alternatively, if you want to run olama as a background service you can run
brew services start ollama
Then the following command
ollama pull codellama:latest
This will download codellama-7b model.
You can also try other models listed here
You can check models you have downloaded by running
ollama list
When the download is completed you can run the following
ollama run codellama
This will run the LLM in interactive mode.
If you want to write a multiline prompt in a file, you can then use it by running the following command
ollama run codellama "$(cat prompt.txt)"
You can get more info about the run, like the token per second,
by running ollama with the --verbose flag
ollama run codellama --verbose