About perplexity
Perplexity measures how well a language model can predict a piece of text. Intuitively, high perplexity means that the model is often "confused" when it comes to picking the next token. Low perplexity means that the model is good at narrowing down predictions for next token to a only a few options.
Links
- Blog post from Hugging Face about perplexity - https://huggingface.co/docs/transformers/perplexity
- Pull request implementing new quantization methods to llama.cpp and comparing perplexity for different quantization levels - https://github.com/ggerganov/llama.cpp/pull/1684