README: updated introduction

This commit is contained in:
JohannesGaessler 2024-02-05 14:18:09 +01:00
parent c6b395535a
commit 05c9cd81a9

View file

@ -6,7 +6,7 @@
[Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggerganov/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggerganov/llama.cpp/discussions/205) / [ggml](https://github.com/ggerganov/ggml) [Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggerganov/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggerganov/llama.cpp/discussions/205) / [ggml](https://github.com/ggerganov/ggml)
Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) in pure C/C++.
### Hot topics ### Hot topics
@ -58,18 +58,23 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
## Description ## Description
The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook The goal of `llama.cpp` is to run large language models such as Meta's LLaMA model
with minimal setup and state-of-the-art performance on a wide variety of hardware.
Its selling points are:
- Plain C/C++ implementation without dependencies - Plain C/C++ implementation without any dependencies. AVX, AVX2, and AVX512 support on x86 architectures.
- Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks - First-class Apple silicon support - optimized via ARM NEON, Accelerate and Metal frameworks.
- AVX, AVX2 and AVX512 support for x86 architectures - Custom CUDA kernels for running LLMs on NVIDIA GPUs. Can be run on AMD GPUs via HIP.
- Mixed F16 / F32 precision - Support for Vulkan, SYCL, and OpenCL.
- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support - 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization support for faster inference and reduced memory use.
- CUDA, Metal, OpenCL, SYCL GPU backend support - CPU+GPU hybrid inference to partially accelerate models larger than total VRAM capacity.
- Fast C++ implementations for a variety of samplers: top-k, top-p, typical free sampling, min-p, Mirostat, temperature.
- Can be used as a library, from the command line via one of the examples, or via an HTTP web server.
- Analysis tools that e.g. provide metrics such as perplexity or [KL divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) to judge the precision loss from quantization.
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022). The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves Since then, the project has improved significantly thanks to many contributions.
as the main playground for developing new features for the [ggml](https://github.com/ggerganov/ggml) library. This project serves as the main playground for the more general [ggml](https://github.com/ggerganov/ggml) machine learning library.
**Supported platforms:** **Supported platforms:**
@ -77,11 +82,14 @@ as the main playground for developing new features for the [ggml](https://github
- [X] Linux - [X] Linux
- [X] Windows (via CMake) - [X] Windows (via CMake)
- [X] Docker - [X] Docker
- [X] FreeBSD
**Supported models:** **Supported models:**
- [X] LLaMA 🦙 - [X] LLaMA 🦙
- [x] LLaMA 2 🦙🦙 - [x] LLaMA 2 🦙🦙
- [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
- [X] Falcon - [X] Falcon
- [X] [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca) - [X] [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca)
- [X] [GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all) - [X] [GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all)
@ -95,7 +103,6 @@ as the main playground for developing new features for the [ggml](https://github
- [X] [Baichuan 1 & 2](https://huggingface.co/models?search=baichuan-inc/Baichuan) + [derivations](https://huggingface.co/hiyouga/baichuan-7b-sft) - [X] [Baichuan 1 & 2](https://huggingface.co/models?search=baichuan-inc/Baichuan) + [derivations](https://huggingface.co/hiyouga/baichuan-7b-sft)
- [X] [Aquila 1 & 2](https://huggingface.co/models?search=BAAI/Aquila) - [X] [Aquila 1 & 2](https://huggingface.co/models?search=BAAI/Aquila)
- [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187) - [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187)
- [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
- [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim) - [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
- [X] [Persimmon 8B](https://github.com/ggerganov/llama.cpp/pull/3410) - [X] [Persimmon 8B](https://github.com/ggerganov/llama.cpp/pull/3410)
- [X] [MPT](https://github.com/ggerganov/llama.cpp/pull/3417) - [X] [MPT](https://github.com/ggerganov/llama.cpp/pull/3417)
@ -104,15 +111,14 @@ as the main playground for developing new features for the [ggml](https://github
- [X] [StableLM-3b-4e1t](https://github.com/ggerganov/llama.cpp/pull/3586) - [X] [StableLM-3b-4e1t](https://github.com/ggerganov/llama.cpp/pull/3586)
- [x] [Deepseek models](https://huggingface.co/models?search=deepseek-ai/deepseek) - [x] [Deepseek models](https://huggingface.co/models?search=deepseek-ai/deepseek)
- [x] [Qwen models](https://huggingface.co/models?search=Qwen/Qwen) - [x] [Qwen models](https://huggingface.co/models?search=Qwen/Qwen)
- [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
- [x] [PLaMo-13B](https://github.com/ggerganov/llama.cpp/pull/3557) - [x] [PLaMo-13B](https://github.com/ggerganov/llama.cpp/pull/3557)
- [x] [GPT-2](https://huggingface.co/gpt2) - [x] [GPT-2](https://huggingface.co/gpt2)
- [x] [CodeShell](https://github.com/WisdomShell/codeshell) - [x] [CodeShell](https://github.com/WisdomShell/codeshell)
**Multimodal models:** **Multimodal models:**
- [x] [Llava 1.5 models](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e) - [x] [LLaVA 1.5 models](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e)
- [x] [Bakllava](https://huggingface.co/models?search=SkunkworksAI/Bakllava) - [x] [BakLLaVA](https://huggingface.co/models?search=SkunkworksAI/Bakllava)
- [x] [Obsidian](https://huggingface.co/NousResearch/Obsidian-3B-V0.5) - [x] [Obsidian](https://huggingface.co/NousResearch/Obsidian-3B-V0.5)
- [x] [ShareGPT4V](https://huggingface.co/models?search=Lin-Chen/ShareGPT4V) - [x] [ShareGPT4V](https://huggingface.co/models?search=Lin-Chen/ShareGPT4V)
- [x] [MobileVLM 1.7B/3B models](https://huggingface.co/models?search=mobileVLM) - [x] [MobileVLM 1.7B/3B models](https://huggingface.co/models?search=mobileVLM)
@ -137,14 +143,22 @@ as the main playground for developing new features for the [ggml](https://github
**UI:** **UI:**
Unless otherwise noted these projects are open-source with permissive licensing:
- [iohub/collama](https://github.com/iohub/coLLaMA)
- [janhq/jan](https://github.com/janhq/jan) (AGPL)
- [nat/openplayground](https://github.com/nat/openplayground) - [nat/openplayground](https://github.com/nat/openplayground)
- [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) - [LMStudio](https://lmstudio.ai/) (proprietary)
- [withcatai/catai](https://github.com/withcatai/catai) - [LostRuins/koboldcpp](https://github.com/LostRuins/koboldcpp) (AGPL)
- [semperai/amica](https://github.com/semperai/amica) - [Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile)
- [nomic-ai/gpt4all](https://github.com/nomic-ai/gpt4all)
- [ollama/ollama](https://github.com/ollama/ollama)
- [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) (AGPL)
- [psugihara/FreeChat](https://github.com/psugihara/FreeChat) - [psugihara/FreeChat](https://github.com/psugihara/FreeChat)
- [ptsochantaris/emeltal](https://github.com/ptsochantaris/emeltal) - [ptsochantaris/emeltal](https://github.com/ptsochantaris/emeltal)
- [iohub/collama](https://github.com/iohub/coLLaMA) - [pythops/tenere](https://github.com/pythops/tenere) (AGPL)
- [pythops/tenere](https://github.com/pythops/tenere) - [semperai/amica](https://github.com/semperai/amica)
- [withcatai/catai](https://github.com/withcatai/catai)
--- ---