README: updated introduction

2024-02-05 14:18:09 +01:00 · 2024-02-05 14:18:09 +01:00 · 05c9cd81a9
commit 05c9cd81a9
parent c6b395535a
1 changed files with 33 additions and 19 deletions
--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@
 [Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggerganov/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggerganov/llama.cpp/discussions/205) / [ggml](https://github.com/ggerganov/ggml)
-Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
+Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) in pure C/C++.
 ### Hot topics
@ -58,18 +58,23 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
 ## Description
-The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook
+The goal of `llama.cpp` is to run large language models such as Meta's LLaMA model
 with minimal setup and state-of-the-art performance on a wide variety of hardware.
 Its selling points are:
- Plain C/C++ implementation without dependencies
+- Plain C/C++ implementation without any dependencies. AVX, AVX2, and AVX512 support on x86 architectures.
- Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
+- First-class Apple silicon support - optimized via ARM NEON, Accelerate and Metal frameworks.
- AVX, AVX2 and AVX512 support for x86 architectures
+- Custom CUDA kernels for running LLMs on NVIDIA GPUs. Can be run on AMD GPUs via HIP.
- Mixed F16 / F32 precision
+- Support for Vulkan, SYCL, and OpenCL.
- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
+- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization support for faster inference and reduced memory use.
- CUDA, Metal, OpenCL, SYCL GPU backend support
+- CPU+GPU hybrid inference to partially accelerate models larger than total VRAM capacity.
 - Fast C++ implementations for a variety of samplers: top-k, top-p, typical free sampling, min-p, Mirostat, temperature.
 - Can be used as a library, from the command line via one of the examples, or via an HTTP web server.
 - Analysis tools that e.g. provide metrics such as perplexity or [KL divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) to judge the precision loss from quantization.
 The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
-Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves
+Since then, the project has improved significantly thanks to many contributions. 
-as the main playground for developing new features for the [ggml](https://github.com/ggerganov/ggml) library.
+This project serves as the main playground for the more general [ggml](https://github.com/ggerganov/ggml) machine learning library.
 **Supported platforms:**
@ -77,11 +82,14 @@ as the main playground for developing new features for the [ggml](https://github
 - [X] Linux
 - [X] Windows (via CMake)
 - [X] Docker
 - [X] FreeBSD
 **Supported models:**
 - [X] LLaMA 🦙
 - [x] LLaMA 2 🦙🦙
 - [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
 - [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
 - [X] Falcon
 - [X] [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca)
 - [X] [GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all)
@ -95,7 +103,6 @@ as the main playground for developing new features for the [ggml](https://github
 - [X] [Baichuan 1 & 2](https://huggingface.co/models?search=baichuan-inc/Baichuan) + [derivations](https://huggingface.co/hiyouga/baichuan-7b-sft)
 - [X] [Aquila 1 & 2](https://huggingface.co/models?search=BAAI/Aquila)
 - [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187)
 - [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
 - [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
 - [X] [Persimmon 8B](https://github.com/ggerganov/llama.cpp/pull/3410)
 - [X] [MPT](https://github.com/ggerganov/llama.cpp/pull/3417)
@ -104,15 +111,14 @@ as the main playground for developing new features for the [ggml](https://github
 - [X] [StableLM-3b-4e1t](https://github.com/ggerganov/llama.cpp/pull/3586)
 - [x] [Deepseek models](https://huggingface.co/models?search=deepseek-ai/deepseek)
 - [x] [Qwen models](https://huggingface.co/models?search=Qwen/Qwen)
 - [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
 - [x] [PLaMo-13B](https://github.com/ggerganov/llama.cpp/pull/3557)
 - [x] [GPT-2](https://huggingface.co/gpt2)
 - [x] [CodeShell](https://github.com/WisdomShell/codeshell)
 **Multimodal models:**
- [x] [Llava 1.5 models](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e)
+- [x] [LLaVA 1.5 models](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e)
- [x] [Bakllava](https://huggingface.co/models?search=SkunkworksAI/Bakllava)
+- [x] [BakLLaVA](https://huggingface.co/models?search=SkunkworksAI/Bakllava)
 - [x] [Obsidian](https://huggingface.co/NousResearch/Obsidian-3B-V0.5)
 - [x] [ShareGPT4V](https://huggingface.co/models?search=Lin-Chen/ShareGPT4V)
 - [x] [MobileVLM 1.7B/3B models](https://huggingface.co/models?search=mobileVLM)
@ -137,14 +143,22 @@ as the main playground for developing new features for the [ggml](https://github
 **UI:**
 Unless otherwise noted these projects are open-source with permissive licensing:
 - [iohub/collama](https://github.com/iohub/coLLaMA)
 - [janhq/jan](https://github.com/janhq/jan) (AGPL)
 - [nat/openplayground](https://github.com/nat/openplayground)
- [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+- [LMStudio](https://lmstudio.ai/) (proprietary)
- [withcatai/catai](https://github.com/withcatai/catai)
+- [LostRuins/koboldcpp](https://github.com/LostRuins/koboldcpp) (AGPL)
- [semperai/amica](https://github.com/semperai/amica)
+- [Mozilla-Ocho/llamafile](https://github.com/Mozilla-Ocho/llamafile)
 - [nomic-ai/gpt4all](https://github.com/nomic-ai/gpt4all)
 - [ollama/ollama](https://github.com/ollama/ollama)
 - [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) (AGPL)
 - [psugihara/FreeChat](https://github.com/psugihara/FreeChat)
 - [ptsochantaris/emeltal](https://github.com/ptsochantaris/emeltal)
- [iohub/collama](https://github.com/iohub/coLLaMA)
+- [pythops/tenere](https://github.com/pythops/tenere) (AGPL)
- [pythops/tenere](https://github.com/pythops/tenere)
+- [semperai/amica](https://github.com/semperai/amica)
 - [withcatai/catai](https://github.com/withcatai/catai)
 ---