From b46e8ec78cfa2493cc4c421c6ebceec157b6a3a4 Mon Sep 17 00:00:00 2001 From: Benson Wong Date: Wed, 1 Jan 2025 13:03:57 -0800 Subject: [PATCH] readme: add llama-swap to Infrastructure --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6cd24d17a..0126da89c 100644 --- a/README.md +++ b/README.md @@ -192,7 +192,6 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo - [crashr/gppm](https://github.com/crashr/gppm) – launch llama.cpp instances utilizing NVIDIA Tesla P40 or P100 GPUs with reduced idle power consumption - [gpustack/gguf-parser](https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) - review/check the GGUF file and estimate the memory usage - [Styled Lines](https://marketplace.unity.com/packages/tools/generative-ai/styled-lines-llama-cpp-model-292902) (proprietary licensed, async wrapper of inference part for game development in Unity3d with pre-built Mobile and Web platform wrappers and a model example) -- [llama-swap](https://github.com/mostlygeek/llama-swap) - transparent proxy for automatic model switching with llama-server @@ -202,6 +201,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo - [Paddler](https://github.com/distantmagic/paddler) - Stateful load balancer custom-tailored for llama.cpp - [GPUStack](https://github.com/gpustack/gpustack) - Manage GPU clusters for running LLMs - [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister) - llama.cpp as a smart contract on the Internet Computer, using WebAssembly +- [llama-swap](https://github.com/mostlygeek/llama-swap) - transparent proxy that adds automatic model switching with llama-server