Commit graph

2130 commits

Author SHA1 Message Date
root
6d34ad7f3c Merge branch 'master' of https://github.com/bmtwl/llama.cpp 2024-02-08 22:21:33 +00:00
bmwl
99a203d02f
Update ggml.h
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-08 14:21:16 -08:00
root
16b91d138e Merge branch 'master' of https://github.com/bmtwl/llama.cpp 2024-02-08 22:00:47 +00:00
root
e107c4cd54 fixed ggml_init_numa variable 2024-02-08 22:00:35 +00:00
bmwl
fecd66ac06
Merge branch 'ggerganov:master' into master 2024-02-08 13:42:06 -08:00
root
c2c31660a5 add missing enum ggml_numa_strategies declaration 2024-02-08 21:41:36 +00:00
Johannes Gäßler
8e6a9d2de0
CUDA: more warps for mmvq on NVIDIA (#5394) 2024-02-08 21:56:40 +01:00
slaren
41f308f58e
llama : do not print "offloading layers" message in CPU-only builds (#5416) 2024-02-08 21:33:03 +01:00
root
314174ddc5 add missing enum ggml_numa_strategies declaration and revert sync problem with master 2024-02-08 19:55:47 +00:00
root
7bbe511b8e Revert bad merge with dynatemp flags 2024-02-08 19:04:02 +00:00
root
b65c863947 Remote enum llama_numa_strategies 2024-02-08 18:07:40 +00:00
bmwl
90668fb596
Merge branch 'ggerganov:master' into master 2024-02-08 09:17:23 -08:00
Abhilash Majumder
6e99f2a04f
Fix f16_sycl cpy call from Arc (#5411)
* fix f16_sycl cpy call

* rm old logic

* add fp16 build CI

* use macro

* format fix
2024-02-08 22:39:10 +05:30
bmwl
18fb9a5382
Merge branch 'ggerganov:master' into master 2024-02-08 08:39:54 -08:00
root
12c23b60c6 Fixed lingering init_llama_backend() bool calls in tests and examples 2024-02-08 16:28:49 +00:00
Daniel Bevenius
ff4ff05c5f
llava : add missing .py, and fix paths in README.md (#5414)
This commit adds the missing .py extension to the convert-image-encoder-to-gguf
script. It also fixes the paths for the `model` and `mmproj` options in the
example llava-cli command.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-08 16:20:03 +02:00
Johannes Gäßler
b7b74cef36
fix trailing whitespace (#5407) 2024-02-08 11:36:54 +01:00
runfuture
4aa43fab56
llama : fix MiniCPM (#5392)
* fix bug for norm_rms_eps missing

* to align with the same order as convert.py for model write

* fix: undo HF models permute tensor

* update for flake8 lint
2024-02-08 12:36:19 +02:00
Daniel Bevenius
a6e514a85f
llava: fix typo/formatting in README.md (#5405)
This commit fixes a typo in the README.md file for the llava example
which is causing the formatting to look a little off:

Clone llava-v15-7b`` and clip-vit-large-patch14-336`` locally

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-08 09:58:19 +01:00
Johannes Gäßler
26d4efd11e
sampling: fix top_k <= 0 (#5388)
* sampling: fix top_k <= 0

* Update llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-08 09:46:30 +01:00
Georgi Gerganov
8504d2d0da
tests : .gitignore obj files 2024-02-08 09:46:47 +02:00
bmwl
f156112f56
Merge branch 'ggerganov:master' into master 2024-02-07 17:38:53 -08:00
root
783b7ca02d Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static 2024-02-07 22:28:29 +00:00
root
d47f232fc1 Removing last bit of MIRROR_MODE code for this PR 2024-02-07 22:02:21 +00:00
root
61c37ba93c Removing MIRROR_MODE code for this PR 2024-02-07 21:46:19 +00:00
Michael Podvitskiy
c4fbb6717c
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393)
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-02-07 16:39:23 -05:00
root
3eccea1b63 Syncing to pr 2024-02-07 21:36:39 +00:00
Ebey Abraham
8c933b70c2
fix typo in readme (#5399)
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
2024-02-07 22:11:30 +01:00
root
c43808c625 Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet 2024-02-07 19:49:07 +00:00
Kamil Tomšík
b906596bb7
Add Ava in the list of llama.cpp UIs (#4362) 2024-02-07 13:44:52 -05:00
Johannes Gäßler
aa7ab99be2
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) 2024-02-07 12:40:26 +01:00
Neo Zhang Jianyu
10afa6f1d1
[SYCL] update install make by w64devkit (#5297) 2024-02-07 18:16:55 +08:00
Xiao-Yong Jin
0ef46da632
llava-cli : always tokenize special tokens (#5382)
* llava-cli: tokenize special tokens in prompt

* llava-cli: use the escape CLI argument, remove incomplete separate escaping process
2024-02-07 10:17:25 +02:00
0cc4m
ee1628bdfe
Basic Vulkan Multi-GPU implementation (#5321)
* Initial Vulkan multi-gpu implementation

Move most global variables into backend context

* Add names to backend device functions

* Add further missing cleanup code

* Reduce code duplication in tensor split layer assignment

* generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h

* Only do device info print in the beginning and initialize one backend for cpu assist

Add missing cleanup code

* Rework backend memory management to make sure devices and buffers get properly allocated and freed

* Rename cpu assist free function

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-07 07:54:50 +01:00
Eve
ed0bf32290
readme : modernize (#5379)
* first cleanup, update everything to Llama 2 and remove outdated content

* Delete SHA256SUMS

* make build instructions generic

* recommend Q4_K_M quantization method

* Update README.md
2024-02-07 08:21:30 +02:00
Ben Williams
9a697d842b
readme : update ui list (#5354) 2024-02-07 08:16:48 +02:00
runfuture
316c7faf77
llama : add MiniCPM support (#5346)
* support minicpm arch.

* fix tab/space typo.

* convert minicpm model via convert-hf-gguf.py

* try to make tokenizer work

* fix bug for quantize minicpm

* fix for flake8 lint

* remove convert-minicpm.py

* fix for editorconfig

* correct minicpm model type (size)

* constants expanded for minicpm

* Minor change of the constant names for minicpm
2024-02-07 08:15:56 +02:00
Justin Parker
f3e2b4fa3f
server : update /props with "total_slots" value (#5373)
* include total "num_slots" in default_generation_settings_for_props

* cleanup total_slots return value in /props endpoint

* update /props endpoint docs with total_slots

* remove num_slots from default_generation_settings_for_props

* update /props endpoint section
2024-02-07 08:15:19 +02:00
Sang-Kil Park
f68664ac24
convert : fix TypeError on GPT-2 vocab.json (#5288) 2024-02-06 23:28:00 -05:00
root
12789eb308 Reverting Makefile 2024-02-06 22:45:21 +00:00
root
7aa974de5e Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h 2024-02-06 22:43:13 +00:00
root
60b80b0e8a removed trailing whitespace 2024-02-06 22:27:38 +00:00
root
a69d6e2b91 Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables 2024-02-06 22:23:34 +00:00
Alexey Parfenov
213d1439fa
server : remove model.json endpoint (#5371) 2024-02-06 20:08:38 +02:00
Johannes Gäßler
17c97fb062
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) 2024-02-06 19:43:06 +02:00
Kawrakow
b08f22c882
Update README.md (#5366)
Add some links to quantization related PRs
2024-02-06 19:00:16 +02:00
Kawrakow
f57fadc009
Slight quantization improvement for Q4_K and Q5_K (#5361)
* Q4_K: slightly better quantization

* Q5_K: slightly better quantization

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-06 17:28:02 +02:00
BarfingLemurs
2e9c0bd6b3
readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) 2024-02-06 16:06:48 +02:00
Johannes Gäßler
2c516611f1
CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) 2024-02-06 14:44:06 +01:00
Justin Parker
8a79c591de
server : include total "num_slots" in props endpoint (#5349) 2024-02-06 11:20:59 +02:00