Marko Tasic
e4124c2477
readme : add JavaScript/Wasm repo ( #5415 )
2024-02-09 12:17:00 +02:00
Michael Podvitskiy
b2f87cb64d
ggml : fix error C2078: too many initializers
for MSVC ARM64 ( #5404 )
2024-02-09 11:56:43 +02:00
0cc4m
44fbe34360
Fix Vulkan crash on APUs with very little device memory ( #5424 )
...
* Fix Vulkan crash on APUs with very little device memory
* Fix debug output function names
2024-02-09 06:52:33 +01:00
root
6d34ad7f3c
Merge branch 'master' of https://github.com/bmtwl/llama.cpp
2024-02-08 22:21:33 +00:00
bmwl
99a203d02f
Update ggml.h
...
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-08 14:21:16 -08:00
root
16b91d138e
Merge branch 'master' of https://github.com/bmtwl/llama.cpp
2024-02-08 22:00:47 +00:00
root
e107c4cd54
fixed ggml_init_numa variable
2024-02-08 22:00:35 +00:00
bmwl
fecd66ac06
Merge branch 'ggerganov:master' into master
2024-02-08 13:42:06 -08:00
root
c2c31660a5
add missing enum ggml_numa_strategies declaration
2024-02-08 21:41:36 +00:00
Johannes Gäßler
8e6a9d2de0
CUDA: more warps for mmvq on NVIDIA ( #5394 )
2024-02-08 21:56:40 +01:00
slaren
41f308f58e
llama : do not print "offloading layers" message in CPU-only builds ( #5416 )
2024-02-08 21:33:03 +01:00
root
314174ddc5
add missing enum ggml_numa_strategies declaration and revert sync problem with master
2024-02-08 19:55:47 +00:00
root
7bbe511b8e
Revert bad merge with dynatemp flags
2024-02-08 19:04:02 +00:00
root
b65c863947
Remote enum llama_numa_strategies
2024-02-08 18:07:40 +00:00
bmwl
90668fb596
Merge branch 'ggerganov:master' into master
2024-02-08 09:17:23 -08:00
Abhilash Majumder
6e99f2a04f
Fix f16_sycl cpy call from Arc ( #5411 )
...
* fix f16_sycl cpy call
* rm old logic
* add fp16 build CI
* use macro
* format fix
2024-02-08 22:39:10 +05:30
bmwl
18fb9a5382
Merge branch 'ggerganov:master' into master
2024-02-08 08:39:54 -08:00
root
12c23b60c6
Fixed lingering init_llama_backend() bool calls in tests and examples
2024-02-08 16:28:49 +00:00
Daniel Bevenius
ff4ff05c5f
llava : add missing .py, and fix paths in README.md ( #5414 )
...
This commit adds the missing .py extension to the convert-image-encoder-to-gguf
script. It also fixes the paths for the `model` and `mmproj` options in the
example llava-cli command.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-08 16:20:03 +02:00
Johannes Gäßler
b7b74cef36
fix trailing whitespace ( #5407 )
2024-02-08 11:36:54 +01:00
runfuture
4aa43fab56
llama : fix MiniCPM ( #5392 )
...
* fix bug for norm_rms_eps missing
* to align with the same order as convert.py for model write
* fix: undo HF models permute tensor
* update for flake8 lint
2024-02-08 12:36:19 +02:00
Daniel Bevenius
a6e514a85f
llava: fix typo/formatting in README.md ( #5405 )
...
This commit fixes a typo in the README.md file for the llava example
which is causing the formatting to look a little off:
Clone llava-v15-7b`` and clip-vit-large-patch14-336`` locally
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-02-08 09:58:19 +01:00
Johannes Gäßler
26d4efd11e
sampling: fix top_k <= 0 ( #5388 )
...
* sampling: fix top_k <= 0
* Update llama.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-08 09:46:30 +01:00
Georgi Gerganov
8504d2d0da
tests : .gitignore obj files
2024-02-08 09:46:47 +02:00
bmwl
f156112f56
Merge branch 'ggerganov:master' into master
2024-02-07 17:38:53 -08:00
root
783b7ca02d
Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static
2024-02-07 22:28:29 +00:00
root
d47f232fc1
Removing last bit of MIRROR_MODE code for this PR
2024-02-07 22:02:21 +00:00
root
61c37ba93c
Removing MIRROR_MODE code for this PR
2024-02-07 21:46:19 +00:00
Michael Podvitskiy
c4fbb6717c
CMAKE_OSX_ARCHITECTURES for MacOS cross compilation ( #5393 )
...
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-02-07 16:39:23 -05:00
root
3eccea1b63
Syncing to pr
2024-02-07 21:36:39 +00:00
Ebey Abraham
8c933b70c2
fix typo in readme ( #5399 )
...
Co-authored-by: Ebey Abraham <ebeyabraham@microsoft.com>
2024-02-07 22:11:30 +01:00
root
c43808c625
Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet
2024-02-07 19:49:07 +00:00
Kamil Tomšík
b906596bb7
Add Ava in the list of llama.cpp UIs ( #4362 )
2024-02-07 13:44:52 -05:00
Johannes Gäßler
aa7ab99be2
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row ( #5386 )
2024-02-07 12:40:26 +01:00
Neo Zhang Jianyu
10afa6f1d1
[SYCL] update install make by w64devkit ( #5297 )
2024-02-07 18:16:55 +08:00
Xiao-Yong Jin
0ef46da632
llava-cli : always tokenize special tokens ( #5382 )
...
* llava-cli: tokenize special tokens in prompt
* llava-cli: use the escape CLI argument, remove incomplete separate escaping process
2024-02-07 10:17:25 +02:00
0cc4m
ee1628bdfe
Basic Vulkan Multi-GPU implementation ( #5321 )
...
* Initial Vulkan multi-gpu implementation
Move most global variables into backend context
* Add names to backend device functions
* Add further missing cleanup code
* Reduce code duplication in tensor split layer assignment
* generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h
* Only do device info print in the beginning and initialize one backend for cpu assist
Add missing cleanup code
* Rework backend memory management to make sure devices and buffers get properly allocated and freed
* Rename cpu assist free function
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-02-07 07:54:50 +01:00
Eve
ed0bf32290
readme : modernize ( #5379 )
...
* first cleanup, update everything to Llama 2 and remove outdated content
* Delete SHA256SUMS
* make build instructions generic
* recommend Q4_K_M quantization method
* Update README.md
2024-02-07 08:21:30 +02:00
Ben Williams
9a697d842b
readme : update ui list ( #5354 )
2024-02-07 08:16:48 +02:00
runfuture
316c7faf77
llama : add MiniCPM support ( #5346 )
...
* support minicpm arch.
* fix tab/space typo.
* convert minicpm model via convert-hf-gguf.py
* try to make tokenizer work
* fix bug for quantize minicpm
* fix for flake8 lint
* remove convert-minicpm.py
* fix for editorconfig
* correct minicpm model type (size)
* constants expanded for minicpm
* Minor change of the constant names for minicpm
2024-02-07 08:15:56 +02:00
Justin Parker
f3e2b4fa3f
server : update /props
with "total_slots" value ( #5373 )
...
* include total "num_slots" in default_generation_settings_for_props
* cleanup total_slots return value in /props endpoint
* update /props endpoint docs with total_slots
* remove num_slots from default_generation_settings_for_props
* update /props endpoint section
2024-02-07 08:15:19 +02:00
Sang-Kil Park
f68664ac24
convert : fix TypeError on GPT-2 vocab.json ( #5288 )
2024-02-06 23:28:00 -05:00
root
12789eb308
Reverting Makefile
2024-02-06 22:45:21 +00:00
root
7aa974de5e
Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h
2024-02-06 22:43:13 +00:00
root
60b80b0e8a
removed trailing whitespace
2024-02-06 22:27:38 +00:00
root
a69d6e2b91
Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables
2024-02-06 22:23:34 +00:00
Alexey Parfenov
213d1439fa
server : remove model.json endpoint ( #5371 )
2024-02-06 20:08:38 +02:00
Johannes Gäßler
17c97fb062
CUDA: mul_mat_vec_q max. batch size 8 -> 4 ( #5370 )
2024-02-06 19:43:06 +02:00
Kawrakow
b08f22c882
Update README.md ( #5366 )
...
Add some links to quantization related PRs
2024-02-06 19:00:16 +02:00
Kawrakow
f57fadc009
Slight quantization improvement for Q4_K and Q5_K ( #5361 )
...
* Q4_K: slightly better quantization
* Q5_K: slightly better quantization
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-02-06 17:28:02 +02:00