Commit graph

2604 commits

Author SHA1 Message Date
kaizau
021c6f50e1
Merge branch 'ggerganov:master' into master 2024-04-03 21:57:00 +08:00
Kai Zau
48850cff19 Remove BOS token from templates, unprefix openchat 2024-04-03 21:45:48 +08:00
Georgi Gerganov
076b08649e
readme : update hot topics 2024-04-03 16:11:15 +03:00
slaren
08a0c02060
ggml : mul_mat_id use the same tensor for all the experts (#6387)
* ggml : update mul_mat_id to use the same tensor for all the experts

* update cuda

* minor

* update metal

* update test-backend-ops

* fix cuda

* Update ggml-metal.m

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update convert.py

* update convert-hf-to-gguf.py

* update convert.py for mixtral hf models

* Update convert-hf-to-gguf.py

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* cuda : support non-pow-2 number of experts

* allow quantize to work for split and merged experts models in the same way

* cleanup + disable mmap automatically with split tensors models

* update imatrix

* test-backend-ops : test qwen argsort

* update grok model loading

* llama : add merged experts tensors to the grok tensor map

* minor

* gguf : bump version

* fix quantizing of merged experts

* convert-hf-to-gguf.py : update grok (untested)

* make linter happy

* cuda/argsort : use shared memory instead of pool memory

* convert : fix grok tensor names

* metal : add support for non-pow-2 argsort

* llama : more loader cleanup, better error checking

* cuda : fix warning

* llama : still use mmap for loading old models, but copy the data to a host buffer

* add review note

* llama : remove ffn tensor counting + add sanity check

ggml-ci

* convert : fix handling of n_experts == None

ggml-ci

* imatrix : fix ncall counters

* llama : produce error if imatrix size does not match

* quantize : terminate on errors + trace logs

ggml-ci

* metal : pad shared memory to 16 bytes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03 16:07:05 +03:00
Meng, Hengyu
52604860f9
[SYCL] Disable iqx on windows as WA (#6435)
* disable iqx on windows as WA

* array instead of global_memory
2024-04-03 10:34:40 +08:00
Georgi Gerganov
f87f7b8986
flake.lock: Update (#6402)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/44d0940ea560dee511026a53f0e2e2cde489b4d4' (2024-03-23)
  → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-01 09:05:57 -07:00
Kai Zau
cfcbc7adf3 Match openchat template with jinja output 2024-04-01 19:35:20 +08:00
Kai Zau
1eebfc9f0f Separate deepseek bos from system message 2024-04-01 19:34:31 +08:00
Kai Zau
9165380c52 Regenerate chat template test with add_generation_prompt 2024-04-01 19:33:19 +08:00
Johannes Gäßler
33a5244806
compare-llama-bench.py: fix long hexsha args (#6424) 2024-04-01 13:30:43 +02:00
Pierrick Hymbert
226e819371
ci: server: verify deps are coherent with the commit (#6409)
* ci: server: verify deps are coherent with the commit

* ci: server: change the ref to build as now it's a pull event target
2024-04-01 12:36:40 +02:00
Kai Zau
d297225e98 Remove alpaca, match deepseek with jinja output 2024-04-01 16:40:07 +09:00
Georgi Gerganov
c50a82ce0f
readme : update hot topics 2024-03-31 11:56:30 +03:00
Pierrick Hymbert
37e7854c10
ci: bench: fix Resource not accessible by integration on PR event (#6393) 2024-03-30 12:36:07 +02:00
Kai Zau
a4986dd52e Add separate template name for vicuna-orca 2024-03-30 19:29:27 +09:00
Kai Zau
f1a3b12ced Add chat template for alpaca 2024-03-30 19:04:49 +09:00
kaizau
ce48a6e4de
Merge branch 'ggerganov:master' into master 2024-03-30 16:56:14 +08:00
Kai Zau
c708544cd6 Add tests for openchat and vicuna chat templates 2024-03-30 17:48:15 +09:00
Kai Zau
5305d6822a Combine vicuna chat templates 2024-03-30 17:47:37 +09:00
Kai Zau
e423aa1adf Add EOS for vicuna templates 2024-03-30 14:54:12 +09:00
Kai Zau
e0f9d9d732 Add chat template for orca-vicuna 2024-03-30 14:41:43 +09:00
Kai Zau
f6104b9b77 Add chat template for vicuna 2024-03-30 11:23:18 +09:00
Kai Zau
0d24c6af89 Add chat template test for openchat 2024-03-30 10:52:55 +09:00
Kai Zau
d19df2c5b9 Add openchat chat template 2024-03-30 10:52:31 +09:00
Mohammadreza Hendiani
c342d070c6
Fedora build update (#6388)
* fixed deprecated address

* fixed deprecated address

* fixed deprecated address

* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions

* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions

* Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions

* reverted back to only the MIT license
2024-03-29 22:59:56 +01:00
Xuan Son Nguyen
f7fc5f6c6f
split: allow --split-max-size option (#6343)
* split by max size

* clean up arg parse

* split: ok

* add dry run option

* error on 0 tensors

* be positive

* remove next_metadata_size
2024-03-29 22:34:44 +01:00
0cc4m
ba0c7c70ab
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
* Fix Vulkan no kv offload incoherence

* Add k-quant mul mat mat shaders

* Rework working buffer allocation, reduces vram use noticeably

Clean up cpu assist code, replaced with ggml-backend offload function

* Default to all dedicated GPUs

* Add fallback for integrated GPUs if no dedicated GPUs are found

* Add debug info which device is allocating memory

* Fix Intel dequant issue

Fix validation issue

* Fix Vulkan GGML_OP_GET_ROWS implementation

* Clean up merge artifacts

* Remove Vulkan warning
2024-03-29 17:29:21 +01:00
Georgi Gerganov
d48ccf3ad4
sync : ggml (#6351)
* sync : ggml

ggml-ci

* cuda : move GGML_CUDA_DMMV constants to dmmv.cuh

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-03-29 17:45:46 +02:00
hxer7963
069574775c
[Model] Add support for xverse (#6301)
* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>
2024-03-29 14:37:03 +01:00
Georgi Gerganov
cfde806eb9
ci : fix BGE wget (#6383)
ggml-ci
2024-03-29 14:34:28 +02:00
zhouwg
b910287954
readme : add project (#6356)
* readme: add Android UI binding

* Update README.md
2024-03-29 09:33:46 +02:00
Matt Clayton
8093987090
cmake : add explicit metal version options (#6370)
* cmake: add explicit metal version options

* Update CMakeLists.txt

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-29 09:27:42 +02:00
Daniel Bevenius
057400a3fd
llama : remove redundant reshape in build_kv_store (#6369)
* llama: remove redundant reshape in build_kv_store

This commit removes the reshape of the V matrix in the build_kv_store.

The motivation for this is that V matrix has the shape:
```console
(gdb) p *v_cur
$46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU,
       buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608,
       8388608}, op = GGML_OP_MUL_MAT, op_params = {
       0 <repeats 16 times>}, flags = 0, grad = 0x0,
       src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
       0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0,
       view_src = 0x0, view_offs = 0, data = 0x0,
       name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0,
       padding = "\000\000\000\000\000\000\000"}
```
And after reshaping this tensor we get:
```console
gdb) p *ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens)
$44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU,
       buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608,
       8388608}, op = GGML_OP_RESHAPE, op_params = {
       0 <repeats 16 times>}, flags = 0, grad = 0x0,
       src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
       0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0,
       view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0,
       name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0,
       padding = "\000\000\000\000\000\000\000"}
```
I noticed that the `src` and `view_src` fields are different but that the
dimensions are the same. From the code comment it seems like the reshape
call is not needed and perhaps the above can motivate the removal of the
reshape call.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* llama : add assert

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-29 09:23:22 +02:00
Pedro Cuenca
b75c38166c
convert : allow conversion of Mistral HF models (#6144)
* Allow conversion of Mistral HF models

* Homogenize Llama, Mistral, Mixtral under the same entry.

* Fix tokenizer, permute tensors

* Use sentencepiece tokenizer, or fall back to hfft.

* convert-hf : small fix for mypy

* convert-hf : fix duplicated block_count

* convert-hf : add vocab size to metadata

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-03-29 09:15:00 +02:00
Georgi Gerganov
bfe7dafc9c
readme : add notice for UI list 2024-03-28 22:56:03 +02:00
Ouadie EL FAROUKI
5106ef482c
[SYCL] Revisited & updated SYCL build documentation (#6141)
* Revisited & updated SYCL build documentation

* removed outdated comment

* Addressed PR comments

* Trimed white spaces

* added new end line
2024-03-28 16:01:47 +00:00
Jared Van Bortel
be55134a53
convert : refactor vocab selection logic (#6355) 2024-03-28 11:44:36 -04:00
Ziang Wu
66ba560256
llava : fix MobileVLM (#6364)
* fix empty bug

* Update MobileVLM-README.md

added more results on devices

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update MobileVLM-README.md

* Update examples/llava/MobileVLM-README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update MobileVLM-README.md

remove gguf links

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-28 16:33:10 +02:00
compilade
0308f5e3d7
llama : fix command-r inference when omitting outputs (#6367) 2024-03-28 14:05:54 +02:00
Pierrick Hymbert
28cb9a09c4
ci: bench: fix master not schedule, fix commit status failed on external repo (#6365) 2024-03-28 11:27:56 +01:00
Ting Sun
cfc4d75df6
doc: fix outdated default value of batch size (#6336)
* doc: fix outdated default value of batch size

* doc: add doc for ubatch-size
2024-03-28 09:51:06 +01:00
Eric Zhang
6902cb7f2e
server : stop gracefully on SIGTERM (#6348) 2024-03-28 09:50:48 +01:00
hutli
d2d8f38996 nix: removed unnessesary indentation 2024-03-28 07:48:27 +00:00
hutli
d39b308eaf nix: moved blas availability check to package inputs so it is still overridable 2024-03-28 07:48:27 +00:00
hutli
c873976649 using blas.meta.available to check host platform 2024-03-28 07:48:27 +00:00
hutli
dbb03e2b9c only using explicit blas if hostPlatform is allowed 2024-03-28 07:48:27 +00:00
Someone Serge
e9f17dc3bf nix: .#windows: proper cross-compilation set-up
Take all dependencies from the cross stage, rather tha only stdenv
2024-03-28 07:48:27 +00:00
Someone Serge
22a462cc1f nix: package: don't introduce the dependency on python
- The generic /usr/bin/env shebangs are good enough
- Python deps are provisioned in the devShells
- We need to be able to leave python out at least on windows (currently breaks eval)
2024-03-28 07:48:27 +00:00
hutli
f6a0f5c642 nix: .#widnows: init
initial nix build for windows using zig

mingwW64 build

removes nix zig windows build

removes nix zig windows build

removed unnessesary glibc.static

removed unnessesary import of pkgs in nix

fixed missing trailing newline on non-windows nix builds

overriding stdenv when building for crosscompiling to windows in nix

better variables when crosscompiling windows in nix

cross compile windows on macos

removed trailing whitespace

remove unnessesary overwrite of "CMAKE_SYSTEM_NAME" in nix windows build

nix: keep file extension when copying result files during cross compile for windows

nix: better checking for file extensions when using MinGW

nix: using hostPlatform instead of targetPlatform when cross compiling for Windows

using hostPlatform.extensions.executable to extract executable format
2024-03-28 07:48:27 +00:00
Ziang Wu
d0e2f6416b
doc: fix typo in MobileVLM-README.md (#6181) 2024-03-28 13:03:30 +09:00