Commit graph

3873 commits

Author SHA1 Message Date
slaren
5fb5e24811
llama : minor sampling refactor (2) (#9386) 2024-09-09 17:10:46 +02:00
Georgi Gerganov
38ca6f644b
readme : update hot topics 2024-09-09 15:51:37 +03:00
Johannes Gäßler
8e6e2fbe14
CUDA: fix variable name conflict for Windows build (#9382) 2024-09-09 14:22:53 +02:00
Antonis Makropoulos
5ed087573e
readme : add LLMUnity to UI projects (#9381)
* add LLMUnity to UI projects

* add newline to examples/rpc/README.md to fix editorconfig-checker unit test
2024-09-09 14:21:38 +03:00
Radoslav Gerganov
54f376d0b9
rpc : update README [no ci] (#9320)
Update README with instructions how to offload model layers to both
local and remote devices
2024-09-09 11:04:39 +03:00
Dan Johansson
b2e89a3274
Arm AArch64: Documentation updates (#9321)
* Arm AArch64: Documentation updates

* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels

* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats

* Add newline to the end of docs/build.md
2024-09-09 10:02:45 +03:00
Markus Tavenrath
daa9623ab0
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118)
* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.
2024-09-08 21:43:48 +02:00
Georgi Gerganov
e079bffb66
cuda : fix FA Q src index (1 -> 0) (#9374) 2024-09-08 22:01:02 +03:00
Xuan Son Nguyen
3f7ccfd649
common : bring back missing args, add env var duplication check (#9375)
* common : bring back missing args

* move duplication check to test-arg-parser

* add check for duplicated env var

* correct default values
2024-09-08 18:08:55 +02:00
slaren
a249843d89
common : restore --n-gpu-layers (#9371) 2024-09-08 16:44:42 +02:00
slaren
19f4a7b296
llama : refactor samplers internal implementation (#9370) 2024-09-08 15:52:07 +02:00
Neo Zhang Jianyu
2a358fb0c4
[SYCL] add check malloc result on device (#9346)
* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-08 19:05:29 +08:00
slaren
eae597182c
llama : sanitize tokens in the upper bound (#9359) 2024-09-08 12:41:51 +02:00
Xuan Son Nguyen
00b02bb249
imatrix : fix arg parser for imatrix (#9366)
* imatrix : fix arg parser

* beautify printing first arg
2024-09-08 12:12:17 +02:00
Georgi Gerganov
a876861455 metal : update support condition for im2col + fix warning (#0) 2024-09-08 11:05:55 +03:00
Georgi Gerganov
385decbd63 sync : ggml 2024-09-08 11:05:55 +03:00
Georgi Gerganov
60a3107ccd scripts : option to increase git patch context 2024-09-08 11:05:55 +03:00
Salvatore Mesoraca
406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947)
sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
Salvatore Mesoraca
9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946)
test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-09-08 11:05:55 +03:00
Johannes Gäßler
202084d31d tests: add gradient tests for all backends (ggml/932)
* tests: add gradient checking to test-backend-ops

* remove old comment

* reorder includes

* adjust SIN/COS parameters

* add documentation, use supports_op if possible
2024-09-08 11:05:55 +03:00
Johannes Gäßler
dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) 2024-09-08 11:05:55 +03:00
Georgi Gerganov
ba1cf846ed cann : fix doxy (ggml/0) 2024-09-08 11:05:55 +03:00
Mengqing Cao
d2d3200b38 cann : add Ascend NPU support (whisper/2336)
* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp
2024-09-08 11:05:55 +03:00
Georgi Gerganov
51d964a4ef cuda : mark BF16 CONT as unsupported 2024-09-08 11:05:55 +03:00
Salvatore Mesoraca
efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)
* ggml_cont: fix issue with transposed tensors when one dimension is 1

when using multiple threads, it is not enough
to check for the tensors to be contiguous for
ggml_compute_forward_dup_same_cont to work correctly.
The tensors strides also need to match.

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Add ggml_cont tests

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Remove dead code

it isn't possible to reach this code because
all these functions are invoked by ggml_compute_forward_dup
if and only if src0->type != dst->type

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

* Make ggml_compute_forward_dup_same_cont work with contiguous tensors

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>

---------

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-08 11:05:55 +03:00
Kevin Gibbons
fbb7fcffbc
llama : set attrs of mislabelled EOT/EOM tokens (#9348) 2024-09-08 08:51:00 +03:00
Georgi Gerganov
a5b5d9a101
llama.android : fix build (#9350) 2024-09-08 00:33:50 +03:00
Georgi Gerganov
f12295b8a9
llama : fix empty ring buffer push (#9358) 2024-09-08 00:33:33 +03:00
Georgi Gerganov
faf69d4237
llama : sanitize invalid tokens (#9357)
* common : do not add null tokens during warmup

ggml-ci

* llama : check that the input tokens are valid

ggml-ci

* tests : fix batch size of bert model

ggml-ci
2024-09-08 00:33:13 +03:00
Eve
e536426ded
llamafile : disable sgemm for batch-size 1 (#9330) 2024-09-07 22:02:26 +03:00
Xuan Son Nguyen
1b9ae5189c
common : refactor arg parser (#9308)
* (wip) argparser v3

* migrated

* add test

* handle env

* fix linux build

* add export-docs example

* fix build (2)

* skip build test-arg-parser on windows

* update server docs

* bring back missing --alias

* bring back --n-predict

* clarify test-arg-parser

* small correction

* add comments

* fix args with 2 values

* refine example-specific args

* no more lamba capture

Co-authored-by: slaren@users.noreply.github.com

* params.sparams

* optimize more

* export-docs --> gen-docs
2024-09-07 20:43:51 +02:00
slaren
e32d0816ed
ggml : always check bounds on get_rows operations (#9354) 2024-09-07 20:23:07 +02:00
Georgi Gerganov
df270ef745
llama : refactor sampling v2 (#9294)
- Add `struct llama_sampler` and `struct llama_sampler_i`
- Add `llama_sampler_` API
- Add `llama_sampler_chain_` API for chaining multiple samplers
- Remove `LLAMA_API_INTERNAL`
- Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`
2024-09-07 15:16:19 +03:00
Xuan Son Nguyen
947538acb8
ggml : fix missing cpu_set_t on emscripten (#9336)
* ggml : fix missing cpu_set_t on emscripten

* better version

* bring back android part
2024-09-07 12:01:34 +02:00
slaren
6c89eb0b47
ci : disable rocm image creation (#9340) 2024-09-07 10:48:54 +03:00
Xuan Son Nguyen
9b2c24c099
server : simplify state machine for slot (#9283)
* server : simplify state machine for slot

* add SLOT_STATE_DONE_PROMPT

* pop_deferred_task

* add missing notify_one

* fix passkey test

* metrics : add n_busy_slots_per_decode

* fix test step

* add test

* maybe fix AddressSanitizer?

* fix deque ?

* missing lock

* pop_deferred_task: also notify

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-06 23:21:29 +02:00
Aarni Koskela
134bc38ecf
llama-bench : log benchmark progress (#9287)
* llama-bench : add optional progress messages
2024-09-06 23:03:01 +02:00
Aarni Koskela
815b1fb20a
batched-bench : add --output-format jsonl option (#9293)
`--output-format` is modeled after `llama-bench`'s options
2024-09-06 17:59:58 +02:00
Changyeon Kim
409dc4f8bb
ggml : fix build break for the vulkan-debug (#9265)
- windows build : Ok.
- linux build : Ok.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
2024-09-06 15:54:50 +03:00
Xuan Son Nguyen
4a1411b4f1
server : fix missing lock (#9334) 2024-09-06 14:06:04 +02:00
Feng Jiang
424e3a52fe llama/kompute: Add multi-GPU support
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
2024-09-06 16:27:27 +08:00
Feng Jiang
56c5f988eb ggml/kompute: Introduce ggml_backend_kompute_get_device_memory()
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
2024-09-06 16:27:27 +08:00
Feng Jiang
97efd5047a ggml/kompute: Introduce ggml_backend_kompute_get_device_count()
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
2024-09-06 16:27:27 +08:00
Feng Jiang
cc9514f941 ggml/kompute: Remove unused ggml_backend_kompute_device_{ref, unref}()
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
2024-09-06 16:27:27 +08:00
Cong Liu
f57f8cb3da ggml/kompute: Reimplement kompute_manager
Signed-off-by: Cong Liu <liucong@kylinos.cn>
2024-09-06 16:27:12 +08:00
Markus Tavenrath
8ebe8ddebd
Improve Vulkan shader build system (#9239)
* Improve Vulkan shader builds system

- Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility.
- Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools

* remove not required self dependency
2024-09-06 08:56:17 +02:00
Cong Liu
3676778e82 ggml/kompute: Implement ggml_backend_i.offload_op interface
Signed-off-by: Cong Liu <liucong@kylinos.cn>
2024-09-06 10:57:00 +08:00
Weishi Li
d94ad56f87 ggml/kompute: Use the kp::Manager in ggml_backend_kompute_context instead of global
Signed-off-by: Weishi Li <liweishi@kylinos.cn>
2024-09-06 10:57:00 +08:00
Weishi Li
74ba8516ce ggml/kompute: Move butf into struct ggml_backend_kompute_context
Signed-off-by: Weishi Li <liweishi@kylinos.cn>
2024-09-06 10:57:00 +08:00
Ming Xie
e914ac7c68 ggml/kompute: Introducing struct ggml_backend_kompute_buffer_context
Signed-off-by: Ming Xie <xieming@kylinos.cn>
2024-09-06 10:57:00 +08:00