Commit graph

2124 commits

Author SHA1 Message Date
Jared Van Bortel
57cecad175 main : remove ggml-kompute.h #include 2024-01-26 16:37:33 -05:00
Jared Van Bortel
91324851a3 ci : initial attempt at testing Kompute backend 2024-01-26 16:36:31 -05:00
Jared Van Bortel
297fde5f58 editorconfig-checker : exclude .gitmodules 2024-01-26 15:48:35 -05:00
Jared Van Bortel
454baebacc op_mul_mat_mat_f32.comp : fix missing final newline 2024-01-26 15:44:13 -05:00
Jared Van Bortel
cdab4043b3 kompute : fix #includes 2024-01-26 15:10:54 -05:00
Jared Van Bortel
2ff2d16131 ggml-kompute.h : remove anything that doesn't need to be public
The remaining functions are either used by llama.cpp or GPT4All.
2024-01-26 15:08:46 -05:00
Jared Van Bortel
6af02b19d1 kompute : init device automatically and remove an unnecessary free 2024-01-26 14:45:52 -05:00
slaren
8ca33dec7d test-backend-ops : check all the ops in the test for support in the backends 2024-01-26 20:01:36 +01:00
Jared Van Bortel
2512799cfe test-backend-ops : comment out Llama and Falcon tests 2024-01-26 13:55:10 -05:00
Jared Van Bortel
aea84989f7 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/nomic-vulkan 2024-01-26 13:46:49 -05:00
Jared Van Bortel
e6ce5f21a1 llama : revert unintended whitespace change 2024-01-26 13:10:49 -05:00
slaren
62fead3ea0
cuda : fix tensor size calculation for non-split buffer (#5145) 2024-01-26 18:59:43 +01:00
Jared Van Bortel
61a5cf88dc kompute : remove unnecessary use_mmap=false 2024-01-26 12:58:50 -05:00
slaren
15b4538ff2
ggml-alloc : add 10% margin to the buffer sizes (#5149) 2024-01-26 19:18:26 +02:00
snadampal
7032f4f634
ggml : update softmax n_task calculation (#5126)
updated the n_task calculation to use max number of
threads possible. This has improved the prompt eval
performance by around 5% for DOT kernels and by
around 10% for MMLA kernels on AWS Graviton3.
2024-01-26 19:17:59 +02:00
Georgi Gerganov
5f1925a8ce
scripts : move run-with-preset.py from root to scripts folder 2024-01-26 17:09:44 +02:00
Georgi Gerganov
3b7c914de2
tests : gitignore test-c.o 2024-01-26 14:48:15 +02:00
Xuan Son Nguyen
48c857aa10
server : refactored the task processing logic (#5065)
* server: add llama_server_queue struct

* server: add llama_server_response_event

* server: add comments

* server: move all mutexes away from server.cpp

* server: correct multitask response

* server: only add back deferred tasks when one slot is available

* server: fix a race condition cause by "request_completion"
2024-01-26 14:42:20 +02:00
crasm
413e7b0559
ci : add model tests + script wrapper (#4586)
* scripts : add lib.sh and lib_test.sh

* scripts : stub out new ci-run.sh script

* scripts : switch to PascalCase for functions

This looks a little odd at first, but I find it very useful as a
convention to know if a command is part of our code vs a builtin.

* scripts : add some fancy conversion from snake_case to PascalCase

* Add venv to ci/run.sh

* Revert scripts work

* scripts : add wrapper script for local use of ci/run.sh

* Simplify .gitignore for tests, clang-tidy fixes

* Label all ctest tests

* ci : ctest uses -L main

* Attempt at writing ctest_with_model

* Update test-model-load-cancel

* ci : add ctest_with_model for debug and release

ggml-ci

* Fix gg_get_model function

ggml-ci

* got stuck on CMake

* Add get_model.cpp to tests/CMakeLists.txt

ggml-ci

* Fix README.md output for ctest_with_model

ggml-ci

* workflows : use `-L main` for all ctest

ggml-ci

* Fixes

* GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE
* Always show warning rather than failing if model file variable is not
  set

* scripts : update usage text for ci-run.sh
2024-01-26 14:18:00 +02:00
Paul Tsochantaris
6dd3c28c9c
metal : remove unused n_buffers and buffers (#5129) 2024-01-26 14:16:07 +02:00
Riceball LEE
38b431de23
gguf : fix "general.alignment" type in gguf_reader.py (#5136) 2024-01-26 11:10:28 +02:00
Georgi Gerganov
aad0b01d73
readme : update hot topics 2024-01-26 10:52:33 +02:00
Kawrakow
1182cf4d4f
Another bucket sort (#5109)
* Initial bucket sort

* Bucket sort: slightly better version

* Bucket sort: another minor improvement

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-26 09:14:39 +02:00
Jared Van Bortel
91654ff042 kompute : fix a -Wstrict-aliasing warning 2024-01-25 17:03:06 -05:00
Jared Van Bortel
bc287047fb kompute : remove unused immintrin.h #include 2024-01-25 16:07:46 -05:00
Jared Van Bortel
3915194232 test-backend-ops : make Falcon test faster with a smaller model 2024-01-25 15:56:42 -05:00
Jared Van Bortel
3fbf0529ef kompute : mark last few failing ops as unsupported 2024-01-25 15:47:43 -05:00
Jared Van Bortel
445a3734b7 kompute : fix basic Q6_K get_rows, 26 -> 24 failures 2024-01-25 15:38:39 -05:00
Jared Van Bortel
de9fba0d39 kompute : fix basic f16 get_rows, 28 -> 26 failures 2024-01-25 15:22:26 -05:00
XiaotaoChen
fe54033b69
readme : add MobileVLM 1.7B/3B to the supported models list (#5107)
Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-25 22:14:32 +02:00
l3utterfly
5eaf9964fc
llama : dynamic temperature sampling (#4972)
* implemented dynamic temperature sampling from koboldcpp

* removed trailing whitespace

* removed unused temp parameter in llama_sample_entropy

* exposed exponent_val in dynamic temp sampler

* added debug check for printf statements

* use nullptr in llama_sample_softmax call during llama_sample_entropy

this avoids counting the time taken stats twice

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* return earlier if there is only 1 candiate (i.e. max_entropy == 0)

* reformat 't' case in llama_sample_queue

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* check for one or zero candidates case in llama_sample_entropy

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-25 22:06:22 +02:00
Jared Van Bortel
11b305082b test-backend-ops : restore softmax tests 2024-01-25 15:05:55 -05:00
Jared Van Bortel
38d1f0c7a0 kompute : fix op_gelu -> Falcon is working on AMDVLK 2024-01-25 15:01:46 -05:00
Jared Van Bortel
6fc99a6e66 test-backend-ops : test larger GELU range 2024-01-25 15:01:46 -05:00
Jared Van Bortel
d292f4f204
examples : make pydantic scripts pass mypy and support py3.8 (#5099) 2024-01-25 14:51:24 -05:00
Jared Van Bortel
1849b85473 test-backend-ops : add Falcon test 2024-01-25 13:55:49 -05:00
Valentin Konovalov
256d1bb0dd
android : use release cmake build type by default (#5123) 2024-01-25 19:05:51 +02:00
Jared Van Bortel
f5ac635473 kompute : fix q8_0 mmv, 41 -> 28 failures 2024-01-25 11:27:11 -05:00
Jared Van Bortel
987335ea0a kompute : fix algorithm names 2024-01-25 11:09:18 -05:00
Kawrakow
faa3526a1e
Fix Q3_K_XS for MoE models (#5113)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25 17:58:53 +02:00
Georgi Gerganov
ddc5a5033f
metal : show compile log messages 2024-01-25 11:26:17 +02:00
Jared Van Bortel
ec68a9657f test-backend-ops : increase max_nmse_err so Llama passes 2024-01-24 17:31:34 -05:00
Engininja2
cd4fddb29f
cuda : fix 2-bit quants on amd hip (#5105)
* cuda : fix 2-bit quants on amd hip

* use __low2float intrinsic function for new quants
2024-01-24 23:18:15 +01:00
Jared Van Bortel
ebb5f7e968 test-backend-ops : test llama with different batch sizes 2024-01-24 16:55:44 -05:00
Jared Van Bortel
df687b10ab kompute : support mask parameter of softmax 2024-01-24 16:51:27 -05:00
Jared Van Bortel
8bd38fe32d test-backend-ops : test mask parameter of ggml_soft_max_ext 2024-01-24 16:28:41 -05:00
Jared Van Bortel
308f279622 kompute : support scale parameter of softmax 2024-01-24 16:17:37 -05:00
Jared Van Bortel
1450966071 test-backend-ops : test scale parameter of ggml_soft_max_ext 2024-01-24 16:17:37 -05:00
Jared Van Bortel
2852902eda test-backend-ops : add llama test 2024-01-24 16:17:29 -05:00
Jared Van Bortel
2b0f642fec fix f16 mmv, 49 -> 41 failures 2024-01-24 13:43:49 -05:00