Paul Tsochantaris
6dd3c28c9c
metal : remove unused n_buffers
and buffers
( #5129 )
2024-01-26 14:16:07 +02:00
Riceball LEE
38b431de23
gguf : fix "general.alignment" type in gguf_reader.py ( #5136 )
2024-01-26 11:10:28 +02:00
Georgi Gerganov
aad0b01d73
readme : update hot topics
2024-01-26 10:52:33 +02:00
Kawrakow
1182cf4d4f
Another bucket sort ( #5109 )
...
* Initial bucket sort
* Bucket sort: slightly better version
* Bucket sort: another minor improvement
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-26 09:14:39 +02:00
Jared Van Bortel
91654ff042
kompute : fix a -Wstrict-aliasing warning
2024-01-25 17:03:06 -05:00
Jared Van Bortel
bc287047fb
kompute : remove unused immintrin.h #include
2024-01-25 16:07:46 -05:00
Jared Van Bortel
3915194232
test-backend-ops : make Falcon test faster with a smaller model
2024-01-25 15:56:42 -05:00
Jared Van Bortel
3fbf0529ef
kompute : mark last few failing ops as unsupported
2024-01-25 15:47:43 -05:00
Jared Van Bortel
445a3734b7
kompute : fix basic Q6_K get_rows, 26 -> 24 failures
2024-01-25 15:38:39 -05:00
Jared Van Bortel
de9fba0d39
kompute : fix basic f16 get_rows, 28 -> 26 failures
2024-01-25 15:22:26 -05:00
XiaotaoChen
fe54033b69
readme : add MobileVLM 1.7B/3B to the supported models list ( #5107 )
...
Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>
2024-01-25 22:14:32 +02:00
l3utterfly
5eaf9964fc
llama : dynamic temperature sampling ( #4972 )
...
* implemented dynamic temperature sampling from koboldcpp
* removed trailing whitespace
* removed unused temp parameter in llama_sample_entropy
* exposed exponent_val in dynamic temp sampler
* added debug check for printf statements
* use nullptr in llama_sample_softmax call during llama_sample_entropy
this avoids counting the time taken stats twice
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* return earlier if there is only 1 candiate (i.e. max_entropy == 0)
* reformat 't' case in llama_sample_queue
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* check for one or zero candidates case in llama_sample_entropy
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-01-25 22:06:22 +02:00
Jared Van Bortel
11b305082b
test-backend-ops : restore softmax tests
2024-01-25 15:05:55 -05:00
Jared Van Bortel
38d1f0c7a0
kompute : fix op_gelu -> Falcon is working on AMDVLK
2024-01-25 15:01:46 -05:00
Jared Van Bortel
6fc99a6e66
test-backend-ops : test larger GELU range
2024-01-25 15:01:46 -05:00
Jared Van Bortel
d292f4f204
examples : make pydantic scripts pass mypy and support py3.8 ( #5099 )
2024-01-25 14:51:24 -05:00
Jared Van Bortel
1849b85473
test-backend-ops : add Falcon test
2024-01-25 13:55:49 -05:00
Valentin Konovalov
256d1bb0dd
android : use release cmake build type by default ( #5123 )
2024-01-25 19:05:51 +02:00
Jared Van Bortel
f5ac635473
kompute : fix q8_0 mmv, 41 -> 28 failures
2024-01-25 11:27:11 -05:00
Jared Van Bortel
987335ea0a
kompute : fix algorithm names
2024-01-25 11:09:18 -05:00
Kawrakow
faa3526a1e
Fix Q3_K_XS for MoE models ( #5113 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-25 17:58:53 +02:00
Georgi Gerganov
ddc5a5033f
metal : show compile log messages
2024-01-25 11:26:17 +02:00
Jared Van Bortel
ec68a9657f
test-backend-ops : increase max_nmse_err so Llama passes
2024-01-24 17:31:34 -05:00
Engininja2
cd4fddb29f
cuda : fix 2-bit quants on amd hip ( #5105 )
...
* cuda : fix 2-bit quants on amd hip
* use __low2float intrinsic function for new quants
2024-01-24 23:18:15 +01:00
Jared Van Bortel
ebb5f7e968
test-backend-ops : test llama with different batch sizes
2024-01-24 16:55:44 -05:00
Jared Van Bortel
df687b10ab
kompute : support mask parameter of softmax
2024-01-24 16:51:27 -05:00
Jared Van Bortel
8bd38fe32d
test-backend-ops : test mask parameter of ggml_soft_max_ext
2024-01-24 16:28:41 -05:00
Jared Van Bortel
308f279622
kompute : support scale parameter of softmax
2024-01-24 16:17:37 -05:00
Jared Van Bortel
1450966071
test-backend-ops : test scale parameter of ggml_soft_max_ext
2024-01-24 16:17:37 -05:00
Jared Van Bortel
2852902eda
test-backend-ops : add llama test
2024-01-24 16:17:29 -05:00
Jared Van Bortel
2b0f642fec
fix f16 mmv, 49 -> 41 failures
2024-01-24 13:43:49 -05:00
Jared Van Bortel
1a14099c43
fix q4_0/q4_1 mmv, 65 -> 49 failures
2024-01-24 13:43:48 -05:00
Jared Van Bortel
0787b80db8
kompute : remove broken mulrow kernel -> 1 less test failure
2024-01-24 13:43:48 -05:00
Jared Van Bortel
2755ae3d10
kompute : fix more dispatch ambiguity -> 12 less failures
2024-01-24 13:43:47 -05:00
Jared Van Bortel
08e23fd78c
kompute : fix op_mul kernel -> 13 less test failures
2024-01-24 13:43:47 -05:00
Jared Van Bortel
0899adf86e
kompute : fix get_rows dispatch -> 4 less failures
2024-01-24 13:43:47 -05:00
Jared Van Bortel
cb9ceff966
minor cleanup
2024-01-24 13:43:46 -05:00
Georgi Gerganov
33e8d6abe1
kompute : fix ggml_add kernel ( #5027 )
2024-01-24 13:43:46 -05:00
Jared Van Bortel
2f6a279e29
fix supported ops for kompute backend
2024-01-24 13:43:45 -05:00
Jared Van Bortel
07530731ba
never try to evaluate an empty command buffer
...
This fixes the immediate crashes with test-backend-ops - when
evaluatating individual no-ops like OP_VIEW, it tries to submit an empty
command buffer, which crashes RADV and hangs AMDVLK.
2024-01-24 13:43:45 -05:00
Jared Van Bortel
729e1a4cc1
sync op_rope_f16 with recent op_rope_f32 changes
2024-01-24 13:43:45 -05:00
Jared Van Bortel
e9d5223da3
actually fix this assertion
2024-01-24 13:43:44 -05:00
Jared Van Bortel
9431026a84
clean up old backend code
2024-01-24 13:43:44 -05:00
Georgi Gerganov
d6bd471693
kompute : fix rope_f32 and scale ops ( #5008 )
2024-01-24 13:43:44 -05:00
Jared Van Bortel
76474a7c0d
kompute : ignore exceptions in ggml_vk_available_devices ( #12 )
...
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-24 13:43:43 -05:00
Jared Van Bortel
cad72e1252
add sanity check and fix kompute teardown order
2024-01-24 13:43:43 -05:00
Jared Van Bortel
070919dbf7
attempt to get test-backend-ops working
2024-01-24 13:43:43 -05:00
Jared Van Bortel
5f660dada8
fix assertion failure
2024-01-24 13:43:42 -05:00
Jared Van Bortel
298d6eec09
kompute : initial attempt at ggml-backend v2 support
2024-01-24 13:43:40 -05:00
Jared Van Bortel
7c527eb568
Merge commit ' e7e4df031b
' into HEAD
2024-01-24 13:39:17 -05:00