Commit graph

4045 commits

Author SHA1 Message Date
Xuan Son Nguyen
b97259e69b small fixes 2024-11-06 16:24:54 +01:00
Xuan Son Nguyen
6a21c4d020 better auto scroll 2024-11-06 15:14:58 +01:00
Xuan Son Nguyen
e25be408a6 clean up a bit 2024-11-06 14:58:08 +01:00
Xuan Son Nguyen
34c01b9c51 fix tests 2024-11-06 10:34:47 +01:00
Xuan Son Nguyen
30c6abb73f add GET method for CORS 2024-11-05 23:51:32 +01:00
Xuan Son Nguyen
29f6d82b19 make CORS preflight more explicit 2024-11-05 23:50:16 +01:00
Xuan Son Nguyen
58d0588fbd better error handling 2024-11-05 23:35:47 +01:00
Xuan Son Nguyen
9096e5ed5e docs: how to use legacy ui 2024-11-05 22:21:47 +01:00
Xuan Son Nguyen
712ee17aa0 small fixes 2024-11-05 21:45:22 +01:00
Xuan Son Nguyen
6ea3315334 regenerate, edit, copy buttons 2024-11-05 19:52:00 +01:00
Xuan Son Nguyen
654ec7ce0d fix tests 2024-11-05 17:10:22 +01:00
Xuan Son Nguyen
255a3205c0 save theme preferences 2024-11-05 17:09:57 +01:00
Xuan Son Nguyen
521be4c31a fix bg-base classes 2024-11-05 16:57:48 +01:00
Xuan Son Nguyen
9719450232 add conversation history, save to localStorage 2024-11-05 16:39:24 +01:00
Xuan Son Nguyen
7f3daf09f3 basic markdown support 2024-11-05 14:24:25 +01:00
Xuan Son Nguyen
191887b771 embed deps into binary 2024-11-05 13:19:16 +01:00
Xuan Son Nguyen
fdf0c07df2 move old files to legacy folder 2024-11-04 23:38:19 +01:00
Xuan Son Nguyen
120d05b7de server : simple chat UI with vuejs and daisyui 2024-11-04 23:09:57 +01:00
Diego Devesa
ea02c753eb
cuda : clear error after changing peer access (#10153) 2024-11-04 13:10:23 +01:00
Georgi Gerganov
05697f670b
metal : simplify f16 and f32 dequant kernels (#0) 2024-11-04 13:49:34 +02:00
Georgi Gerganov
f8e58135cf
metal : move dequantize templates to beginning of MSL source (#0) 2024-11-04 13:44:06 +02:00
leo-pony
329ed914c9
CANN: adjust backend registry refactor. (#10158)
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
2024-11-04 19:08:22 +08:00
Georgi Gerganov
ce027adfb3
sync : ggml 2024-11-04 10:33:37 +02:00
Yuri Khrustalev
284e5b0275
cmake : make it possible linking ggml as external lib (ggml/1003) 2024-11-04 10:33:11 +02:00
Plamen Minev
e2292aaa17
metal : fix minor string leaks (ggml/1004) 2024-11-04 10:33:10 +02:00
Diego Devesa
9f40989351
ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
Georgi Gerganov
08828a6d7d
metal : minor fixup in FA kernel (#10143)
* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var
2024-11-03 15:18:40 +02:00
Georgi Gerganov
1839f69130
flake.lock: Update (#10146) 2024-11-03 05:14:15 -08:00
Christian Köhnenkamp
9830b6923b
Add apple arm to presets (#10134)
* Add apple arm to presets

* Add final new line
2024-11-02 15:35:31 -07:00
sasha0552
42cadc74bd
server : fix slot selection by lru (#10126)
* server : fix slot selection by lru, migrate lcs to `size_t`

* minor debug log fix
2024-11-02 18:34:56 +02:00
Georgi Gerganov
45950415ed
server : fix endpoint checks (#10135)
ggml-ci
2024-11-02 18:34:00 +02:00
Georgi Gerganov
1926d6e39d
llama : adjust default context size + print warnings (#10136)
* llama : adjust default context size + print warnings

ggml-ci

* ggml-ci : add missing gpu-layers + adjust context sizes
2024-11-02 15:18:56 +02:00
Diego Devesa
b634f8a26f
simple-chat : only add bos on first prompt (#10129) 2024-11-02 13:08:53 +01:00
Xuan Son Nguyen
7554aa4655
convert-lora : make --base optional (#10110)
* convert-lora : make `--base` optional

* lint

* handle case where base_model_name_or_path is invalid

* do not include metadata from base model

* clarify unspecified --base

* add small comment [no ci]

* trigger ci
2024-11-02 12:53:17 +01:00
Diego Devesa
a6744e43e8
llama : add simple-chat example (#10124)
* llama : add simple-chat example

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-11-01 23:50:59 +01:00
Diego Devesa
e991e3127f
llama : use smart pointers for ggml resources (#10117) 2024-11-01 23:48:26 +01:00
Shupei Fan
418f5eef26
vulkan : improve ggml_vk_create_buffer error handling (#9898) 2024-11-01 19:33:14 +01:00
Georgi Gerganov
ba6f62eb79
readme : update hot topics 2024-11-01 17:31:51 +02:00
sasha0552
d865d1478c
server : fix smart selection of available slot (#10120)
* Fix smart selection of available slot

* minor fix

* replace vectors of tokens with shorthands
2024-11-01 14:33:14 +01:00
Georgi Gerganov
1804adb0cf
ggml : remove ggml_scratch (#10121)
ggml-ci
2024-11-01 12:58:45 +02:00
Georgi Gerganov
815fe72adc
sync : ggml 2024-11-01 10:28:24 +02:00
Georgi Gerganov
f221d56220
ggml : alloc ggml_contexts on the heap (whisper/2525) 2024-11-01 10:24:50 +02:00
Zhenwei Jin
e597e50794
build: fix build error in Windows env with OneAPI setup (#10107) 2024-11-01 11:09:59 +08:00
Diego Devesa
85679d37f3
llama : improve output buffer type selection (#10098) 2024-11-01 00:49:53 +01:00
Diego Devesa
1e9f94994e
quantize : fix --keep-split (#10114) 2024-11-01 00:45:34 +01:00
Diego Devesa
c02e5ab2a6
llama : fix buffer checks for mamba and rwk (#10111)
* llama : fix buffer checks for mamba and rwk

* llama : fix missing worst case flag during reserve

* cuda : fix supports_op for norm

* disable sched SET_CAUSE
2024-10-31 22:54:23 +01:00
Zhenwei Jin
ab3d71f97f
loader: refactor tensor weights storage (#9935)
* loader: refactor tensor weights storage

* use sorted map, sort weights by layer

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-10-31 19:50:39 +01:00
Kevin Gibbons
0a683e8088
server : include scheme when printing URL (#10106) 2024-10-31 14:02:35 +01:00
Diego Devesa
dea5e86051
ggml : check tensor name lengths in gguf files (#10100) 2024-10-31 11:40:59 +01:00
Sergio López
1329c0a75e
kompute: add mul_mat_q4_k shader (#10097)
This is a more or less direct translation from the Metal implementation
to GLSL.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-10-31 11:09:52 +02:00