Commit graph

4479 commits

Author SHA1 Message Date
Georgi Gerganov
b48d763583
vocab : more pimpl (#11165)
ggml-ci
2025-01-10 11:02:57 +02:00
Georgi Gerganov
446fec5023
hparams : move vocab params to llama_vocab (#11159)
ggml-ci
2025-01-10 11:02:57 +02:00
Georgi Gerganov
df1c467a72
llama : add struct llama_vocab to the API (#11156)
ggml-ci
2025-01-10 11:02:57 +02:00
Georgi Gerganov
bfe781a42d
vocab : fix bug (eos -> bos)
ggml-ci
2025-01-10 10:22:48 +02:00
Georgi Gerganov
dfd319c890
model : fix Phi MoE conflicts
ggml-ci
2025-01-10 10:22:48 +02:00
Georgi Gerganov
cee3648ee3
llama : vocab cleanup
ggml-ci
2025-01-10 10:22:48 +02:00
Georgi Gerganov
9dd71e078f
llama : vocab pimpl cont
ggml-ci
2025-01-10 10:22:47 +02:00
Georgi Gerganov
615bea8629
llama : vocabl private charsmap
ggml-ci
2025-01-10 10:22:47 +02:00
Georgi Gerganov
885495ccd1
llama : vocab load 2025-01-10 10:22:47 +02:00
Georgi Gerganov
695a0037db
llama : vocab fix names 2025-01-10 10:22:47 +02:00
Georgi Gerganov
2c9f20d4bb
llama : vocab pimpl
ggml-ci
2025-01-10 10:22:47 +02:00
Georgi Gerganov
f4b6969b1d
llama : vocab
ggml-ci
2025-01-10 10:22:47 +02:00
Georgi Gerganov
7cf1ae4afb
llama : remove unicode.h from llama-model.cpp
ggml-ci
2025-01-10 10:22:46 +02:00
slaren
c1d6ae9bd8
Revert "ninja multi-config -> ninja"
This reverts commit b2fbe63862199b73312fd179340ff2124b4d9957.
2025-01-10 10:22:46 +02:00
slaren
d3cbd43cc6
test 2025-01-10 10:22:46 +02:00
slaren
dfdc4d786a
ninja multi-config -> ninja 2025-01-10 10:22:46 +02:00
Georgi Gerganov
6860c4bef3
test 2025-01-10 10:22:46 +02:00
Georgi Gerganov
5c8d759a3f
llama : fix llm_type enum names
ggml-ci
2025-01-10 10:22:44 +02:00
Georgi Gerganov
a15e22537f
llama : pimpl llama_model
ggml-ci
2025-01-10 10:21:45 +02:00
Georgi Gerganov
a16daa9552
llama : move load tensors to llama_model
ggml-ci
2025-01-10 10:21:00 +02:00
Georgi Gerganov
662dd05016
llama : add llama_model methods
ggml-ci
2025-01-10 10:15:16 +02:00
0cc4m
c3f9d25706
Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161)
* Vulkan: Remove float16 use in shaders

* Fix validation error about subgroup_size_control extension
2025-01-10 06:39:33 +01:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture (#11001)
llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Akarshan Biswas
c6860cc734
SYCL: Refactor ggml_sycl_compute_forward (#11121)
* SYCL: refactor ggml_sycl_compute_forward

* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy

* SYCL: add function name to noop debug

* SYCL: Some device info print refactoring and add details of XMX availability
2025-01-10 08:13:03 +08:00
Tei Home
1204f97270
doc: add cuda guide for fedora (#11135)
Since NVIDIA does not release CUDA for in-maintenance versions of Fedora, the process of setting up the CUDA toolkit on Fedora has become quite involved. This guide should help mere mortals install CUDA for development in a Fedora 39 toolbox environment, without affecting the host system.
2025-01-09 11:32:06 +00:00
Daniel Bevenius
8eceb888d7
server : add tooltips to settings and themes btn (#11154)
* server : add tooltips to settings and themes btn

This commit adds tooltips to the settings and themes buttons in the
webui. The tooltip will be displayed below the actual buttons when
hovered over.

The motivation for this change is to clarify the purpose of the themes
button.

* squash! server : add tooltips to settings and themes btn

This commit adds a tooltip to the '...' button when a chat has been
started. The tooltip is "Chat options" which think could be a good
description as the dropdown contains options to delete or download the
current chat.

* rm tooltip for 3 dots button

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-09 11:28:29 +01:00
Pierrick Hymbert
f8feb4b01a
model: Add support for PhiMoE arch (#11003)
* model: support phimoe

* python linter

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: add phimoe as supported model

ggml-ci

---------

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
2025-01-09 11:21:41 +01:00
Georgi Gerganov
be0e950c91
media : remove old img [no ci] 2025-01-09 11:15:15 +02:00
Xuan Son Nguyen
d9feae1c06
llama-chat : add phi 4 template (#11148) 2025-01-09 10:07:33 +01:00
hydai
8d59d91171
fix: add missing msg in static_assert (#11143)
Signed-off-by: hydai <z54981220@gmail.com>
2025-01-08 20:03:28 +00:00
Vinesh Janarthanan
8a1d9c25fa
gguf-py : move scripts directory (#11116)
* Moved scripts dir and fixed pyproject.toml

* updated readme

* fixed README urls

* bump pypi gguf to v0.14.0

* retrigger ci

* empty commit - trigger ci
2025-01-08 20:54:58 +02:00
Eric Curtin
1bf839b1e8
Enhance user input handling for llama-run (#11138)
The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-08 18:47:05 +00:00
Xuan Son Nguyen
f7cd13301c
ci : use actions from ggml-org (#11140) 2025-01-08 16:09:20 +01:00
Xuan Son Nguyen
4d2b3d8804
lora : improve compat with mergekit-extract-lora (#11131)
* (wip) support mergekit-extracted lora

* support mergekit-extract-lora

* use lora->get_scale

* correct comment

* correct norm name & condition

* add some hints
2025-01-08 15:59:53 +01:00
Georgi Gerganov
c07d437bbd
llama : avoid hardcoded QK_K (#11061)
ggml-ci
2025-01-08 16:19:36 +02:00
Georgi Gerganov
99a3755a3c
sync : ggml 2025-01-08 13:40:30 +02:00
Radoslav Gerganov
c792dcf488
ggml : allow loading backend with env variable (ggml/1059)
ref: #1058
2025-01-08 13:40:18 +02:00
Xuan Son Nguyen
80ccf5d725
ci : pin dependency to specific version (#11137)
* ci : pin dependency to specific version

* will this fix ec?
2025-01-08 12:07:20 +01:00
Georgi Gerganov
a3c1232c3f
arg : option to exclude arguments from specific examples (#11136)
* arg : option to exclude arguments from specific examples

ggml-ci

* readme : remove old args [no ci]
2025-01-08 12:55:36 +02:00
amritahs-ibm
8cef75c743
llamafile : ppc64le MMA INT8 implementation (#10912)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-01-08 12:54:19 +02:00
Georgi Gerganov
0d52a69e4b
ci : fix cmake option (#11125) 2025-01-08 11:29:34 +02:00
Mathieu Baudier
02f0430141
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117)
* Disable GL_KHR_cooperative_matrix Vulkan extension if not available.

* Perform Vulkan extensions checks in a more sensible order

* Remove unnecessary #ifdef directive
2025-01-08 09:18:13 +01:00
ag2s20150909
bec2183f2c
fix: Vulkan shader gen binary path when Cross-compiling (#11096)
* fix: Vulkan shader gen binary path when cross compiling
2025-01-08 09:17:29 +01:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes (#11030)
* GGUF: C++ refactor, backend support, misc fixes

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types
2025-01-07 18:01:58 +01:00
Diego Devesa
017cc5f446
ggml-backend : only offload from host buffers (fix) (#11124) 2025-01-07 16:11:57 +01:00
Diego Devesa
a3d50bc022
ggml-backend : only offload from host buffers (#11120) 2025-01-07 12:38:05 +01:00
Radoslav Gerganov
a4dd490069
rpc : code cleanup (#11107)
Remove duplicated macros, use GGML_LOG_ERROR for errors
2025-01-07 08:37:02 +02:00
Akarshan Biswas
c0d6f790d0
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087)
* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6

* Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6"

This reverts commit f62dc45f31.

* Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6
2025-01-07 14:26:07 +08:00
Eric Curtin
dc7cef9f37
llama-run : fix context size (#11094)
Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-06 23:45:28 +01:00
Georgi Gerganov
ecebbd292d
llama : remove unused headers (#11109)
ggml-ci
2025-01-06 17:52:35 +02:00