Akarshan Biswas
6e84b0ab8e
SYCL : SOFTMAX F16 mask support and other fixes ( #11261 )
...
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021 .
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).
* SYCL: SOFTMAX F16 mask support and other fixes
* test-backend-ops: Add F16 mask test cases
2025-01-28 09:56:58 +00:00
Michael Engel
2b8525d5c8
Handle missing model in CLI parameters for llama-run ( #11399 )
...
The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-01-28 08:32:40 +00:00
Eric Curtin
a4417ddda9
Add new hf protocol for ollama ( #11449 )
...
https://huggingface.co/docs/hub/en/ollama
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-27 19:36:10 +01:00
Haus1
d6d24cd9ed
AMD: parse the architecture as supplied by gcnArchName ( #11244 )
...
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
2025-01-27 14:58:17 +01:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed ( #11448 )
...
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-27 14:42:09 +01:00
Georgi Gerganov
e665b57fa2
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-01-27 14:09:22 +02:00
Johannes Gäßler
df984e0147
llama: refactor llama_decode_impl ( #11381 )
2025-01-27 12:07:12 +01:00
Ihar Hrachyshka
acd38efee3
metal: Handle null returned from MTLCreateSystemDefaultDevice() ( #11441 )
...
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
2025-01-27 09:41:59 +02:00
Xuan Son Nguyen
caf773f249
docker : fix ARM build and Vulkan build ( #11434 )
...
* ci : do not fail-fast for docker
* build arm64/amd64 separatedly
* fix pip
* no fast fail
* vulkan: try jammy
2025-01-26 22:45:32 +01:00
Georgi Gerganov
a0c500b4dc
context : prepare for abstraction
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
99422dfa3f
context : introduce llama_batch_manager
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
cb8f2095c6
wip
2025-01-26 20:16:22 +02:00
Georgi Gerganov
133ad6a723
context : initial need_reserve logic
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
c75ba6851e
context : move adapter code in the implementation [no ci]
2025-01-26 20:16:22 +02:00
Georgi Gerganov
f0713498fd
context : add get_ctx_padding()
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
b4ec1d4429
cont : move kv_self update to llama_context
...
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
f2524c0e41
llama : remove references to llama_kv_cache (wip)
...
Intermediate step necessary to abstract the `llama_context` and
`llama_kv_cache`.
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
ae274f9747
llama : fix names [no ci]
2025-01-26 20:16:21 +02:00
Georgi Gerganov
a19f671fe0
context : minor
...
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
17b363afd3
llama : update llama_kv_self API
...
ggml-ci
2025-01-26 20:16:20 +02:00
Georgi Gerganov
fd05ab87aa
kv_cache : move state read/write to llama_kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
4cd1b6fa4c
context : prepare kv_cache_read/write to be moved to kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
73a14eccc9
kv_cache : minor
2025-01-26 20:14:36 +02:00
Georgi Gerganov
fef90cb3d7
kv_cache : fix
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
4d7bd03e65
kv_cache : functions -> members
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
e4550fbafc
llama : cont
...
ggml-ci
2025-01-26 20:14:35 +02:00
Georgi Gerganov
f78b396ee7
llama : add struct llama_kv_cache (wip) [no ci]
2025-01-26 20:12:06 +02:00
Georgi Gerganov
178a7eb952
metal : use residency sets ( #11427 )
...
* metal : use residency sets
ggml-ci
* metal : restore commandBufferWithUnretainedReferences calls [no ci]
* metal : release descriptors
ggml-ci
* metal : check env GGML_METAL_NO_RESIDENCY
ggml-ci
* metal : fix build + clean-up
ggml-ci
2025-01-26 20:06:16 +02:00
Nuno
6f53d8a6b4
docker: add missing vulkan library to base layer and update to 24.04 ( #11422 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-26 18:22:43 +01:00
bandoti
19f65187cb
cmake: add ggml find package ( #11369 )
...
* Add initial ggml cmake package
* Add build numbers to ggml find-package
* Expand variables with GGML_ prefix
* Guard against adding to cache variable twice
* Add git to msys2 workflow
* Handle ggml-cpu-* variants
* Link ggml/ggml-base libraries to their targets
* Replace main-cmake-pkg with simple-cmake-pkg
* Interface features require c_std_90
* Fix typo
* Removed unnecessary bracket from status message
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-26 12:07:48 -04:00
Frank Mai
1d8ee06000
rpc: fix register position ( #11424 )
...
Signed-off-by: thxCode <thxcode0824@gmail.com>
2025-01-26 16:20:34 +01:00
Georgi Gerganov
2cc9b8c32c
readme : update hot topics
2025-01-26 14:30:15 +02:00
Jeff Bolz
f35726c2fb
build: apply MSVC /bigobj option to c/cpp files only ( #11423 )
2025-01-26 03:10:03 +01:00
Jeff Bolz
4a75d19376
vulkan: compile shaders on-demand ( #11406 )
...
Reduce first-run startup time and memory consumption.
Should fix #11339 .
2025-01-25 22:29:57 +01:00
uvos
26771a1491
Hip: disable VMM on hip as it seams that it dosent work in some configurations ( #11420 )
2025-01-25 21:01:12 +01:00
Jeff Bolz
ca6baf76c1
build: add /bigobj to MSVC build ( #11407 )
2025-01-25 11:26:37 -06:00
Diego Devesa
6e264a905b
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for ( #11419 )
2025-01-25 17:22:41 +01:00
Xuan Son Nguyen
49b0e3cec4
server : fix cleaning up stream task ( #11418 )
...
* server : fix cleaning up stream task
* one more spot
2025-01-25 16:36:44 +01:00
Diego Devesa
20a758155b
docker : fix CPU ARM build ( #11403 )
...
* docker : fix CPU ARM build
* add CURL to other builds
2025-01-25 15:22:29 +01:00
Georgi Gerganov
00c24acb2a
ci : fix line breaks on windows builds ( #11409 )
...
* ci : fix line breaks on windows builds
* cont : another try
* ci : fix powershell line breaks
2025-01-25 13:36:48 +02:00
jiahao su
466ea66f33
CANN: Add Ascend CANN build ci ( #10217 )
...
* CANN: Add Ascend CANN build ci
* Update build.yml
* Modify cann image version
* Update build.yml
* Change to run on x86 system
* Update build.yml
* Update build.yml
* Modify format error
* Update build.yml
* Add 'Ascend NPU' label restrictions
* Exclude non PR event
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
* Update build.yml
---------
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-01-25 00:26:01 +01:00
uvos
5f0db9522f
hip : Add hipGraph and VMM support to ROCM ( #11362 )
...
* Add hipGraph support
* Enable VMM on rocm
2025-01-25 00:02:23 +01:00
Johannes Gäßler
c5d9effb49
CUDA: fix FP16 cuBLAS GEMM ( #11396 )
2025-01-24 21:02:43 +01:00
uvos
9fbadaef4f
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna ( #11356 )
2025-01-24 17:50:49 +01:00
Georgi Gerganov
9755129c27
release : pack /lib in the packages ( #11392 )
...
* release : pack /lib and /include in the packages
* cmake : put libs in /bin
* TMP : push artifacts
* Revert "TMP : push artifacts"
This reverts commit 4decf2c4df
.
* ci : fix HIP cmake compiler options to be on first line
* ci : restore the original HIP commands
* ci : change ubuntu build from latest to 20.04
* ci : try to fix macos build rpaths
* ci : remove obsolete MacOS build
* TMP : push artifacts
* ci : change back to ubuntu latest
* ci : macos set build rpath to "@loader_path"
* ci : fix typo
* ci : change ubuntu package to 22.04
* Revert "TMP : push artifacts"
This reverts commit 537b09e70f
.
2025-01-24 18:41:30 +02:00
Jafar Uruç
a07c2c8a52
docs : Update readme to build targets for local docker build ( #11368 )
2025-01-24 14:30:13 +01:00
Johannes Gäßler
8137b4bb2b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support ( #11380 )
2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann
1af6945eb0
cmake : avoid -march=native when reproducible build is wanted ( #11366 )
...
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.
Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.
Fixes : #11317
This patch was done while working on reproducible builds for openSUSE.
2025-01-24 13:21:35 +02:00
Eric Curtin
01f37edf1a
Update llama-run README.md ( #11386 )
...
For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-24 09:39:24 +00:00
stduhpf
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element ( #11364 )
...
* webui : put DeepSeek R1 CoT in a collapsible <details> element
* webui: refactor split
* webui: don't use regex to split cot and response
* webui: format+qol
* webui: no loading icon if the model isn't generating
* ui fix, add configs
* add jsdoc types
* only filter </think> for assistant msg
* build
* update build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00