Commit graph

4071 commits

Author SHA1 Message Date
ochafik
4394e1cd5e Update tool-call.cpp 2024-10-23 21:21:39 +01:00
ochafik
2b49440011 tool-call: fix previous commit's parallel arg 2024-10-23 02:35:21 +01:00
ochafik
3e12b9b38e tool-calls: basic Nemo support, default parallel to true if template mentions tool_call_id 2024-10-23 02:30:31 +01:00
ochafik
fc80ad20ce tool-call: Log tool call style name, ensure returned content not null 2024-10-22 23:41:47 +01:00
ochafik
a4f12a4594 minja: fix string subscripts, add string pipe to support Mistral-Nemo template 2024-10-22 23:39:46 +01:00
Olivier Chafik
351aecbe3f Update llama-sampling.cpp 2024-10-22 14:37:43 +01:00
Olivier Chafik
db4bf93812 Merge remote-tracking branch 'origin/master' into tool-call 2024-10-22 14:37:30 +01:00
Olivier Chafik
7f2429e6b0 tool-calls: fix grammar regression 2024-10-22 11:49:50 +01:00
Molly Sophia
4ff7fe1fb3
llama : add chat template for RWKV-World + fix EOT (#9968)
* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-10-22 13:33:37 +03:00
ochafik
b53362a148 Update test-tool-call.cpp 2024-10-22 10:54:48 +01:00
ochafik
9f5ab97756 tool-calls: add generic tool call style as default 2024-10-22 10:53:21 +01:00
ochafik
fa8462ffd3 fix root 2024-10-22 10:53:01 +01:00
ochafik
75764871e6 tool-call: fix grammar roots 2024-10-22 10:50:52 +01:00
leo-pony
6b8447352d
[CANN] Adapt to dynamically loadable backends mechanism (#9970)
* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request
2024-10-22 16:16:01 +08:00
Daniel Bevenius
674804a996
arg : fix typo in embeddings argument help [no ci] (#9994)
This commit fixes two typos in the help text for the `--embd-normalize`
and `--embd-separator` arguments. It also updates common.h which contain
the same typo in two comments.
2024-10-22 10:40:02 +03:00
Georgi Gerganov
e94a138d64
llama.vim : fix info text display [no ci] (#9787) 2024-10-22 00:37:55 +03:00
Georgi Gerganov
e01c67affe
llama.vim : move info to the right of screen [no ci] (#9787)
'eol' messes up the rendering with nvim v0.10.2 for some reason
2024-10-21 22:53:18 +03:00
Asghar Ghorbani
994cfb1acb
readme : update UI list (#9972)
add PocketPal AI app
2024-10-21 21:20:59 +03:00
Daniel Bevenius
94008cc760
arg : fix attention non-causal arg value hint (#9985)
This commit updates the argument value hint for the `--attention`
argument to `non-causal`.

The motivation for this change is that the only values for this argument
are `causal` and `non-causal`.
2024-10-21 21:12:52 +03:00
Georgi Gerganov
dbd5f2f573
llama.vim : plugin for Neovim (#9787) 2024-10-21 20:25:02 +03:00
Georgi Gerganov
f594bc80ba
ggml : add asserts for type conversion in fattn kernels (#9971)
ggml-ci
2024-10-21 16:20:46 +03:00
Radoslav Gerganov
d5ebd79c76
rpc : pack only RPC structs (#9959) 2024-10-21 13:35:40 +03:00
Georgi Gerganov
55e47786e3
llama : default sampling changes + greedy update (#9897)
* llama : deprecate softmax sampler + fix dist sampler

ggml-ci

* tests : replace macros with functions

ggml-ci

* sampling : change temperature sampler logic

For t <= 0.0f, keep the max logit intact and set the rest to -inf

* cont : no need for special "greedy" logic

top-k == 1 is the same

* tests : init prob correctly

* llama : handle temp <= 0.0 in the temp_ext sampler too

ggml-ci

* cont : avoid extra loop in temperature sampler for sub-zero temp

ggml-ci
2024-10-21 09:46:40 +03:00
Georgi Gerganov
bc21975084
speculative : fix handling of some input params (#9963)
* speculative : fix batch sizes at initialization

ggml-ci

* speculative : handle params.n_predict == -1

* speculative : limit batch size to llama_n_batch
2024-10-21 09:37:12 +03:00
Neo Zhang Jianyu
1db8c84fc6
fix mul_mat_vec_q and *_vec_q error (#9939)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-10-21 14:26:09 +08:00
Loïc Carrère
45f097645e
readme : update bindings list (#9951)
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
2024-10-20 19:25:41 +03:00
icppWorld
7cab2083c7
readme : update infra list (#9942)
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
2024-10-20 19:01:34 +03:00
Xuan Son Nguyen
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
* refactor llama_batch_get_one

* adapt all examples

* fix simple.cpp

* fix llama_bench

* fix

* fix context shifting

* free batch before return

* use common_batch_add, reuse llama_batch in loop

* null terminated seq_id list

* fix save-load-state example

* fix perplexity

* correct token pos in llama_batch_allocr
2024-10-18 23:18:01 +02:00
Radoslav Gerganov
afd9909a64
rpc : backend refactoring (#9912)
* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server
2024-10-18 14:33:58 +03:00
Ouadie EL FAROUKI
87421a23e8
[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
2024-10-18 06:46:16 +01:00
Ma Mingfei
60ce97c9d8
add amx kernel for gemm (#8998)
add intel amx isa detection

add vnni kernel for gemv cases

add vnni and amx kernel support for block_q8_0

code cleanup

fix packing B issue

enable openmp

fine tune amx kernel

switch to aten parallel pattern

add error message for nested parallelism

code cleanup

add f16 support in ggml-amx

add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS

update CMakeList

update README

fix some compilation warning

fix compiler warning when amx is not enabled

minor change

ggml-ci

move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp

ggml-ci

update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16

ggml-ci

add amx as an ggml-backend

update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h

minor change

update CMakeLists.txt

minor change

apply weight prepacking in set_tensor method in ggml-backend

fix compile error

ggml-ci

minor change

ggml-ci

update CMakeLists.txt

ggml-ci

add march dependency

minor change

ggml-ci

change ggml_backend_buffer_is_host to return false for amx backend

ggml-ci

fix supports_op

use device reg for AMX backend

ggml-ci

minor change

ggml-ci

minor change

fix rebase

set .buffer_from_host_ptr to be false for AMX backend
2024-10-18 13:34:36 +08:00
Georgi Gerganov
8901755ba3
server : add n_indent parameter for line indentation requirement (#9929)
ggml-ci
2024-10-18 07:32:19 +03:00
Daniel Bevenius
6f55bccbb8
llama : rename batch_all to batch (#8881)
This commit addresses the TODO in the code to rename the `batch_all`
parameter to `batch` in `llama_decode_internal`.
2024-10-18 01:41:51 +02:00
Georgi Gerganov
17bb928080
readme : remove --memory-f32 references (#9925) 2024-10-17 23:43:05 +03:00
Georgi Gerganov
9f45fc1e99
llama : change warning to debug log 2024-10-17 23:27:42 +03:00
Georgi Gerganov
99bd4ac28c
llama : infill sampling handle very long tokens (#9924)
* llama : infill sampling handle very long tokens

ggml-ci

* cont : better indices

ggml-ci
2024-10-17 22:32:47 +03:00
Tim Wang
3752217ed5
readme : update bindings list (#9918)
Co-authored-by: Tim Wang <tim.wang@ing.com>
2024-10-17 09:57:14 +03:00
Diego Devesa
f010b77a37
vulkan : add backend registry / device interfaces (#9721)
* vulkan : add backend registry / device interfaces

* llama : print devices used on model load
2024-10-17 02:46:58 +02:00
Gilad S.
2194200278
fix: allocating CPU buffer with size 0 (#9917) 2024-10-17 01:34:22 +02:00
Gilad S.
73afe681aa
fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875)
* fix: use `vm_allocate` to allocate CPU backend buffer on macOS

* fix: switch to `posix_memalign` to keep existing `free()` usages work

* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS

* style: formatting

* fix: move const outside of `#ifndef`

* style: formatting

* fix: unused var

* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`

* fix: unused var

* fix: page align to `GGUF_DEFAULT_ALIGNMENT`

* fix: page align to `TENSOR_ALIGNMENT`

* fix: convert `TENSOR_ALIGNMENT` to a macro

* fix: increase page size to `32` on iOS

* fix: iOS page size

* fix: `hbw_posix_memalign` alignment
2024-10-17 00:36:51 +02:00
Daniel Bevenius
9e04102448
llama : suppress conversion from 'size_t' to 'int' (#9046)
* llama : suppress conversion from 'size_t' to 'int'

This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:

```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
```

This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.

The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.

* squash! llama : suppress conversion from 'size_t' to 'int'

Move cast into for loop.
2024-10-16 20:34:28 +03:00
Daniel Bevenius
dbf18e4de9
llava : fix typo in error message [no ci] (#9884) 2024-10-16 20:24:05 +03:00
Joe Eli McIlvain
66c2c93082
grammar : fix JSON Schema for string regex with top-level alt. (#9903)
Prior to this commit, using a JSON Schema containing a string
with `pattern` regular expression that uses top-level alternation
(e.g. `"pattern": "^A|B|C|D$"`) would result in invalid JSON
output from the constrained sampling grammar, because it
ended up creating a grammar rule like this for the string:

```
thing ::= "\"" "A" | "B" | "C" | "D" "\"" space
```

Note that this rule will only match a starting quote for the "A" case,
and will only match an ending quote for the "D" case,
so this rule will always produce invalid JSON when used for sampling
(that is, the JSON will always be lacking the starting quote,
the ending quote, or both).

This was fixed in a simple way by adding parentheses to the
generated rule (for all string pattern rules, to keep it simple),
such that the new generated rule looks like this (correct):

```
thing ::= "\"" ("A" | "B" | "C" | "D") "\"" space
```
2024-10-16 19:03:24 +03:00
Molly Sophia
10433e8b45
llama : add tensor name for "result_norm" (#9907)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-10-16 13:10:21 +03:00
Alexey Parfenov
1f66b699c4
server : fix the disappearance of the end of the text (#9867)
* server: fix the disappearance of the end of the text when streaming with stop strings

* simplify "send text" checks
2024-10-16 11:35:53 +03:00
Georgi Gerganov
0e41b300ed
sync : ggml 2024-10-16 11:28:14 +03:00
Daniel Bevenius
cd60b88bf7
ggml-alloc : remove buffer_id from leaf_alloc (ggml/987)
This commit removes the buffer_id field from the leaf_alloc struct.

The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.
2024-10-16 11:28:01 +03:00
leo-pony
becfd387f6
[CANN] Fix cann compilation error (#9891)
Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.
2024-10-16 08:51:46 +08:00
Georgi Gerganov
755a9b2bf0
llama : add infill sampler (#9896)
ggml-ci
2024-10-15 16:35:33 +03:00
Georgi Gerganov
223c25a72f
server : improve infill context reuse (#9894)
ggml-ci
2024-10-15 16:28:55 +03:00