Commit graph

3978 commits

Author SHA1 Message Date
Georgi Gerganov
e4be74b4b7
llama.vim : add top_p + improve responsivness + fix edge cases 2024-10-21 11:00:20 +03:00
Georgi Gerganov
25ecb35c4f
llama.vim : simplify job logic + improve robustness and responsivness 2024-10-21 11:00:20 +03:00
Georgi Gerganov
9f8fa900f6
llama.vim : fix repetitions [no ci] 2024-10-21 11:00:20 +03:00
Georgi Gerganov
ae76a092b8
llama.vim : pass filenames for each chunk
ggml-ci
2024-10-21 11:00:20 +03:00
Georgi Gerganov
916c2ee3fd
llama : simplify infill sampler 2024-10-21 11:00:19 +03:00
Georgi Gerganov
bc2857b88c
llama.vim : async context processing
ggml-ci
2024-10-21 11:00:19 +03:00
Georgi Gerganov
2960510153
llama.vim : do not auto-fim when far from the end of the line [no ci] 2024-10-21 11:00:19 +03:00
Georgi Gerganov
d81a0ac185
llama.vim : do not evict certain chunks [no ci] 2024-10-21 11:00:19 +03:00
Georgi Gerganov
27d53cb4ee
llama.vim : logic to evict old chunks that are similar to new one 2024-10-21 11:00:19 +03:00
Georgi Gerganov
f794549bae
llama.vim : gather chunk on leaving buffer [no ci] 2024-10-21 11:00:18 +03:00
Georgi Gerganov
27bc11da0f
llama.vim : update server command [no ci] 2024-10-21 11:00:18 +03:00
Georgi Gerganov
b8890229b6
llama.vim : add ring context from opened files and yanked text 2024-10-21 11:00:18 +03:00
Georgi Gerganov
4f46e29b09
llama : print more info about control tokens 2024-10-21 11:00:18 +03:00
Georgi Gerganov
491f211b4c
llama : improve infill sampler
ggml-ci
2024-10-21 11:00:18 +03:00
Georgi Gerganov
5624e919df
llama.vim : fix docs [no ci] 2024-10-21 11:00:17 +03:00
Georgi Gerganov
c9a46f4bd7
llama.vim : minor [no ci] 2024-10-21 11:00:17 +03:00
Georgi Gerganov
865d9bc48a
llama : clean-up
ggml-ci
2024-10-21 11:00:17 +03:00
Georgi Gerganov
4b1bd81661
llama : simplify infill sampler 2024-10-21 11:00:17 +03:00
Georgi Gerganov
2e8c350a5f
llama.vim : fix edge cases 2024-10-21 11:00:16 +03:00
Georgi Gerganov
6669b550db
llama.vim : set time limit for the generation phase 2024-10-21 11:00:16 +03:00
Georgi Gerganov
c507a65af5
llama.vim : async 2024-10-21 11:00:16 +03:00
Georgi Gerganov
41053f92d3
llama.vim : simplify init and cancel + auto-fim 2024-10-21 11:00:16 +03:00
Georgi Gerganov
7e0b5062af
llama.vim : reduce scope of ids to local [no ci] 2024-10-21 11:00:16 +03:00
Georgi Gerganov
26a0c61e8a
llama.vim : allow repeated suggestions [no ci] 2024-10-21 11:00:15 +03:00
Georgi Gerganov
6e82a03b9d
llama.vim : display realtime [no ci] 2024-10-21 11:00:15 +03:00
Georgi Gerganov
9d13e87b1b
llama.vim : add processing info overlay 2024-10-21 11:00:15 +03:00
Georgi Gerganov
07e7dd47f2
llama.vim : handle space 2024-10-21 11:00:15 +03:00
Georgi Gerganov
0c649c8967
llama.vim : fix suffix construction + fix virt text offset 2024-10-21 11:00:15 +03:00
Georgi Gerganov
0566c69531
llama.vim : neovim plugin 2024-10-21 11:00:14 +03:00
Georgi Gerganov
5aaf24766a
llama : add infill sampler 2024-10-21 11:00:14 +03:00
Georgi Gerganov
55e47786e3
llama : default sampling changes + greedy update (#9897)
* llama : deprecate softmax sampler + fix dist sampler

ggml-ci

* tests : replace macros with functions

ggml-ci

* sampling : change temperature sampler logic

For t <= 0.0f, keep the max logit intact and set the rest to -inf

* cont : no need for special "greedy" logic

top-k == 1 is the same

* tests : init prob correctly

* llama : handle temp <= 0.0 in the temp_ext sampler too

ggml-ci

* cont : avoid extra loop in temperature sampler for sub-zero temp

ggml-ci
2024-10-21 09:46:40 +03:00
Georgi Gerganov
bc21975084
speculative : fix handling of some input params (#9963)
* speculative : fix batch sizes at initialization

ggml-ci

* speculative : handle params.n_predict == -1

* speculative : limit batch size to llama_n_batch
2024-10-21 09:37:12 +03:00
Neo Zhang Jianyu
1db8c84fc6
fix mul_mat_vec_q and *_vec_q error (#9939)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-10-21 14:26:09 +08:00
Loïc Carrère
45f097645e
readme : update bindings list (#9951)
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
2024-10-20 19:25:41 +03:00
icppWorld
7cab2083c7
readme : update infra list (#9942)
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
2024-10-20 19:01:34 +03:00
Xuan Son Nguyen
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
* refactor llama_batch_get_one

* adapt all examples

* fix simple.cpp

* fix llama_bench

* fix

* fix context shifting

* free batch before return

* use common_batch_add, reuse llama_batch in loop

* null terminated seq_id list

* fix save-load-state example

* fix perplexity

* correct token pos in llama_batch_allocr
2024-10-18 23:18:01 +02:00
Radoslav Gerganov
afd9909a64
rpc : backend refactoring (#9912)
* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server
2024-10-18 14:33:58 +03:00
Ouadie EL FAROUKI
87421a23e8
[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
2024-10-18 06:46:16 +01:00
Ma Mingfei
60ce97c9d8
add amx kernel for gemm (#8998)
add intel amx isa detection

add vnni kernel for gemv cases

add vnni and amx kernel support for block_q8_0

code cleanup

fix packing B issue

enable openmp

fine tune amx kernel

switch to aten parallel pattern

add error message for nested parallelism

code cleanup

add f16 support in ggml-amx

add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS

update CMakeList

update README

fix some compilation warning

fix compiler warning when amx is not enabled

minor change

ggml-ci

move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp

ggml-ci

update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16

ggml-ci

add amx as an ggml-backend

update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h

minor change

update CMakeLists.txt

minor change

apply weight prepacking in set_tensor method in ggml-backend

fix compile error

ggml-ci

minor change

ggml-ci

update CMakeLists.txt

ggml-ci

add march dependency

minor change

ggml-ci

change ggml_backend_buffer_is_host to return false for amx backend

ggml-ci

fix supports_op

use device reg for AMX backend

ggml-ci

minor change

ggml-ci

minor change

fix rebase

set .buffer_from_host_ptr to be false for AMX backend
2024-10-18 13:34:36 +08:00
Georgi Gerganov
8901755ba3
server : add n_indent parameter for line indentation requirement (#9929)
ggml-ci
2024-10-18 07:32:19 +03:00
Daniel Bevenius
6f55bccbb8
llama : rename batch_all to batch (#8881)
This commit addresses the TODO in the code to rename the `batch_all`
parameter to `batch` in `llama_decode_internal`.
2024-10-18 01:41:51 +02:00
Georgi Gerganov
17bb928080
readme : remove --memory-f32 references (#9925) 2024-10-17 23:43:05 +03:00
Georgi Gerganov
9f45fc1e99
llama : change warning to debug log 2024-10-17 23:27:42 +03:00
Georgi Gerganov
99bd4ac28c
llama : infill sampling handle very long tokens (#9924)
* llama : infill sampling handle very long tokens

ggml-ci

* cont : better indices

ggml-ci
2024-10-17 22:32:47 +03:00
Tim Wang
3752217ed5
readme : update bindings list (#9918)
Co-authored-by: Tim Wang <tim.wang@ing.com>
2024-10-17 09:57:14 +03:00
Diego Devesa
f010b77a37
vulkan : add backend registry / device interfaces (#9721)
* vulkan : add backend registry / device interfaces

* llama : print devices used on model load
2024-10-17 02:46:58 +02:00
Gilad S.
2194200278
fix: allocating CPU buffer with size 0 (#9917) 2024-10-17 01:34:22 +02:00
Gilad S.
73afe681aa
fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875)
* fix: use `vm_allocate` to allocate CPU backend buffer on macOS

* fix: switch to `posix_memalign` to keep existing `free()` usages work

* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS

* style: formatting

* fix: move const outside of `#ifndef`

* style: formatting

* fix: unused var

* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`

* fix: unused var

* fix: page align to `GGUF_DEFAULT_ALIGNMENT`

* fix: page align to `TENSOR_ALIGNMENT`

* fix: convert `TENSOR_ALIGNMENT` to a macro

* fix: increase page size to `32` on iOS

* fix: iOS page size

* fix: `hbw_posix_memalign` alignment
2024-10-17 00:36:51 +02:00
Daniel Bevenius
9e04102448
llama : suppress conversion from 'size_t' to 'int' (#9046)
* llama : suppress conversion from 'size_t' to 'int'

This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:

```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
```

This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.

The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.

* squash! llama : suppress conversion from 'size_t' to 'int'

Move cast into for loop.
2024-10-16 20:34:28 +03:00
Daniel Bevenius
dbf18e4de9
llava : fix typo in error message [no ci] (#9884) 2024-10-16 20:24:05 +03:00