Commit graph

3988 commits

Author SHA1 Message Date
Georgi Gerganov
b8efb0725d
llama.vim : minor [no ci] 2024-10-21 11:00:22 +03:00
Georgi Gerganov
fe78c39399
llama.vim : fix large chunk accept + comments [no ci] 2024-10-21 11:00:22 +03:00
Georgi Gerganov
6bb6e6dd80
llama.vim : display ring capacity [no ci] 2024-10-21 11:00:22 +03:00
Georgi Gerganov
1600d846b6
llama.vim : complete only whithin the local scope [no ci] 2024-10-21 11:00:22 +03:00
Georgi Gerganov
d1b8b215d5
llama.vim : fix repetitions of existing text 2024-10-21 11:00:21 +03:00
Georgi Gerganov
4583aef12b
llama.vim : final touches
ggml-ci
2024-10-21 11:00:21 +03:00
Georgi Gerganov
847c8c023e
llama.vim : update infill API params [no ci] 2024-10-21 11:00:21 +03:00
Georgi Gerganov
060573f7e8
llama.vim : add comments [no ci] 2024-10-21 11:00:21 +03:00
Georgi Gerganov
42a9008b31
llama.vim : process extra chunks in the background [no ci] 2024-10-21 11:00:21 +03:00
Georgi Gerganov
0c1f51b73e
llama : improve infill sampler
ggml-ci
2024-10-21 11:00:20 +03:00
Georgi Gerganov
e4be74b4b7
llama.vim : add top_p + improve responsivness + fix edge cases 2024-10-21 11:00:20 +03:00
Georgi Gerganov
25ecb35c4f
llama.vim : simplify job logic + improve robustness and responsivness 2024-10-21 11:00:20 +03:00
Georgi Gerganov
9f8fa900f6
llama.vim : fix repetitions [no ci] 2024-10-21 11:00:20 +03:00
Georgi Gerganov
ae76a092b8
llama.vim : pass filenames for each chunk
ggml-ci
2024-10-21 11:00:20 +03:00
Georgi Gerganov
916c2ee3fd
llama : simplify infill sampler 2024-10-21 11:00:19 +03:00
Georgi Gerganov
bc2857b88c
llama.vim : async context processing
ggml-ci
2024-10-21 11:00:19 +03:00
Georgi Gerganov
2960510153
llama.vim : do not auto-fim when far from the end of the line [no ci] 2024-10-21 11:00:19 +03:00
Georgi Gerganov
d81a0ac185
llama.vim : do not evict certain chunks [no ci] 2024-10-21 11:00:19 +03:00
Georgi Gerganov
27d53cb4ee
llama.vim : logic to evict old chunks that are similar to new one 2024-10-21 11:00:19 +03:00
Georgi Gerganov
f794549bae
llama.vim : gather chunk on leaving buffer [no ci] 2024-10-21 11:00:18 +03:00
Georgi Gerganov
27bc11da0f
llama.vim : update server command [no ci] 2024-10-21 11:00:18 +03:00
Georgi Gerganov
b8890229b6
llama.vim : add ring context from opened files and yanked text 2024-10-21 11:00:18 +03:00
Georgi Gerganov
4f46e29b09
llama : print more info about control tokens 2024-10-21 11:00:18 +03:00
Georgi Gerganov
491f211b4c
llama : improve infill sampler
ggml-ci
2024-10-21 11:00:18 +03:00
Georgi Gerganov
5624e919df
llama.vim : fix docs [no ci] 2024-10-21 11:00:17 +03:00
Georgi Gerganov
c9a46f4bd7
llama.vim : minor [no ci] 2024-10-21 11:00:17 +03:00
Georgi Gerganov
865d9bc48a
llama : clean-up
ggml-ci
2024-10-21 11:00:17 +03:00
Georgi Gerganov
4b1bd81661
llama : simplify infill sampler 2024-10-21 11:00:17 +03:00
Georgi Gerganov
2e8c350a5f
llama.vim : fix edge cases 2024-10-21 11:00:16 +03:00
Georgi Gerganov
6669b550db
llama.vim : set time limit for the generation phase 2024-10-21 11:00:16 +03:00
Georgi Gerganov
c507a65af5
llama.vim : async 2024-10-21 11:00:16 +03:00
Georgi Gerganov
41053f92d3
llama.vim : simplify init and cancel + auto-fim 2024-10-21 11:00:16 +03:00
Georgi Gerganov
7e0b5062af
llama.vim : reduce scope of ids to local [no ci] 2024-10-21 11:00:16 +03:00
Georgi Gerganov
26a0c61e8a
llama.vim : allow repeated suggestions [no ci] 2024-10-21 11:00:15 +03:00
Georgi Gerganov
6e82a03b9d
llama.vim : display realtime [no ci] 2024-10-21 11:00:15 +03:00
Georgi Gerganov
9d13e87b1b
llama.vim : add processing info overlay 2024-10-21 11:00:15 +03:00
Georgi Gerganov
07e7dd47f2
llama.vim : handle space 2024-10-21 11:00:15 +03:00
Georgi Gerganov
0c649c8967
llama.vim : fix suffix construction + fix virt text offset 2024-10-21 11:00:15 +03:00
Georgi Gerganov
0566c69531
llama.vim : neovim plugin 2024-10-21 11:00:14 +03:00
Georgi Gerganov
5aaf24766a
llama : add infill sampler 2024-10-21 11:00:14 +03:00
Georgi Gerganov
55e47786e3
llama : default sampling changes + greedy update (#9897)
* llama : deprecate softmax sampler + fix dist sampler

ggml-ci

* tests : replace macros with functions

ggml-ci

* sampling : change temperature sampler logic

For t <= 0.0f, keep the max logit intact and set the rest to -inf

* cont : no need for special "greedy" logic

top-k == 1 is the same

* tests : init prob correctly

* llama : handle temp <= 0.0 in the temp_ext sampler too

ggml-ci

* cont : avoid extra loop in temperature sampler for sub-zero temp

ggml-ci
2024-10-21 09:46:40 +03:00
Georgi Gerganov
bc21975084
speculative : fix handling of some input params (#9963)
* speculative : fix batch sizes at initialization

ggml-ci

* speculative : handle params.n_predict == -1

* speculative : limit batch size to llama_n_batch
2024-10-21 09:37:12 +03:00
Neo Zhang Jianyu
1db8c84fc6
fix mul_mat_vec_q and *_vec_q error (#9939)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-10-21 14:26:09 +08:00
Loïc Carrère
45f097645e
readme : update bindings list (#9951)
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
2024-10-20 19:25:41 +03:00
icppWorld
7cab2083c7
readme : update infra list (#9942)
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
2024-10-20 19:01:34 +03:00
Xuan Son Nguyen
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
* refactor llama_batch_get_one

* adapt all examples

* fix simple.cpp

* fix llama_bench

* fix

* fix context shifting

* free batch before return

* use common_batch_add, reuse llama_batch in loop

* null terminated seq_id list

* fix save-load-state example

* fix perplexity

* correct token pos in llama_batch_allocr
2024-10-18 23:18:01 +02:00
Radoslav Gerganov
afd9909a64
rpc : backend refactoring (#9912)
* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server
2024-10-18 14:33:58 +03:00
Ouadie EL FAROUKI
87421a23e8
[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp
2024-10-18 06:46:16 +01:00
Ma Mingfei
60ce97c9d8
add amx kernel for gemm (#8998)
add intel amx isa detection

add vnni kernel for gemv cases

add vnni and amx kernel support for block_q8_0

code cleanup

fix packing B issue

enable openmp

fine tune amx kernel

switch to aten parallel pattern

add error message for nested parallelism

code cleanup

add f16 support in ggml-amx

add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS

update CMakeList

update README

fix some compilation warning

fix compiler warning when amx is not enabled

minor change

ggml-ci

move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp

ggml-ci

update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16

ggml-ci

add amx as an ggml-backend

update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h

minor change

update CMakeLists.txt

minor change

apply weight prepacking in set_tensor method in ggml-backend

fix compile error

ggml-ci

minor change

ggml-ci

update CMakeLists.txt

ggml-ci

add march dependency

minor change

ggml-ci

change ggml_backend_buffer_is_host to return false for amx backend

ggml-ci

fix supports_op

use device reg for AMX backend

ggml-ci

minor change

ggml-ci

minor change

fix rebase

set .buffer_from_host_ptr to be false for AMX backend
2024-10-18 13:34:36 +08:00
Georgi Gerganov
8901755ba3
server : add n_indent parameter for line indentation requirement (#9929)
ggml-ci
2024-10-18 07:32:19 +03:00