llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	e4be74b4b7	llama.vim : add top_p + improve responsivness + fix edge cases	2024-10-21 11:00:20 +03:00
Georgi Gerganov	25ecb35c4f	llama.vim : simplify job logic + improve robustness and responsivness	2024-10-21 11:00:20 +03:00
Georgi Gerganov	9f8fa900f6	llama.vim : fix repetitions [no ci]	2024-10-21 11:00:20 +03:00
Georgi Gerganov	ae76a092b8	llama.vim : pass filenames for each chunk ggml-ci	2024-10-21 11:00:20 +03:00
Georgi Gerganov	916c2ee3fd	llama : simplify infill sampler	2024-10-21 11:00:19 +03:00
Georgi Gerganov	bc2857b88c	llama.vim : async context processing ggml-ci	2024-10-21 11:00:19 +03:00
Georgi Gerganov	2960510153	llama.vim : do not auto-fim when far from the end of the line [no ci]	2024-10-21 11:00:19 +03:00
Georgi Gerganov	d81a0ac185	llama.vim : do not evict certain chunks [no ci]	2024-10-21 11:00:19 +03:00
Georgi Gerganov	27d53cb4ee	llama.vim : logic to evict old chunks that are similar to new one	2024-10-21 11:00:19 +03:00
Georgi Gerganov	f794549bae	llama.vim : gather chunk on leaving buffer [no ci]	2024-10-21 11:00:18 +03:00
Georgi Gerganov	27bc11da0f	llama.vim : update server command [no ci]	2024-10-21 11:00:18 +03:00
Georgi Gerganov	b8890229b6	llama.vim : add ring context from opened files and yanked text	2024-10-21 11:00:18 +03:00
Georgi Gerganov	4f46e29b09	llama : print more info about control tokens	2024-10-21 11:00:18 +03:00
Georgi Gerganov	491f211b4c	llama : improve infill sampler ggml-ci	2024-10-21 11:00:18 +03:00
Georgi Gerganov	5624e919df	llama.vim : fix docs [no ci]	2024-10-21 11:00:17 +03:00
Georgi Gerganov	c9a46f4bd7	llama.vim : minor [no ci]	2024-10-21 11:00:17 +03:00
Georgi Gerganov	865d9bc48a	llama : clean-up ggml-ci	2024-10-21 11:00:17 +03:00
Georgi Gerganov	4b1bd81661	llama : simplify infill sampler	2024-10-21 11:00:17 +03:00
Georgi Gerganov	2e8c350a5f	llama.vim : fix edge cases	2024-10-21 11:00:16 +03:00
Georgi Gerganov	6669b550db	llama.vim : set time limit for the generation phase	2024-10-21 11:00:16 +03:00
Georgi Gerganov	c507a65af5	llama.vim : async	2024-10-21 11:00:16 +03:00
Georgi Gerganov	41053f92d3	llama.vim : simplify init and cancel + auto-fim	2024-10-21 11:00:16 +03:00
Georgi Gerganov	7e0b5062af	llama.vim : reduce scope of ids to local [no ci]	2024-10-21 11:00:16 +03:00
Georgi Gerganov	26a0c61e8a	llama.vim : allow repeated suggestions [no ci]	2024-10-21 11:00:15 +03:00
Georgi Gerganov	6e82a03b9d	llama.vim : display realtime [no ci]	2024-10-21 11:00:15 +03:00
Georgi Gerganov	9d13e87b1b	llama.vim : add processing info overlay	2024-10-21 11:00:15 +03:00
Georgi Gerganov	07e7dd47f2	llama.vim : handle space	2024-10-21 11:00:15 +03:00
Georgi Gerganov	0c649c8967	llama.vim : fix suffix construction + fix virt text offset	2024-10-21 11:00:15 +03:00
Georgi Gerganov	0566c69531	llama.vim : neovim plugin	2024-10-21 11:00:14 +03:00
Georgi Gerganov	5aaf24766a	llama : add infill sampler	2024-10-21 11:00:14 +03:00
Georgi Gerganov	55e47786e3	llama : default sampling changes + greedy update (#9897 ) * llama : deprecate softmax sampler + fix dist sampler ggml-ci * tests : replace macros with functions ggml-ci * sampling : change temperature sampler logic For t <= 0.0f, keep the max logit intact and set the rest to -inf * cont : no need for special "greedy" logic top-k == 1 is the same * tests : init prob correctly * llama : handle temp <= 0.0 in the temp_ext sampler too ggml-ci * cont : avoid extra loop in temperature sampler for sub-zero temp ggml-ci	2024-10-21 09:46:40 +03:00
Georgi Gerganov	bc21975084	speculative : fix handling of some input params (#9963 ) * speculative : fix batch sizes at initialization ggml-ci * speculative : handle params.n_predict == -1 * speculative : limit batch size to llama_n_batch	2024-10-21 09:37:12 +03:00
Neo Zhang Jianyu	1db8c84fc6	fix mul_mat_vec_q and *_vec_q error (#9939 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-10-21 14:26:09 +08:00
Loïc Carrère	45f097645e	readme : update bindings list (#9951 ) Update the binding list by adding LM-Kit.NET (C# & VB.NET)	2024-10-20 19:25:41 +03:00
icppWorld	7cab2083c7	readme : update infra list (#9942 ) llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.	2024-10-20 19:01:34 +03:00
Xuan Son Nguyen	cda0e4b648	llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 ) * refactor llama_batch_get_one * adapt all examples * fix simple.cpp * fix llama_bench * fix * fix context shifting * free batch before return * use common_batch_add, reuse llama_batch in loop * null terminated seq_id list * fix save-load-state example * fix perplexity * correct token pos in llama_batch_allocr	2024-10-18 23:18:01 +02:00
Radoslav Gerganov	afd9909a64	rpc : backend refactoring (#9912 ) * rpc : refactor backend Use structs for RPC request/response messages * rpc : refactor server	2024-10-18 14:33:58 +03:00
Ouadie EL FAROUKI	87421a23e8	[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705 ) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-10-18 06:46:16 +01:00
Ma Mingfei	60ce97c9d8	add amx kernel for gemm (#8998 ) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-10-18 13:34:36 +08:00
Georgi Gerganov	8901755ba3	server : add n_indent parameter for line indentation requirement (#9929 ) ggml-ci	2024-10-18 07:32:19 +03:00
Daniel Bevenius	6f55bccbb8	llama : rename batch_all to batch (#8881 ) This commit addresses the TODO in the code to rename the `batch_all` parameter to `batch` in `llama_decode_internal`.	2024-10-18 01:41:51 +02:00
Georgi Gerganov	17bb928080	readme : remove --memory-f32 references (#9925 )	2024-10-17 23:43:05 +03:00
Georgi Gerganov	9f45fc1e99	llama : change warning to debug log	2024-10-17 23:27:42 +03:00
Georgi Gerganov	99bd4ac28c	llama : infill sampling handle very long tokens (#9924 ) * llama : infill sampling handle very long tokens ggml-ci * cont : better indices ggml-ci	2024-10-17 22:32:47 +03:00
Tim Wang	3752217ed5	readme : update bindings list (#9918 ) Co-authored-by: Tim Wang <tim.wang@ing.com>	2024-10-17 09:57:14 +03:00
Diego Devesa	f010b77a37	vulkan : add backend registry / device interfaces (#9721 ) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-10-17 02:46:58 +02:00
Gilad S.	2194200278	fix: allocating CPU buffer with size `0` (#9917 )	2024-10-17 01:34:22 +02:00
Gilad S.	73afe681aa	fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875 ) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment	2024-10-17 00:36:51 +02:00
Daniel Bevenius	9e04102448	llama : suppress conversion from 'size_t' to 'int' (#9046 ) * llama : suppress conversion from 'size_t' to 'int' This commit updates llm_tokenizer_spm.tokenize to suppress/remove the following warnings that are generated on Windows when using MSVC: ```console src\llama-vocab.cpp(211,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data src\llama-vocab.cpp(517,1): warning C4267: 'argument': conversion from 'size_t' to 'int', possible loss of data ``` This is done by adding a cast for the size_t returned from symbols.size(). I believe this is safe as it seems unlikely that symbols, which stores an entry for each UTF8 character, would become larger than INT_MAX. The motivation for this change is to reduce the number of warnings that are currently generated when building on Windows. * squash! llama : suppress conversion from 'size_t' to 'int' Move cast into for loop.	2024-10-16 20:34:28 +03:00
Daniel Bevenius	dbf18e4de9	llava : fix typo in error message [no ci] (#9884 )	2024-10-16 20:24:05 +03:00

1 2 3 4 5 ...

3978 commits