llama.cpp

Author	SHA1	Message	Date
Victor Oluwadare	c2c2626ec6	Added support for SFTTrainer checkpoint models and adapter models containing one or more non-LoRA weights My initial commit was more like a brute force. The edits suggested by @FirstTimeEZ reduces the complexity.	2024-10-08 20:31:43 +01:00
Victor Oluwadare	c6396aa4bb	Added support for SFTTrainer checkpoint models and adapter models containing some non-LoRA weights The previous code triggers an unexpected name error and calls sys.exit(1) (lines 350-351 current version) even if a single weight in the lora_model is not a lora_A, lora_B, or base layer weight. This edit collects the names of all LoRA weights in the model before the for loop in line 341 (current version). And in line 350 (edit version), the subsequent operations are performed only on the LoRA and base layer weights, ignoring any non-LoRA weights in the lora_model. Hopefully, this helps by allowing the script to extract LoRA weights and convert LoRA to GGUF for adapters containing one or more non-LoRA weights.	2024-10-08 02:35:08 +01:00
Diego Devesa	6374743747	ggml : add backend registry / device interfaces to BLAS backend (#9752 ) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers	2024-10-07 21:55:08 +02:00
Andrew Minh Nguyen	f1af42fa8c	Update building for Android (#9672 ) * docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android	2024-10-07 09:37:31 -07:00
Georgi Gerganov	6279dac039	flake.lock: Update (#9753 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/bcef6817a8b2aa20a5a6dbb19b43e63c5bf8619a?narHash=sha256-HO4zgY0ekfwO5bX0QH/3kJ/h4KvUDFZg8YpkNwIbg1U%3D' (2024-09-12) → 'github:hercules-ci/flake-parts/3d04084d54bedc3d6b8b736c70ef449225c361b1?narHash=sha256-K5ZLCyfO/Zj9mPFldf3iwS6oZStJcU4tSpiXTMYaaL0%3D' (2024-10-01) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01) → 'https://github.com/NixOS/nixpkgs/archive/fb192fec7cc7a4c26d51779e9bab07ce6fa5597a.tar.gz?narHash=sha256-0xHYkMkeLVQAMa7gvkddbPqpxph%2BhDzdu1XdGPJR%2BOs%3D' (2024-10-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26) → 'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-10-07 09:35:42 -07:00
Georgi Gerganov	d5ac8cf2f2	ggml : add metal backend registry / device (#9713 ) * ggml : add metal backend registry / device ggml-ci * metal : fix names [no ci] * metal : global registry and device instances ggml-ci * cont : alternative initialization of global objects ggml-ci * llama : adapt to backend changes ggml-ci * fixes * metal : fix indent * metal : fix build when MTLGPUFamilyApple3 is not available ggml-ci * fix merge * metal : avoid unnecessary singleton accesses ggml-ci * metal : minor fix [no ci] * metal : g_state -> g_ggml_ctx_dev_main [no ci] * metal : avoid reference of device context in the backend context ggml-ci * metal : minor [no ci] * metal : fix maxTransferRate check * metal : remove transfer rate stuff --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-10-07 18:27:51 +03:00
Paul Tsochantaris	96b6912103	metal : single allocation of encode_async block (#9747 ) * Single allocation of encode_async block with non-ARC capture in ggml-metal.m * Moving Block_release to the deallocation code * Release encode block when re-setting encoding buffer count if needed * Update ggml/src/ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-10-07 15:26:31 +03:00
Georgi Gerganov	d5cb86844f	contrib : simplify + minor edits [no ci]	2024-10-06 14:15:27 +03:00
Georgi Gerganov	f4b2dcdf49	readme : fix typo [no ci]	2024-10-06 13:49:41 +03:00
Georgi Gerganov	b6d6c5289f	sync : llama.cpp	2024-10-06 12:53:28 +03:00
SRHMorris	b0915d5b51	vulkan : retry allocation with fallback flags (whisper/2451) Co-authored-by: Samuel Morris <samuel.morris@artlist.io>	2024-10-06 12:52:11 +03:00
Georgi Gerganov	8c475b97b8	rerank : use [SEP] token instead of [BOS] (#9737 ) * rerank : use [SEP] token instead of [BOS] ggml-ci * common : sanity check for non-NULL tokens ggml-ci * ci : adjust rank score interval ggml-ci * ci : add shebang to run.sh ggml-ci	2024-10-05 15:55:04 +03:00
Georgi Gerganov	58b16695e1	sync : ggml	2024-10-05 15:53:49 +03:00
Georgi Gerganov	905f5485b2	metal : zero-init buffer contexts (whisper/0)	2024-10-05 15:53:00 +03:00
Viet-Anh NGUYEN (Andrew)	71967c2a6d	Add Llama Assistant (#9744 )	2024-10-04 20:29:35 +02:00
Georgi Gerganov	17880771ad	sync : ggml	2024-10-04 18:50:25 +03:00
Daniel Bevenius	55951c018d	ggml : fix typo in example usage ggml_gallocr_new (ggml/984)	2024-10-04 18:50:05 +03:00
Diego Devesa	ff565769f2	ggml : fixes after sync (ggml/983) ggml : remove test-backend-buffer ggml : fix CUDA build warnings	2024-10-04 18:50:04 +03:00
Xuan Son Nguyen	f3fdcfaa79	ci : fine-grant permission (#9710 )	2024-10-04 11:47:19 +02:00
Daniel Kleine	133c7b46b3	Fixed RNG seed docs (#9723 ) * Update README.md fixed RNG seed info * changed print format to unsigned	2024-10-04 10:54:44 +02:00
Georgi Gerganov	d5ed2b929d	metal : remove abort (skip) (ggml/0)	2024-10-03 21:18:19 +03:00
Georgi Gerganov	1bb8a64ebf	sync : ggml	2024-10-03 21:17:49 +03:00
Johannes Gäßler	fabdc3bda3	ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)	2024-10-03 21:17:26 +03:00
Johannes Gäßler	eee39bdc96	ggml: refactor cross entropy loss CPU impl. (ggml/976)	2024-10-03 21:17:26 +03:00
Jack Mousseau	5d5ab1e5cc	metal : fix compute pass descriptor autorelease crash (#9718 )	2024-10-03 21:01:46 +03:00
Diego Devesa	a7ad553513	ggml-backend : add device description to CPU backend (#9720 )	2024-10-03 17:39:18 +02:00
bandoti	d6fe7abf04	ggml: unify backend logging mechanism (#9709 ) * Add scaffolding for ggml logging macros * Metal backend now uses GGML logging * Cuda backend now uses GGML logging * Cann backend now uses GGML logging * Add enum tag to parameters * Use C memory allocation funcs * Fix compile error * Use GGML_LOG instead of GGML_PRINT * Rename llama_state to llama_logger_state * Prevent null format string * Fix whitespace * Remove log callbacks from ggml backends * Remove cuda log statement	2024-10-03 17:39:03 +02:00
compilade	e3c355ba65	convert : handle tokenizer merges format from transformers 4.45 (#9696 )	2024-10-03 17:22:15 +03:00
Radoslav Gerganov	841713e1e4	rpc : enable vulkan (#9714 ) closes #8536	2024-10-03 13:00:52 +03:00
Ouadie EL FAROUKI	5639971466	Fixed dequant precision issues in Q4_1 and Q5_1 (#9711 )	2024-10-03 07:50:44 +01:00
Diego Devesa	c83ad6d01e	ggml-backend : add device and backend reg interfaces (#9707 ) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-03 01:49:47 +02:00
Xuan Son Nguyen	a39ab216aa	llama : reduce compile time and binary size (#9712 ) * llama : speed up compile time * fix build * fix build (2)	2024-10-02 15:49:55 +02:00
Alberto Cabrera Pérez	f536f4c439	[SYCL] Initial cmake support of SYCL for AMD GPUs (#9658 ) sycl: initial cmake support of SYCL for AMD GPUs	2024-10-02 13:57:18 +01:00
Radoslav Gerganov	00b7317e63	vulkan : do not use tensor->extra (#9407 ) * vulkan : do not use tensor->extra This patch allows using the Vulkan backend with the RPC backend as tensor->extra is no longer used. Ref: #8536 * Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2) --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-10-02 13:49:16 +03:00
Zhenwei Jin	76b37d1541	gguf-split : improve --split and --merge logic (#9619 ) * make sure params --split and --merge are not specified at same time * update gguf-split params parse logic * Update examples/gguf-split/gguf-split.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-10-02 10:21:57 +03:00
Georgi Gerganov	148844fe97	examples : remove benchmark (#9704 ) ggml-ci	2024-10-02 10:14:44 +03:00
Paweł Wodnicki	3f1ae2e32c	Update README.md (#9591 ) Add Bielik model.	2024-10-01 19:18:46 +02:00
Georgi Gerganov	f1b8c42711	sync : ggml	2024-10-01 16:09:42 +03:00
Johannes Gäßler	e98c1c188e	test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974)	2024-10-01 16:07:40 +03:00
Salvatore Mesoraca	cb00020504	vulkan : mul_mat: fix UB with small warps (ggml/952) When the device's warp size is less than 16, it is possible for loadstride_a (mul_mm.comp:114) and loadstride_b (mul_mm.comp:115) to be set to 0. Because they are calculated as: the workgroup size, multiplied by LOAD_VEC_* (which can be 1) and divided by 16. And the workgroup size is set to be the same as the warp/subgroup size. The loadstride_* variables are used as increments in the loops that populate the buffers used for the multiplication. When they are 0 they cause an infinite loop. But infinite loops without side-effects are UB and the values of loadstride_* are known at compile time. So, the compiler quietly optimizes all the loops away. As a consequence, the buffers are not populated and the multiplication result is just a matrix with all elements set to 0. We prevent the UB by making sure that the workgroup size will never be less than 16, even if our device has a smaller warp size (e.g. 8). Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-10-01 16:07:39 +03:00
Borislav Stanimirov	6c5322481a	ggml : fix ggml_cast (ggml/973)	2024-10-01 16:07:39 +03:00
Johannes Gäßler	7254cdf7e8	ggml: fix gradient allocation logic (ggml/966) * ggml: fix gradient allocation logic * gradient allocation in ggml_build_backward_expand * fixup * fix test-backend-ops grad * suggestions by slaren * fix test1.c * fix legacy opt API * fix test-grad0 * remove keep arg	2024-10-01 16:07:38 +03:00
Georgi Gerganov	cad341d889	metal : reduce command encoding overhead (#9698 ) * metal : reduce command encoding overhead ggml-ci * metal : add comments	2024-10-01 16:00:25 +03:00
Georgi Gerganov	a90484c6d9	llama : print correct model type for Llama 3.2 1B and 3B	2024-10-01 11:42:01 +03:00
compilade	1927378bcc	convert : refactor rope_freqs generation (#9396 ) * convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.	2024-10-01 09:31:36 +03:00
serhii-nakon	6f1d9d71f4	Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641 ) * Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS * Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code	2024-09-30 20:57:12 +02:00
compilade	511636df0c	ci : reduce severity of unused Pyright ignore comments (#9697 )	2024-09-30 14:13:16 -04:00
vb	08a43d05b6	py : update transfomers version (#9694 ) * update transfomers version. * update hfh version.	2024-09-30 18:03:47 +03:00
Georgi Gerganov	ace4f4be37	flake.lock: Update (#9680 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/c04d5652cfa9742b1d519688f65d1bbccea9eb7e?narHash=sha256-PmUr/2GQGvFTIJ6/Tvsins7Q43KTMvMFhvG6oaYK%2BWk%3D' (2024-09-19) → 'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-09-30 07:48:49 -07:00
Ruchira Hasaranga	8277a817f1	console : utf-8 fix for windows stdin (#9690 ) * utf-8 fix for windows stdin * Update common/console.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-30 11:23:42 +03:00

1 2 3 4 5 ...

3898 commits