llama.cpp

Author	SHA1	Message	Date
Concedo	c1ca1de2ac	fixed support for old falcon models	2023-10-18 17:20:44 +08:00
Concedo	700951dbd4	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-10-18 16:33:09 +08:00
Concedo	53b7cdf8a3	Merge branch 'concedo' into concedo_experimental	2023-10-18 13:51:13 +08:00
slaren	cb33f43a2a	fix embeddings when using CUDA (#3657 )	2023-10-17 22:24:50 +02:00
Georgi Gerganov	e1675d133c	llama : avoid fprintf in favor of LLAMA_LOG (#3538 )	2023-10-17 22:34:26 +03:00
BarfingLemurs	8402566a7c	readme : update hot-topics & models, detail windows release in usage (#3615 ) * Update README.md * Update README.md * Update README.md * move "Running on Windows" section below "Prepare data and run" --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 21:13:21 +03:00
LostRuins	6e34d31c44	Update README.md (#479 )	2023-10-18 01:24:14 +08:00
shibe2	40e5ce054f	CLBlast: Fix temporary buffer size for f16 conversion (wsize) Fix buffer overflow. Reduce the size to fit just one 2D slice. Assert sufficient size.	2023-10-17 21:02:30 +04:00
slaren	a5e8c1d8c7	train-text-from-scratch : fix assert failure in ggml-alloc (#3618 )	2023-10-17 20:00:58 +03:00
Georgi Gerganov	e74c705e15	editorconfig : remove trailing spaces	2023-10-17 19:52:53 +03:00
coezbek	3ad1e3f1a1	server : documentation of JSON return value of /completion endpoint (#3632 ) * Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 19:51:02 +03:00
Georgi Gerganov	1142013da4	save-load-state : fix example + add ci test (#3655 ) * save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci	2023-10-17 19:12:46 +03:00
ldwang	5fe268a4d9	readme : add Aquila2 links (#3610 ) Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-10-17 18:52:33 +03:00
staviq	1a159553f9	tokenizer : special token handling (#3538 ) * Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 18:11:01 +03:00
Concedo	6f8fe88f10	fix for lite (+5 squashed commit) Squashed commit: [f9ce9855] catch more exceptions [8cdaf149] tweaked horde worker timeouts, updated lite [619ebef4] fixed abort no response if failed [a54a66a2] fixed time overflow [9affdc3e] updated lite	2023-10-17 23:04:32 +08:00
Georgi Gerganov	281ef73c25	k-quants : fix quantization ranges (#3646 )	2023-10-17 09:19:28 +03:00
Georgi Gerganov	940efa95fe	llava : fix tokenization to not add bos between image embeddings and user prompt (#3645 ) * llava : fix tokenization to not add bos after system prompt * set seed --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>	2023-10-16 23:58:00 +03:00
Concedo	ee0681f0d9	convert some asserts into non-terminating since they are ovezealous	2023-10-15 16:12:20 +08:00
Concedo	5cfabaee25	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # docs/BLIS.md	2023-10-15 15:50:20 +08:00
cebtenzzre	11bff29045	MPT : support GQA for replit-code-v1.5 (#3627 )	2023-10-15 09:32:06 +03:00
M. Yusuf Sarıgöz	11dc1091f6	Honor -ngl option for Cuda offloading in llava (#3621 )	2023-10-14 04:52:44 -06:00
Daniel Bevenius	2a4bcbacea	llama : remove n_threads from llama_decode_internal (#3614 ) This commit removes `n_threads` from the `llama_decode_internal` functions doc comment as it does not exist anymore. It looks like this parameter was removed in Commit `16bc66d947` ("llama.cpp : split llama_context_params into model and context params"). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-10-13 13:33:16 +03:00
slaren	424b6381c4	ggml : add context enumeration functions (#3605 ) finetune : fix assert failure in ggml-alloc	2023-10-13 12:23:10 +02:00
Concedo	643902fbbb	fixed tensor split save and load	2023-10-13 10:07:22 +08:00
shibe2	1e0e873c37	CLBlast: Fix matrix-vector multiplication (#3544 )	2023-10-12 21:59:47 +02:00
M. Yusuf Sarıgöz	370359e5ba	examples: support LLaVA v1.5 (multimodal model) (#3436 ) * WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-12 18:23:18 +03:00
uint256_t	9e24cc6e2e	docs : fix typo GOMP_CPU_AFFINITY (#3597 )	2023-10-12 16:36:16 +03:00
Georgi Gerganov	d28e572c02	cmake : fix add_compile_options on macOS	2023-10-12 14:31:05 +03:00
Ian Scrivener	f3040beaab	typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592 ) fixed a typo in the MacOS Metal run doco	2023-10-12 14:10:50 +03:00
Georgi Gerganov	1a8c8795d6	ci : check if there is enough VRAM (#3596 ) ggml-ci	2023-10-12 13:44:56 +03:00
Concedo	7e2f714c9c	tensor split only for cuda	2023-10-12 17:01:52 +08:00
Alexander Abushady	11b8f97c1e	Tensor split UI (#471 ) * update .gitignore Remove .idea folder created by Jet Brains products. * Front end, and partial backe-end Tensor Split pulled in, shows in console, then not respected on model load. * UI Tweak + Tensor Split Fix Made Tensor Flow input match similar boxes around it. Also, fixed Tensor Split to populate the correct argument. * Changed int to float for tensor split Accidentally set int, needed to be float when setting tensor split args	2023-10-12 16:50:21 +08:00
Concedo	601be78a3f	kcpp does sampling ourselves, we can do whatever we want	2023-10-12 16:47:56 +08:00
Concedo	a6c3dbc351	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # README.md # build.zig	2023-10-12 16:32:00 +08:00
Concedo	8be043ee38	more horde optimizations	2023-10-12 16:20:52 +08:00
Concedo	8d1cd512e2	missed a flag	2023-10-12 15:00:51 +08:00
Concedo	c6fe820357	improve cors and header handling	2023-10-12 14:53:39 +08:00
Aarni Koskela	b016596d90	server : add completion mode (no chat) (#3582 )	2023-10-12 09:51:53 +03:00
Georgi Gerganov	6b3ae4da92	prompts : add mnemonics.txt	2023-10-12 09:35:30 +03:00
Georgi Gerganov	57dd55e2c7	server : fix kv cache management (#3588 )	2023-10-12 09:29:04 +03:00
Concedo	f604cffdce	multiuser racer bugfix	2023-10-12 13:39:12 +08:00
Georgi Gerganov	b8fe4b5cc9	main : fix session loading bug (#3400 )	2023-10-11 23:55:41 +03:00
Michael Coppola	a8bdd65525	server : add parameter -tb N, --threads-batch N (#3584 ) Co-authored-by: Michael Coppola <info@michaeljcoppola.com>	2023-10-11 22:42:22 +03:00
Kerfuffle	70c29da118	common : fix mirostat state when using multiple sequences (#3543 ) * Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts	2023-10-11 22:35:46 +03:00
Georgi Gerganov	8c70a5ff25	batched : add bench tool (#3545 ) * batched : add bench tool * batched : minor fix table * batched-bench : add readme + n_kv_max is now configurable * batched-bench : init warm-up batch * batched-bench : pass custom set of PP, TG and PL * batched-bench : add mmq CLI arg	2023-10-11 21:25:33 +03:00
Concedo	a003e3c348	horde auto recovery	2023-10-12 00:57:32 +08:00
Zane Shannon	24ba3d829e	examples : add batched.swift + improve CI for swift (#3562 )	2023-10-11 06:14:05 -05:00
Galunid	9f6ede19f3	Add MPT model to supported models in README.md (#3574 )	2023-10-10 19:02:49 -04:00
goerch	233fc1c69f	Minor improvements in GPT2 tokenizer (#3567 ) * Fixing minor bugs in bpe_gpt2_preprocess * Don't add bos token in test	2023-10-10 18:59:52 +02:00
Xingchen Song(宋星辰)	c5b49360d0	readme : add bloom (#3570 )	2023-10-10 19:28:50 +03:00

1 2 3 4 5 ...

2371 commits