llama.cpp

Author	SHA1	Message	Date
Concedo	8be043ee38	more horde optimizations	2023-10-12 16:20:52 +08:00
Concedo	8d1cd512e2	missed a flag	2023-10-12 15:00:51 +08:00
Concedo	c6fe820357	improve cors and header handling	2023-10-12 14:53:39 +08:00
Concedo	f604cffdce	multiuser racer bugfix	2023-10-12 13:39:12 +08:00
Concedo	a003e3c348	horde auto recovery	2023-10-12 00:57:32 +08:00
Concedo	d74eab0e63	actually for this round, do not include deprecated params. i dont want to have to deal with them (+2 squashed commit) Squashed commit: [df2691c2] show context limit [7c74f52a] prevent old scripts from crashing	2023-10-10 19:20:33 +08:00
Concedo	a723466d50	Merge branch 'master' into concedo_experimental	2023-10-10 17:21:42 +08:00
YellowRoseCx	1b25b21655	Merge pull request #27 from one-lithe-rune/allow-sdk-dll-loading - Allow use of hip SDK (if installed) dlls on windows (#470 ) * If the rocm/hip sdk is installed on windows, then include the sdk as a potential location to load the hipBlas/rocBlas .dlls from. This allows running koboldcpp.py directly with python after building work on windows without having to build the .exe and run that or copy .dlls around. Co-authored-by: one-lithe-rune <skapusniak@lithe-runes.com>	2023-10-10 17:16:33 +08:00
Jan Ploski	f5f9121de1	llm : add MPT support (#3417 ) * CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545) * mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt * mpt : protect against "clip_qkv": null in mpt-7b * mpt : quick fix to avoid "Strange model" warning when quantizing MPT models * mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?) * mpt : standardized all tensor names to follow GGUF spec * mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code * mpt : fixed comment s/gptneox/mpt/ * mpt : remove tabs, trailing whitespace * mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt * mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252 * comment out n_past instead of marking it unused * mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"] * mpt : remove unused tokenizer_json in convert script * ggml : remove obsolete n_past assert in ggml_alibi * llama : print clam_kqv and max_alibi_bias hparams --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-10 10:50:23 +03:00
vvhg1	11ea5c7d96	infill. : fix tokenization (#3508 ) * infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit `63ba0b621f`. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check	2023-10-10 10:31:21 +03:00
Concedo	f288c6b5e3	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # build.zig # scripts/sync-ggml.sh	2023-10-10 00:09:46 +08:00
Matěj Štágl	96e9539f05	OpenAI compat API adapter (#466 ) * feat: oai-adapter * simplify optional adapter for instruct start and end tags --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-10-09 23:24:48 +08:00
slaren	95bd60a0a6	ggml-alloc : fix assert in debug builds (#3555 )	2023-10-09 15:44:58 +03:00
Georgi Gerganov	fcca0a7004	refact : fix convert script + zero out KV cache to avoid nans (#3523 ) * refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements	2023-10-09 14:32:17 +03:00
Georgi Gerganov	dcc09d2596	metal : do not use mul_mm kernels when ne00 < 64 (#3542 )	2023-10-09 14:28:27 +03:00
Georgi Gerganov	db3abcc114	sync : ggml (ggml-backend) (#3548 ) * sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build	2023-10-08 20:19:14 +03:00
Concedo	80e53af236	fixed a bug in lite	2023-10-09 00:18:03 +08:00
Concedo	4e5b6293ab	adjust streaming timings	2023-10-08 23:12:45 +08:00
Concedo	e967717385	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # build.zig	2023-10-08 22:55:44 +08:00
Concedo	840b244c17	update lite	2023-10-08 22:55:05 +08:00
Matheus C. França	eee42c670e	ci : add Zig CI/CD and fix build (#2996 ) * zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 16:59:20 +03:00
Ryder Wishart	8e6716a102	api_like_OAI.py : compat with Microsoft Guidance (#2746 ) Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 13:55:58 +03:00
arcrank	9c38d181d4	api_like_OAI.py : simplify function (#2796 ) Simplify function	2023-10-08 13:52:57 +03:00
Johannes Rudolph	a1202a31ed	k-quants : fix comments about block sizing (#3499 )	2023-10-08 13:21:19 +03:00
Georgi Gerganov	94e502dfb7	ci : enable on obj-c changes + fix metal build (#3540 )	2023-10-08 11:24:50 +03:00
Luo Tian	7d8b24932f	zig : fix build by introducing train.cpp (#3539 )	2023-10-08 11:24:01 +03:00
Concedo	d8fa5ca230	Merge branch 'master' into concedo_experimental	2023-10-08 15:51:42 +08:00
Concedo	80dfe2ba49	blasthreads should apply for any thread count below 32	2023-10-08 15:51:18 +08:00
Concedo	a2b8473354	force flush sse	2023-10-08 15:12:07 +08:00
Georgi Gerganov	b0ec5218c3	metal : support MTLGPUFamily < Apple7, formatting, style (#3524 ) * metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7	2023-10-08 10:01:53 +03:00
Kerfuffle	63d3b06a43	llama : fix missing break in Persimmon arch case statements (#3535 )	2023-10-08 08:22:17 +03:00
Concedo	133897a558	updated lite (+1 squashed commits) Squashed commits: [4d1411df] update lite	2023-10-08 12:17:47 +08:00
Concedo	f797cba377	Merge branch 'master' into concedo_experimental	2023-10-08 10:43:34 +08:00
Kerfuffle	a16e89cec8	Fix trying to strip newline from empty prompt and cfg prompt file content (#3534 )	2023-10-07 15:31:41 -06:00
M. Yusuf Sarıgöz	4d03833211	gguf.py : fix CI for publishing GGUF package (#3532 ) * Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version	2023-10-07 22:14:10 +03:00
Concedo	e46708eedc	updated lite	2023-10-07 23:33:54 +08:00
Concedo	678f31f2fd	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # llama.cpp	2023-10-07 22:00:09 +08:00
Concedo	ca4a8c5dc8	updated lite	2023-10-07 21:50:24 +08:00
Tom C	c47066d833	py : change version of numpy requirement to 1.24.4 (#3515 ) Co-authored-by: Lyjia <me@lyjia.us>	2023-10-07 12:56:15 +03:00
cebtenzzre	f1782c68de	quantize : fail fast on write errors (#3521 )	2023-10-07 11:41:52 +03:00
Jhen-Jie Hong	c26765a0a1	metal : support default.metallib load & reuse code for swift package (#3522 ) * metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT	2023-10-07 11:40:27 +03:00
Phillip Kravtsov	0e797c2fc5	llm : support Adept Persimmon 8B (#3410 ) * Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci	2023-10-07 10:12:43 +03:00
goerch	3a716b4dae	Fix for #3454 (#3455 ) Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion	2023-10-07 06:57:01 +02:00
Concedo	6b282271b1	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-10-07 10:24:34 +08:00
Concedo	07a114de63	force debugmode to be indicated on horde, allow 64k context for gguf	2023-10-07 10:23:33 +08:00
BarfingLemurs	1faaae8c2b	readme : update models, cuda + ppl instructions (#3510 )	2023-10-06 22:13:36 +03:00
Mihai	cb13d73a72	server : docs fix default values and add n_probs (#3506 )	2023-10-06 21:39:33 +03:00
Concedo	d8f7a7077a	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml	2023-10-07 01:36:14 +08:00
Concedo	120695ddf7	add update link	2023-10-07 01:33:18 +08:00
Kerfuffle	9ca79d5cbb	kv cache slot search improvements (#3493 ) * kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.	2023-10-06 10:10:13 -06:00

1 2 3 4 5 ...

2324 commits