llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	827f5eda91	readme : update hot topics	2023-06-04 23:38:19 +03:00
Georgi Gerganov	ecb217db4f	llama : Metal inference (#1642 ) * mtl : export the LLaMA computation graph * ci : disable temporary * mtl : adapt the MNIST example as starter * mtl : no need for mtl-export tool, add cli arg for main instead * mtl : export just a small part of the graph for now to make it easier * mtl : move MSL code into separate file for easy editing * mtl : initial get_rows_q4_0 kernel * mtl : confirmed get_rows_q4_0 is working correctly * mtl : add rms_norm kernel + confirm working * mtl : add mul kernel + confirm working * mtl : initial mul_mat Q4 kernel (wrong results) * mtl : mul_mat fixes (still wrong) * mtl : another mul_mat Q4 (still does not work) * mtl : working mul_mat q4 * ggml : fix handling of "view" ops in ggml_graph_import() * mtl : add rope kernel * mtl : add reshape and transpose handling * ggml : store offset as opt arg for ggml_view_xd() operators * mtl : add cpy kernel + handle view ops * mtl : confirm f16 x f32 attention mul mat * mtl : add scale kernel * mtl : add diag_mask_inf kernel * mtl : fix soft_max kernel * ggml : update ggml_nbytes() to handle non-contiguous tensors * mtl : verify V tensor contents * mtl : add f32 -> f32 cpy kernel * mtl : add silu kernel * mtl : add non-broadcast mul kernel * mtl : full GPU inference of the computation graph * mtl : optimize rms_norm and soft_max kernels * mtl : add f16 mat x f32 vec multiplication kernel * mtl : fix bug in f16 x f32 mul mat + speed-up computation * mtl : faster mul_mat_q4_0_f32 kernel * mtl : fix kernel signature + roll inner loop * mtl : more threads for rms_norm + better timing * mtl : remove printfs from inner loop * mtl : simplify implementation * mtl : add save/load vocab to ggml file * mtl : plug Metal inference into llama.cpp (very quick-n-dirty) * mtl : make it work with main example Lots of hacks but at least now it generates text * mtl : preparing for merge * mtl : clean-up ggml mtl interface + suport scratch / inplace * mtl : remove temp / debug code * metal : final refactoring and simplification * Revert "ci : disable temporary" This reverts commit `98c267fc77`. * metal : add comments * metal : clean-up stuff, fix typos * readme : add Metal instructions * readme : add example for main	2023-06-04 23:34:30 +03:00
0cc4m	dcb2ed4826	OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653 ) * Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation	2023-06-04 08:12:05 +02:00
Henri Vasserman	d8bd0013e8	Add info about CUDA_VISIBLE_DEVICES (#1682 )	2023-06-03 16:35:20 +03:00
Jiří Podivín	b5c85468a3	Docker: change to calling convert.py (#1641 ) Deprecation disclaimer was added to convert-pth-to-ggml.py	2023-06-03 15:11:53 +03:00
Evan Jones	136476e898	Fix prompt cache saving and chat-persistent rollover (#1678 ) * Fix prompt cache saving and chat-persistent rollover (fixes #1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-06-03 07:28:45 -04:00
Randall Fitzgerald	df2ecc942a	Merge pull request #18 from anon998/update-readme Update readme + parse --mlock and --no-mmap	2023-06-02 17:04:25 -04:00
anon	98ae2de017	parse --mlock and --no-mmap + format	2023-06-02 17:54:46 -03:00
anon	05a5a485b8	make help text load faster	2023-06-02 17:52:04 -03:00
anon	a6ed390cc6	update readme	2023-06-02 17:48:29 -03:00
anon	e1e2be2146	remove --keep from help text	2023-06-02 17:47:42 -03:00
Randall Fitzgerald	5758e9f09b	Removed embedding from flags.	2023-06-02 08:31:12 -07:00
Randall Fitzgerald	310bf61496	Merge pull request #17 from SlyEcho/server_refactor improve docs and example	2023-06-02 11:25:01 -04:00
Randall Fitzgerald	de6df486e9	Removed embedding from README	2023-06-02 08:24:46 -07:00
Henri Vasserman	bcd616700e	improve docs and example	2023-06-02 18:06:02 +03:00
digiwombat	7cebe2eaf8	Merge branch 'master' of https://github.com/digiwombat/llama.cpp	2023-06-02 10:06:04 -04:00
digiwombat	16e1c9813a	Removed the embedding api endpoint and associated code.	2023-06-02 10:05:52 -04:00
Randall Fitzgerald	4dd72fc6e4	Merge pull request #16 from anon998/fix-log-json Replace invalid characters instead of crashing.	2023-06-02 09:43:29 -04:00
anon	41bb71bde7	replace invalid characters instead of crashing While logging the requests.	2023-06-02 10:37:13 -03:00
digiwombat	3ff27d30e3	Fixed up a few things in embedding mode.	2023-06-02 09:20:53 -04:00
Randall Fitzgerald	28cc0cdc50	Merge pull request #15 from SlyEcho/server_refactor Improve long input truncation and add more verbose logging	2023-06-02 08:47:54 -04:00
Henri Vasserman	3df0192804	improve long input truncation and add more verbose logging	2023-06-02 15:19:05 +03:00
Randall Fitzgerald	1bd52c8627	Merge branch 'ggerganov:master' into master	2023-06-02 07:31:55 -04:00
Randall Fitzgerald	f5d5e7020d	Merge pull request #14 from anon998/do-completion-update Trim partial stopping strings when not streaming and move multibyte check.	2023-06-02 07:30:53 -04:00
anon	f820740dad	move multibyte check to doCompletion	2023-06-02 08:27:23 -03:00
anon	8f9e546b51	trim partial stopping strings when not streaming	2023-06-02 08:25:31 -03:00
Randall Fitzgerald	bebea657cb	Merge pull request #13 from anon998/small-fixes Small fixes.	2023-06-02 06:53:10 -04:00
anon998	abb7782745	Merge branch 'master' into small-fixes	2023-06-02 10:35:06 +00:00
Henri Vasserman	88cc7bb6f7	Stuff with logits	2023-06-02 13:29:57 +03:00
anon	47efbb5cf3	use std::isinf to check if ignore_eos is active	2023-06-02 07:19:21 -03:00
anon	2932db15a3	avoid creating element in logit_bias accidentally	2023-06-02 06:59:11 -03:00
anon	a8a9f19689	small fixes	2023-06-02 06:01:10 -03:00
anon	49dce94885	make types match gpt_params exactly	2023-06-02 06:01:10 -03:00
anon	1488a0f528	make functions that never return false void	2023-06-02 06:00:48 -03:00
anon	ebfead6e5a	remove unused variables	2023-06-02 05:45:57 -03:00
anon	731ecc0d1b	fix typo	2023-06-02 05:45:16 -03:00
Henri Vasserman	0bc047730f	Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-06-02 10:29:09 +03:00
Randall Fitzgerald	d29b6d5f55	Merge pull request #12 from anon998/clear-logit-bias Clear logit bias between requests.	2023-06-01 08:58:35 -04:00
anon	8cbc4be6c2	clear logit_bias between requests + print	2023-06-01 09:49:50 -03:00
anon	6025476e39	default penalize_nl back to true	2023-06-01 09:49:16 -03:00
anon	49a18bdd14	remove unused parameter warning	2023-06-01 09:41:35 -03:00
Randall Fitzgerald	af711263ae	Merge pull request #11 from SlyEcho/server_refactor Server refactor	2023-06-01 08:10:55 -04:00
Randall Fitzgerald	797155a0d1	Merge pull request #10 from cirk2/master Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS	2023-06-01 08:10:26 -04:00
Henri Vasserman	9531ae60db	Add logit bias support	2023-06-01 13:57:47 +03:00
Henri Vasserman	8c6a5fc92b	last tokens fixes	2023-06-01 13:18:12 +03:00
Felix Hellmann	5bbc030338	Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS rules	2023-06-01 10:47:53 +02:00
digiwombat	f7882e2d69	Fixed a crash caused by erasing from empty last_n_tokens	2023-05-31 20:35:28 -04:00
Randall Fitzgerald	5f6e16da36	Merge pull request #9 from anon998/stopping-strings Fix stopping strings.	2023-05-31 20:05:18 -04:00
anon	e9b1f0bf5c	fix stopping strings	2023-05-31 21:00:21 -03:00
digiwombat	342604bb81	Added a super simple CORS header as default for all endpoints.	2023-05-31 19:54:05 -04:00

1 2 3 4 5 ...

827 commits