llama.cpp

Author	SHA1	Message	Date
Concedo	ba2040d1df	compile fix for ARM NEON	2023-08-03 12:52:06 +08:00
Concedo	3fa6befdaf	increase max free blocks	2023-08-03 10:50:16 +08:00
Concedo	34e60be41a	compile fix	2023-08-03 10:36:14 +08:00
Concedo	b2eaec4261	updated lite	2023-08-02 22:54:17 +08:00
Johannes Gäßler	4f6b60c776	CUDA: Fix models with output size != 32000 (#2480 )	2023-08-02 16:48:10 +02:00
Concedo	4c90fdc5cd	Merge remote-tracking branch 'johannes/cuda-fix-output-size' into concedo_experimental # Conflicts: # CMakeLists.txt	2023-08-02 22:37:41 +08:00
Concedo	6fe92318f8	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-double-float.cpp # tests/test-grad0.cpp # tests/test-opt.cpp	2023-08-02 22:36:00 +08:00
JohannesGaessler	1e64d511d5	CUDA: Fix models with output size != 32000	2023-08-02 10:26:53 +02:00
ldwang	220d931864	readme : add Aquila-7B model series to supported models (#2487 ) * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert, fix Signed-off-by: ldwang <ftgreat@gmail.com> * Add Aquila-7B models in README.md Signed-off-by: ldwang <ftgreat@gmail.com> * Up Aquila-7B models in README.md Signed-off-by: ldwang <ftgreat@gmail.com> --------- Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-08-02 11:21:11 +03:00
Eve	81844fbcfd	tests : Fix compilation warnings (Linux/GCC) (#2451 ) * fix hellaswag print format, cast away warning in test-double-float * c++11 cannot use designated initializers * add static to test-grad0.c internal functions * use memcpy in test-double-float.c * port c tests to c++ * use initializer list for ggml_init_params	2023-08-02 11:06:19 +03:00
Yiming Cui	a312193e18	readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475 ) * add support for chinese llama-2 / alpaca-2 * remove white spaces	2023-08-02 09:18:31 +03:00
Bono Lv	c574bddb36	fix a typo in examples/server/README.md (#2478 )	2023-08-01 14:54:28 +02:00
Concedo	c58ffc92e5	fixed compile error	2023-08-01 18:28:49 +08:00
Concedo	84b28c4282	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-08-01 18:13:27 +08:00
Concedo	46682e5cb3	added mmq launch flag	2023-08-01 17:57:13 +08:00
ebraminio	86aeb27734	server : Support dark mode (#2414 ) * server : Support dark mode So it respects user system light / dark settings. * Update index.html.hpp by running ./deps.sh	2023-08-01 10:56:23 +02:00
Matteo Boschini	1873ff586b	metal : add gqa8 kernel to allow llama-2-70B on metal (#2459 ) * Added gqa8 kernel to allow llama-2-70B on metal * Update ggml-metal.m Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> * Extend kernel_mul_mat_f16_f32 to handle gqa broadcast * Added ne03==ne13 assertion --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-08-01 10:43:12 +03:00
Johannes Gäßler	49e7cb5bb1	CUDA: fixed LLAMA_FAST compilation option (#2473 )	2023-07-31 21:02:19 +02:00
Johannes Gäßler	b772bba42e	CUDA: fixed cmake F16 option (#2471 )	2023-07-31 19:52:22 +02:00
Concedo	e221843147	trying out mmq Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # README.md	2023-07-31 22:51:15 +08:00
Concedo	3e370f83ef	Warning: Very experimental merge, do not use until confirmed stable.	2023-07-31 22:33:43 +08:00
Johannes Gäßler	0728c5a8b9	CUDA: mmq CLI option, fixed mmq build issues (#2453 )	2023-07-31 15:44:35 +02:00
Johannes Gäßler	1215ed7d5c	CUDA: Implemented row flattening for non-glm RoPE (#2468 )	2023-07-31 14:32:30 +02:00
Johannes Gäßler	2dbf518911	CUDA: fewer memory bank conflicts for mul_mat_q (#2458 )	2023-07-31 13:18:51 +02:00
Concedo	84ce184c4f	layout	2023-07-31 17:33:31 +08:00
slaren	9d2382b3e4	Fix Metal backend broken from the allocator changes (#2455 ) * fix Metal backend broken from the allocator changes	2023-07-31 11:02:53 +02:00
YellowRoseCx	f27972777f	correct semantic error in import_vars (#355 ) * Hide unavailable backends & Add tooltip over backend count Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command Add tooltip when hovering over backend count label hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built * add some code comments * hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available " if len(runopts)==6 else + " * small typo fix * remove wrongly added leftover device choosing code * fix labels * move tooltip to function * import vars logic fix --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-07-31 15:51:35 +08:00
Concedo	5ea5d19d6a	SSE emoji fix	2023-07-30 22:31:20 +08:00
slaren	a113689571	ggml : add graph tensor allocator (#2411 ) * ggml : add graph tensor allocator * ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset * ggml : refactor ggml_view_Nd into ggml_view_tensor_offset	2023-07-30 15:58:01 +02:00
Concedo	82d0695f0f	Merge commit '`9baf9ef304`' into concedo_experimental	2023-07-30 18:18:23 +08:00
Concedo	90a37d63d5	up ver, added warning for max context	2023-07-30 18:07:14 +08:00
YellowRoseCx	c8af65760f	Hide unavailable backends & Add tooltip over backend count (#352 ) * Hide unavailable backends & Add tooltip over backend count Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command Add tooltip when hovering over backend count label hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built * add some code comments * hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available " if len(runopts)==6 else + " * small typo fix * remove wrongly added leftover device choosing code * fix labels * move tooltip to function --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-07-30 17:50:55 +08:00
Concedo	45456fa6ca	switch noavx2 to not use openblas, as it has incompatible instructions	2023-07-30 16:47:33 +08:00
Concedo	23825abee1	fix wrong key	2023-07-30 14:30:46 +08:00
Johannes Gäßler	11f3ca06b8	CUDA: Quantized matrix matrix multiplication (#2160 ) * mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds	2023-07-29 23:04:44 +02:00
Johannes Gäßler	9baf9ef304	CUDA: faster multi GPU synchronization (#2448 )	2023-07-29 23:04:10 +02:00
Concedo	cde3760e52	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # ggml.h # llama.cpp	2023-07-29 17:47:00 +08:00
Concedo	9589d52079	added help link	2023-07-29 17:33:15 +08:00
Concedo	e4b42e5b15	fixed gui bugs	2023-07-29 11:15:57 +08:00
klosax	8a88e5855c	perplexity : add Hellaswag calculation (#2389 ) * common.h : add hellaswag / remove perplexity-lines * common.cpp : add hellaswag / remove perplexity-lines * perplexity.cpp : add hellswag scores / remove perplexity-lines * perplexity.cpp : clean up * common.h : change default param value * common.cpp : Change default param * perplexity.cpp : alter wording * common.h : alter wording * common.cpp : alter wording	2023-07-28 21:25:36 +03:00
Lee	a9559bf77b	ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405 )	2023-07-28 21:17:45 +03:00
eric8607242	ee1b497c98	llama : support more diverse tokenizers? (#2420 ) * supporting more diverse tokenizers * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-28 21:10:05 +03:00
Georgi Gerganov	d73b8d48b4	examples : fix whitespace	2023-07-28 21:05:08 +03:00
nhamanasu	34ae1caf7f	examples : server chat mode with llama2 (#2400 ) * add: server chat mode with llama2 * fix: remove the unnecessary last \n	2023-07-28 21:02:10 +03:00
Weird Constructor	d91f3f0c55	readme : fix the description of the Tail free sampling (TFS) method (#2431 )	2023-07-28 11:44:43 +03:00
Rand Xie	65cdf34bdc	llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433 )	2023-07-28 11:42:53 +03:00
Concedo	b40550cf1a	change wiki link	2023-07-28 13:01:12 +08:00
Concedo	31486ebc8d	updated readme	2023-07-28 11:32:55 +08:00
niansa/tuxifan	edcc7ae7d2	Obtaining LLaMA 2 instructions (#2308 ) * Obtaining LLaMA 2 instructions * Removed sharing warning for LLaMA 2 * Linked TheBloke's GGML repos * Add LLaMA 2 to list of supported models * Added LLaMA 2 usage instructions * Added links to LLaMA 2 70B models	2023-07-28 03:14:11 +02:00
mj-shifu	7c529cede6	convert.py : Update to support 70B HF format model files (#2427 ) * convert.py : fix llama 2 70b conversion from Huggingface	2023-07-27 14:39:17 -06:00

1 2 3 4 5 ...

1697 commits