llama.cpp

Author	SHA1	Message	Date
Jared Van Bortel	2b0f642fec	fix f16 mmv, 49 -> 41 failures	2024-01-24 13:43:49 -05:00
Jared Van Bortel	1a14099c43	fix q4_0/q4_1 mmv, 65 -> 49 failures	2024-01-24 13:43:48 -05:00
Jared Van Bortel	0787b80db8	kompute : remove broken mulrow kernel -> 1 less test failure	2024-01-24 13:43:48 -05:00
Jared Van Bortel	2755ae3d10	kompute : fix more dispatch ambiguity -> 12 less failures	2024-01-24 13:43:47 -05:00
Jared Van Bortel	08e23fd78c	kompute : fix op_mul kernel -> 13 less test failures	2024-01-24 13:43:47 -05:00
Jared Van Bortel	0899adf86e	kompute : fix get_rows dispatch -> 4 less failures	2024-01-24 13:43:47 -05:00
Jared Van Bortel	cb9ceff966	minor cleanup	2024-01-24 13:43:46 -05:00
Georgi Gerganov	33e8d6abe1	kompute : fix ggml_add kernel (#5027 )	2024-01-24 13:43:46 -05:00
Jared Van Bortel	2f6a279e29	fix supported ops for kompute backend	2024-01-24 13:43:45 -05:00
Jared Van Bortel	07530731ba	never try to evaluate an empty command buffer This fixes the immediate crashes with test-backend-ops - when evaluatating individual no-ops like OP_VIEW, it tries to submit an empty command buffer, which crashes RADV and hangs AMDVLK.	2024-01-24 13:43:45 -05:00
Jared Van Bortel	729e1a4cc1	sync op_rope_f16 with recent op_rope_f32 changes	2024-01-24 13:43:45 -05:00
Jared Van Bortel	e9d5223da3	actually fix this assertion	2024-01-24 13:43:44 -05:00
Jared Van Bortel	9431026a84	clean up old backend code	2024-01-24 13:43:44 -05:00
Georgi Gerganov	d6bd471693	kompute : fix rope_f32 and scale ops (#5008 )	2024-01-24 13:43:44 -05:00
Jared Van Bortel	76474a7c0d	kompute : ignore exceptions in ggml_vk_available_devices (#12 ) Signed-off-by: Jared Van Bortel <jared@nomic.ai>	2024-01-24 13:43:43 -05:00
Jared Van Bortel	cad72e1252	add sanity check and fix kompute teardown order	2024-01-24 13:43:43 -05:00
Jared Van Bortel	070919dbf7	attempt to get test-backend-ops working	2024-01-24 13:43:43 -05:00
Jared Van Bortel	5f660dada8	fix assertion failure	2024-01-24 13:43:42 -05:00
Jared Van Bortel	298d6eec09	kompute : initial attempt at ggml-backend v2 support	2024-01-24 13:43:40 -05:00
Jared Van Bortel	7c527eb568	Merge commit '`e7e4df031b`' into HEAD	2024-01-24 13:39:17 -05:00
Michael Hueschen	c9b316c78f	nix-shell: use addToSearchPath thx to @SomeoneSerge for the suggestion!	2024-01-24 12:39:29 +00:00
Michael Hueschen	bf63d695b8	nix: add cc to devShell LD_LIBRARY_PATH this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...	2024-01-24 12:39:29 +00:00
slaren	1387ea2117	llama : pre-allocate input tensors in a separate buffer (#5100 )	2024-01-24 12:48:14 +01:00
Georgi Gerganov	26d607608d	metal : disable support for MUL_MAT F32 x F16	2024-01-23 15:50:56 +02:00
Kawrakow	44879ee885	Additional KL-divergence statistics (#5081 ) * perplexity: add top-token probability * perplexity: add additional KL-divergence statistics * perplexity: a better organized KL-divergence statistics output --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-23 15:17:20 +02:00
Johannes Gäßler	9ecdd12e95	CUDA: more info when no device code (#5088 )	2024-01-23 13:31:56 +01:00
Georgi Gerganov	89758723c7	minor : clean-up some warnings and style (#5094 ) * minor : clean-up some warnings and style ggml-ci * ggml : add comment	2024-01-23 14:12:57 +02:00
Xuan Son Nguyen	2bed4aa3f3	devops : add intel oneapi dockerfile (#5068 ) Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>	2024-01-23 09:11:39 +02:00
Michael Coppola	125d03a503	llama.vim : added api key support (#5090 ) Co-authored-by: Michael Coppola <info@michaeljcoppola.com>	2024-01-23 08:51:27 +02:00
slaren	011e8ec577	llama : fix not enough space in buffer with Qwen (#5086 )	2024-01-22 23:42:41 +01:00
Kawrakow	6f9939d119	KL-divergence (#5076 ) * kl-divergence: be able to save all logits to a file * Add ability to compute KL-divergence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 16:10:14 +02:00
Reinforce-II	780e24a22e	ggml : parallelize FP32 conversion when using BLAS (#5045 ) * make GGML_TASK_INIT phase can be run in multithread * multithreaded dequantize in mul_mat when using blas library * minor fixes * update outdated comment * fix coding style * simplify code Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-22 15:15:08 +02:00
XiaotaoChen	3ce7e8f8e7	llava : MobileVLM support (#4954 ) * MobileVLM native implementation * delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake * move android script to example/llava directory * Fix the editor config checks --------- Co-authored-by: Chenxiaotao03 <chenxiaotao03@meituan.com>	2024-01-22 15:09:35 +02:00
Someone Serge	b2d80e105a	flake.nix: add a comment about flakes vs nix	2024-01-22 12:19:30 +00:00
Someone Serge	28603cd283	nix: add a comment on the many nixpkgs-with-cuda instances	2024-01-22 12:19:30 +00:00
Someone Serge	5e97ec91ae	nix: add a comment about makeScope	2024-01-22 12:19:30 +00:00
Someone Serge	7251870780	nix: refactor the cleanSource rules	2024-01-22 12:19:30 +00:00
Someone Serge	fe8b3c0d4b	workflows: nix-ci: drop the redundant "paths" filter	2024-01-22 12:19:30 +00:00
Someone Serge	f4dd059259	workflows: nix-build-aarch64: rate limit	2024-01-22 12:19:30 +00:00
Someone Serge	f7276f7500	workflows: nix-ci: rebuild on flake.lock updates	2024-01-22 12:19:30 +00:00
Kawrakow	15bceec2d7	imatrix : keep intermediate imatrix results (#5077 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 14:18:43 +02:00
compilade	d6bd4d46dd	llama : support StableLM 2 1.6B (#5052 ) * llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.	2024-01-22 13:21:52 +02:00
Daniel Bevenius	152d9d05e0	finetune : print sample-start/include-sample-start (#5072 ) This commit adds `--sample-start` and `--include-sample-start` to the output from the main function in finetune.cpp. The motivation for this is that even though these are set explicitly by the user via the command line, if one forgets to set them then it is useful to have their values printed out. Otherwise it is possible to go through the whole training process before realizing that the values are not what one expected. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-01-22 13:11:01 +02:00
Kawrakow	66d575c45c	llama : add Q3_K_XS (#5060 ) * Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S * Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K Together with an importance matrix, this brings perplexity for LLaMA-v2-70B below the perplexity of the former Q2_K with a 800 MB smaller quantized model size. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-22 12:43:33 +02:00
bobqianic	57744932c6	ci : fix Windows CI by updating Intel SDE version (#5053 )	2024-01-22 10:55:05 +02:00
Shijie	3466c6ebcf	llama : add more qwen2 models (#5071 )	2024-01-22 09:33:19 +02:00
iSma	504dc37be8	Revert LLAMA_NATIVE to OFF in flake.nix (#5066 )	2024-01-21 21:37:13 +00:00
kuronekosaiko	05490fad7f	add safetensors support to convert-lora-to-ggml.py (#5062 ) * add safetensors support to convert-lora-to-ggml.py * Update convert-lora-to-ggml.py Remove white space in line 69.	2024-01-21 17:28:14 +01:00
bobqianic	6c5629d4d2	add `#include <string>` to unicode.h (#5051 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-01-21 10:17:35 -05:00
Kawrakow	7dcbe39d36	Add ability to evauate multiple choice tasks (#5047 ) * TruthfulQA: 1st attempt, does not look like it is working The same implementation can be used for HellaSwag as well, so I converted a HellaSwag validation dataset to the binary format used here and tested with that. The score is only around 50, so something is not quite right. * TruthfulQA: works but the result is bad I know it works because if I convert the HellaSwag validation data to the binary format used in the truthful_qa_score() function I get the exact same result as from the hellaswag_score() function. But I guess, the questions are tricky and the way I have done the combination of question + answer is very likely not the best. The TruthfulQA validation dataset contains 817 questions, with random chance result around 19%. With this version I get 29.1% for Mistral-7B and 55.2% for Mistral-7B-Instruct-v0.2. The HF leader board results for these two models are 42.2% and 68.3%, respectively. * TruthfulQA: fix random sample * TruthfulQA: prepare tasks in parallel for large test datasets * Rename truthful_qa to multiple_choice * Make MSVC happy I had forgotten that MSVC does not make constexpr's available inside a lambda. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-01-21 14:42:44 +02:00

1 2 3 4 5 ...

2125 commits