llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	20fc3804bf	convert : fix gemma v1 tokenizer convert (#8248 ) ggml-ci	2024-07-04 10:41:03 +03:00
AidanBeltonS	f619024764	[SYCL] Remove unneeded semicolons (#8280 )	2024-07-04 09:07:19 +08:00
Daniele	d23287f122	Define and optimize RDNA1 (#8085 )	2024-07-04 01:02:58 +02:00
slaren	5f2d4e60e2	ppl : fix n_seq_max for perplexity (#8277 ) * ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence	2024-07-03 20:33:31 +03:00
Mason M	613a3c6a53	Remove trailing whitespace	2024-07-03 12:11:38 -03:00
bandoti	4226103400	Update README.md	2024-07-03 12:02:21 -03:00
Xuan Son Nguyen	916248af1f	fix phi 3 conversion (#8262 )	2024-07-03 16:01:54 +02:00
Judd	f8d6a23804	fix typo (#8267 ) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-03 14:40:16 +02:00
AidanBeltonS	fadde67135	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values	2024-07-03 09:55:34 +08:00
Mason M	dafcaf1dd3	Remove trailing whitespace	2024-07-02 20:21:14 -03:00
bandoti	3a554ae63e	Update README.md	2024-07-02 18:38:22 -03:00
Mason M	019e4a3c7e	Add cflags from pkg-config to fix w64devkit build	2024-07-02 18:10:00 -03:00
MistApproach	a27152b602	fix: add missing short command line argument -mli for multiline-input (#8261 )	2024-07-02 22:56:46 +02:00
Mason M	a85b5d8ee0	Merge branch 'master' into vulkan-build-integration	2024-07-02 16:00:26 -03:00
Mason M	2f5a0e8e69	Remove trailing newline	2024-07-02 15:51:56 -03:00
Clint Herron	3e2618bc7b	Adding step to `clean` target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809 . (#8257 )	2024-07-02 13:19:56 -04:00
Mason M	22323d50a3	Merge branch 'master' into vulkan-build-integration	2024-07-02 13:36:04 -03:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Faisal Zaghloul	968967376d	Add `JAIS` model(s) (#8118 ) * Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com>	2024-07-02 16:36:00 +02:00
Daniel Bevenius	023b8807e1	convert-hf : print output file name when completed (#8181 ) * convert-hf : print output file name when completed This commit adds the output file name to the log message when the conversion is completed. The motivation for this change is that when `--outfile` option is not specified it migth not be obvious where the output file is written. With this change the output of running the script will be something like the following: ```console INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Updates the output of to support printing the directory if the output is split into multiple files. Also the output file name is now retrieved from the model_instance object. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use parent attribute of Path object and string interpolation. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use os.sep instead of hardcoding the path separator. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-02 09:40:49 +03:00
slaren	0e0590adab	cuda : update supports_op for matrix multiplication (#8245 )	2024-07-02 09:39:38 +03:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-02 12:50:07 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-02 10:16:00 +08:00
Xuan Son Nguyen	5fac350b9c	Fix gemma2 tokenizer convert (#8244 ) * fix gemma2 tokenizer convert * remove scores * improve code, fix new line issue	2024-07-02 01:07:23 +02:00
Johannes Gäßler	cb5fad4c6c	CUDA: refactor and optimize IQ MMVQ (#8215 ) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-01 20:39:06 +02:00
Mateusz Charytoniuk	dae57a1ebc	readme: add Paddler to the list of projects (#8239 )	2024-07-01 20:13:22 +03:00
Xuan Son Nguyen	49122a873f	gemma2: add sliding window mask (#8227 ) * gemma2: add sliding window mask * fix data_swa uninitialized * better naming * add co-author Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> * replace list with single tensor * update * llama : minor styling * convert : add sanity check for query_pre_attn_scalar * fix small typo in README --------- Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 18:48:34 +02:00
Mason M	422bfb3e88	Merge branch 'master' into vulkan-build-integration	2024-07-01 11:52:49 -03:00
Mason M	9bca872be0	code review changes	2024-07-01 11:43:58 -03:00
Roni	0ddeff1023	readme : update tool list (#8209 ) * Added gppm to Tool list in README * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 15:48:16 +03:00
Michael Francis	3840b6f593	nix : enable curl (#8043 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 14:47:04 +03:00
Georgi Gerganov	257f8e41e2	nix : remove OpenCL remnants (#8235 ) * nix : remove OpenCL remnants * minor : remove parentheses	2024-07-01 14:46:18 +03:00
iacore	694c59cb42	Document BERT support. (#8205 ) * Update README.md document BERT support * Update README.md	2024-07-01 13:40:58 +02:00
zhentaoyu	197fe6c1d7	[SYCL] Update SYCL-Rope op and Refactor (#8157 ) * align with rope.cu and move sycl-op to a single file	2024-07-01 19:39:06 +08:00
Georgi Gerganov	d0a7145ba9	flake.lock: Update (#8218 )	2024-06-30 16:09:34 -07:00
Xuan Son Nguyen	9ef0780062	Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203 ) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change	2024-06-30 20:27:13 +02:00
Andrei	1c5eba6f8e	llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 ) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-29 23:44:08 -04:00
bandoti	4eab311ed0	Merge branch 'ggerganov:master' into vulkan-build-integration	2024-06-30 00:14:14 -03:00
Mason M	ac9a065d31	Remove Python dependency from Vulkan build	2024-06-30 00:02:41 -03:00
Xuan Son Nguyen	72272b83a3	fix code typo in llama-cli (#8198 )	2024-06-29 00:14:20 +02:00
Olivier Chafik	8748d8ac6f	json: attempt to skip slow tests when running under emulator (#8189 )	2024-06-28 18:02:05 +01:00
Xuan Son Nguyen	26a39bbd6b	Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` (#8172 ) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch	2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
slaren	b851b3fba0	cmake : allow user to override default options (#8178 )	2024-06-28 12:37:45 +02:00
Olivier Chafik	139cc621e9	`json`: restore default additionalProperties to false, fix some pattern escapes (#8180 ) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md	2024-06-28 09:26:45 +01:00
pculliton	e57dc62057	llama: Add support for Gemma2ForCausalLM (#8156 ) * Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-06-27 21:00:43 -07:00
Xuan Son Nguyen	a27aa50ab7	Add missing items in makefile (#8177 )	2024-06-28 02:19:11 +02:00
Olivier Chafik	cb0b06a8a6	`json`: update grammars/README w/ examples & note about additionalProperties (#8132 ) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example	2024-06-27 22:08:42 +01:00
loonerin	558f44bf83	CI: fix release build (Ubuntu+Mac) (#8170 ) * CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <loonerin@users.noreply.github.com>	2024-06-27 21:01:23 +02:00
slaren	8172ee9da9	cmake : fix deprecated option names not working (#8171 ) * cmake : fix deprecated option names not working * remove LlAMA_OPENMP	2024-06-27 20:04:39 +02:00

1 2 3 4 5 ...

3424 commits