llama.cpp

Author	SHA1	Message	Date
HanishKVC	184ac322e3	ChatON: Make json_get efficient and flexible wrt its calling Also explicitly indicate that we are looking at a chain of keys	2024-05-13 16:21:02 +05:30
Neo Zhang	948f4ec7c5	[SYCL] rm wait() (#7233 )	2024-05-13 18:11:26 +08:00
Joan Fontanals	9aa672490c	llama : rename jina tokenizers to v2 (#7249 ) * refactor: rename jina tokenizers to v2 * refactor: keep refactoring non-breaking	2024-05-13 11:35:14 +03:00
HanishKVC	eb7554ca3b	ChatON: Avoid -> to match simpcfg as well as corresponding keys	2024-05-13 10:37:14 +05:30
Brian	b1f8af1886	convert.py: Outfile default name change and additional metadata support (#4858 ) * convert.py: Outfile default name change and additional metadata support * convert.py: don't stringify Metadata load method output * convert.py: typo fix * convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp	2024-05-13 12:56:47 +10:00
Benjamin Findley	e586ee4259	change default temperature of OAI compat API from 0 to 1 (#7226 ) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API	2024-05-13 12:40:08 +10:00
Neo Zhang	cbf75894d2	[SYCL] Add oneapi runtime dll files to win release package (#7241 ) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:04:29 +08:00
Neo Zhang	0d5cef78ae	[SYCL] update CI with oneapi 2024.1 (#7235 ) Co-authored-by: Zhang <jianyu.zhang@intel.com>	2024-05-13 08:02:55 +08:00
HanishKVC	d5b0bfbaec	SimpCfg: Remove now unused SC_DEBUG, rather GroupKV uses equiv The code which was using SC_DEBUG moved to GroupKV and inturn GKV_DEBUG	2024-05-13 00:33:36 +05:30
HanishKVC	857570f8f8	SimpCfgTest: Update dump usage to GKV return string semantic	2024-05-13 00:20:58 +05:30
HanishKVC	9249649fb3	ChatON+TestPrgs: Use specific log files	2024-05-12 23:59:48 +05:30
Johannes Gäßler	dc685be466	CUDA: add FP32 FlashAttention vector kernel (#7188 ) * CUDA: add FP32 FlashAttention vector kernel * fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel	2024-05-12 19:40:45 +02:00
HanishKVC	3d33d62924	SimpCfg: Move testing code into its own file in tests Also set functions to inline or static as appropriate	2024-05-12 22:53:48 +05:30
HanishKVC	f2dd1263fd	GroupKV: Move test code into its own file in tests	2024-05-12 22:33:48 +05:30
HanishKVC	6048218383	SimpCFG: COnvert to GroupKV extended version Reuse the code already moved into GroupKV Add explicit get and set wrt int32_t, which was added after move to GroupKV wrt basic MapOfMapOfVariant logic.	2024-05-12 21:58:59 +05:30
Georgi Gerganov	6f1b63606f	cmake : fix version cmp (#7227 )	2024-05-12 18:30:23 +03:00
HanishKVC	db2ffabb18	ChatON: use templated json_get when loading bool key-value fields With this now even loading chaton_meta.json file will generate more informative exception, so that user can know which field is missing, if any.	2024-05-12 18:26:58 +05:30
HanishKVC	470b8885f3	ChatON: Switch to templated json_get for str/bool/etal	2024-05-12 18:19:18 +05:30
HanishKVC	0249c07e6b	ChatON:Switch to json_get_str to help identify missing keys better The json library generates less informative exception message, which doesnt help one identify which key is missing, so switch to the new json_get_str helper added in the last commit. It generates more informative exception message.	2024-05-12 17:44:13 +05:30
HanishKVC	4eae05a6b7	ChatON: json access helper which raises exception if key missing	2024-05-12 17:34:04 +05:30
HanishKVC	f94fed92d3	ChatON+MetaHpp: Had forgotten to conv reverse-prompt Also has dump was using get_value calls with fallback to default, so it wasnt identifying the missed field. Have fixed both of those. Also reconverted meta json file. Misc: interesting avesham and aattam	2024-05-12 16:20:28 +05:30
HanishKVC	4232ec1fb9	Main: Load json meta file only if specified This should be ok, given that there is a version of the chat tmpl meta data already included with the library. So only if user wants to change the chat template info wrt a existing model/template-standard or add a new one, then there is need to pass a json file with info for that model/standard.	2024-05-12 14:53:37 +05:30
HanishKVC	a3285e8e25	ChatON:Include auto converted ChatONMeta.hpp chat template data This should allow for using this generic chat templating code flow along with the included chat template data, without needing to load any json file at runtime. However If user wants to change the already included chat template data, or add new chat template standard/model related data, one can explicitly load json file. TODO: Need to cross check this flow once, but logically should work	2024-05-12 14:08:09 +05:30
HanishKVC	b8590e3e57	ChatON:P5:meta json to hpp: Add required c++ inc and global var Also comment to indicate that the hpp file is auto converted from the chaton_meta.json file	2024-05-12 14:06:24 +05:30
HanishKVC	b5b274a44b	ChatON:P4:meta json to hpp: Insert kv bool Rename kv helpers to match their semantic. * whether working with string or bool value * whether two keys or a single key Add support for kv with bool value inturn add the kv boolean pairs used in the chaton_meta.json file Add the closing bracket	2024-05-12 13:33:54 +05:30
HanishKVC	7b5fb0a2fa	ChatON:P3:meta json to hpp: Retain esc seqs and more kv pairs Use repr to retain the escape sequences in the read string. And parallely skip the single quote around strings wrt repr. Bring in more k-v pairs wrt chaton_meta.json	2024-05-12 13:06:22 +05:30
HanishKVC	078e04d32b	ChatON:P2:meta json to hpp conversion - add k-v pairs skeleton	2024-05-12 12:42:53 +05:30
HanishKVC	0c21a0084f	ChatON:p1: meta json to hpp conversion - Initial skeleton load the json file and put the template ids	2024-05-12 12:42:23 +05:30
slaren	b228aba91a	remove convert-lora-to-ggml.py (#7204 )	2024-05-12 02:29:33 +02:00
HanishKVC	1574201f71	ChatON:LoadJSon:ChatTemplates: revPrompt, system-user flags WIP:NOTE: Initial go converting from json driven flow to ChatTemplatesGroupKV related flow done. Needs to be tested. A optional helper added to load ChatTemplates from a specified json file. Need to add a compile time initialized MapOfMapOfVariants wrt the chat template details of models/standards already known to the program. So that one can use the llama.cpp and this new chat template logic, even without json dependency, if one doesnt want to.	2024-05-12 01:45:19 +05:30
HanishKVC	444d2ccf9c	ChatON:LoadJSON: ChatTemplates - global/system/user/assistant Manually iterate the json object items using begin-end explicitly, because the implicit iteration for loop related helpers for the used json lib gives only the values and not a key-value pair.	2024-05-12 01:35:31 +05:30
HanishKVC	2efc09f2d0	ChatON: Unnecessarily indirect nlohmann json code used for exploring/testing commited just for future reference	2024-05-12 00:42:17 +05:30
Georgi Gerganov	7bd4ffb780	metal : fix warnings (skipme) (#0 )	2024-05-11 21:38:13 +03:00
Georgi Gerganov	1622ac023f	sync : ggml	2024-05-11 21:35:05 +03:00
Georgi Gerganov	6aeff24f8b	metal : fix indent (ggml/0)	2024-05-11 21:34:21 +03:00
Georgi Gerganov	325756d28d	ggml : resolve merge (ggml/0) ggml-ci	2024-05-11 21:33:08 +03:00
HanishKVC	b9d9700de3	CMakeLists.txt: Compile C++ code for -std=c++20	2024-05-11 23:42:08 +05:30
HanishKVC	b944d04d08	ChatON: Add constructor for ChatTemplates which chains into GKV	2024-05-11 23:42:08 +05:30
HanishKVC	d9959b74e7	GroupKV: Get ready for use in llama.cpp ++ Avoid defining GKV_TEST_PRG, used for self testing, by default Add it to common library	2024-05-11 23:40:03 +05:30
Josh Ramer	fed0108491	Scripting & documenting debugging one test without anything else in the loop. (#7096 ) * A little documentation that shares my quick tips for working in the repository. * Update startup-testing-debugging.md * script that shows a menu of tests to pick from & run the debugger on * debug-test.sh: Refactor CLI help message * debug-test.sh: documentation update * debug-test.sh: CLI Help output corrections * debug-test.sh: minor doc fix --------- authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal> Assisted-by: brian khuu <mofosyne@gmail.com>	2024-05-12 03:26:35 +10:00
HanishKVC	4a9a6ce256	ChatON: ChatONMetaDump switch to GKV/ChatTemplates based flow	2024-05-11 22:53:45 +05:30
Xuan Son Nguyen	72c177c1f6	fix system prompt handling (#7153 )	2024-05-11 17:28:10 +02:00
HanishKVC	484c710eab	GroupKV:Add GetValue which throws exception	2024-05-11 20:49:51 +05:30
compilade	5a419926b0	convert-hf : support bfloat16 conversion (#7158 ) * convert-hf : support bfloat16 conversion * gguf-py : flake8 fixes * convert-hf : add missing space after comma * convert-hf : get bit-exact same output as ./quantize The quantization version was missing. * convert-hf : don't round bf16 NANs * convert-hf : save some memory with np.int16 intermediate bf16 weights * convert-hf : more closely match llama.cpp with which weights to keep in f32 * convert-hf : add --outtype auto-f16 A reason for this to exist is for model quantizers who want an initial GGUF with the most fidelity to the original model while still using a 16-bit float type instead of 32-bit floats. * convert-hf : remove a semicolon because flake8 doesn't like it It's a reflex from when programming in C/C++, I guess. * convert-hf : support outtype templating in outfile name * convert-hf : rename --outtype auto-f16 to --outtype auto	2024-05-11 11:06:26 -04:00
HanishKVC	9d4450d51a	GroupKV: Let dump return a string, rather than printing/logging	2024-05-11 19:43:34 +05:30
HanishKVC	e999934e91	ChatON:WIP: initial go at GroupKV based flow, instead of json	2024-05-11 19:41:58 +05:30
HanishKVC	f294fddf43	GroupKV: Add group_exists checker	2024-05-11 19:18:19 +05:30
HanishKVC	dde72df9d3	GroupKV: Rename the internal map	2024-05-11 18:23:06 +05:30
Georgi Gerganov	fae9d234b6	sync : ggml ggml-ci	2024-05-11 15:38:34 +03:00
Justina Cho	f5ef34e428	feat: implemented sigmoid function (ggml/806) * added sigmoid function * implemented metal kernel for sigmoid * implemented cuda kernel for sigmoid * added sigmoid unary op and incremented count	2024-05-11 15:38:34 +03:00

1 2 3 4 5 ...

3111 commits