llama.cpp

Author	SHA1	Message	Date
HanishKVC	b8590e3e57	ChatON:P5:meta json to hpp: Add required c++ inc and global var Also comment to indicate that the hpp file is auto converted from the chaton_meta.json file	2024-05-12 14:06:24 +05:30
HanishKVC	b5b274a44b	ChatON:P4:meta json to hpp: Insert kv bool Rename kv helpers to match their semantic. * whether working with string or bool value * whether two keys or a single key Add support for kv with bool value inturn add the kv boolean pairs used in the chaton_meta.json file Add the closing bracket	2024-05-12 13:33:54 +05:30
HanishKVC	7b5fb0a2fa	ChatON:P3:meta json to hpp: Retain esc seqs and more kv pairs Use repr to retain the escape sequences in the read string. And parallely skip the single quote around strings wrt repr. Bring in more k-v pairs wrt chaton_meta.json	2024-05-12 13:06:22 +05:30
HanishKVC	078e04d32b	ChatON:P2:meta json to hpp conversion - add k-v pairs skeleton	2024-05-12 12:42:53 +05:30
HanishKVC	0c21a0084f	ChatON:p1: meta json to hpp conversion - Initial skeleton load the json file and put the template ids	2024-05-12 12:42:23 +05:30
slaren	b228aba91a	remove convert-lora-to-ggml.py (#7204 )	2024-05-12 02:29:33 +02:00
HanishKVC	1574201f71	ChatON:LoadJSon:ChatTemplates: revPrompt, system-user flags WIP:NOTE: Initial go converting from json driven flow to ChatTemplatesGroupKV related flow done. Needs to be tested. A optional helper added to load ChatTemplates from a specified json file. Need to add a compile time initialized MapOfMapOfVariants wrt the chat template details of models/standards already known to the program. So that one can use the llama.cpp and this new chat template logic, even without json dependency, if one doesnt want to.	2024-05-12 01:45:19 +05:30
HanishKVC	444d2ccf9c	ChatON:LoadJSON: ChatTemplates - global/system/user/assistant Manually iterate the json object items using begin-end explicitly, because the implicit iteration for loop related helpers for the used json lib gives only the values and not a key-value pair.	2024-05-12 01:35:31 +05:30
HanishKVC	2efc09f2d0	ChatON: Unnecessarily indirect nlohmann json code used for exploring/testing commited just for future reference	2024-05-12 00:42:17 +05:30
Georgi Gerganov	7bd4ffb780	metal : fix warnings (skipme) (#0 )	2024-05-11 21:38:13 +03:00
Georgi Gerganov	1622ac023f	sync : ggml	2024-05-11 21:35:05 +03:00
Georgi Gerganov	6aeff24f8b	metal : fix indent (ggml/0)	2024-05-11 21:34:21 +03:00
Georgi Gerganov	325756d28d	ggml : resolve merge (ggml/0) ggml-ci	2024-05-11 21:33:08 +03:00
HanishKVC	b9d9700de3	CMakeLists.txt: Compile C++ code for -std=c++20	2024-05-11 23:42:08 +05:30
HanishKVC	b944d04d08	ChatON: Add constructor for ChatTemplates which chains into GKV	2024-05-11 23:42:08 +05:30
HanishKVC	d9959b74e7	GroupKV: Get ready for use in llama.cpp ++ Avoid defining GKV_TEST_PRG, used for self testing, by default Add it to common library	2024-05-11 23:40:03 +05:30
Josh Ramer	fed0108491	Scripting & documenting debugging one test without anything else in the loop. (#7096 ) * A little documentation that shares my quick tips for working in the repository. * Update startup-testing-debugging.md * script that shows a menu of tests to pick from & run the debugger on * debug-test.sh: Refactor CLI help message * debug-test.sh: documentation update * debug-test.sh: CLI Help output corrections * debug-test.sh: minor doc fix --------- authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal> Assisted-by: brian khuu <mofosyne@gmail.com>	2024-05-12 03:26:35 +10:00
HanishKVC	4a9a6ce256	ChatON: ChatONMetaDump switch to GKV/ChatTemplates based flow	2024-05-11 22:53:45 +05:30
Xuan Son Nguyen	72c177c1f6	fix system prompt handling (#7153 )	2024-05-11 17:28:10 +02:00
HanishKVC	484c710eab	GroupKV:Add GetValue which throws exception	2024-05-11 20:49:51 +05:30
compilade	5a419926b0	convert-hf : support bfloat16 conversion (#7158 ) * convert-hf : support bfloat16 conversion * gguf-py : flake8 fixes * convert-hf : add missing space after comma * convert-hf : get bit-exact same output as ./quantize The quantization version was missing. * convert-hf : don't round bf16 NANs * convert-hf : save some memory with np.int16 intermediate bf16 weights * convert-hf : more closely match llama.cpp with which weights to keep in f32 * convert-hf : add --outtype auto-f16 A reason for this to exist is for model quantizers who want an initial GGUF with the most fidelity to the original model while still using a 16-bit float type instead of 32-bit floats. * convert-hf : remove a semicolon because flake8 doesn't like it It's a reflex from when programming in C/C++, I guess. * convert-hf : support outtype templating in outfile name * convert-hf : rename --outtype auto-f16 to --outtype auto	2024-05-11 11:06:26 -04:00
HanishKVC	9d4450d51a	GroupKV: Let dump return a string, rather than printing/logging	2024-05-11 19:43:34 +05:30
HanishKVC	e999934e91	ChatON:WIP: initial go at GroupKV based flow, instead of json	2024-05-11 19:41:58 +05:30
HanishKVC	f294fddf43	GroupKV: Add group_exists checker	2024-05-11 19:18:19 +05:30
HanishKVC	dde72df9d3	GroupKV: Rename the internal map	2024-05-11 18:23:06 +05:30
Georgi Gerganov	fae9d234b6	sync : ggml ggml-ci	2024-05-11 15:38:34 +03:00
Justina Cho	f5ef34e428	feat: implemented sigmoid function (ggml/806) * added sigmoid function * implemented metal kernel for sigmoid * implemented cuda kernel for sigmoid * added sigmoid unary op and incremented count	2024-05-11 15:38:34 +03:00
Borislav Stanimirov	ef0d5e3ec9	build: fix and ignore msvc warnings (ggml/805)	2024-05-11 15:38:34 +03:00
CrispStrobe	3292733f95	convert : skip unaccessible HF repos (#7210 )	2024-05-11 11:18:35 +03:00
Steve Grubb	988631335a	server : free llama_batch on exit (#7212 ) * [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces	2024-05-11 11:13:02 +03:00
Haoxiang Fei	f99e1e456e	llama : lookup word in vocab before doing BPE merges (#7193 ) * fix: llama-3 ignore_merges * test: add test for llama-3 bpe ignore_merges * fix: set ignore_merges only for llama-3 * fix: test-tokenizer-1-bpe --ingore-merges detection * fix: copy to fix fallthrough * fix: change ignore_merges to bool * fix: add ignore merges tests to cmake * llama : alternative merge ignore logic --------- Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-11 11:12:06 +03:00
Johannes Gäßler	5ae3426b0b	server: fix reported top tokens for temperature 0 (#7203 )	2024-05-11 10:11:28 +02:00
HanishKVC	fdefb39518	GroupKV:Make LDBUG macros conditional, avoid condition at usage site Also change LWARN to LDBUG wrt previously GKV_DEBUG conditional code	2024-05-11 13:30:56 +05:30
Joan Fontanals	b83cc3f5b3	llama : add Jina Embeddings architecture (#6826 ) * feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-11 10:46:09 +03:00
Georgi Gerganov	9cb317f77e	ggml : full ALiBi support (#7192 ) * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * ggml : fix assert message * vulkan : add dev notes * ggml : require mask when using ALiBi ggml-ci * convert : fix convert for refact models	2024-05-11 10:32:41 +03:00
HanishKVC	7f03dd0d4b	GroupKV: Add int32_t to variant list, to simplify int use So that no need to explicitly specify <int64_t> or LL wrt int literals, which dont need 64bit space by default. Which also means one shouldnt/cant mix up type of value stored and default type specified when getting.	2024-05-11 12:45:58 +05:30
HanishKVC	0342124946	GroupKV: Add to_str wrt vectors, help avoid compiler confusion	2024-05-11 12:27:42 +05:30
HanishKVC	7d7c59ec50	GroupKV:Simplify:P2: Rename tags, Make debug logs conditional Rename all the log messages to have GKV and not SC. The log messages in get_vector made conditional to GKV_DEBUG, this was missed out earlier in simpcfg itself.	2024-05-11 11:57:27 +05:30
HanishKVC	d764a9d395	GroupKV: Simplify code to the minimal needed for GroupKV - P1	2024-05-11 11:37:06 +05:30
HanishKVC	86b842b172	GroupKV: Duplicate SimpCfg to chop down into GroupKV IE a minimal MapOfMapOfVariant, with some basic helpers. This can be the basis of a ChatTemplates object, as well as SimpCfg built on top of it.	2024-05-11 10:57:32 +05:30
HanishKVC	c0506f94bf	SimpCfg: Allow for direct initialization lists based init This should pave way for having a default chat templates dataset in the code, without needing to load it from a config file, if one doesnt want to. TODO: allow for loading config from json into simpcfg, so that a program which uses llama.cpp can decide, whether it is ok with what is already there in the internal dataset, or allow for loading template info at runtime using the simpcfg's simple text file or additionally include the json code to load template info at runtime from json file.	2024-05-11 00:33:31 +05:30
HanishKVC	fe27902964	SimpCfg: Avoid iostream/cout and format for direct library use It appears like std::format is not supported in older g++/lib still in wide use like current debian stable, so avoiding same wrt direct library use. Allow for empty VAARGS NOTE: However test program mode of the same uses cout and format	2024-05-10 22:27:07 +05:30
slaren	e849648888	llama-bench : add pp+tg test type (#7199 )	2024-05-10 18:03:54 +02:00
HanishKVC	1f9a0eb8ce	ChatON: Remove unneeded iostream	2024-05-10 21:10:44 +05:30
Georgi Gerganov	18e437665c	metal : fix flash attention kernel requirements (#7169 ) * metal : fix flash attention kernel requirements ggml-ci * metal : fix ggml_metal_supports_op ggml-ci	2024-05-10 18:20:10 +03:00
Georgi Gerganov	8c660242d7	convert : print "ignore_merges" field	2024-05-10 17:53:04 +03:00
slaren	25c6e82e7a	llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200 )	2024-05-10 14:28:01 +02:00
Justine Tunney	4e3880978f	Fix memory bug in grammar parser (#7194 ) The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.	2024-05-10 21:01:08 +10:00
HanishKVC	f89fe2732c	Main+: optionally allow special tokens from user in interactive mode (#7097 ) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.	2024-05-10 20:21:58 +10:00
HanishKVC	abb406b888	Merge branch 'master' into hkvc_chaton_v3 Have merged master branch has of 20240510IST12XY with chaton_v3 branch. As part of same had to update the flow in examples/main/main.cpp wrt conversion related commit in master branch and my chaton related commits in this branch.	2024-05-10 13:14:26 +05:30

... 4 5 6 7 8 ...

3288 commits