Commit graph

3111 commits

Author SHA1 Message Date
HanishKVC
184ac322e3 ChatON: Make json_get efficient and flexible wrt its calling
Also explicitly indicate that we are looking at a chain of keys
2024-05-13 16:21:02 +05:30
Neo Zhang
948f4ec7c5
[SYCL] rm wait() (#7233) 2024-05-13 18:11:26 +08:00
Joan Fontanals
9aa672490c
llama : rename jina tokenizers to v2 (#7249)
* refactor: rename jina tokenizers to v2

* refactor: keep refactoring non-breaking
2024-05-13 11:35:14 +03:00
HanishKVC
eb7554ca3b ChatON: Avoid -> to match simpcfg as well as corresponding keys 2024-05-13 10:37:14 +05:30
Brian
b1f8af1886
convert.py: Outfile default name change and additional metadata support (#4858)
* convert.py: Outfile default name change and additional metadata support

* convert.py: don't stringify Metadata load method output

* convert.py: typo fix

* convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp
2024-05-13 12:56:47 +10:00
Benjamin Findley
e586ee4259
change default temperature of OAI compat API from 0 to 1 (#7226)
* change default temperature of OAI compat API from 0 to 1

* make tests explicitly send temperature to OAI API
2024-05-13 12:40:08 +10:00
Neo Zhang
cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package (#7241)
* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang
0d5cef78ae
[SYCL] update CI with oneapi 2024.1 (#7235)
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
HanishKVC
d5b0bfbaec SimpCfg: Remove now unused SC_DEBUG, rather GroupKV uses equiv
The code which was using SC_DEBUG moved to GroupKV and inturn
GKV_DEBUG
2024-05-13 00:33:36 +05:30
HanishKVC
857570f8f8 SimpCfgTest: Update dump usage to GKV return string semantic 2024-05-13 00:20:58 +05:30
HanishKVC
9249649fb3 ChatON+TestPrgs: Use specific log files 2024-05-12 23:59:48 +05:30
Johannes Gäßler
dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00
HanishKVC
3d33d62924 SimpCfg: Move testing code into its own file in tests
Also set functions to inline or static as appropriate
2024-05-12 22:53:48 +05:30
HanishKVC
f2dd1263fd GroupKV: Move test code into its own file in tests 2024-05-12 22:33:48 +05:30
HanishKVC
6048218383 SimpCFG: COnvert to GroupKV extended version
Reuse the code already moved into GroupKV

Add explicit get and set wrt int32_t, which was added after move
to GroupKV wrt basic MapOfMapOfVariant logic.
2024-05-12 21:58:59 +05:30
Georgi Gerganov
6f1b63606f
cmake : fix version cmp (#7227) 2024-05-12 18:30:23 +03:00
HanishKVC
db2ffabb18 ChatON: use templated json_get when loading bool key-value fields
With this now even loading chaton_meta.json file will generate
more informative exception, so that user can know which field
is missing, if any.
2024-05-12 18:26:58 +05:30
HanishKVC
470b8885f3 ChatON: Switch to templated json_get for str/bool/etal 2024-05-12 18:19:18 +05:30
HanishKVC
0249c07e6b ChatON:Switch to json_get_str to help identify missing keys better
The json library generates less informative exception message,
which doesnt help one identify which key is missing, so switch to
the new json_get_str helper added in the last commit. It generates
more informative exception message.
2024-05-12 17:44:13 +05:30
HanishKVC
4eae05a6b7 ChatON: json access helper which raises exception if key missing 2024-05-12 17:34:04 +05:30
HanishKVC
f94fed92d3 ChatON+MetaHpp: Had forgotten to conv reverse-prompt
Also has dump was using get_value calls with fallback to default,
so it wasnt identifying the missed field.

Have fixed both of those. Also reconverted meta json file.

Misc: interesting avesham and aattam
2024-05-12 16:20:28 +05:30
HanishKVC
4232ec1fb9 Main: Load json meta file only if specified
This should be ok, given that there is a version of the chat tmpl
meta data already included with the library.

So only if user wants to change the chat template info wrt a existing
model/template-standard or add a new one, then there is need to
pass a json file with info for that model/standard.
2024-05-12 14:53:37 +05:30
HanishKVC
a3285e8e25 ChatON:Include auto converted ChatONMeta.hpp chat template data
This should allow for using this generic chat templating code flow
along with the included chat template data, without needing to
load any json file at runtime.

However If user wants to change the already included chat template
data, or add new chat template standard/model related data, one can
explicitly load json file.

TODO: Need to cross check this flow once, but logically should work
2024-05-12 14:08:09 +05:30
HanishKVC
b8590e3e57 ChatON:P5:meta json to hpp: Add required c++ inc and global var
Also comment to indicate that the hpp file is auto converted from
the chaton_meta.json file
2024-05-12 14:06:24 +05:30
HanishKVC
b5b274a44b ChatON:P4:meta json to hpp: Insert kv bool
Rename kv helpers to match their semantic.
* whether working with string or bool value
* whether two keys or a single key

Add support for kv with bool value

inturn add the kv boolean pairs used in the chaton_meta.json file

Add the closing bracket
2024-05-12 13:33:54 +05:30
HanishKVC
7b5fb0a2fa ChatON:P3:meta json to hpp: Retain esc seqs and more kv pairs
Use repr to retain the escape sequences in the read string.
And parallely skip the single quote around strings wrt repr.

Bring in more k-v pairs wrt chaton_meta.json
2024-05-12 13:06:22 +05:30
HanishKVC
078e04d32b ChatON:P2:meta json to hpp conversion - add k-v pairs skeleton 2024-05-12 12:42:53 +05:30
HanishKVC
0c21a0084f ChatON:p1: meta json to hpp conversion - Initial skeleton
load the json file and put the template ids
2024-05-12 12:42:23 +05:30
slaren
b228aba91a
remove convert-lora-to-ggml.py (#7204) 2024-05-12 02:29:33 +02:00
HanishKVC
1574201f71 ChatON:LoadJSon:ChatTemplates: revPrompt, system-user flags
WIP:NOTE:

Initial go converting from json driven flow to ChatTemplatesGroupKV
related flow done. Needs to be tested.

A optional helper added to load ChatTemplates from a specified
json file.

Need to add a compile time initialized MapOfMapOfVariants wrt
the chat template details of models/standards already known
to the program. So that one can use the llama.cpp and this new
chat template logic, even without json dependency, if one doesnt
want to.
2024-05-12 01:45:19 +05:30
HanishKVC
444d2ccf9c ChatON:LoadJSON: ChatTemplates - global/system/user/assistant
Manually iterate the json object items using begin-end explicitly,
because the implicit iteration for loop related helpers for the
used json lib gives only the values and not a key-value pair.
2024-05-12 01:35:31 +05:30
HanishKVC
2efc09f2d0 ChatON: Unnecessarily indirect nlohmann json
code used for exploring/testing commited just for future reference
2024-05-12 00:42:17 +05:30
Georgi Gerganov
7bd4ffb780
metal : fix warnings (skipme) (#0) 2024-05-11 21:38:13 +03:00
Georgi Gerganov
1622ac023f
sync : ggml 2024-05-11 21:35:05 +03:00
Georgi Gerganov
6aeff24f8b
metal : fix indent (ggml/0) 2024-05-11 21:34:21 +03:00
Georgi Gerganov
325756d28d
ggml : resolve merge (ggml/0)
ggml-ci
2024-05-11 21:33:08 +03:00
HanishKVC
b9d9700de3 CMakeLists.txt: Compile C++ code for -std=c++20 2024-05-11 23:42:08 +05:30
HanishKVC
b944d04d08 ChatON: Add constructor for ChatTemplates which chains into GKV 2024-05-11 23:42:08 +05:30
HanishKVC
d9959b74e7 GroupKV: Get ready for use in llama.cpp ++
Avoid defining GKV_TEST_PRG, used for self testing, by default

Add it to common library
2024-05-11 23:40:03 +05:30
Josh Ramer
fed0108491
Scripting & documenting debugging one test without anything else in the loop. (#7096)
* A little documentation that shares my quick tips for working in the repository.

* Update startup-testing-debugging.md

* script that shows a menu of tests to pick from & run the debugger on

* debug-test.sh: Refactor CLI help message

* debug-test.sh: documentation update

* debug-test.sh: CLI Help output corrections

* debug-test.sh: minor doc fix

---------

authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-12 03:26:35 +10:00
HanishKVC
4a9a6ce256 ChatON: ChatONMetaDump switch to GKV/ChatTemplates based flow 2024-05-11 22:53:45 +05:30
Xuan Son Nguyen
72c177c1f6
fix system prompt handling (#7153) 2024-05-11 17:28:10 +02:00
HanishKVC
484c710eab GroupKV:Add GetValue which throws exception 2024-05-11 20:49:51 +05:30
compilade
5a419926b0
convert-hf : support bfloat16 conversion (#7158)
* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto
2024-05-11 11:06:26 -04:00
HanishKVC
9d4450d51a GroupKV: Let dump return a string, rather than printing/logging 2024-05-11 19:43:34 +05:30
HanishKVC
e999934e91 ChatON:WIP: initial go at GroupKV based flow, instead of json 2024-05-11 19:41:58 +05:30
HanishKVC
f294fddf43 GroupKV: Add group_exists checker 2024-05-11 19:18:19 +05:30
HanishKVC
dde72df9d3 GroupKV: Rename the internal map 2024-05-11 18:23:06 +05:30
Georgi Gerganov
fae9d234b6 sync : ggml
ggml-ci
2024-05-11 15:38:34 +03:00
Justina Cho
f5ef34e428 feat: implemented sigmoid function (ggml/806)
* added sigmoid function

* implemented metal kernel for sigmoid

* implemented cuda kernel for sigmoid

* added sigmoid unary op and incremented count
2024-05-11 15:38:34 +03:00