Commit graph

  • db2ffabb18 ChatON: use templated json_get when loading bool key-value fields HanishKVC 2024-05-12 18:24:02 +05:30
  • 470b8885f3 ChatON: Switch to templated json_get for str/bool/etal HanishKVC 2024-05-12 18:19:18 +05:30
  • 0249c07e6b ChatON:Switch to json_get_str to help identify missing keys better HanishKVC 2024-05-12 17:44:13 +05:30
  • 4eae05a6b7 ChatON: json access helper which raises exception if key missing HanishKVC 2024-05-12 17:08:03 +05:30
  • 6cf75b2711 fix path Zhang 2024-05-12 19:55:15 +08:00
  • f94fed92d3 ChatON+MetaHpp: Had forgotten to conv reverse-prompt HanishKVC 2024-05-12 15:59:37 +05:30
  • f547c4f54a Update server.cpp Maximilian Winter 2024-05-12 12:06:36 +02:00
  • 0add3107f7 spacing changes. Julia Longtin 2024-05-12 09:36:08 +00:00
  • 4232ec1fb9 Main: Load json meta file only if specified HanishKVC 2024-05-12 14:53:37 +05:30
  • f0d7be409d remove unrelevant debug code Achazwl 2024-05-12 16:40:51 +08:00
  • a3285e8e25 ChatON:Include auto converted ChatONMeta.hpp chat template data HanishKVC 2024-05-12 13:56:02 +05:30
  • b8590e3e57 ChatON:P5:meta json to hpp: Add required c++ inc and global var HanishKVC 2024-05-12 13:50:22 +05:30
  • d2ca97bc89 fix path Zhang 2024-05-12 16:17:32 +08:00
  • a7a5f42c07 fix path Zhang 2024-05-12 16:13:13 +08:00
  • 9b28d860a5 fix path Zhang 2024-05-12 16:08:53 +08:00
  • 22e9d24a1c fix path Zhang 2024-05-12 16:04:26 +08:00
  • b5b274a44b ChatON:P4:meta json to hpp: Insert kv bool HanishKVC 2024-05-12 13:21:17 +05:30
  • 6a3c7fe022 add oneapi running time dlls to release package Zhang 2024-05-12 15:41:24 +08:00
  • 7b5fb0a2fa ChatON:P3:meta json to hpp: Retain esc seqs and more kv pairs HanishKVC 2024-05-12 13:06:22 +05:30
  • 078e04d32b ChatON:P2:meta json to hpp conversion - add k-v pairs skeleton HanishKVC 2024-05-12 12:40:33 +05:30
  • 0c21a0084f ChatON:p1: meta json to hpp conversion - Initial skeleton HanishKVC 2024-05-12 12:29:16 +05:30
  • 9c5d3fcffb Update and fix Vulkan argsort implementation 0cc4m 2024-05-12 08:53:17 +02:00
  • 720f132280 Update and fix Vulkan softmax implementation 0cc4m 2024-05-12 08:52:54 +02:00
  • 69a0609d81 update CI with oneapi 2024.1 Zhang 2024-05-12 12:55:56 +08:00
  • 2b1e5ea37b convert-hf: add missing ftype Francis Couture-Harpin 2024-05-11 23:18:30 -04:00
  • 6c8fdb8e5a adding the components of half2 seems to be compiled faster Engininja2 2024-05-11 21:07:59 -06:00
  • b228aba91a
    remove convert-lora-to-ggml.py (#7204) b2860 slaren 2024-05-12 02:29:33 +02:00
  • 540d9b5970 make tests explicitly send temperature to OAI API Benjamin Findley 2024-05-11 17:25:58 -07:00
  • 62d786b4f3
    remove comment slaren 2024-05-12 02:00:16 +02:00
  • 19a88d4640
    Fixed the issue where some terminals wouldn't show the message colors Behnam Moh 2024-05-11 19:52:30 -04:00
  • 3bc8f2b1d5
    Better ccache guide in Makefile Behnam Moh 2024-05-11 19:45:20 -04:00
  • 472a9b8be5
    Merge branch 'ggerganov:master' into master Behnam Moh 2024-05-11 19:38:54 -04:00
  • 5c5911d69d Remove custom enum, rename left recursion check and move to "grammar internal" section, add handling for edge case where a leftmost nonterminal may be empty Haggai Nuchi 2024-05-11 16:28:59 -07:00
  • 65176e78ec Add left recursion check: quit early instead of going into an infinite loop Haggai Nuchi 2024-05-04 22:43:08 -07:00
  • a20edbf300 do 2 rounds of 4, instead of 4 rounds of 2. and properly offset unalligned reads across a 64 byte boundary. Julia Longtin 2024-05-11 20:28:47 +00:00
  • 1574201f71 ChatON:LoadJSon:ChatTemplates: revPrompt, system-user flags HanishKVC 2024-05-12 01:36:03 +05:30
  • 444d2ccf9c ChatON:LoadJSON: ChatTemplates - global/system/user/assistant HanishKVC 2024-05-12 01:15:04 +05:30
  • b23ab86eda make offset available in a register. Julia Longtin 2024-05-11 19:57:45 +00:00
  • 1072686dcf load from identical addresses for low and high side. Julia Longtin 2024-05-11 19:48:53 +00:00
  • 3449b0f359 minor comment fixes. Julia Longtin 2024-05-11 19:47:20 +00:00
  • efdb4116d1 make the offset of q4 available. Julia Longtin 2024-05-11 19:39:53 +00:00
  • 9e6f2e2aff cuda : add amd dpp version of warp_reduce_sum for half2 Engininja2 2024-05-11 13:38:34 -06:00
  • 9550ca516f add missing vector. Julia Longtin 2024-05-11 19:29:09 +00:00
  • 653a565a02 fill and increment r12 and r13. Julia Longtin 2024-05-11 19:24:11 +00:00
  • a880154f03 change default temperature of OAI compat API from 0 to 1 Benjamin Findley 2024-05-11 12:16:07 -07:00
  • 2efc09f2d0 ChatON: Unnecessarily indirect nlohmann json HanishKVC 2024-05-12 00:41:15 +05:30
  • 7fa2d73b0a relabel some other labels. Julia Longtin 2024-05-11 19:02:48 +00:00
  • 7bd4ffb780
    metal : fix warnings (skipme) (#0) b2859 Georgi Gerganov 2024-05-11 21:36:20 +03:00
  • 1622ac023f
    sync : ggml Georgi Gerganov 2024-05-11 21:35:05 +03:00
  • 6aeff24f8b
    metal : fix indent (ggml/0) Georgi Gerganov 2024-05-11 16:57:53 +03:00
  • 325756d28d
    ggml : resolve merge (ggml/0) Georgi Gerganov 2024-05-11 16:25:50 +03:00
  • 7a3f7e94ba cuda : use amd wave sharing intrinsics for warp_reduce functions Engininja2 2024-04-04 15:09:03 -06:00
  • b9d9700de3 CMakeLists.txt: Compile C++ code for -std=c++20 HanishKVC 2024-05-11 23:36:02 +05:30
  • b944d04d08 ChatON: Add constructor for ChatTemplates which chains into GKV HanishKVC 2024-05-11 23:31:53 +05:30
  • d9959b74e7 GroupKV: Get ready for use in llama.cpp ++ HanishKVC 2024-05-11 23:30:12 +05:30
  • 047defea41 rename some labels. Julia Longtin 2024-05-11 17:56:10 +00:00
  • fed0108491
    Scripting & documenting debugging one test without anything else in the loop. (#7096) Josh Ramer 2024-05-11 12:26:35 -05:00
  • 4a9a6ce256 ChatON: ChatONMetaDump switch to GKV/ChatTemplates based flow HanishKVC 2024-05-11 22:47:27 +05:30
  • 72c177c1f6
    fix system prompt handling (#7153) b2854 Xuan Son Nguyen 2024-05-11 17:28:10 +02:00
  • 484c710eab GroupKV:Add GetValue which throws exception HanishKVC 2024-05-11 20:46:10 +05:30
  • 4b2c00f227 debug-test.sh: minor doc fix brian khuu 2024-05-12 01:17:28 +10:00
  • 3d8ed608e9 debug-test.sh: CLI Help output corrections brian khuu 2024-05-12 01:15:38 +10:00
  • d7e199e444 convert-hf : support q8_0 conversion Francis Couture-Harpin 2024-05-10 14:47:28 -04:00
  • 5a419926b0
    convert-hf : support bfloat16 conversion (#7158) compilade 2024-05-11 11:06:26 -04:00
  • 7227161611 debug-test.sh: documentation update brian khuu 2024-05-12 01:04:41 +10:00
  • 132e5fc201 debug-test.sh: Refactor CLI help message brian khuu 2024-05-12 01:04:08 +10:00
  • a1d0da669d rename label 1 to 3. Julia Longtin 2024-05-11 14:24:30 +00:00
  • 9d4450d51a GroupKV: Let dump return a string, rather than printing/logging HanishKVC 2024-05-11 19:43:34 +05:30
  • e999934e91 ChatON:WIP: initial go at GroupKV based flow, instead of json HanishKVC 2024-05-11 19:18:50 +05:30
  • 0a0bb9b7db introduce r10 and r11, for vloadunpackhd. Julia Longtin 2024-05-11 14:02:36 +00:00
  • f294fddf43 GroupKV: Add group_exists checker HanishKVC 2024-05-11 19:18:19 +05:30
  • 9d7f967e88 spacing changes. Julia Longtin 2024-05-11 13:35:50 +00:00
  • aa9cbd7660 fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel Johannes Gäßler 2024-05-11 15:34:09 +02:00
  • f3c3eafa6e fixup! fixup! CUDA: add FP32 FlashAttention vector kernel Johannes Gäßler 2024-05-11 15:32:46 +02:00
  • 6c4e687b85 spacing changes. Julia Longtin 2024-05-11 13:26:00 +00:00
  • 41f5f3a4e4 fixup! CUDA: add FP32 FlashAttention vector kernel Johannes Gäßler 2024-05-11 15:12:36 +02:00
  • 595e15f35b allow test number arg & specify build output flag Ubuntu 2024-05-11 13:07:58 +00:00
  • b34575b1f3 add missing jump. Julia Longtin 2024-05-11 12:53:23 +00:00
  • dde72df9d3 GroupKV: Rename the internal map HanishKVC 2024-05-11 18:23:06 +05:30
  • bbeb952aca CUDA: add FP32 FlashAttention vector kernel Johannes Gäßler 2024-05-09 16:20:45 +02:00
  • fae9d234b6 sync : ggml b2852 Georgi Gerganov 2024-05-11 12:02:39 +03:00
  • f5ef34e428 feat: implemented sigmoid function (ggml/806) Justina Cho 2024-05-01 14:44:26 -07:00
  • ef0d5e3ec9 build: fix and ignore msvc warnings (ggml/805) Borislav Stanimirov 2024-04-25 17:24:07 +03:00
  • fa0226c8df look at the right final memory location. Julia Longtin 2024-05-11 11:27:52 +00:00
  • fba57c125c subtract the correct amount. Julia Longtin 2024-05-11 11:11:15 +00:00
  • 3156e639bf change from handling three iterations per loop to four. Julia Longtin 2024-05-11 11:07:16 +00:00
  • 96b89063e5
    Merge 160d0f0a8b into 3292733f95 Julia Longtin 2024-05-11 18:50:44 +08:00
  • 3269efe70d Merge branch 'master' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-05-11 11:50:26 +02:00
  • 6ee7c9886b
    sync : ggml Georgi Gerganov 2024-05-11 12:02:39 +03:00
  • 97e688e856
    feat: implemented sigmoid function (ggml/806) Justina Cho 2024-05-01 14:44:26 -07:00
  • d6a4e035ad
    build: fix and ignore msvc warnings (ggml/805) Borislav Stanimirov 2024-04-25 17:24:07 +03:00
  • 3292733f95
    convert : skip unaccessible HF repos (#7210) CrispStrobe 2024-05-11 10:18:35 +02:00
  • 988631335a
    server : free llama_batch on exit (#7212) b2848 Steve Grubb 2024-05-11 04:13:02 -04:00
  • f99e1e456e
    llama : lookup word in vocab before doing BPE merges (#7193) b2847 Haoxiang Fei 2024-05-11 16:12:06 +08:00
  • 5ae3426b0b
    server: fix reported top tokens for temperature 0 (#7203) b2846 Johannes Gäßler 2024-05-11 10:11:28 +02:00
  • b8d3cd5337
    llama : alternative merge ignore logic Georgi Gerganov 2024-05-11 11:10:23 +03:00
  • fdefb39518 GroupKV:Make LDBUG macros conditional, avoid condition at usage site HanishKVC 2024-05-11 13:21:41 +05:30
  • b83cc3f5b3
    llama : add Jina Embeddings architecture (#6826) b2845 Joan Fontanals 2024-05-11 09:46:09 +02:00
  • 49b3dbbe08
    embedding : add warning about missing SEP Georgi Gerganov 2024-05-11 10:44:45 +03:00
  • 23499b81d9
    Merge branch 'master' into HEAD Georgi Gerganov 2024-05-11 10:35:17 +03:00