Commit graph

  • aac1e664a7
    Merge branch 'ggerganov:master' into jukofyork-command_r-control-vector-fix jukofyork 2024-06-22 17:16:13 +01:00
  • 3e58b0ee35
    cvector: fix CI + correct help message (#8064) b3202 Xuan Son Nguyen 2024-06-22 18:11:30 +02:00
  • f57ebf281e
    Fixed all models' control vectors jukofyork 2024-06-22 17:10:08 +01:00
  • 2c42c51ad5
    Merge branch 'ggerganov:master' into jukofyork-command_r-control-vector-fix jukofyork 2024-06-22 17:09:13 +01:00
  • 69a4755f74 server ci : disable thread sanitizer slaren 2024-06-22 17:38:30 +02:00
  • 5d7695e03f test-backend-ops : increase cpy max nmse slaren 2024-06-22 17:38:11 +02:00
  • cc6eec0c08 also correct --pca-iter ngxson 2024-06-22 17:26:02 +02:00
  • bc8cb23993 cvector: fix CI + correct help message ngxson 2024-06-22 17:23:17 +02:00
  • adf480c3ab
    cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (#8052) b3201 HatsuneMikuUwU33 2024-06-22 17:19:37 +02:00
  • 3aa184a8c7
    convert-hf : change assert to exception (#8015) 0xspringtime 2024-06-22 09:37:41 -04:00
  • 5b48cd53a8
    Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058) b3199 ddh0 2024-06-22 07:16:10 -06:00
  • af2e033051
    Fixes qwen2 too jukofyork 2024-06-22 11:07:17 +01:00
  • 8e092616a9
    fixes #7999 jukofyork 2024-06-22 10:39:26 +01:00
  • 31388960d2
    Merge 68e7c2579a into c5a8d4b749 kalomaze 2024-06-22 15:40:50 +08:00
  • c5a8d4b749
    JSON Schema to GBNF integration tests (#7790) Clint Herron 2024-06-21 23:18:36 -04:00
  • 5d15dc832e Fixing grammar indentation to be consistent throughout file. Clint Herron 2024-06-21 23:00:31 -04:00
  • 3bdf7c3021 Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings. Clint Herron 2024-06-21 22:43:57 -04:00
  • 939b58ae6b Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework. HanClinto 2024-06-12 10:47:01 -07:00
  • d4a63b0538 Merging improved schema test methods added by @ochafik in #7797 HanClinto 2024-06-12 10:42:37 -07:00
  • acd3c468af Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs. Clint Herron 2024-06-05 22:41:03 -07:00
  • 74985def80 Adding additional examples as documented in #7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program. Clint Herron 2024-06-05 22:29:25 -07:00
  • 2b174dd9c5 Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars. Clint Herron 2024-06-05 19:28:13 -07:00
  • c51b0e5e85
    Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values ddh0 2024-06-21 14:58:56 -05:00
  • 5c4ba81933 Remove comment ltoniazzi 2024-06-21 18:00:12 +01:00
  • 65a9c9bac6 re-enabled mul_mat_batched_sycl path for batched Q*K & KQ*V OuadiElfarouki 2024-06-21 17:50:28 +01:00
  • 26df64ad04 Fix passing param ltoniazzi 2024-06-21 17:28:14 +01:00
  • 12112bfa48 Add basic cpu setup ltoniazzi 2024-06-21 16:44:33 +01:00
  • 1f6e1b0086
    Merge branch 'ggerganov:master' into sgemm_iq4_nl Eve 2024-06-21 15:27:30 +00:00
  • b452e826cb Add tokenizer flag: clean_up_tokenization_spaces jaime-m-p 2024-06-21 16:12:26 +02:00
  • da4f6617c6 gguf-py, convert-hf : add model conversion support for T5ForConditionalGeneration and T5WithLMHeadModel Stanisław Szymczyk 2024-06-21 15:22:15 +02:00
  • e07ef1c440
    Update cvector-generator.cpp HatsuneMikuUwU33 2024-06-21 11:23:27 +02:00
  • 6896afb23a
    Update cvector-generator.cpp HatsuneMikuUwU33 2024-06-21 11:10:16 +02:00
  • fe8205ba1c
    Update positive.txt HatsuneMikuUwU33 2024-06-21 11:09:38 +02:00
  • edcfc81533
    Update negative.txt HatsuneMikuUwU33 2024-06-21 11:09:22 +02:00
  • 557b653dc9
    vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022) b3197 k.h.lai 2024-06-21 16:28:20 +08:00
  • 0520d88edf
    Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-21 16:19:59 +08:00
  • 55a57a5063 reuse llm_build_kv Eddie-Wang1120 2024-06-21 16:12:48 +08:00
  • 4b65b648ce add preprocess to chatglm3 and chatglm4 toyer 2024-06-21 07:47:51 +00:00
  • 733cb122a6 vulkan: fix id query Adriankhl 2024-06-21 15:43:59 +08:00
  • 7d5e8777ae
    ggml : AVX IQ quants (#7845) b3196 Eve 2024-06-21 05:57:36 +00:00
  • a927b0f3dd
    llama : optimize long word tokenization with WPM (#8034) b3195 Georgi Gerganov 2024-06-21 08:51:28 +03:00
  • 80ea089d77
    llama : allow pooled embeddings on any model (#7477) b3194 Douglas Hanley 2024-06-21 00:38:22 -05:00
  • 0e64591e82
    swiftui : enable stream updating (#7754) b3193 Shuichi Tsutsumi 2024-06-21 14:30:58 +09:00
  • ffd430babc fix ci netrunnereve 2024-06-21 00:33:54 -04:00
  • ff0aa3abd1 fix part of mul_mat_id sycl-mul-mat-id Meng, Hengyu 2024-06-21 03:38:00 +00:00
  • ced082c9dd
    Update sgemm.cpp Eve 2024-06-21 03:30:16 +00:00
  • b54877c164 oops netrunnereve 2024-06-20 23:04:20 -04:00
  • c848b7135a Merge branch 'avx_iq' into sgemm_iq4_nl netrunnereve 2024-06-20 22:59:53 -04:00
  • 6559208845 iq4_nl sgemm netrunnereve 2024-06-20 22:59:06 -04:00
  • a055767ad8
    Merge branch 'ggerganov:master' into avx_iq Eve 2024-06-21 02:49:18 +00:00
  • 626be75e73
    Merge f5a3bbbdda into b1ef562bc1 S David 2024-06-20 18:09:38 -04:00
  • 6d233bc132 Remove previous space jaime-m-p 2024-06-21 00:01:31 +02:00
  • 0cc6593f10 Remove previous space jaime-m-p 2024-06-21 00:00:35 +02:00
  • 503b7531c7 Fix add_space_prefix, set false by default jaime-m-p 2024-06-20 22:48:24 +02:00
  • b1ef562bc1
    requirements : Bump torch and numpy for python3.12 (#8041) Hamdoud Hakem 2024-06-20 21:01:15 +01:00
  • 17b291a6a5
    convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040) Hamdoud Hakem 2024-06-20 20:59:59 +01:00
  • 064b35eaff Update bruteforce random tests jaime-m-p 2024-06-20 21:41:37 +02:00
  • 88be57338f enable curl in nix build Michael Francis 2024-06-20 15:30:44 -04:00
  • 4198bb6b57
    Merge b639e2a73f into abd894ad96 John 2024-06-20 18:36:40 +00:00
  • 071bf42f23 Clean old known problematic codepoints jaime-m-p 2024-06-20 19:25:32 +02:00
  • 03dbcc89f6 minor: confusing hexadecimal codepoint jaime-m-p 2024-06-20 19:20:37 +02:00
  • 16a7503dcc Fix tokenizer tests jaime-m-p 2024-06-20 19:18:23 +02:00
  • 40a66606a8 Using llama_tokenize() in tests jaime-m-p 2024-06-20 19:14:02 +02:00
  • 5d8fbacb08 Fixed packages versions Hamdoud Hakem 2024-06-20 18:12:36 +01:00
  • 839875f812 Fix the encoding in the convert-hf-to-gguf-update.py Hamdoud Hakem 2024-06-20 17:56:08 +01:00
  • d779bab49c Using llama_tokenize() in tests jaime-m-p 2024-06-20 18:20:16 +02:00
  • 7925db09e3
    Merge 11dbcf02ae into abd894ad96 Xuan Son Nguyen 2024-06-20 16:07:02 +00:00
  • eea8dfab6b Add llama_detokenize() jaime-m-p 2024-06-20 17:51:16 +02:00
  • c6ddfa7e37
    fix whitespace Eddie-Wang 2024-06-20 22:41:29 +08:00
  • abd894ad96
    common: fix warning (#8036) b3190 Johannes Gäßler 2024-06-20 16:40:13 +02:00
  • abcdc5033a
    Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-20 22:33:53 +08:00
  • 8d7034fda9
    Update common/common.cpp Johannes Gäßler 2024-06-20 16:23:41 +02:00
  • c4a14e8150 common: fix warning Johannes Gäßler 2024-06-20 15:49:24 +02:00
  • 2f290b5cea vulkan: remove unneeded variables Adriankhl 2024-06-20 21:44:23 +08:00
  • de391e4c80
    [SYCL] Fix windows build and inference (#8003) b3189 luoyu-intel 2024-06-20 13:19:05 +00:00
  • d50f8897a7
    CUDA: stream-k decomposition for MMQ (#8018) b3188 Johannes Gäßler 2024-06-20 14:39:21 +02:00
  • a58cf0d61f remove q22_grid Eddie-Wang1120 2024-06-20 20:08:10 +08:00
  • 2b097682e0 remove q2_2 Eddie-Wang1120 2024-06-20 20:06:13 +08:00
  • 141d0810ec fix undefined memory reads for small matrices Johannes Gäßler 2024-06-20 14:05:50 +02:00
  • 677bf2e928
    llama : optimize long word tokenization with WPM Georgi Gerganov 2024-06-20 14:47:48 +03:00
  • 005cf2e662 rpc : copy tensors across servers Radoslav Gerganov 2024-06-18 16:28:46 +03:00
  • d47e1371b0 rpc : enable async operations Radoslav Gerganov 2024-06-13 09:57:24 +03:00
  • e773174052 Fix eos tokens to glm4 and adapts to glm3 toyer 2024-06-20 08:43:33 +00:00
  • 26a2a91ac6 fix format luoyu-intel 2024-06-20 16:21:13 +08:00
  • 95fd910d32 remove unused log toyer 2024-06-20 08:20:12 +00:00
  • 7b8069e9ec update README luoyu-intel 2024-06-20 16:14:48 +08:00
  • 8c5f1b2b6c fix eos tokens to glm4 toyer 2024-06-20 08:10:00 +00:00
  • 2075a66a96
    metal : fix ggml_metal_supports_op for BF16 (#8021) b3187 Michael de Gans 2024-06-19 22:32:01 -07:00
  • de3c909db0 support glm-4-9b-chat XingXing Qiao 2024-06-19 15:16:23 +08:00
  • 9b705f5836 revert linux build cmd luoyu-intel 2024-06-20 11:27:23 +08:00
  • 61b628fa6c use cl as c compiler luoyu-intel 2024-06-20 11:08:50 +08:00
  • 486d06106c check abort_callback on main thread only slaren 2024-06-20 02:42:04 +02:00
  • a9e8dc4073 vulkan: detect multiple devices by deviceUUID instead of deviceID Adriankhl 2024-06-20 08:43:33 +08:00
  • ba58993152
    server : fix smart slot selection (#8020) b3186 sasha0552 2024-06-19 23:57:10 +00:00
  • ac1c5a72f2 Fix ggml_metal_supports_op Michael de Gans 2024-06-19 16:30:21 -07:00
  • ac29f10929
    server : fix smart slot selection sasha0552 2024-06-19 22:59:29 +00:00
  • a7854743c5
    un-ignore build-info.cmake and build-info.sh (#7996) Michael de Gans 2024-06-19 13:10:42 -07:00
  • d27f26ea0c ggml : remove ggml_task_type and GGML_PERF slaren 2024-06-19 17:59:18 +02:00
  • da1db13d6a CUDA: stream-k decomposition for MMQ Johannes Gäßler 2024-06-19 10:29:08 +02:00
  • 7e42358177
    support --spm-infill Sigbjørn Skjæret 2024-06-19 22:01:27 +02:00