Commit graph

  • 70482f587f
    metal : support unary ops for nelements % 4 != 0 Georgi Gerganov 2024-04-15 22:37:25 +03:00
  • 4256fe6a1a fix-review simonJJJ 2024-04-16 03:36:03 +08:00
  • d8a671781c
    perplexity : require positive --ctx-size arg Georgi Gerganov 2024-04-15 21:54:51 +03:00
  • 5ab711e7e9 build: avoid exceeding max cmd line limit in makefile hex dump ochafik 2024-04-15 19:36:33 +01:00
  • edd8e2e5a0 merge qwen2moe simonJJJ 2024-04-16 02:15:44 +08:00
  • 1fb778750f Merge remote-tracking branch 'origin/master' into grammar-reps ochafik 2024-04-15 18:57:43 +01:00
  • 021643f041 Adding unicode regex mappings Kazim Abrar Mahi 2024-04-15 23:48:04 +06:00
  • 4a3342361c Merge remote-tracking branch 'origin/master' into generate-assets ochafik 2024-04-15 18:37:28 +01:00
  • 7593639ce3
    main: add --json-schema / -j flag (#6659) b2679 Olivier Chafik 2024-04-15 18:35:21 +01:00
  • b2b45db93f build: don't use xxd in Makefile (od hackery instead) ochafik 2024-04-15 18:30:46 +01:00
  • 2082353020 fix autoawq quantized gemma model convert error Zheng.Deng 2024-04-15 23:44:00 +08:00
  • 021baca34a
    gguf : add special tokens metadata for FIM/Infill Daniel Bevenius 2024-04-15 13:01:20 +02:00
  • 17519e110f Implement '--keep-split' to quantize model into several shards z5269887 2024-04-14 20:08:08 +08:00
  • 132f55795e
    llama : fix restoring the number of outputs from state files (#6687) b2678 compilade 2024-04-15 08:56:55 -04:00
  • 5598afb778 llama : fix restoring the number of outputs from state files Francis Couture-Harpin 2024-04-15 08:40:17 -04:00
  • 3272896d79
    server : revert "minor layout improvements" (#6684) Pierrick Hymbert 2024-04-15 14:18:47 +02:00
  • 92093acf2a
    Revert "minor layout improvements (#6572)" Pierrick Hymbert 2024-04-15 13:30:20 +02:00
  • b7499e0460 Add musa support dixyes 2024-04-15 19:05:35 +08:00
  • 7fc16a2c32
    swift : linux support (#6590) b2676 Steven Prichard 2024-04-15 05:14:46 -05:00
  • 17e98d4c96
    fix mul_mat_id() for new input, make the ut pass (#6682) b2675 Neo Zhang Jianyu 2024-04-15 17:12:26 +08:00
  • 811fa4574b fix mul_mat_id() for new input, make the ut pass jianyuzh 2024-04-15 13:58:16 +08:00
  • 1f6929e557 Flake8 format Ashish 2024-04-14 16:46:26 -07:00
  • b7f984a0df space after commas; Keep indentation multiple of 4 spaces Ashish 2024-04-14 16:34:40 -07:00
  • d2ab693066 Fix incorrect check for K norm Ashish 2024-04-14 16:25:55 -07:00
  • bf1a9a5514
    format Ashish 2024-04-14 16:24:00 -07:00
  • 91728faac6 Formatting Ashish 2024-04-14 14:40:23 -07:00
  • 412a2807cb Format Ashish 2024-04-14 14:30:33 -07:00
  • 13c75c21eb Proper check for None type for new_name to avoid crash; formatting; revert change to base class write_tensors() Ashish 2024-04-14 14:28:12 -07:00
  • 96695fb96b refactor stablelm graph builder to support 1.6, 3b and 12b more efficiently Ashish 2024-04-14 14:21:25 -07:00
  • 1958f7e06c
    llama : add missing kv clear in llama_beam_search (#6664) b2674 David Renshaw 2024-04-14 15:24:15 -04:00
  • 04fbc5f23e
    Add Command R chat template (#6650) b2673 Chao Jiang 2024-04-15 00:16:34 +08:00
  • f184dd9208
    flake.lock: Update (#6669) Georgi Gerganov 2024-04-14 16:55:30 +03:00
  • 422c2aff1c
    Added support for GGML_OP_CLAMP in Metal (#6662) b2671 Dave 2024-04-14 07:14:19 -04:00
  • 8800226d65
    Fix --split-max-size (#6655) b2670 Sigbjørn Skjæret 2024-04-14 13:12:59 +02:00
  • e689fc4e91
    [bug fix] convert github repository_owner to lowercase (#6673) b2669 Jaemin Son 2024-04-14 20:12:36 +09:00
  • bc06e6c978 [bug fix] convert github repository_owner to lowercase jaeminSon 2024-04-14 19:32:44 +09:00
  • 650db0f25f
    add --split-max-size to readme Sigbjørn Skjæret 2024-04-14 12:00:05 +02:00
  • 708a0b0516
    explicitly define which scripts to run Sigbjørn Skjæret 2024-04-14 11:15:49 +02:00
  • e53bc29c25
    clean up before and after test Sigbjørn Skjæret 2024-04-14 11:04:02 +02:00
  • a4ec34e1cd
    convert : enable the --use-temp-file cli flag (#6645) James A Capozzoli 2024-04-14 04:40:18 -04:00
  • e3f73604d5 Move QK norm stack to private function so it's easier to read Ashish 2024-04-13 23:56:32 -07:00
  • f7b40d7650 Revert formatter Ashish 2024-04-13 23:37:46 -07:00
  • 0dc779bff9 Removed warnings Ashish 2024-04-13 21:27:18 -07:00
  • 8dcd9978d2 Fix accidental removal Ashish 2024-04-13 21:15:15 -07:00
  • 0ec53cfff7 Converge StableLM and StableLM2 code to simplify graph construction Ashish 2024-04-13 21:12:30 -07:00
  • de17e3f745
    fix memcpy() crash, add missed cmd in guide, fix softmax (#6622) b2667 Neo Zhang Jianyu 2024-04-14 10:42:29 +08:00
  • 29d940b0d7 Do QK norm stacking in model conversion step Ashish 2024-04-13 19:09:37 -07:00
  • 91a3db9e7d Formatting Ashish 2024-04-12 23:27:07 -07:00
  • 15a5e7db4c Removed autoformatting; resolved bug where model_arch was not selecting StableLM2 Ashish 2024-04-12 22:48:21 -07:00
  • 0eb8492ccb Added 12B support Ashish 2024-04-12 02:33:08 -07:00
  • b5afc44704 fix Ashish 2024-04-12 02:32:38 -07:00
  • b89fa9734d StableLM-2-12b model support Ashish 2024-04-12 02:28:44 -07:00
  • 13387d9c57 StableLM12 tensormapping and constants Ashish 2024-04-12 02:18:51 -07:00
  • d383c0d818 StableLM2 12B support for huggingface -> GGUF Ashish 2024-04-12 02:17:50 -07:00
  • 9db2000849 revert to malloc/free solution, for threaad safe Jianyu Zhang 2024-04-14 09:26:54 +08:00
  • e3012ac949 flake.lock: Update github-actions[bot] 2024-04-14 00:19:46 +00:00
  • b5e7285baf
    CUDA: fix matrix multiplication logic for tests (#6667) b2666 Johannes Gäßler 2024-04-14 00:21:55 +02:00
  • c6797da82e CUDA: fix matrix multiplication logic for tests Johannes Gäßler 2024-04-13 23:17:18 +02:00
  • 18ed9ed57f
    move WORK_PATH to a subdirectory Sigbjørn Skjæret 2024-04-13 22:22:29 +02:00
  • e5dda78bf5 Refactored code Kazim Abrar Mahi 2024-04-13 19:33:06 +06:00
  • 360849169f Updated/merged the deepseek coder pr Jaggzh 2024-02-12 04:18:06 -08:00
  • bb80290482 added and refactored unicode_regex_split and related functions Kazim Abrar Mahi 2024-04-01 00:48:49 +06:00
  • 83e924f40b Corrected size dave-fl 2024-04-13 14:47:06 -04:00
  • e969ada736 add missing kv clear in llama_beam_search David Renshaw 2024-04-13 14:26:19 -04:00
  • 17a86b351a build: more idiomatic hexing ochafik 2024-04-13 18:55:16 +01:00
  • 1be0389343 build: don't call xxd from build.zig ochafik 2024-04-13 18:39:22 +01:00
  • 35df587d9c build: don't use xxd in cmake ochafik 2024-04-13 18:12:56 +01:00
  • 0d82da6f79 quantize: add imatrix filename in KV Pierrick HYMBERT 2024-04-13 19:06:50 +02:00
  • 38c9cedd81 Remove debug print Chao Jiang 2024-04-14 00:46:54 +08:00
  • 3f1aa54a0b Add chat template test for command-r models and update the implementation to trim whitespaces Chao Jiang 2024-04-14 00:42:54 +08:00
  • 369825f91d Added support for GGML_OP_CLAMP in Metal dave-fl 2024-04-13 12:23:43 -04:00
  • 13a36efbef build: workaround lack of -n on gnu xxd ochafik 2024-04-13 17:04:17 +01:00
  • 851de160dd quantize: add imatrix m_last_call as quantize.imatrix.chunks_count Pierrick HYMBERT 2024-04-13 18:03:10 +02:00
  • b5f9c7021e build: generate hex dumps of server assets on the fly ochafik 2024-04-13 15:13:39 +01:00
  • de86bb0eb3 Resolved issues Kazim Abrar Mahi 2024-03-23 14:38:06 +06:00
  • 4812c79779 Moved header files Kazim Abrar Mahi 2024-03-23 01:16:04 +06:00
  • c848f8866e Moved regex patterns to unicode.cpp and updated unicode.h Kazim Abrar Mahi 2024-03-23 01:13:08 +06:00
  • c4d4f64d33 merged the changes from deepseeker models to main branch Jaggzh 2024-02-12 04:04:34 -08:00
  • d42add49d4
    add examples test scripts to ci run Sigbjørn Skjæret 2024-04-13 16:04:00 +02:00
  • 6738215a10
    add tests.sh Sigbjørn Skjæret 2024-04-13 16:01:34 +02:00
  • 3af4ac581c json: fix zig build ochafik 2024-04-13 14:55:43 +01:00
  • 8da85dc2a9 json: move json-schema-to-grammar to common lib ochafik 2024-04-13 14:28:54 +01:00
  • cbc43aa411 llama: remove kv override str_value initialization as it does not compile on some toolchain Pierrick HYMBERT 2024-04-13 15:12:03 +02:00
  • 262c95ab63 quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common #6656 Pierrick HYMBERT 2024-04-13 14:50:32 +02:00
  • a9202fb155 common: factorize KV Overrides parsing between common and server Pierrick HYMBERT 2024-04-13 14:49:48 +02:00
  • 01e7795930 llama: support kv overrides type string string Pierrick HYMBERT 2024-04-13 14:49:23 +02:00
  • 1ef7fab772 main: add --json-schema / -j ochafik 2024-04-13 13:41:13 +01:00
  • d766c5a62e imatrix: save the dataset file used in the output file Pierrick HYMBERT 2024-04-13 13:53:59 +02:00
  • 1d86bd87fb
    Fix --split-max-size Sigbjørn Skjæret 2024-04-13 12:09:09 +02:00
  • 4bd0f93e4a
    model: support arch DbrxForCausalLM (#6515) b2665 Pierrick Hymbert 2024-04-13 11:33:52 +02:00
  • 9f77484c91 minor: fix indent in llama_build_graph Pierrick HYMBERT 2024-04-13 11:07:30 +02:00
  • 29ebe38619 use host buff to reduce malloc times Jianyu Zhang 2024-04-13 15:48:01 +08:00
  • 955efb6181
    Fix indentation Chao Jiang 2024-04-13 06:55:55 +00:00
  • 8657b8fc94 fix for use Miwa / Ensan 2024-04-13 15:48:36 +09:00
  • edb2ee4d2a Add chat template for command-r model series Chao Jiang 2024-04-13 11:38:29 +08:00
  • ab2fae200c When doing inference on a CPU, if you have F16C available, it's better to use AVX instead of the lookup table. Kunnis 2024-04-12 20:46:01 -05:00
  • 34dbfd1b68 Enable the --use-temp-file cli flag, since some models were failing to convert without it. James Capozzoli 2024-04-12 18:50:41 -04:00
  • e517585fba convert-hf-to-gguf.py: fix python linter Pierrick HYMBERT 2024-04-13 00:17:57 +02:00
  • f1256dc8c8 llama: rename build_moe to build_moe_ffn and fix grok is using gelu instead of silu. Do not pass too much time on this function as it will be replaced in #6505 Pierrick HYMBERT 2024-04-13 00:14:50 +02:00
  • 8e6758f2f4 convert: update comment of MOE tensors mapping Pierrick HYMBERT 2024-04-12 22:15:11 +02:00