Commit graph

  • c5734f1274
    cont : drop "penalty prompt" support (#3727) Georgi Gerganov 2024-08-11 11:43:30 +03:00
  • e08100c851
    cont : simplify logit_bias + add ignore_eos flag Georgi Gerganov 2024-08-10 14:55:39 +03:00
  • 6b7103cccd
    llama : introduce llama_sampling_params (wip) Georgi Gerganov 2024-08-10 14:17:44 +03:00
  • ae9d3f68e9
    llama : remove sampling from llama_context Georgi Gerganov 2024-08-05 12:59:59 +03:00
  • cc53500f65
    llama : add llama_sampling and combine it with llama_grammar Georgi Gerganov 2024-08-05 10:08:25 +03:00
  • 1262e7ed13
    grammar-parser : fix possible null-deref (#9004) b3575 DavidKorczynski 2024-08-12 13:36:41 +01:00
  • 3e2eb6dc57 Merge branch 'master' into pr/8836 Nexesenex 2024-08-12 14:25:23 +02:00
  • df5478fbea
    ggml: fix div-by-zero (#9003) b3574 DavidKorczynski 2024-08-12 13:21:41 +01:00
  • 1ec79f04ab modify convert script and readme caitianchi 2024-08-12 20:17:56 +08:00
  • 627b7f906b grammar-parser: fix possible null-deref David Korczynski 2024-08-12 03:51:05 -07:00
  • 2589292cde
    Fix a spelling mistake (#9001) b3573 Liu Jia 2024-08-12 17:46:03 +08:00
  • b61c8c0fdb ggml: fix div-by-zero David Korczynski 2024-08-12 02:12:03 -07:00
  • e324149cd6 export-lora : throw error if lora is quantized Xuan Son Nguyen 2024-08-12 11:07:56 +02:00
  • 89d378c76b fix type-check caitianchi 2024-08-12 16:47:59 +08:00
  • d3ae0ee8d7
    py : fix requirements check '==' -> '~=' (#8982) Georgi Gerganov 2024-08-12 11:02:01 +03:00
  • a945b3ca8b fix type-check caitianchi 2024-08-12 15:30:44 +08:00
  • 5ef07e25ac
    server : handle models with missing EOS token (#8997) b3571 Georgi Gerganov 2024-08-12 10:21:50 +03:00
  • 662d4c1402 fix type-check caitianchi 2024-08-12 15:06:22 +08:00
  • 2650b70387
    server : handle models with missing EOS token Georgi Gerganov 2024-08-12 09:11:22 +03:00
  • bfebb38a3f
    ci : run on all requirements.txt Georgi Gerganov 2024-08-12 09:05:06 +03:00
  • 3a0bf17d57 gguf-py : Numpy (de)quantization for TQ1_0 and TQ2_0 Francis Couture-Harpin 2024-08-12 00:06:48 -04:00
  • 3c039c0239
    fix: duplication n_predict key in the generation_settings Riceball LEE 2024-08-12 11:05:17 +08:00
  • 75febe2e17 Fix T5 model load bug zhenweijin 2024-08-12 10:25:10 +08:00
  • faaac59d16 llama : support NUL bytes in tokens compilade/nul-str-token Francis Couture-Harpin 2024-08-11 21:00:03 -04:00
  • de512ae86e
    Merge afa6800eb1 into 4134999e01 Meng Zhang 2024-08-11 17:06:11 -07:00
  • afa6800eb1 feat: whitelist jina bert v2 for llama-server embedding Meng Zhang 2024-08-11 17:03:41 -07:00
  • d911cd1f13 Merge branch 'master' into compilade/bitnet-ternary Francis Couture-Harpin 2024-08-11 15:52:29 -04:00
  • df9e6fda50 Adjustments on output and embeddings Nexesenex 2024-08-11 21:49:23 +02:00
  • 1ad18f80e9 Adjustments on attn_k Nexesenex 2024-08-11 21:44:29 +02:00
  • 4134999e01
    gguf-py : Numpy dequantization for most types (#8939) b3570 compilade 2024-08-11 14:45:41 -04:00
  • 7eda5583fa server : fix segfault on long system prompt Francis Couture-Harpin 2024-08-11 14:18:17 -04:00
  • 346f64f0d8
    Revert "ggml : remove OpenCL (#7735) + (#8235)" David Heidelberg 2024-08-11 23:50:26 +09:00
  • 8c2c03f4a7
    Merge b3569 Nexes the Old 2024-08-11 16:46:15 +02:00
  • 91db53b645 IQ1_XL and some corrections Nexesenex 2024-08-11 16:41:23 +02:00
  • 8cd1bcfd3f
    flake.lock: Update (#8979) Georgi Gerganov 2024-08-11 16:58:58 +03:00
  • b9fc6784d0 fix-up coding style. Changyeon Kim 2024-08-11 21:28:19 +09:00
  • 4d56810a83 llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. Changyeon Kim 2024-08-11 19:24:42 +09:00
  • 5b2717cf16
    cont : fix the fix Georgi Gerganov 2024-08-11 12:05:25 +03:00
  • 5ae33eb971 Fix validation error with transfer queue memory barrier flags 0cc4m 2024-08-11 10:58:25 +02:00
  • a21c6fd450
    update guide (#8909) b3568 Neo Zhang 2024-08-11 16:37:43 +08:00
  • 33309f661a
    llama : check all graph nodes when searching for result_embd_pooled (#8956) b3567 fairydreaming 2024-08-11 10:35:26 +02:00
  • 4f197e33ef Merge upstream changes, fix conflicts 0cc4m 2024-08-11 10:32:40 +02:00
  • 3eaf614d7d
    py : fix requirements check '==' -> '~=' Georgi Gerganov 2024-08-11 11:14:58 +03:00
  • cfb9864182
    squash! ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-11 10:11:16 +02:00
  • 7c5bfd57f8
    Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943) b3566 Markus Tavenrath 2024-08-11 10:09:09 +02:00
  • ecd1c1e6b8
    Revert "squash! ggml : move rope type enum to ggml.h" Daniel Bevenius 2024-08-11 10:08:51 +02:00
  • 3e4d01ce01
    squash! ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-11 10:00:18 +02:00
  • b2f9d2246f
    Fix small typo 0cc4m 2024-08-11 09:40:39 +02:00
  • d74cc1674f
    squash! ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-11 08:52:07 +02:00
  • 6261222bd0
    squash! ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-11 08:16:59 +02:00
  • 1b908d9143 llama : check all graph nodes when searching for result_embd_pooled (needed for gemma-2) Stanisław Szymczyk 2024-08-11 07:41:50 +02:00
  • c9206f63be
    squash! ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-11 06:58:14 +02:00
  • 1268d58ca8 More adjustments Nexesenex 2024-08-11 02:13:08 +02:00
  • 9c6eb9fa20 flake.lock: Update github-actions[bot] 2024-08-11 00:20:32 +00:00
  • ef83a87cfe Revert of ffn gate and up on IQ3_M Nexesenex 2024-08-11 01:30:18 +02:00
  • e2e2d77e8e misplaced file lol Nexesenex 2024-08-11 01:13:12 +02:00
  • 8ad71f4469 IQ1_XS Nexesenex 2024-08-11 01:11:24 +02:00
  • bdaec8f1da
    Merge branch 'ggerganov:master' into add-paligemma-support Andrei 2024-08-10 18:30:27 -04:00
  • 614cb6a5fb Merge branch 'master' into add-paligemma-support Andrei Betlen 2024-08-10 17:49:40 -04:00
  • 8af53b40c4
    Merge branch 'ggerganov:master' into hk Henry Kroll III 2024-08-10 13:09:53 -08:00
  • 14f4f404d5
    Merge b3565 Nexes the Old 2024-08-10 20:45:26 +02:00
  • 8bc7a9849e 2 forgotten files Nexesenex 2024-08-10 20:40:27 +02:00
  • f0806ac943 IQ2_XL , IQ3_XL , Q2_K_L Nexesenex 2024-08-10 20:34:17 +02:00
  • 66a4225825 fix: Fixes wrong input type for raw_dtype in ggml to gguf scripts farbod 2024-08-10 21:46:00 +03:30
  • 49617b1960 Advancing on several tensors Nexesenex 2024-08-10 18:37:29 +02:00
  • 415d5e40e1 Refactor furthermore attn.v Nexesenex 2024-08-10 17:32:29 +02:00
  • 8c8e43ce20 Settings for MOE >= 8 experts applied to >= 4 experts Nexesenex 2024-08-10 16:38:11 +02:00
  • aa4eb594ef Further refactor attn_k Nexesenex 2024-08-10 16:33:55 +02:00
  • 32b47f600f fix type-check caitianchi 2024-08-10 21:51:04 +08:00
  • 6e02327e8b
    metal : fix uninitialized abort_callback (#8968) b3565 slaren 2024-08-10 15:42:10 +02:00
  • 7034e8d38b metal : fix uninitialized abort_callback slaren 2024-08-10 15:25:33 +02:00
  • ebeba4cf00 llama : model-based max number of graph nodes calculation Nico Bosshard 2024-08-10 15:14:49 +02:00
  • 7c2768cc72 py: Assume v1.0 if version metadata is missing brian khuu 2024-08-10 21:33:04 +10:00
  • 4a87d1d93e modify readme caitianchi 2024-08-10 19:21:38 +08:00
  • 28d6a0f43d modify clip caitianchi 2024-08-10 19:21:27 +08:00
  • 9c972aa43c Merge branch 'embed_files' of https://github.com/katsu560/llama.cpp into embed_files katsu560 2024-08-10 20:16:29 +09:00
  • 8f1b99fee8 Shortening formatting Nexesenex 2024-08-10 13:09:11 +02:00
  • 7eb23840ed
    llama : default n_swa for phi-3 (#8931) b3564 Xuan Son Nguyen 2024-08-10 13:04:40 +02:00
  • 287f7db329 add gguf_add_file.py katsu560 2024-08-10 19:58:25 +09:00
  • 01b463f0e8 add write_data katsu560 2024-08-10 19:57:37 +09:00
  • a4289e2aab add EMBEDDED katsu560 2024-08-10 19:57:20 +09:00
  • 7212098755 IQ1 and IQ2 refactor Nexesenex 2024-08-10 12:52:57 +02:00
  • bffbe1cf44 add resampler of v2.6 caitianchi 2024-08-10 18:19:35 +08:00
  • fe39ecc1ee add readme caitianchi 2024-08-10 18:18:58 +08:00
  • 6cad864cbd modify convert caitianchi 2024-08-10 18:18:38 +08:00
  • 7c3f55c100
    Add support for encoder-only T5 models (#8900) b3563 fairydreaming 2024-08-10 11:43:26 +02:00
  • 2913e5ff1e double check swa Xuan Son Nguyen 2024-08-10 11:24:50 +02:00
  • 61d8388721 Add Vulkan GROUP_NORM eps parameter 0cc4m 2024-08-10 11:19:30 +02:00
  • 804ddd70bb delete unused white space gtygo 2024-08-10 17:03:32 +08:00
  • 31d9233629 Merge remote-tracking branch 'upstream/master' into t5-encoder Stanisław Szymczyk 2024-08-10 10:41:04 +02:00
  • ce0d1a6f29
    Merge pull request #24 from OpenBMB/master tc-mb 2024-08-10 16:36:27 +08:00
  • fc1c860bb8
    Merge branch 'prepare-PR-of-minicpm-v2.6' into master tc-mb 2024-08-10 16:36:04 +08:00
  • f356e27769 llama : for clarity set is_encoding to true before building worst-case graph only if the model contains encoder Stanisław Szymczyk 2024-08-10 10:32:15 +02:00
  • 15c309cd02
    Merge branch 'ggerganov:master' into embed_files katsu560 2024-08-10 16:07:30 +09:00
  • ea0c8283c8 modify convert caitianchi 2024-08-10 14:20:59 +08:00
  • 911b437f22
    gguf-py : fix double call to add_architecture() (#8952) Matteo Mortari 2024-08-10 07:58:49 +02:00
  • 5e14dbf2ea
    squash! ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-10 06:50:54 +02:00
  • 73bc9350cd gguf-py : Numpy dequantization for grid-based i-quants compilade/gguf-py-dequant Francis Couture-Harpin 2024-08-09 23:47:31 -04:00
  • 066996d2eb ggml_metal_init : Metal Surpport for --main-gpu ( #8886) ifeanyipossibilities 2024-08-09 23:09:32 -04:00
  • 0d72b7562b Added support to select GPU using metal on Apple Intel or Apple Silicon using --main-gpu index ifeanyipossibilities 2024-08-09 22:54:12 -04:00