Commit graph

  • dcf752707d
    update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894) Meng, Hengyu 2024-06-12 17:05:35 +08:00
  • 32dd2ef133 Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. Markus Tavenrath 2024-06-12 10:53:25 +02:00
  • a54b791211
    Apply suggestions from code review slaren 2024-06-12 10:32:20 +02:00
  • faaa86b7e4
    ggml-qnn: refine ggml inference using QNN NPU zhou.weiguo 2024-06-12 16:30:50 +08:00
  • 5e5eee7b44 fix whitespace Eddie-Wang1120 2024-06-12 16:25:46 +08:00
  • 2ad8c49830 update intel docker Meng, Hengyu 2024-06-12 07:52:32 +00:00
  • f395dd9ca0 change table name Eddie-Wang1120 2024-06-12 14:28:24 +08:00
  • c0cd08d45e
    Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-12 14:12:27 +08:00
  • 43d8d4bf9e examples : replace llama_kv_cache_seq_* with llama_past_seq_* Francis Couture-Harpin 2024-06-10 14:44:42 -04:00
  • 8d1d112a9f iq4_nl netrunnereve 2024-06-11 23:23:24 -04:00
  • f2b5764beb
    Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci] Patrice Ferlet 2024-06-12 03:18:16 +02:00
  • 9bed1aebbe Reserve logits when causal attention is disabled on context Andrei Betlen 2024-06-11 21:12:43 -04:00
  • 7612e4cdcc Merge branch 'master' of https://github.com/ggerganov/llama.cpp into add-paligemma-support Andrei Betlen 2024-06-11 21:10:52 -04:00
  • d67de1a364 Merge commit '148995e5' into tokenizer-bpe-fixes jaime-m-p 2024-06-12 00:24:20 +02:00
  • 0e48ea8ec2 fix _not_strings for substring overlaps Olivier Chafik 2024-06-11 22:48:33 +01:00
  • e06659811e fixes slaren 2024-06-08 17:00:26 +02:00
  • 1aad9d2004
    Update convert-hf-to-gguf-update.py Iaroslav Chelombitko 2024-06-11 23:37:42 +03:00
  • e9cb3b336d fix .editorconfig ngxson 2024-06-11 22:09:14 +02:00
  • 2f37328052
    Merge branch 'ggerganov:master' into avx_iq Eve 2024-06-11 19:55:36 +00:00
  • b7e1707069 fix ci netrunnereve 2024-06-11 15:54:59 -04:00
  • 73bac2b11d
    vulkan: select only one device for single gpu with multiple drivers (#7582) b3135 k.h.lai 2024-06-12 03:26:05 +08:00
  • ef52d1d16a
    Update Vulkan RoPE implementation (#7818) b3134 0cc4m 2024-06-11 21:20:29 +02:00
  • 8a1998aff8
    Merge a808370c58 into 14f83526cd Anisse Astier 2024-06-11 21:42:31 +03:00
  • 5ffba9ecc3 add readme ngxson 2024-06-11 19:35:17 +02:00
  • 76512cbc92 Llama.cpp - Make Clean Andrew Ferrouolo 2024-06-11 13:15:01 -04:00
  • 04c91d29ff use ggml_format_name ngxson 2024-06-11 19:14:04 +02:00
  • 7c3402662c Merge branch 'ggerganov-master' Andrew Ferrouolo 2024-06-11 13:09:43 -04:00
  • 971dbaec02 Makefile merge fixed Andrew Ferrouolo 2024-06-11 13:08:25 -04:00
  • 9f8790bb49 Finished fixing some issues with llamacheck Andrew Ferrouolo 2024-06-11 13:05:10 -04:00
  • 54f77e2467 add to makefile all targets ngxson 2024-06-11 19:03:13 +02:00
  • 85db22dd20 Merge branch 'master' into xsn/control-vector-generator ngxson 2024-06-11 19:00:19 +02:00
  • ca581c7a50
    server : restore numeric prompts Georgi Gerganov 2024-06-11 19:46:37 +03:00
  • 14f83526cd
    fix broken link in pr template (#7880) [no ci] Deven Mistry 2024-06-11 12:18:58 -04:00
  • 170b3ea6a1
    Update pull_request_template.md [no ci] Brian 2024-06-12 02:18:21 +10:00
  • f740635dab fix broken link in pr template deven367 2024-06-11 11:51:47 -04:00
  • 5269e082aa
    ggml-qnn: refine ggml inference using QNN NPU zhou.weiguo 2024-06-11 23:05:00 +08:00
  • 6fe42d073f
    github: move PR template to .github/ root (#7868) Brian 2024-06-12 00:43:41 +10:00
  • da6babdf0a fix macos build ngxson 2024-06-11 15:47:35 +02:00
  • e474ef1df4 update llama-rpc-server bin name + doc Olivier Chafik 2024-06-11 14:42:03 +01:00
  • 3223133cf5 default n_pca_batch to 20 ngxson 2024-06-11 15:05:06 +02:00
  • 148995e5e5
    llama-bench: more compact markdown tables (#7879) b3131 Johannes Gäßler 2024-06-11 14:45:40 +02:00
  • d41c719980 bring back n_completions ngxson 2024-06-11 14:31:45 +02:00
  • fd19d687e7 llama-bench: more compact markdown tables Johannes Gäßler 2024-06-11 14:24:24 +02:00
  • 446da906d9 fix n_completions Christian Zhou-Zheng 2024-06-11 08:22:38 -04:00
  • 163916864c remember to copy back the last_eigenvector ngxson 2024-06-11 12:40:07 +02:00
  • 1a088fb0a5 working version ngxson 2024-06-11 12:37:05 +02:00
  • 9e39571fc2 add n_batch for pca ngxson 2024-06-11 11:45:16 +02:00
  • 4bfe50f741
    tests : check the Python version (#7872) b3130 Georgi Gerganov 2024-06-11 10:10:20 +03:00
  • bdcb8f4222
    CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) Johannes Gäßler 2024-06-11 08:26:07 +02:00
  • 4356325ef5
    tests : check the Python version gg/check-python-version Georgi Gerganov 2024-06-11 09:02:52 +03:00
  • c2ce6c47e4
    fix CUDA CI by using a windows-2019 image (#7861) slaren 2024-06-11 07:59:20 +02:00
  • 02e26344ab Update json-schema-to-grammar.cpp ochafik 2024-06-11 04:41:56 +01:00
  • 4e6375606d fix not_strings & port to js+py ochafik 2024-06-11 04:21:06 +01:00
  • 2322e9db9a
    Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-11 10:50:12 +08:00
  • de1d5073e4 remove unused Eddie-Wang1120 2024-06-11 10:23:20 +08:00
  • cd81597d56 github: move PR template to .github/ root [no ci] brian khuu 2024-06-11 11:54:48 +10:00
  • 8b47473df3 port not_strings to python, add trailing space ochafik 2024-06-11 02:54:07 +01:00
  • 6743438607 Merge remote-tracking branch 'origin/master' into json-additional ochafik 2024-06-11 02:43:20 +01:00
  • ee3a086fdf
    Merge pull request #2 from HanClinto/bins-nits-2 Olivier Chafik 2024-06-11 02:36:25 +01:00
  • 166397f1e4 update grammar/README.md w/ new llama-* names ochafik 2024-06-11 02:35:30 +01:00
  • 2a9c4cd7ba Merge remote-tracking branch 'origin/master' into bins ochafik 2024-06-11 02:35:01 +01:00
  • b61eb9644d
    json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) Olivier Chafik 2024-06-11 02:22:57 +01:00
  • ddc16b8d31 json: refine constraint for whitespace to avoid runaways yet allow pretty print ochafik 2024-06-11 01:12:12 +01:00
  • 396b18dfec
    json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) Olivier Chafik 2024-06-11 01:00:30 +01:00
  • 07a9487715 fix typos in repetition syntax ochafik 2024-06-11 00:58:37 +01:00
  • 6a5adf3d7c fix shape of v_diff_original ngxson 2024-06-11 01:33:16 +02:00
  • 8cf8c129d4 Update apps.nix ochafik 2024-06-11 00:18:47 +01:00
  • adca9af2f6 json: prevent additional props to redefine a typed prop ochafik 2024-06-11 00:16:12 +01:00
  • c241b500a1 clean up PCA ggml implementation ngxson 2024-06-11 01:13:10 +02:00
  • 1f5ec2c0b4 Updating two small main references missed earlier in the finetune docs. HanClinto 2024-06-10 16:12:50 -07:00
  • 82df7f9f0e
    Merge pull request #1 from HanClinto/bins-rename-nits Olivier Chafik 2024-06-10 23:58:12 +01:00
  • 70de0debab Updating documentation references for lookup-merge and export-lora HanClinto 2024-06-10 15:32:21 -07:00
  • 864a99e7a0
    cmake : fix CMake requirement for CUDA (#7821) Jared Van Bortel 2024-06-10 18:32:10 -04:00
  • 72660c357c Updating run-with-preset.py to use new binary names. Updating docs around perplexity binary rename. HanClinto 2024-06-10 15:23:32 -07:00
  • c2f29e4617 json: better suport for "type" arrays (e.g. {"type": ["array", "null"], "items": {"type": "string"}}) ochafik 2024-06-10 22:55:29 +01:00
  • 016548dc60 first notes ltoniazzi 2024-06-10 22:54:51 +01:00
  • 2fd66b2ce2 Updating a few lingering doc references for rename of main to llama-cli HanClinto 2024-06-10 14:53:23 -07:00
  • 8802d63c93 first notes ltoniazzi 2024-06-10 22:48:39 +01:00
  • e7e03733b2 Updating docs for eval-callback binary to use new llama- prefix. HanClinto 2024-06-10 14:41:56 -07:00
  • 70b863a9f8 try win-2019 slaren 2024-06-10 23:04:44 +02:00
  • 0be5f399c4 add two missing llama- prefixes ochafik 2024-06-10 22:00:28 +01:00
  • e1aae7c72d install vs build tools before cuda toolkit slaren 2024-06-10 22:26:35 +02:00
  • f7fa9f3d48 try exllama/bdashore3 method slaren 2024-06-10 22:03:10 +02:00
  • cfd87e9d46 another test slaren 2024-06-10 21:43:59 +02:00
  • 8cb2dbd1a2 CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) Johannes Gäßler 2024-06-09 23:40:26 +02:00
  • 7f0eebee7b trigger when build.yml changes slaren 2024-06-10 21:09:32 +02:00
  • 0a3b474388 try to fix CUDA ci with --allow-unsupported-compiler slaren 2024-06-10 21:06:14 +02:00
  • e9895d2ce9
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 14:55:14 -04:00
  • 05b183fe7b compatibility fix Christian Zhou-Zheng 2024-06-10 14:00:13 -04:00
  • 854bd64a5d
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 13:55:08 -04:00
  • b843445827
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 13:54:41 -04:00
  • aa8a7cd350 fix: QWEN2MOE support for expert_feed_forward_length stefan 2024-06-10 17:50:31 +00:00
  • f9cfd04bd4 address gbnf-validator unused fread warning (switched to C++ / ifstream) Olivier Chafik 2024-06-10 17:38:36 +01:00
  • b8436395b4 rename: llama-cli-cmake-pkg(.exe) Olivier Chafik 2024-06-10 16:23:45 +01:00
  • 4881a94bee fix test-eval-callback Olivier Chafik 2024-06-10 16:21:14 +01:00
  • b8cb44e812 more llama-cli(.exe) Olivier Chafik 2024-06-10 16:08:06 +01:00
  • 051633ed2d update dockerfile refs Olivier Chafik 2024-06-10 16:05:11 +01:00
  • 1cc651446d rename(make): llama-baby-llama Olivier Chafik 2024-06-10 16:03:18 +01:00
  • 0fcf2c328e rename dockerfile w/ llama-cli Olivier Chafik 2024-06-10 15:44:49 +01:00
  • 0bb2a3f233 fix some missing -cli suffixes Olivier Chafik 2024-06-10 15:42:20 +01:00