Commit graph

  • 5e0a9a2d37 Add unary and binary op shader templates 0cc4m 2024-02-11 08:41:59 +01:00
  • 7107b9098e ws John 2024-02-11 03:44:07 +01:00
  • 7dcadb4ec3 whitespace corrections John 2024-02-11 03:30:36 +01:00
  • 7e64e376f0 flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • e5dfedacab
    remove static bobqianic 2024-02-10 23:22:59 +00:00
  • 98d5d20044
    Fix bpe_gpt2_preprocess bobqianic 2024-02-10 23:17:39 +00:00
  • d02245cd76 make: add error message for bad CUDA version JohannesGaessler 2024-02-10 20:41:08 +01:00
  • 2a0cf851d4 Optimize dequant shaders for q4_1, q5_0, q5_1 and q8_0 0cc4m 2024-02-10 19:54:10 +01:00
  • 5169f928c7 Fix q4_0 dequant dispatch sizes 0cc4m 2024-02-10 17:42:49 +01:00
  • 76a0128bec revert low register pressure changes JohannesGaessler 2024-02-10 13:02:45 +01:00
  • f026f8120f
    metal : use autoreleasepool to avoid memory leaks (#5437) b2116 Ian Bull 2024-02-10 02:53:28 -08:00
  • 5205597069 Optimize dmmv non-kquants for GCN 0cc4m 2024-02-10 11:02:58 +01:00
  • 2bb97fca5e fix AMD JohannesGaessler 2024-02-10 09:54:36 +01:00
  • cd9aea63b5
    scripts : update sync scripts with new backends Georgi Gerganov 2024-02-10 09:53:05 +02:00
  • 43b65f5eb8
    sync : ggml b2114 Georgi Gerganov 2024-02-10 09:30:36 +02:00
  • 4633d93af0
    ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +01:00
  • cdf30d6780 Use @autoreleasepool to avoid memory leaks Ian Bull 2024-02-09 21:46:01 -08:00
  • 1a27406426
    server: clean up using_chatml variable Xuan Son Nguyen 2024-02-10 00:11:13 +01:00
  • 97f8a7a2fa CUDA: mul_mat_vec_q tiling, refactor mul mat logic JohannesGaessler 2024-02-09 14:51:02 +01:00
  • b4172ca29f Improve dequant shaders, add fast q4_0 dequant 0cc4m 2024-02-09 20:05:47 +01:00
  • 6972e7e90e flake8 : add W503 to ignore list Jared Van Bortel 2024-02-09 13:50:31 -05:00
  • ab49e9ee45 bert : simplify token type embedding access Jared Van Bortel 2024-02-09 13:44:32 -05:00
  • 4b7b38bef5
    vulkan: Set limit for task concurrency (#5427) b2112 Neuman Vong 2024-02-10 05:30:19 +11:00
  • 56afb2f60e undo attempted type_embd simplify Douglas Hanley 2024-02-09 12:00:41 -06:00
  • 961e98f245 style fixes Douglas Hanley 2024-02-09 11:53:17 -06:00
  • eb45d123a3
    Apply suggestions from code review Alexey Parfenov 2024-02-09 16:30:47 +00:00
  • f6fd1a97d1
    Apply suggestions from code review Alexey Parfenov 2024-02-09 16:28:18 +00:00
  • ebe3079539 server: validate "--chat-template" argument ngxson 2024-02-09 17:00:53 +01:00
  • 7efef47d2e server: format_llama2: remove BOS ngxson 2024-02-09 17:00:36 +01:00
  • 420aec1917
    Merge branch 'master' into sigint-non-interactive Jaggz H 2024-02-09 07:31:30 -08:00
  • 2db0ca34d3 calc ppl on sakurallm prompt format correctly Reinforce-II 2024-02-09 21:46:27 +08:00
  • e00d2a62dd
    llava : add requirements.txt and update README.md (#5428) Daniel Bevenius 2024-02-09 14:00:59 +01:00
  • c54c048cf1
    llava: fix typo in llava-surgery.py output Daniel Bevenius 2024-02-09 13:52:23 +01:00
  • 7c777fcd5d
    server : fix prompt caching for repeated prompts (#5420) b2110 Riley Stewart 2024-02-09 02:49:49 -08:00
  • e5ca3937c6
    llama : do not cap thread count when MoE on CPU (#5419) b2109 Paul Tsochantaris 2024-02-09 10:48:06 +00:00
  • 72a9f4ea8c Add CUDA option to use the memory pool max release threshold YavorGIvanov 2024-02-09 10:46:02 +00:00
  • e4124c2477
    readme : add JavaScript/Wasm repo (#5415) Marko Tasic 2024-02-09 11:17:00 +01:00
  • b2f87cb64d
    ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404) b2107 Michael Podvitskiy 2024-02-09 10:56:43 +01:00
  • 269437e4eb server: rename template mistral to llama2 ngxson 2024-02-09 09:55:20 +01:00
  • 27976c31b6 server: fix typo ngxson 2024-02-09 09:32:51 +01:00
  • 60c5cab48d
    llava: add requirements.txt and update README.md Daniel Bevenius 2024-02-09 08:46:59 +01:00
  • 44fbe34360
    Fix Vulkan crash on APUs with very little device memory (#5424) b2106 0cc4m 2024-02-09 06:52:33 +01:00
  • 3a1895d786 Merge branch 'bert' of github.com:iamlemec/llama.cpp into bert Douglas Hanley 2024-02-08 23:05:45 -06:00
  • d080bebcc6 hard-code token_type = 0 Douglas Hanley 2024-02-08 23:03:33 -06:00
  • 68758083d6 fix up model sizing and result acquisition Douglas Hanley 2024-02-08 22:43:26 -06:00
  • 504ee68716
    Merge branch 'ggerganov:master' into master hsnmkls 2024-02-09 10:23:31 +08:00
  • b14c457fb4 bert : add some missing graph callbacks Jared Van Bortel 2024-02-08 21:22:23 -05:00
  • e78388d39a use ctx_output for tok_norm of BERT and BLOOM Jared Van Bortel 2024-02-08 21:17:57 -05:00
  • a887ba3c18 vulkan: Set limit for task concurrency Neuman Vong 2024-02-09 12:39:51 +11:00
  • 96d37f8d55 add causal attention gguf key Douglas Hanley 2024-02-08 16:41:05 -06:00
  • e3efcf13c8
    Update convert-hf-to-gguf.py Douglas Hanley 2024-02-08 17:33:14 -05:00
  • 6d34ad7f3c Merge branch 'master' of https://github.com/bmtwl/llama.cpp root 2024-02-08 22:21:33 +00:00
  • 99a203d02f
    Update ggml.h bmwl 2024-02-08 14:21:16 -08:00
  • 2ebedda3d9 server: add mistral chat template ngxson 2024-02-08 23:16:58 +01:00
  • 16b91d138e Merge branch 'master' of https://github.com/bmtwl/llama.cpp root 2024-02-08 22:00:47 +00:00
  • e107c4cd54 fixed ggml_init_numa variable root 2024-02-08 22:00:35 +00:00
  • fecd66ac06
    Merge branch 'ggerganov:master' into master bmwl 2024-02-08 13:42:06 -08:00
  • c2c31660a5 add missing enum ggml_numa_strategies declaration root 2024-02-08 21:41:36 +00:00
  • c9e28a4220 Fix debug output function names 0cc4m 2024-02-08 22:22:46 +01:00
  • 858187f1a5 Fix Vulkan crash on APUs with very little device memory 0cc4m 2024-02-08 22:20:42 +01:00
  • 8e6a9d2de0
    CUDA: more warps for mmvq on NVIDIA (#5394) b2105 Johannes Gäßler 2024-02-08 21:56:40 +01:00
  • 41f308f58e
    llama : do not print "offloading layers" message in CPU-only builds (#5416) b2104 slaren 2024-02-08 21:33:03 +01:00
  • 314174ddc5 add missing enum ggml_numa_strategies declaration and revert sync problem with master root 2024-02-08 19:55:47 +00:00
  • 7218c7b613 Merge remote-tracking branch 'upstream/master' into bert Douglas Hanley 2024-02-08 13:33:32 -06:00
  • e0e14e31c1 Merge remote-tracking branch 'origin/master' into bert Douglas Hanley 2024-02-08 13:32:14 -06:00
  • 00ebc310cd CUDA: more warps for mmvq on NVIDIA JohannesGaessler 2024-02-07 00:27:48 +01:00
  • 5f1c21d0b6 put causal_attn flag in gguf Douglas Hanley 2024-02-08 13:28:25 -06:00
  • 59c1829b0c add in wordpiece tokenizer Douglas Hanley 2024-02-08 13:09:36 -06:00
  • 7bbe511b8e Revert bad merge with dynatemp flags root 2024-02-08 19:04:02 +00:00
  • 9535a7a32a server: fix prompt caching for same prompts (#4902) Riley Stewart 2024-02-08 10:48:19 -08:00
  • d5a6e865f6 Whitespace Paul Tsochantaris 2024-02-08 18:26:10 +00:00
  • f8dc954e0f Not capping thread count when MoE inference is running on CPU Paul Tsochantaris 2024-02-08 18:22:42 +00:00
  • b65c863947 Remote enum llama_numa_strategies root 2024-02-08 18:07:40 +00:00
  • cfaa525804
    server: allow to specify tokens as strings in logit_bias ZXED 2024-01-16 21:49:23 +03:00
  • 6b4f287235
    common: use enums for sampler types ZXED 2024-02-08 20:07:34 +03:00
  • 90668fb596
    Merge branch 'ggerganov:master' into master bmwl 2024-02-08 09:17:23 -08:00
  • 6e99f2a04f
    Fix f16_sycl cpy call from Arc (#5411) b2103 Abhilash Majumder 2024-02-08 22:39:10 +05:30
  • c4ca30170f llama : do not print "offloading layers" message in CPU-only builds slaren 2024-02-08 17:48:46 +01:00
  • 18fb9a5382
    Merge branch 'ggerganov:master' into master bmwl 2024-02-08 08:39:54 -08:00
  • 12c23b60c6 Fixed lingering init_llama_backend() bool calls in tests and examples root 2024-02-08 16:28:49 +00:00
  • 7320059891
    format fix Abhilash Majumder 2024-02-08 21:48:52 +05:30
  • 7debf5c263 README.md: added JavaScript/Wasm (works in browser) tangledgroup/llama-cpp-wasm Marko Tasic 2024-02-08 17:04:22 +01:00
  • d8f132d165 llama.cpp: add MATMUL_INT8 capability to system_info Sunita Nadampalli 2024-02-08 15:20:48 +00:00
  • bca726f06a ggml: update unit tests for the new vec_dot interface Sunita Nadampalli 2024-02-02 21:30:35 +00:00
  • 9cd5b8de6e ggml: aarch64: implement smmla kernel for q4_1_q8_1 quantized gemm Sunita Nadampalli 2024-02-08 15:16:01 +00:00
  • ba668572ce ggml: aarch64: implement smmla kernel for q4_0_q8_0 quantized gemm Sunita Nadampalli 2024-02-08 15:15:02 +00:00
  • 52489546fb ggml: aarch64: implement smmla kernel for q8_0_q8_0 quantized gemm Sunita Nadampalli 2024-02-08 15:09:42 +00:00
  • ff4ff05c5f
    llava : add missing .py, and fix paths in README.md (#5414) Daniel Bevenius 2024-02-08 15:20:03 +01:00
  • 2c932cb3d1
    llava: add missing .py, and fix paths in README.md Daniel Bevenius 2024-02-08 15:10:29 +01:00
  • 6bf368e7bd
    use macro Abhilash Majumder 2024-02-08 19:00:10 +05:30
  • 3cf123e8f6 Fuse matrix multiplication + SiLU JohannesGaessler 2024-02-08 11:52:13 +01:00
  • 6b40e5ac82
    add fp16 build CI Abhilash Majumder 2024-02-08 17:37:49 +05:30
  • c4c32f2954
    rm old logic Abhilash Majumder 2024-02-08 17:26:27 +05:30
  • de69ea86b0
    fix f16_sycl cpy call Abhilash Majumder 2024-02-08 16:42:14 +05:30
  • b7b74cef36
    fix trailing whitespace (#5407) b2101 Johannes Gäßler 2024-02-08 11:36:54 +01:00
  • 4aa43fab56
    llama : fix MiniCPM (#5392) b2100 runfuture 2024-02-08 18:36:19 +08:00
  • 6a5d8236ee fix trailing whitespace JohannesGaessler 2024-02-08 11:26:40 +01:00
  • a47e4cb6d7 frequency_threshold JohannesGaessler 2024-02-08 10:16:00 +01:00
  • 6d2693bc8d fix hashmap code JohannesGaessler 2024-02-08 10:11:12 +01:00
  • a6e514a85f
    llava: fix typo/formatting in README.md (#5405) Daniel Bevenius 2024-02-08 09:58:19 +01:00