Commit graph

  • b4e079a7fd vulkan : add backend registry / device interfaces (#9721) Diego Devesa 2024-10-17 02:46:58 +02:00
  • 373ffc1cd9 fix: allocating CPU buffer with size 0 (#9917) Gilad S. 2024-10-17 02:34:22 +03:00
  • 8b4deb9434 fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) Gilad S. 2024-10-17 01:36:51 +03:00
  • 4fe75626dc llama : suppress conversion from 'size_t' to 'int' (#9046) Daniel Bevenius 2024-10-16 19:34:28 +02:00
  • 10dbdc85aa llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
  • b8182325e7 grammar : fix JSON Schema for string regex with top-level alt. (#9903) Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
  • 50a87204d2 llama : add tensor name for "result_norm" (#9907) Molly Sophia 2024-10-16 18:10:21 +08:00
  • aca8957c23 server : fix the disappearance of the end of the text (#9867) Alexey Parfenov 2024-10-16 08:35:53 +00:00
  • d2844e8ee3 sync : ggml Georgi Gerganov 2024-10-16 11:28:14 +03:00
  • 049b0bce63 ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
  • 38f190c590 [CANN] Fix cann compilation error (#9891) leo-pony 2024-10-16 08:51:46 +08:00
  • 06e841beed llama : add infill sampler (#9896) Georgi Gerganov 2024-10-15 16:35:33 +03:00
  • ed0843d6c5 server : improve infill context reuse (#9894) Georgi Gerganov 2024-10-15 16:28:55 +03:00
  • 9c42815259 sampling : add XTC sampler (#9742) MaggotHATE 2024-10-15 15:54:55 +05:00
  • d84d567578 server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
  • b9a3854838 readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
  • b9ebb55f66 server : handle "logprobs" field with false value (#9871) VoidIsVoid 2024-10-14 15:04:36 +08:00
  • f9d2c91df6 Vectorize load instructions in dmmv f16 CUDA kernel (#9816) agray3 2024-10-14 01:49:08 +01:00
  • 3719bf6f11 server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
  • 4682f73437 server : reuse cached context chunks (#9866) Georgi Gerganov 2024-10-13 18:52:48 +03:00
  • 695e30d25e flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
  • 3fd4de0020 server : add option to time limit the generation phase (#9865) Georgi Gerganov 2024-10-12 16:14:27 +03:00
  • 0b2de92468 server : remove self-extend features (#9860) Georgi Gerganov 2024-10-12 16:06:31 +03:00
  • b7aea356a3 server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
  • a0ddf1ffe8 llama : improve infill support and special token detection (#9798) Georgi Gerganov 2024-10-12 08:21:51 +03:00
  • bb832cab50 musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
  • 63654cd1f2 ggml : move more prints to the ggml log system (#9839) Diego Devesa 2024-10-11 15:34:45 +02:00
  • 1467a7a064 add tests for N % 1024 != 0 Junhee Yoo 2024-10-18 23:28:32 +09:00
  • 526d1cf303 Add llama_cpp_canister to the README icpp 2024-10-18 11:16:24 -04:00
  • adbec7f5ad fix im2col and add unittest for N>=1024 Junhee Yoo 2024-10-18 20:20:16 +09:00
  • 5d99ae447b correct token pos in llama_batch_allocr Xuan Son Nguyen 2024-10-18 15:56:28 +02:00
  • 9dd7e77742 Merge branch 'master' into xsn/llama_batch_remove_compat Xuan Son Nguyen 2024-10-18 15:41:41 +02:00
  • bc82fc2ed8
    llama-bench : add time-to-first-byte stat gg/ttfb Georgi Gerganov 2024-09-19 09:15:29 +03:00
  • 06159898e1
    cont : avoid extra loop in temperature sampler for sub-zero temp Georgi Gerganov 2024-10-18 15:58:52 +03:00
  • 033a2410d3 fix mul_mat_vec_q and *_vec_q error arthw 2024-10-18 20:49:32 +08:00
  • afd9909a64
    rpc : backend refactoring (#9912) b3942 Radoslav Gerganov 2024-10-18 14:33:58 +03:00
  • a5452db6dd sample: maintain token count in penalty sampler context zhenweijin 2024-10-17 16:19:57 +08:00
  • c9e549cce4 rpc : refactor server Radoslav Gerganov 2024-10-18 10:13:16 +03:00
  • 98f4e5d921 rpc : refactor backend Radoslav Gerganov 2024-10-17 10:35:19 +03:00
  • 87421a23e8
    [SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) b3941 Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
  • 60ce97c9d8
    add amx kernel for gemm (#8998) b3940 Ma Mingfei 2024-10-18 13:34:36 +08:00
  • 8901755ba3
    server : add n_indent parameter for line indentation requirement (#9929) b3939 Georgi Gerganov 2024-10-18 07:32:19 +03:00
  • 3a25182685 Merge branch 'master' into sycl_async_data_load OuadiElfarouki 2024-10-18 05:17:30 +01:00
  • 2d3fc54ac6 add amx kernel for gemm pr_add_intel_amx_support mingfeima 2024-04-06 19:57:25 -07:00
  • 3c86af28f1 basic concept Roberto Tomás Collins 2024-10-17 23:10:54 -04:00
  • 6f55bccbb8
    llama : rename batch_all to batch (#8881) b3938 Daniel Bevenius 2024-10-18 01:41:51 +02:00
  • 630bce5a7f ggml : fix possible buffer use after free in sched reserve sl/fix-sched-reserve slaren 2024-10-18 00:21:54 +02:00
  • 17bb928080
    readme : remove --memory-f32 references (#9925) b3937 Georgi Gerganov 2024-10-17 23:43:05 +03:00
  • 9f45fc1e99
    llama : change warning to debug log b3936 Georgi Gerganov 2024-10-17 23:26:32 +03:00
  • 4a5b5870f1
    llama : handle temp <= 0.0 in the temp_ext sampler too Georgi Gerganov 2024-10-17 22:53:22 +03:00
  • f0ded27901
    server : add n_indent parameter for line indentation requirement Georgi Gerganov 2024-10-17 22:06:20 +03:00
  • 99bd4ac28c
    llama : infill sampling handle very long tokens (#9924) b3935 Georgi Gerganov 2024-10-17 22:32:47 +03:00
  • cd978508ac
    tests : init prob correctly Georgi Gerganov 2024-10-17 18:23:02 +03:00
  • 57fb835e5b
    cont : no need for special "greedy" logic Georgi Gerganov 2024-10-17 18:09:57 +03:00
  • cb75bebcad
    sampling : change temperature sampler logic Georgi Gerganov 2024-10-17 17:19:23 +03:00
  • 33a69ec742
    tests : replace macros with functions Georgi Gerganov 2024-10-17 17:51:54 +03:00
  • e31c8790ff
    llama : deprecate softmax sampler + fix dist sampler Georgi Gerganov 2024-10-15 14:24:05 +03:00
  • 7899c67f7c
    cont : better indices Georgi Gerganov 2024-10-17 16:55:33 +03:00
  • 99c4a39bf1
    llama : infill sampling handle very long tokens Georgi Gerganov 2024-10-17 16:00:48 +03:00
  • dc68a59064 update spelling Clarissa Miranda 2024-10-17 21:49:31 +11:00
  • a33fbbe411
    Update spelling in memoize Clarissa Miranda 2024-10-17 21:44:24 +11:00
  • 34fc44d03b
    Merge pull request #1 from ggerganov/gg/grammar-refactor Clarissa Miranda 2024-10-17 21:41:48 +11:00
  • 17b3a3e8cc
    llama : minor llama_grammar refactoring gg/grammar-refactor Georgi Gerganov 2024-10-17 12:19:28 +03:00
  • b4d3c16493 add pool_2d Junhee Yoo 2024-10-17 16:05:45 +09:00
  • 3752217ed5
    readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
  • 2aa6dd273a add stacks cache into llama_grammar Clarissa Miranda 2024-10-17 14:30:07 +11:00
  • f010b77a37
    vulkan : add backend registry / device interfaces (#9721) b3933 Diego Devesa 2024-10-17 02:46:58 +02:00
  • 2363a4805e Merge remote-tracking branch 'origin/master' into sl/vulkan-reg-2 slaren 2024-10-17 01:41:51 +02:00
  • b1a5386fbb Add SwiftLlama to the Bindings list Tim Wang 2024-10-17 10:36:39 +11:00
  • 2194200278
    fix: allocating CPU buffer with size 0 (#9917) b3932 Gilad S. 2024-10-17 02:34:22 +03:00
  • a12f3fdc7b fix: allocating CPU buffer with size 0 Gilad S 2024-10-17 02:21:45 +03:00
  • 73afe681aa
    fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) b3931 Gilad S. 2024-10-17 01:36:51 +03:00
  • 645eb3c6ad
    consolidated.safetensors CrispStrobe 2024-10-16 23:18:56 +02:00
  • 9e04102448
    llama : suppress conversion from 'size_t' to 'int' (#9046) b3930 Daniel Bevenius 2024-10-16 19:34:28 +02:00
  • dbf18e4de9
    llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
  • f3141d563c Merge branch 'master' into sycl_async_data_load OuadiElfarouki 2024-10-16 18:20:51 +01:00
  • 66c2c93082
    grammar : fix JSON Schema for string regex with top-level alt. (#9903) b3928 Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
  • eaf938889c
    Merge branch 'ggerganov:master' into master dennyxbox890 2024-10-16 20:28:38 +08:00
  • 68824e2af5 change root 2024-10-16 15:25:02 +03:00
  • d87aa806b5 llama : bump max layers from 512 to 1024 Nico Bosshard 2024-10-16 14:20:20 +02:00
  • a524195f9d
    Delete common/stb_image.h VikingServer 2024-10-16 14:20:08 +03:00
  • 890a5ccdac llama : rename batch_all to batch Daniel Bevenius 2024-10-16 13:15:20 +02:00
  • 10433e8b45
    llama : add tensor name for "result_norm" (#9907) b3927 Molly Sophia 2024-10-16 18:10:21 +08:00
  • dc00642502
    Add files via upload VikingServer 2024-10-16 12:38:52 +03:00
  • c1fb24fe1f Take good advice and streamline code caitianchi 2024-10-16 17:15:22 +08:00
  • 9236d8e7e5 RWKV v6: Add tensor name for "result_norm" Molly Sophia 2024-10-16 16:45:34 +08:00
  • 1f66b699c4
    server : fix the disappearance of the end of the text (#9867) b3926 Alexey Parfenov 2024-10-16 08:35:53 +00:00
  • 0e41b300ed
    sync : ggml b3925 Georgi Gerganov 2024-10-16 11:28:14 +03:00
  • cd60b88bf7
    ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
  • 92eca17ab7
    Merge branch 'ggerganov:master' into master dennyxbox890 2024-10-16 14:56:18 +08:00
  • c5ae329409
    Fix JSON Schema to Grammar for string regexp with top-level alternation. Joe Eli McIlvain 2024-10-15 18:55:47 -07:00
  • becfd387f6
    [CANN] Fix cann compilation error (#9891) b3923 leo-pony 2024-10-16 08:51:46 +08:00
  • ecddb2452a fix: hbw_posix_memalign alignment Gilad S 2024-10-16 02:03:03 +03:00
  • 6e13d3faef
    Merge remote-tracking branch 'upstream/master' into fix-stop-trim ZXED 2024-10-15 20:33:35 +03:00
  • 755a9b2bf0
    llama : add infill sampler (#9896) b3922 Georgi Gerganov 2024-10-15 16:35:33 +03:00
  • a0c403e4f6
    Merge branch 'ggerganov:master' into master dennyxbox890 2024-10-15 21:34:54 +08:00
  • 223c25a72f
    server : improve infill context reuse (#9894) b3921 Georgi Gerganov 2024-10-15 16:28:55 +03:00
  • 6a308cbc06 vulkan : improve ggml_vk_create_buffer error handling Shupei Fan 2024-10-15 19:31:26 +08:00
  • d8d0eeaa86
    llama : add infill sampler Georgi Gerganov 2024-10-15 12:57:32 +03:00
  • fbc98b748e
    sampling : add XTC sampler (#9742) b3920 MaggotHATE 2024-10-15 15:54:55 +05:00