Commit graph

  • 7670809ad4
    metal : mul mat struct (wip) Georgi Gerganov 2024-11-09 17:54:40 +02:00
  • 4fd6fc5ab8
    metal : cont + avoid potential int overflow [no ci] Georgi Gerganov 2024-11-09 16:39:36 +02:00
  • 9e07bcc06e
    metal : fattn args Georgi Gerganov 2024-11-09 16:09:31 +02:00
  • 6a2c025684 sycl : fix norm asserts in debug build Alberto Cabrera 2024-11-12 09:31:07 +00:00
  • 1198ae7749
    metal : add kernel arg structs (wip) Georgi Gerganov 2024-11-09 15:28:55 +02:00
  • 689203d4c9
    feat: update README.md tianzixuan 2024-11-12 17:17:39 +08:00
  • bb33473f08
    Merge pull request #23 from NexaAI/david/vulkan2 Zack Li 2024-11-11 23:47:03 -08:00
  • 98297afbd5
    Merge pull request #22 from NexaAI/david/vulkan2 Zack Li 2024-11-11 23:46:51 -08:00
  • 4e80184c32 fix vulkan build bug for external build Yicheng Qian 2024-11-11 23:35:11 -08:00
  • 89bcf5a6d9
    Merge pull request #21 from NexaAI/master Zack Li 2024-11-11 23:17:58 -08:00
  • 82dbdbdb40
    Merge pull request #20 from NexaAI/weili/master-release Zack Li 2024-11-11 22:36:57 -08:00
  • 55953d35a4 [omni-vlm] fixed the segmentation fault issue in nano-vlm-instruct(WIP, current solution is still not perfect) 李为 2024-11-12 14:17:42 +08:00
  • ad4a165f9b vulkan: Use macros to make the mat mul pipeline creation more concise Jeff Bolz 2024-11-11 21:46:09 -06:00
  • ab26fb9005 build fixes slaren 2024-11-12 02:32:22 +01:00
  • bf79cb3972 add rpc backend slaren 2024-11-11 23:55:27 +01:00
  • 8768c7c45a fix tests and examples slaren 2024-11-11 23:44:27 +01:00
  • 4428593487 Merge remote-tracking branch 'origin/master' into sl/dl-backend slaren 2024-11-11 23:20:55 +01:00
  • f6ea8b73f8 sycl : marks permuted MUL_MAT as unsupported Alberto Cabrera 2024-11-11 21:33:41 +00:00
  • 17b8a2e669 sycl : Fixes RWKV6 broken build in the cuda backend Alberto Cabrera 2024-11-11 21:31:41 +00:00
  • c2c819be86 always show copy btn for code snippet Xuan Son Nguyen 2024-11-11 16:32:18 -04:00
  • 5f2d958492
    Merge pull request #19 from NexaAI/master Zack Li 2024-11-11 12:25:51 -08:00
  • 362bdf3292
    Merge pull request #18 from NexaAI/weili/master-release Zack Li 2024-11-11 12:24:12 -08:00
  • a30f0b2c23 ggml : build backends as libraries slaren 2024-11-11 21:18:27 +01:00
  • 0945da8334 Samplers sequence: simplified and input field. MaggotHATE 2024-11-12 00:35:49 +05:00
  • b04b638852 vulkan: Optimize contiguous copies Jeff Bolz 2024-11-11 12:47:13 -06:00
  • 103f10ca29 tests: Fix memory bandwidth calculation for perf tests Jeff Bolz 2024-11-11 12:19:31 -06:00
  • 54ef9cfc72
    vulkan: Throttle the number of shader compiles during the build step. (#10222) b4067 Jeff Bolz 2024-11-11 11:13:51 -06:00
  • 7cf07df5e2 reset model in every inerence step to avoid nosense output. 李为 2024-11-11 19:41:26 +08:00
  • 2268ce0c4f fix build error Charles Xu 2024-11-11 08:19:21 +01:00
  • b0cefea58a
    metal : more precise Q*K in FA vec kernel (#10247) b4066 Georgi Gerganov 2024-11-11 08:39:13 +02:00
  • b141e5f6ef
    server : enable KV cache defrag by default (#10233) b4065 Georgi Gerganov 2024-11-11 08:38:43 +02:00
  • 40465e2e86 Fix link in main.swift Jason Flax 2024-11-10 18:53:32 -05:00
  • b97f21851c Add corrections Jason Flax 2024-11-10 18:51:20 -05:00
  • 7752c97f3e Merge branch 'master' of https://github.com/ggerganov/llama.cpp Jason Flax 2024-11-10 18:37:36 -05:00
  • 9c76cdba16 Merge branch 'master' of https://github.com/ggerganov/llama.cpp Jason Flax 2024-11-10 18:21:17 -05:00
  • 0638c44821 llama: updated comments Michael Podvitskiy 2024-11-10 23:08:42 +01:00
  • ba2e8fc060
    Update test-tokenizer-0.py Robert 2024-11-10 12:56:08 -08:00
  • 355206539f use settings-modal-short-input component Xuan Son Nguyen 2024-11-10 15:49:31 -04:00
  • 4b3a9212b6
    flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
  • 3ffbbc3bf8 Merge branch 'master' into xsn/ui_copy_btn Xuan Son Nguyen 2024-11-10 15:44:52 -04:00
  • 505f33274d
    server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
  • dcf039ac9c use component for settings input, move help msg to tooltips Xuan Son Nguyen 2024-11-10 14:33:43 -04:00
  • 76d8975873 rebased onto commit a0a4646 Charles Xu 2024-11-10 19:15:42 +01:00
  • 0f6f1c789c
    metal : more precise Q*K in FA vec kernel Georgi Gerganov 2024-11-10 16:22:29 +02:00
  • c7a54d1f2b [ggml-aarch64] impl the same logic as the ASM version in q4_0_4_4 gemm/gemv Shupei Fan 2024-11-10 21:33:40 +08:00
  • 160687b3ed
    vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) b4062 Jeff Bolz 2024-11-10 05:37:56 -06:00
  • 6bb6546dc5 Fixed stretching of input fields. MaggotHATE 2024-11-10 15:08:34 +05:00
  • 20ad68f968 Added tooltips with basic information MaggotHATE 2024-11-10 13:49:32 +05:00
  • 6f0e8c3ee6
    Create CODEOWNERS Zack Li 2024-11-09 18:56:57 -08:00
  • bc80c85018 flake.lock: Update github-actions[bot] 2024-11-10 00:22:20 +00:00
  • 084afaa1e2 fix problem with api key Xuan Son Nguyen 2024-11-09 18:40:07 -04:00
  • 22160f000e server : (web ui) add copy btn for code blocks Xuan Son Nguyen 2024-11-09 18:19:24 -04:00
  • 84bcad6988 CUDA: no -sm row for very small matrices Johannes Gäßler 2024-11-08 21:26:33 +01:00
  • f9b1969097 Update README.md ochafik 2024-11-09 19:00:53 +00:00
  • 5789f69d2d minja: don't explode upon referencing a field on an array (fixes Hermes tool use template) ochafik 2024-11-09 18:57:09 +00:00
  • c059aecd37 agent: memorize, search_memory (sqlite-vec + sqlite-lembed), fetch + docling (pdf -> markdown), sparql for dbpedia and wikidata ochafik 2024-11-09 18:25:34 +00:00
  • 21bc833273
    Merge pull request #17 from NexaAI/weili/master-release Zack Li 2024-11-09 10:03:46 -08:00
  • ee599f901a llama: correct reverting of the entire batch. also updates llama_kv_cache_find_slot, will correctly count the number of used cells for recurrent models Michael Podvitskiy 2024-10-22 19:57:15 +02:00
  • 0026c810d7 llama: restore a kv_cache in case of failed computation Michael Podvitskiy 2024-10-21 10:47:27 +02:00
  • acb9528362 llama: llama_kv_cache_state was removed, only the result of llama_graph_compute is returned Michael Podvitskiy 2024-10-21 09:05:33 +02:00
  • 4701893233 llama: reverting kv_cache in case of failed compute Michael Podvitskiy 2024-09-24 21:12:47 +02:00
  • 5e354e3ca2 llama: propagating the results of graph_compute to the user interface Michael Podvitskiy 2024-09-17 21:43:01 +02:00
  • 918f3f9ab7 Add back samplers to server MaggotHATE 2024-11-09 20:48:26 +05:00
  • d04e354f2f fix OCR template error. 李为 2024-11-09 20:35:55 +08:00
  • 014eb6f228
    server : enable KV cache defrag by default Georgi Gerganov 2024-11-09 12:20:37 +02:00
  • 9cae93cfd8 Added shift_p_min parameter to control probabilities MaggotHATE 2024-11-09 14:54:49 +05:00
  • 6423c65aa8
    metal : reorder write loop in mul mat kernel + style (#10231) b4061 Georgi Gerganov 2024-11-09 11:53:13 +02:00
  • 39a334a9aa
    metal : fix build and some more comments (#10229) b4060 Georgi Gerganov 2024-11-09 11:53:02 +02:00
  • bb38cdd8ba
    metal : fix F32 accumulation in FA vec kernel (#10232) b4059 Georgi Gerganov 2024-11-09 11:52:45 +02:00
  • f018acba22
    llama : fix Qwen model type strings b4058 Georgi Gerganov 2024-11-09 11:26:34 +02:00
  • 46323fa9ef
    metal : hide debug messages from normal log b4057 Georgi Gerganov 2024-11-09 11:21:49 +02:00
  • ced3be94ac
    metal : fix F32 accumulation in FA vec kernel Georgi Gerganov 2024-11-09 11:15:37 +02:00
  • 3d1fe1bb4d
    metal : int -> short, style gg/metal-mul-mat-write-opt Georgi Gerganov 2024-11-08 15:38:25 +02:00
  • 535050572a
    metal : reorder write loop Georgi Gerganov 2024-11-08 15:15:25 +02:00
  • bd1198a67a
    metal : fix build and some more comments gg/metal-fix-build Georgi Gerganov 2024-11-09 10:09:50 +02:00
  • 5b359bb1e3
    ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) b4056 SXX 2024-11-09 15:35:46 +08:00
  • e89213492d
    ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) b4055 amritahs-ibm 2024-11-09 12:47:50 +05:30
  • 8fc393f246
    scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
  • a1a7ab0958 llama : use ggml_backend_dev_get_extra_bufts Daniel Bevenius 2024-11-09 05:27:10 +01:00
  • 29fd73f9f1 vulkan: Fix newly added tests for permuted mul_mat and 1D im2col Jeff Bolz 2024-11-08 20:52:26 -06:00
  • 667a6d9838
    Merge pull request #16 from NexaAI/perry/android-dev Perry Cheng 2024-11-08 15:23:54 -08:00
  • ecfe0b487f changed download models and nlen zhycheng614 2024-11-08 23:22:26 +00:00
  • d5df53658f
    Merge pull request #14 from NexaAI/teliu/android/dev Zack Li 2024-11-08 13:25:00 -08:00
  • 8c417282d5
    Merge pull request #15 from NexaAI/weili/master-release Zack Li 2024-11-08 13:23:46 -08:00
  • 2444764279 vulkan: Throttle the number of shader compiles during the build step. Jeff Bolz 2024-11-08 13:41:22 -06:00
  • ec450d3bbf
    metal : opt-in compile flag for BF16 (#10218) b4053 Georgi Gerganov 2024-11-08 21:59:46 +02:00
  • 695ad752b2
    metal : improve clarity (minor) (#10171) b4052 Georgi Gerganov 2024-11-08 18:37:41 +02:00
  • f640fdd98b sycl: Add option to set the SYCL architecture for all targets romain.biessy 2024-11-07 10:23:43 +00:00
  • 871036d236 add check for tensor dimensions Charles Xu 2024-11-08 17:01:51 +01:00
  • 5947d72c84 retain the tensor type as Q4_0 Charles Xu 2024-11-07 11:06:08 +01:00
  • b632bf0fc5 refactor add new buffer type for online flow Charles Xu 2024-11-06 15:36:14 +01:00
  • 647eb3167c backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels Charles Xu 2024-10-17 09:17:35 +02:00
  • 7ae02add95
    metal : fix BF16 check in MSL Georgi Gerganov 2024-11-08 16:51:55 +02:00
  • eb6d54679e update README.md 李为 2024-11-08 22:05:57 +08:00
  • 50a65ef1c5 scripts: fix pattern and get n_tokens in one go lhpqaq 2024-11-08 21:58:37 +08:00
  • 41f0cf65fd
    metal : has_float -> use_float Georgi Gerganov 2024-11-08 15:56:10 +02:00
  • 36b0ea612e
    swift : switch back to v12 Georgi Gerganov 2024-11-08 15:01:57 +02:00
  • 93c91526c6
    ci : use BF16 Georgi Gerganov 2024-11-08 11:50:30 +02:00
  • b74aabfae4
    metal : opt-in compile flag for BF16 Georgi Gerganov 2024-11-08 11:46:59 +02:00
  • 3d9c63a3ff remove omni-vlm-v2/ 李为 2024-11-08 21:00:42 +08:00