Commit graph

  • 49e7304cdf use id for mmproj tensors Xuan Son Nguyen 2024-10-03 10:51:20 +02:00
  • 5639971466
    Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) b3869 Ouadie EL FAROUKI 2024-10-03 07:50:44 +01:00
  • c6f4c22dcd Update README.md Daniel Kleine 2024-10-03 05:56:46 +02:00
  • 62b09b343c metal : fix wrong number of tokens per sequence in SSM_SCAN Francis Couture-Harpin 2024-10-02 21:35:50 -04:00
  • c83ad6d01e
    ggml-backend : add device and backend reg interfaces (#9707) b3868 Diego Devesa 2024-10-03 01:49:47 +02:00
  • 8c179f41fa get a new ckpt Yutong Dai 2024-10-02 23:32:35 +00:00
  • a9d172cf65 Merge remote-tracking branch 'origin/master' into sl/backend-registry-2 slaren 2024-10-03 01:31:07 +02:00
  • ffeca353a9 fix more naming inconsistencies, make interface structs const slaren 2024-10-03 00:44:08 +02:00
  • d7072f11d5 Fix compute pass descriptor autorelease crash Jack Mousseau 2024-10-02 13:22:51 -07:00
  • cfef355611 fix some inconsistencies in the names of functions slaren 2024-10-02 21:26:40 +02:00
  • d0c4954fa0 move device backend_reg to the struct slaren 2024-10-02 21:08:07 +02:00
  • fa8df0c350 agent: drop fastify.py -> simpler serve_tools.py, and expose other tools to python interpreter Olivier Chafik 2024-10-02 19:51:23 +01:00
  • dc475c3b20 fix consistency issues with the usage of main_gpu slaren 2024-10-02 20:40:26 +02:00
  • 6b4a454735 agent: hard-code max_results=10 in brave_search Olivier Chafik 2024-10-02 19:13:28 +01:00
  • 26e76f9704 agent: allow interactive chat by default, and don't reuse sessions Olivier Chafik 2024-10-02 19:12:57 +01:00
  • 6f2191d99e agent: remove *lots* of cruft from tool definitions derived from FastAPI catalog (and remove wait* tools which can be implemented in Python anyway) Olivier Chafik 2024-10-02 17:54:20 +01:00
  • e2a9ab68a3 agent: --openai flag (auto-fetches OPENAI_API_KEY), improved logging Olivier Chafik 2024-10-02 17:15:55 +01:00
  • 5b8ec2b978 metal : fix SSM_SCAN state head offset Francis Couture-Harpin 2024-10-02 12:11:45 -04:00
  • b5516aab64 fix align [no ci] slaren 2024-10-02 17:49:38 +02:00
  • 8b15bc6fa0 metal : add back n_seqs to SSM_SCAN args Francis Couture-Harpin 2024-10-02 11:47:56 -04:00
  • 7a351abc28 metal : remove unused arguments for SSM_SCAN Francis Couture-Harpin 2024-10-02 11:28:16 -04:00
  • 2428b73853 agent: ditch openai dependency, use cache_prompt and expose seed Olivier Chafik 2024-10-02 16:26:45 +01:00
  • 03d0e6eabe metal : use log and exp instead of log1pf and expf in SSM_SCAN Francis Couture-Harpin 2024-10-02 10:58:41 -04:00
  • 87b97d08f4 metal : fix SSM_SCAN pipeline scope Francis Couture-Harpin 2024-10-02 10:41:10 -04:00
  • 2c77d799f9 metal : attempt to adapt SSM_SCAN for Mamba-2 Francis Couture-Harpin 2024-10-02 10:36:22 -04:00
  • b559d64ecc Update README.md Olivier Chafik 2024-10-02 15:19:27 +01:00
  • 9e502e89a5 tool-call: promote getting chat templates w/ dedicated script rather than rely on test resources Olivier Chafik 2024-10-02 15:03:08 +01:00
  • f3538e755b update tools Olivier Chafik 2024-10-02 14:57:25 +01:00
  • a39ab216aa
    llama : reduce compile time and binary size (#9712) b3867 Xuan Son Nguyen 2024-10-02 15:49:55 +02:00
  • 5b01402655 agent: add brave_search & fetch_page tools + move to examples/agent/tools/ Olivier Chafik 2024-10-02 14:29:45 +01:00
  • 6ff0d67b36 remove move unused reg_init functions from backends slaren 2024-10-02 15:10:19 +02:00
  • 2a60833a01 Merge remote-tracking branch 'origin/master' into sl/backend-registry-2 slaren 2024-10-02 15:05:45 +02:00
  • f536f4c439
    [SYCL] Initial cmake support of SYCL for AMD GPUs (#9658) b3866 Alberto Cabrera Pérez 2024-10-02 13:57:18 +01:00
  • f9cab02ee9 removed unused function, add missing statics slaren 2024-10-02 14:55:40 +02:00
  • 090cec28e4 rpc : enable vulkan Radoslav Gerganov 2024-09-10 17:10:27 +03:00
  • 00b7317e63
    vulkan : do not use tensor->extra (#9407) b3865 Radoslav Gerganov 2024-10-02 13:49:16 +03:00
  • c2ec885264 add more kv metadata Xuan Son Nguyen 2024-10-02 12:37:50 +02:00
  • 76b37d1541
    gguf-split : improve --split and --merge logic (#9619) b3864 Zhenwei Jin 2024-10-02 15:21:57 +08:00
  • 148844fe97
    examples : remove benchmark (#9704) b3863 Georgi Gerganov 2024-10-02 10:14:44 +03:00
  • c702e55930 Add llama_token_in_embd function to embed input tokens Andrei Betlen 2024-10-01 23:57:13 -04:00
  • 7bab8ed5b7 using fma in Q4_1 and Q5_1 dequant to fix precision issues OuadiElfarouki 2024-10-02 03:25:32 +01:00
  • db53f8ef06 fix pipeline parallelism check slaren 2024-10-02 03:13:51 +02:00
  • 04ef648f3e update other backends slaren 2024-10-02 02:45:18 +02:00
  • 065a4406cc Fix whitespace Mason M 2024-10-01 21:29:11 -03:00
  • 49d45c50a2 Prevent null format string Mason M 2024-10-01 21:16:09 -03:00
  • 0e63704b9a
    Merge 424e3a52fe into 3f1ae2e32c Feng Jiang 2024-10-02 02:14:29 +02:00
  • 6ff0e7a32e add device props/caps, fully support async upload for all compatible backends slaren 2024-10-02 01:23:54 +02:00
  • b03cab541f fix build (2) Xuan Son Nguyen 2024-10-02 01:10:47 +02:00
  • c76b14501e tool-call: fix Makefile Olivier Chafik 2024-10-02 00:06:42 +01:00
  • c6087816d0 fix build Xuan Son Nguyen 2024-10-02 00:54:04 +02:00
  • 5f972a04f3 llama : speed up compile time Xuan Son Nguyen 2024-10-02 00:44:03 +02:00
  • c36a196f53 tool-call: prepare possible externalization of minja + factor tool call style out of template Olivier Chafik 2024-10-01 23:12:24 +01:00
  • 5422f0005a ci : fine-grant permission Xuan Son Nguyen 2024-10-01 23:21:41 +02:00
  • 5a34562100
    Update examples/gguf-split/gguf-split.cpp Xuan Son Nguyen 2024-10-01 23:18:42 +02:00
  • 6089b0a50a simple example works Xuan Son Nguyen 2024-10-01 22:53:30 +02:00
  • 4897ff61c6 fix build Xuan Son Nguyen 2024-10-01 22:26:31 +02:00
  • e2535a113a Rename llama_state to llama_logger_state Mason M 2024-10-01 16:58:24 -03:00
  • a731280e61 Use GGML_LOG instead of GGML_PRINT Mason M 2024-10-01 16:42:18 -03:00
  • 4cd67cadd7
    Merge d3df98d6ea into 3f1ae2e32c stduhpf 2024-10-01 21:39:05 +02:00
  • 31c3b89c88 Fix compile error Mason M 2024-10-01 16:11:59 -03:00
  • 675b7ead54 Use C memory allocation funcs Mason M 2024-10-01 15:57:51 -03:00
  • 5e614429eb Add enum tag to parameters Mason M 2024-10-01 15:42:37 -03:00
  • 3f1ae2e32c
    Update README.md (#9591) Paweł Wodnicki 2024-10-01 12:18:46 -05:00
  • 7d6cb36895 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-10-01 13:09:40 -04:00
  • 273e7a495a llama : avoid redundant state copy for Mamba 1 and 2 Francis Couture-Harpin 2024-09-30 15:52:42 -04:00
  • 805fea97ac
    Update ggml/src/ggml-backend-impl.h slaren 2024-10-01 18:52:52 +02:00
  • c6b6540cb3 Cann backend now uses GGML logging Mason M 2024-10-01 13:51:20 -03:00
  • 15817d0f83 Cuda backend now uses GGML logging Mason M 2024-10-01 13:46:02 -03:00
  • 1f0c1a9292 Metal backend now uses GGML logging Mason M 2024-10-01 13:38:21 -03:00
  • 90222ac920 Add scaffolding for ggml logging macros Mason M 2024-10-01 13:15:29 -03:00
  • 0cbdf133d2 ggml-backend : add device and backend reg interfaces slaren 2024-10-01 17:24:28 +02:00
  • 84facfa472
    examples : remove benchmark Georgi Gerganov 2024-10-01 17:56:22 +03:00
  • e7673556d3 Improved description of SYCL backend in docs Alberto Cabrera 2024-10-01 15:54:19 +01:00
  • 0617335927 Added documentation for SYCL using AMD GPUs Alberto Cabrera 2024-10-01 15:36:42 +01:00
  • f1b8c42711
    sync : ggml b3861 Georgi Gerganov 2024-10-01 16:09:42 +03:00
  • e98c1c188e
    test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) Johannes Gäßler 2024-09-30 09:55:23 +02:00
  • cb00020504
    vulkan : mul_mat: fix UB with small warps (ggml/952) Salvatore Mesoraca 2024-09-30 09:14:09 +02:00
  • 6c5322481a
    ggml : fix ggml_cast (ggml/973) Borislav Stanimirov 2024-09-30 10:11:41 +03:00
  • 7254cdf7e8
    ggml: fix gradient allocation logic (ggml/966) Johannes Gäßler 2024-09-29 23:18:02 +02:00
  • cad341d889
    metal : reduce command encoding overhead (#9698) b3856 Georgi Gerganov 2024-10-01 16:00:25 +03:00
  • 9aecd38a8d Add embeddings scale to clip_ctx to rescale final image embeddings Andrei Betlen 2024-10-01 06:12:31 -04:00
  • 5648e30d3e llava cgraph ok Xuan Son Nguyen 2024-10-01 11:40:25 +02:00
  • 1b3e564f7b Added a comment to specify the tested AMD architectures Alberto Cabrera 2024-10-01 09:48:55 +01:00
  • a90484c6d9
    llama : print correct model type for Llama 3.2 1B and 3B b3855 Georgi Gerganov 2024-10-01 11:42:01 +03:00
  • 5273e59b09
    metal : add comments Georgi Gerganov 2024-10-01 10:58:52 +03:00
  • 43b9d694df
    metal : reduce command encoding overhead Georgi Gerganov 2024-10-01 10:23:44 +03:00
  • 5a49eddb89
    Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2) 0cc4m 2024-10-01 08:49:05 +02:00
  • 1927378bcc
    convert : refactor rope_freqs generation (#9396) compilade 2024-10-01 02:31:36 -04:00
  • 002e457788 implemented missing SYCL event APIs OuadiElfarouki 2024-09-30 21:49:12 +01:00
  • 323df41251 initial enablment of pinned mem model loading OuadiElfarouki 2024-09-26 14:51:46 +01:00
  • 6f1d9d71f4
    Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641) b3853 serhii-nakon 2024-09-30 21:57:12 +03:00
  • a38fc04f4d Merge branch 'master' into compilade/convert-merges-pairs-to-old Francis Couture-Harpin 2024-09-30 14:25:40 -04:00
  • bfbac0e4e2 Merge branch 'master' into compilade/convert-separate-extra-tensors Francis Couture-Harpin 2024-09-30 14:18:40 -04:00
  • 511636df0c
    ci : reduce severity of unused Pyright ignore comments (#9697) compilade 2024-09-30 14:13:16 -04:00
  • a34fc0dd86 ci : reduce severity of unused Pyright ignore comments compilade/pyright-fix-ignores Francis Couture-Harpin 2024-09-30 13:29:08 -04:00
  • 6854ad4057 img pre processing Xuan Son Nguyen 2024-09-30 17:35:04 +02:00
  • 7a780222bf
    Update convert_llama_ggml_to_gguf.py Ferdaws 2024-09-30 10:22:08 -05:00
  • 8ba38584b2 convert : handle tokenizer merges format from transformers 4.45 Francis Couture-Harpin 2024-09-30 11:13:53 -04:00
  • 08a43d05b6
    py : update transfomers version (#9694) vb 2024-09-30 17:03:47 +02:00
  • ace4f4be37
    flake.lock: Update (#9680) Georgi Gerganov 2024-09-30 17:48:49 +03:00