Commit graph

  • 96e80dabc6
    examples : improve base-translate.sh script (#4783) Georgi Gerganov 2024-01-06 11:40:24 +02:00
  • 918c3334a8 Merge upstream changes, fix staging buffer usage 0cc4m 2024-01-06 09:22:56 +01:00
  • ece0b0d855 improve graph splitting, partial fix for --no-kv-offload slaren 2024-01-06 00:58:22 +01:00
  • b614a86dd9 disable print statements for dynatemp Concedo 2024-01-06 11:14:58 +08:00
  • 123bff9a0f
    Full DynaTemp implementation + UI (#600) kalomaze 2024-01-05 21:13:16 -06:00
  • d107459321 ggml-backend : increase GGML_MAX_BACKENDS slaren 2024-01-05 10:57:30 +01:00
  • 863ef45539 llama : check for null tensor_split slaren 2024-01-05 10:57:05 +01:00
  • 1fa7ee2e51 batched-bench : add tensor_split param Georgi Gerganov 2024-01-05 10:31:15 +02:00
  • a1ab35c682 fix unmap after loading slaren 2024-01-05 03:14:06 +01:00
  • 6483328fa9 ggml-backend : add names to buffers slaren 2024-01-05 02:49:38 +01:00
  • 33f0761e9b llama : ggml-backend integration slaren 2023-12-28 19:15:46 +01:00
  • 28c3c8337e ggml: use __builtin_amdgcn_sudot4 in __dp4a for gfx11 Konstantin Zhuravlyov 2024-01-05 12:56:25 -05:00
  • eec22a1c63
    cmake : check for openblas64 (#4134) b1775 a-n-n-a-l-e-e 2024-01-05 08:04:40 -08:00
  • be36bb946a
    flake.nix : fix typo (#4700) Ikko Eltociear Ashimine 2024-01-06 01:02:44 +09:00
  • 91d38876df metal : switch back to default.metallib (ggml/681) b1773 Georgi Gerganov 2024-01-05 16:30:52 +02:00
  • d061bf9405 ggml : fix q2_k bpw in comments (ggml/680) Georgi Gerganov 2024-01-05 15:36:04 +02:00
  • 1bf681f90e ggml : add error handling to graph_compute (whisper/1714) Finn Voorhees 2024-01-03 08:39:43 -05:00
  • 66e40093a2
    metal : switch back to default.metallib (ggml/681) Georgi Gerganov 2024-01-05 16:30:52 +02:00
  • aaa3b9d932
    ggml : fix q2_k bpw in comments (ggml/680) Georgi Gerganov 2024-01-05 15:36:04 +02:00
  • 45516ebb93
    ggml : add error handling to graph_compute (whisper/1714) Finn Voorhees 2024-01-03 08:39:43 -05:00
  • 427ba21e62 add stub values for usage, revert cuda malloc pool implementation (+1 squashed commits) Concedo 2024-01-05 19:02:45 +08:00
  • c1d7cb28d3
    ggml : do not sched_yield when calling BLAS (#4761) b1770 Georgi Gerganov 2024-01-05 15:18:21 +02:00
  • 3681f22443
    examples : add few-shot translation example (#4783) Georgi Gerganov 2024-01-05 15:11:10 +02:00
  • 61e3ca8cd1
    examples : add few-shot translation example Georgi Gerganov 2024-01-05 14:30:28 +02:00
  • c9fdd42da2 Merge branch 'master' into concedo_experimental Concedo 2024-01-05 18:32:54 +08:00
  • 20261049c9 try to reuse cloudflared file Concedo 2024-01-05 18:04:09 +08:00
  • b3a7c20b5c
    finetune : remove unused includes (#4756) b1768 Daniel Bevenius 2024-01-04 20:45:37 +01:00
  • 18b612b5c6
    updated server readme to reflect the gg/server-token-probs-4088 commit Behnam M 2024-01-04 13:41:48 -05:00
  • 012cf349ae
    server : send token probs for "stream == false" (#4714) b1767 Georgi Gerganov 2024-01-04 19:56:33 +02:00
  • 4a0e7222c9
    ggml : simplify do_yield logic Georgi Gerganov 2024-01-04 12:56:26 +02:00
  • f77882461f
    ggml : fix do_yield logic Georgi Gerganov 2024-01-04 11:43:01 +02:00
  • a91928014f
    Print backend name on test-backend-ops failure (#4751) b1766 Johannes Gäßler 2024-01-04 09:43:23 +01:00
  • 3c0b585561
    llama.swiftui : support loading custom model from file picker (#4767) b1765 singularity 2024-01-04 16:22:38 +08:00
  • e5804313a1
    server : fix options in README.md (#4765) Michael Coppola 2024-01-04 03:17:09 -05:00
  • 459769a08d
    Merge branch 'master' into swiftui-load-custom-model singularity 2024-01-04 16:16:54 +08:00
  • dc891b7f7a
    ggml : include stdlib.h before intrin.h (#4736) b1763 Georgi Gerganov 2024-01-04 10:12:26 +02:00
  • a78caeb7ac
    ggml : add comment Georgi Gerganov 2024-01-04 10:12:05 +02:00
  • 23b535f396
    minor : fix whitespace Georgi Gerganov 2024-01-04 10:07:26 +02:00
  • 46cea79e1f
    llama.swiftui : fix build of ggml.metallib (#4754) singularity 2024-01-04 15:58:16 +08:00
  • 733634edc5
    llama.swift : remove debug flags from metallib build Georgi Gerganov 2024-01-04 09:53:12 +02:00
  • 5cd52ddcd2 swiftui: remove trailing whitespace singularity 2024-01-04 15:51:42 +08:00
  • eb001f2fc7 swiftui: support load model from file picker singularity 2024-01-04 11:35:08 +08:00
  • 75dee075fc metal: build ggml.metallib instead of copy src singularity 2024-01-04 11:24:48 +08:00
  • 43a9177ef9
    fix examples/server/README.md Michael Coppola 2024-01-03 18:01:38 -05:00
  • e3698b5f87
    Work around MinGW bug Mosè Giordano 2024-01-03 21:33:59 +00:00
  • 14783532e3 Print backend name on test-backend-ops failure JohannesGaessler 2024-01-03 10:52:17 +01:00
  • cb1e2818e0
    train : fix typo in overlapping-samples help msg (#4758) b1761 Daniel Bevenius 2024-01-03 18:53:40 +01:00
  • ece9a45e8f
    swift : update Package.swift to use ggml as dependency (#4691) b1760 Ashraful Islam 2024-01-03 11:30:02 -06:00
  • 23d9e5b6de
    ggml : do not sched_yield when calling BLAS Georgi Gerganov 2024-01-03 19:12:13 +02:00
  • 44f30434aa fixup! CUDA: faster softmax via shared memory + fp16 math JohannesGaessler 2024-01-03 17:29:27 +01:00
  • e1936bb52f fixup! fixup! CUDA: faster softmax via shared memory + fp16 math JohannesGaessler 2024-01-03 16:56:27 +01:00
  • ae26053d1f fixup! CUDA: faster softmax via shared memory + fp16 math JohannesGaessler 2024-01-03 16:46:18 +01:00
  • d37c94bcd9 Merge branch 'master' into concedo_experimental Concedo 2024-01-03 22:46:49 +08:00
  • 64c46fc6f5 CUDA: faster softmax via shared memory + fp16 math JohannesGaessler 2024-01-02 16:50:43 +01:00
  • 234f79fe9d Merge branch 'master' into concedo_experimental Concedo 2024-01-03 22:33:38 +08:00
  • 667a71725f
    train: fix typo in overlapping-samples help msg Daniel Bevenius 2024-01-03 14:39:31 +01:00
  • 91378bc596
    finetune: remove unused includes Daniel Bevenius 2024-01-03 13:39:14 +01:00
  • 7bed7eba35 cuda : simplify expression b1759 Georgi Gerganov 2024-01-03 14:18:46 +02:00
  • d55356d3ba cuda : mark I16 and I32 ops as unsupported Georgi Gerganov 2024-01-03 13:01:44 +02:00
  • 75e3fd8581 sync : ggml Georgi Gerganov 2024-01-03 11:37:44 +02:00
  • 289313716f metal : add kernel_get_rows_i32 Georgi Gerganov 2024-01-03 11:35:46 +02:00
  • ab62fc3e55 scripts : fix sync order + metal sed Georgi Gerganov 2024-01-03 11:25:54 +02:00
  • 5f66ebca9c ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) Guillaume Wenzek 2023-12-29 18:07:03 +01:00
  • dd22e4f908
    cuda : simplify expression Georgi Gerganov 2024-01-03 14:18:46 +02:00
  • b13f3b223e
    cuda : mark I16 and I32 ops as unsupported Georgi Gerganov 2024-01-03 13:01:44 +02:00
  • 78c3d322d7 metal: fix metal backend init failure in swiftui singularity 2024-01-03 18:42:11 +08:00
  • 4fb1bd85a0 Fix CUDA diag_mask_inf tests with LLAMA_FAST JohannesGaessler 2024-01-03 11:20:28 +01:00
  • 99eb4c3fd3
    sync : ggml Georgi Gerganov 2024-01-03 11:37:44 +02:00
  • 03655fbb05
    metal : add kernel_get_rows_i32 Georgi Gerganov 2024-01-03 11:35:46 +02:00
  • fd7bf2aac1
    scripts : fix sync order + metal sed Georgi Gerganov 2024-01-03 11:25:54 +02:00
  • 978305e691
    ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) Guillaume Wenzek 2023-12-29 18:07:03 +01:00
  • f2eb19bd8b
    server : throw an error when slot unavailable (#4741) Justin Parker 2024-01-03 03:43:19 -05:00
  • e49d398f73 use same struct size for cuda and non cuda (+1 squashed commits) Concedo 2024-01-03 16:03:36 +08:00
  • e4916e91d0
    Throw an error to be caught by upstream callers when slot unavailable Justin Parker 2024-01-02 15:42:00 -05:00
  • f3f62f0d83
    metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725) b1752 Georgi Gerganov 2024-01-02 21:07:47 +02:00
  • 9f51f3e695
    metal : opt mul_mm_id gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 20:50:18 +02:00
  • 9c11f61ff1 Simpler array sizes for msvc Henrik Forstén 2024-01-02 20:18:48 +02:00
  • 6683eb4960 Fix buffer size Henrik Forstén 2024-01-02 20:07:34 +02:00
  • 4cc78d3873
    ggml : force F32 precision for ggml_mul_mat cuda-cublas-opts Georgi Gerganov 2023-12-19 16:23:39 +02:00
  • 0ef3ca2ac6
    server : add token counts to html footer (#4738) b1751 Phil H 2024-01-02 15:48:49 +00:00
  • 7e03872f1e Fix QKK_64 Henrik Forstén 2024-01-02 15:44:45 +02:00
  • 5cbf4ba1cc server: generate hpp phiharri 2024-01-02 15:26:08 +00:00
  • 3b9658e369 server: add token counts to stats phiharri 2024-01-02 15:22:30 +00:00
  • 21e100d6dc
    Merge branch 'master' into gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 16:27:21 +02:00
  • 540938f890
    llama : llama_model_desc print number of experts b1750 Georgi Gerganov 2024-01-02 16:26:45 +02:00
  • daf9b12472
    metal : minor fix Georgi Gerganov 2024-01-02 16:25:41 +02:00
  • 74460d0065
    Merge branch 'master' into gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 16:24:05 +02:00
  • c73e598d1c
    Merge branch 'master' into gg/metal-opt-mul-mat-id Georgi Gerganov 2024-01-02 16:22:47 +02:00
  • 0040d42eeb
    llama : replace all API facing int's with int32_t (#4577) b1749 Marcus Dunn 2024-01-02 06:15:16 -08:00
  • b5af7ad84f
    llama : refactor quantization to avoid <mutex> header gg/avoid-mutex Georgi Gerganov 2024-01-02 15:53:28 +02:00
  • 4b06507172 Cleanup Henrik Forstén 2024-01-01 13:59:42 +02:00
  • c92418dca9
    ggml : include stdlib.h before intrin.h Georgi Gerganov 2024-01-02 14:38:24 +02:00
  • 83e633c27e
    llama : differentiate the KV dims in the attention (#4657) b1748 postmasters 2024-01-02 03:51:28 -08:00
  • 120a1a5515
    llama : auto download HF models if URL provided gg/hf-auto-dl Georgi Gerganov 2024-01-02 13:19:56 +02:00
  • 32866c5edd
    editorconfig : fix whitespace and indentation #4710 b1747 Georgi Gerganov 2024-01-02 13:28:15 +02:00
  • 5d7002d437
    server : add --override-kv parameter (#4710) b1746 minarchist 2024-01-02 04:38:15 -06:00
  • 26f3071d71
    py : re-enable mmap in convert hf (#4732) Nam D. Tran 2024-01-02 16:23:38 +07:00
  • 775ac8712a
    finetune: fix typo in README.md (#4733) Daniel Bevenius 2024-01-02 10:16:55 +01:00
  • 58ba655af0
    metal : enable shader debugging (cmake option) (#4705) b1743 Georgi Gerganov 2024-01-02 10:57:44 +02:00
  • dd59578a69
    metal : fix mat-vec Q4_K kernel for QK_K == 64 Georgi Gerganov 2023-12-31 13:52:34 +02:00