Commit graph

  • d2c004fd27 metal : fix from ptr buffer name slaren 2024-11-06 01:10:53 +01:00
  • 30c6abb73f add GET method for CORS Xuan Son Nguyen 2024-11-05 23:51:32 +01:00
  • 29f6d82b19 make CORS preflight more explicit Xuan Son Nguyen 2024-11-05 23:50:16 +01:00
  • 58d0588fbd better error handling Xuan Son Nguyen 2024-11-05 23:35:47 +01:00
  • 9096e5ed5e docs: how to use legacy ui Xuan Son Nguyen 2024-11-05 22:21:47 +01:00
  • 38c6fa3b8f enable lib to be exported in nexa SDK Zack Zhiyuan Li 2024-11-05 20:56:33 +00:00
  • 712ee17aa0 small fixes Xuan Son Nguyen 2024-11-05 21:45:22 +01:00
  • b535cd941e
    Merge pull request #9 from NexaAI/master Zack Li 2024-11-05 10:59:26 -08:00
  • 6ea3315334 regenerate, edit, copy buttons Xuan Son Nguyen 2024-11-05 19:52:00 +01:00
  • 8c29230848
    Merge branch 'ggerganov:master' into avx_opt Eve 2024-11-05 17:24:14 +00:00
  • 654ec7ce0d fix tests Xuan Son Nguyen 2024-11-05 17:10:22 +01:00
  • 255a3205c0 save theme preferences Xuan Son Nguyen 2024-11-05 17:09:57 +01:00
  • 521be4c31a fix bg-base classes Xuan Son Nguyen 2024-11-05 16:57:48 +01:00
  • 9719450232 add conversation history, save to localStorage Xuan Son Nguyen 2024-11-05 16:39:24 +01:00
  • 7f3daf09f3 basic markdown support Xuan Son Nguyen 2024-11-05 14:24:25 +01:00
  • b8deef0ec0
    llama : add <|tool_call|> formatting to Granite template (#10177) b4034 Gabe Goodhart 2024-11-05 05:23:04 -07:00
  • 191887b771 embed deps into binary Xuan Son Nguyen 2024-11-05 13:19:16 +01:00
  • 32e0862a7e [ggml-aarch64] use intrinsics for iq4_nl_4x4 gemv&gemm Shupei Fan 2024-10-25 17:54:18 +08:00
  • 75e8cbb2ab [ggml] add iq4_nl_4x4 format Shupei Fan 2024-10-25 17:28:59 +08:00
  • 561b7f2364 [ggml-aarch64] use intrinsics in q4_0_4_4 gemm Shupei Fan 2024-10-25 15:18:59 +08:00
  • 102299e30c [ggml-aarch64] use intrinsics in q4_0_4_4 gemv Shupei Fan 2024-10-25 14:07:17 +08:00
  • 55a86969b8 [rebase] Fix build error. Qingtao Li 2024-11-05 18:40:31 +08:00
  • 9c13f952f8
    metal : minor [no ci] Georgi Gerganov 2024-11-05 09:59:10 +02:00
  • 73f378df82
    metal : float-correctness Georgi Gerganov 2024-11-05 09:24:06 +02:00
  • 810f06bd5b revert isotr0py 2024-11-05 15:15:19 +08:00
  • f84d25dd8f Limit enable_t_mac to take effect on INT_N only. Qingtao Li 2024-10-30 16:00:38 +08:00
  • 080d2ecc56 Add run_pipeline option of rechunk. Qingtao Li 2024-10-30 10:46:16 +08:00
  • e86c69df8b [Feat] Support TQ1_0 and TQ2_0 with T-MAC. Qingtao Li 2024-10-23 12:28:25 +08:00
  • b266290700 [llama.cpp] update convert_hf_to_gguf.py kalineid 2024-10-14 17:31:48 +08:00
  • 6bb4acae7c Remove unused code. Qingtao Li 2024-10-12 13:27:30 +08:00
  • f64c768055 Restore n_tensor check. Qingtao Li 2024-10-12 12:48:45 +08:00
  • dfac0c4b3e Remove uint8 branch in gguf_writer. Qingtao Li 2024-10-11 14:43:14 +08:00
  • f673699460 Remove is_lora in convert_hf_to_gguf, which is removed in master. Qingtao Li 2024-10-11 14:41:31 +08:00
  • 94502e44a7 Fix a Cmake variable fault. Qingtao Li 2024-10-11 14:40:58 +08:00
  • 351e345c6f Integrate T-MAC kernels Qingtao Li 2024-10-10 16:35:37 +08:00
  • 983b4625ef
    Merge pull request #8 from NexaAI/weili/master-release Zack Li 2024-11-04 22:39:36 -08:00
  • d805404e2d
    metal : fix shared memory calc + reduce smem + comments Georgi Gerganov 2024-11-05 08:13:48 +02:00
  • 07ef1a8a04 make mode compatiable isotr0py 2024-11-05 14:01:38 +08:00
  • d6c0627d31
    Merge pull request #7 from NexaAI/master Zack Li 2024-11-04 22:00:30 -08:00
  • 91b3cafbb5
    Merge pull request #6 from NexaAI/master-release-audio-lm Zack Li 2024-11-04 21:59:26 -08:00
  • 6a13722ca5 code format isotr0py 2024-11-05 12:42:19 +08:00
  • 31b6beabec
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-05 09:21:28 +05:00
  • 98e070c120
    Merge branch 'ggerganov:master' into master Zhiyuan Li 2024-11-05 13:35:16 +11:00
  • 623db3b06f update lint Zhiyuan Li 2024-11-05 13:31:41 +11:00
  • a05f148e84 fix(granite chat): Add the <|tool_call|> formatting to the granite template Gabe Goodhart 2024-11-04 16:57:48 -07:00
  • 05853eb861 remove C++20 syntax Zack Zhiyuan Li 2024-11-04 23:03:49 +00:00
  • d42e0371f8 remove C++20 style Zack Zhiyuan Li 2024-11-04 22:50:33 +00:00
  • fdf0c07df2 move old files to legacy folder Xuan Son Nguyen 2024-11-04 23:38:19 +01:00
  • 29abe69ef9
    Merge bd4d1221d9 into a9e8a9a030 Neo Zhang Jianyu 2024-11-04 22:23:44 +00:00
  • ec6366f7d0 Merge https://github.com/ggerganov/llama.cpp into avx_opt Eve 2024-11-04 17:17:22 -05:00
  • a9e8a9a030
    ggml : fix arch check in bf16_to_fp32 (#10164) b4033 Diego Devesa 2024-11-04 23:17:01 +01:00
  • 120d05b7de server : simple chat UI with vuejs and daisyui Xuan Son Nguyen 2024-11-04 23:09:57 +01:00
  • 3407364776
    Q6_K AVX improvements (#10118) b4032 Eve 2024-11-04 22:06:31 +00:00
  • f85336e263
    have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86 Eve 2024-11-04 21:31:02 +00:00
  • b4e9c5998d convert : fix flake8 lint Francis Couture-Harpin 2024-11-04 15:26:15 -05:00
  • 8d8f065743 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-11-04 14:30:18 -05:00
  • b0e9b96e5d rebase to master Eve 2024-11-04 14:25:06 -05:00
  • d5a409e57f
    ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
  • 4ec3e4a528
    Merge branch 'ggerganov:master' into q6_k Eve 2024-11-04 19:05:05 +00:00
  • 3bc7103d2e ggml : avoid multiply by D in GGML_OP_SSM_SCAN Francis Couture-Harpin 2024-11-04 11:36:37 -05:00
  • 7418d9980e ggml : fix gelu tables initialization slaren 2024-11-04 18:56:28 +01:00
  • ad6fd8de25 revert unnecessary change isotr0py 2024-11-05 01:48:31 +08:00
  • a92c920eec revert unnecessary change isotr0py 2024-11-05 01:03:38 +08:00
  • 401558b7ba
    ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
  • af46dc2445
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-04 21:26:36 +05:00
  • e264c35fc9 remove some codes Zhiyuan Li 2024-11-05 03:03:38 +11:00
  • acb1b9d22c
    Merge branch 'ggerganov:master' into master Zhiyuan Li 2024-11-05 02:58:54 +11:00
  • 4574795cd5 use recommended way GGML_TENSOR_LOCALS Zhiyuan Li 2024-11-05 02:57:08 +11:00
  • 1e129611b1
    metal : clean-up (cont) Georgi Gerganov 2024-11-04 17:46:30 +02:00
  • 4693b4611f rewrite to be more inline with the common pattern for distributing threads Zhiyuan Li 2024-11-05 02:49:22 +11:00
  • a749ba7701 put the declaration outside the loop Zhiyuan Li 2024-11-05 02:45:40 +11:00
  • 6a1e977e34
    Update ggml/src/ggml-sycl/concat.cpp Zhiyuan Li 2024-11-05 02:41:55 +11:00
  • 35a1a2dfa9 move element-wise functions outside Zhiyuan Li 2024-11-05 02:40:11 +11:00
  • 9e0ecfb697
    server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
  • 6a066b9978
    fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
  • 72e4432577 add appropriate asserts Zhiyuan Li 2024-11-05 01:20:52 +11:00
  • b81602477b
    Update ggml/src/ggml-cpu.c Zhiyuan Li 2024-11-05 01:14:27 +11:00
  • a878502f43 fix define error Zhiyuan Li 2024-11-05 01:07:33 +11:00
  • 0057e193cc ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment slaren 2024-11-04 15:05:26 +01:00
  • 81cb301224 update the function to use appropriate types Zhiyuan Li 2024-11-05 00:55:59 +11:00
  • bb0685fad5
    Update ggml/src/ggml-sycl/wkv6.cpp Zhiyuan Li 2024-11-05 00:42:37 +11:00
  • 8c7b4ec22a
    Update ggml/src/ggml-sycl/outprod.cpp Zhiyuan Li 2024-11-05 00:42:31 +11:00
  • a5866b17c2 fix tests Xuan Son Nguyen 2024-11-04 14:36:04 +01:00
  • 04b464ef93 fix build break on arm64 linux Sunita Nadampalli 2024-11-04 13:05:14 +00:00
  • 9ea34a78cb fix: add defualt Zhiyuan Li 2024-11-04 23:28:26 +11:00
  • 1ba978549d ggml : fix arch check in bf16_to_fp32 slaren 2024-11-04 13:24:47 +01:00
  • ea02c753eb
    cuda : clear error after changing peer access (#10153) b4027 Diego Devesa 2024-11-04 13:10:23 +01:00
  • dd0d9ed102
    metal : clean-up Georgi Gerganov 2024-11-04 14:04:11 +02:00
  • 13b87f212e
    metal : fix support check Georgi Gerganov 2024-11-04 13:40:52 +02:00
  • e9565ccf9a
    metal : add quantized FA (non-vec) support Georgi Gerganov 2024-11-04 09:10:49 +02:00
  • 6c484f35b0
    metal : add quantized FA (vec) support Georgi Gerganov 2024-11-03 10:27:48 +02:00
  • 05697f670b
    metal : simplify f16 and f32 dequant kernels (#0) b4026 Georgi Gerganov 2024-11-04 13:49:34 +02:00
  • f8e58135cf
    metal : move dequantize templates to beginning of MSL source (#0) b4025 Georgi Gerganov 2024-11-04 13:43:32 +02:00
  • 61c665b7f1 fix: update changes to upstream Zhiyuan Li 2024-11-04 22:17:12 +11:00
  • 5f792141c5
    Merge branch 'ggerganov:master' into master Zhiyuan Li 2024-11-04 22:12:31 +11:00
  • 153251f761 sync : ggml Georgi Gerganov 2024-11-04 10:33:37 +02:00
  • eb5711c496 cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
  • 8050d021ab metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
  • 89812b157a ggml : move CPU backend to a separate file (#10144) Diego Devesa 2024-11-03 19:34:08 +01:00
  • b18963085b metal : minor fixup in FA kernel (#10143) Georgi Gerganov 2024-11-03 15:18:40 +02:00