Commit graph

  • 224d6e9da9 Update llama-run README.md Eric Curtin 2025-01-24 09:16:25 +00:00
  • 6adca19c94 ggml-cpu: Add CPU backend support for KleidiAI library Charles Xu 2025-01-24 10:17:04 +01:00
  • c07e87f38b
    server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364) stduhpf 2025-01-24 09:02:38 +01:00
  • e9bcdf8956
    softmax: remove GGML_SYCL_DEBUG as they don't work in softmax.cpp Akarshan Biswas 2025-01-21 19:34:00 +05:30
  • 1c5611ef1e
    softmax: remove pragma unroll directive Akarshan Biswas 2025-01-21 19:29:39 +05:30
  • 45d6c58dba
    test-backend-ops: Add F16 mask test cases Akarshan Biswas 2025-01-21 09:11:54 +05:30
  • 53847e4a55
    Review update: Use GGML_SYCL_DEBUG Akarshan Biswas 2025-01-20 20:26:02 +05:30
  • b913e83c81
    softmax: review update Akarshan Biswas 2025-01-20 18:02:36 +05:30
  • 82d5c0dd80
    Remove changes not related to softmax Akarshan Biswas 2025-01-20 17:41:55 +05:30
  • 495e7ea48d
    SYCL: SOFTMAX F16 mask support and other fixes Akarshan Biswas 2025-01-16 12:56:22 +05:30
  • 7620e4023c
    Update build.yml jiahao su 2025-01-24 10:31:51 +08:00
  • 36ed106f84 WIP chat handlers Olivier Chafik 2025-01-24 02:31:37 +00:00
  • c815d53e78 update build Xuan Son Nguyen 2025-01-24 00:18:51 +01:00
  • 0f0482fec5 Merge branch 'master' into R1-ui Xuan Son Nguyen 2025-01-24 00:08:24 +01:00
  • 9c4962ff98 build Xuan Son Nguyen 2025-01-24 00:05:55 +01:00
  • 4395d70d39 only filter </think> for assistant msg Xuan Son Nguyen 2025-01-24 00:05:03 +01:00
  • 7b3ac17c7b add jsdoc types Xuan Son Nguyen 2025-01-24 00:04:10 +01:00
  • fa225a423a ui fix, add configs Xuan Son Nguyen 2025-01-23 23:41:54 +01:00
  • cd0aee8981 llama: refactor llama_decode_impl Johannes Gäßler 2025-01-23 23:37:59 +01:00
  • b72d7557f3 Merge branch 'master' into xsn/vision_2 Xuan Son Nguyen 2025-01-23 23:07:20 +01:00
  • b986af80de py: a bit cleaner Xuan Son Nguyen 2025-01-23 23:07:08 +01:00
  • 08ad58dec5 convert_hf_to_gguf: fix typo Steve Grubb 2025-01-23 16:12:10 -05:00
  • 564804b79b
    tests: fix some mul_mat test gaps (#11375) b4539 Jeff Bolz 2025-01-23 14:51:24 -06:00
  • 05f63cc9ee
    Update documentation (#11373) b4538 Eric Curtin 2025-01-23 20:04:31 +00:00
  • 620acbb381 webui: no loading icon if the model isn't generating Stéphane du Hamel 2025-01-23 20:17:34 +01:00
  • 2c35d08a01 tests: fix some mul_mat test gaps Jeff Bolz 2025-01-23 13:14:32 -06:00
  • 7402727de4 webui: format+qol Stéphane du Hamel 2025-01-23 20:11:22 +01:00
  • 1760b92f6f Link ggml/ggml-base libraries to their targets Mason M 2025-01-23 15:10:04 -04:00
  • 817cff1d60 Handle ggml-cpu-* variants Mason M 2025-01-23 14:51:47 -04:00
  • 77c08a8fb8 webui: don't use regex to split cot and response Stéphane du Hamel 2025-01-23 19:31:55 +01:00
  • 2a7d711bc5 webui: refactor split Stéphane du Hamel 2025-01-23 19:31:25 +01:00
  • 09ab04ea14 Add git to msys2 workflow Mason M 2025-01-23 13:01:01 -04:00
  • f203a1ac25 Update documentation Eric Curtin 2025-01-23 16:20:23 +00:00
  • f7fb43cd0b
    Add -ngl (#11372) b4537 Eric Curtin 2025-01-23 16:16:18 +00:00
  • 30626cd326 Add -ngl Eric Curtin 2025-01-23 15:58:55 +00:00
  • c3a654c0fb add SmolVLM Xuan Son Nguyen 2025-01-23 15:51:30 +01:00
  • ea0a85abae Guard against adding to cache variable twice Mason M 2025-01-23 09:43:51 -04:00
  • 5845661640
    server : add more clean up when cancel_tasks is called (#11340) b4536 Xuan Son Nguyen 2025-01-23 13:56:05 +01:00
  • b14e8294b1 Merge branch 'master' into cmake-ggml-find-pkg Mason M 2025-01-23 08:54:43 -04:00
  • 314f26cc8b Expand variables with GGML_ prefix Mason M 2025-01-23 08:48:44 -04:00
  • 25a97ce4cb correct positions for siglip Xuan Son Nguyen 2025-01-23 13:34:13 +01:00
  • 72893b60b6 Update readme to build targets for local docker build JafarAbdi 2025-01-23 12:01:02 +00:00
  • 8586d23c8a minicpm working without uhd Xuan Son Nguyen 2025-01-23 12:14:06 +01:00
  • f211d1dc10
    Treat hf.co/ prefix the same as hf:// (#11350) b4535 Eric Curtin 2025-01-23 10:38:20 +00:00
  • 5b4c12e92e Add build numbers to ggml find-package Mason M 2025-01-23 05:58:56 -04:00
  • 4c16559f1f Treat hf.co/ prefix the same as hf:// Eric Curtin 2025-01-22 13:41:51 +00:00
  • 955a6c2d91
    Vulkan-run-test: fix mmq_wg_denoms (#11343) b4534 amd-dwang 2025-01-23 15:14:28 +08:00
  • 1971adf55e
    vulkan: sort shaders for more deterministic binary (#11315) b4533 Jeff Bolz 2025-01-23 01:07:50 -06:00
  • 5245729e33
    vulkan: fix diag_mask_inf (#11323) b4532 Jeff Bolz 2025-01-23 01:01:17 -06:00
  • ada42ebda0 webui : put DeepSeek R1 CoT in a collapsible <details> element Stéphane du Hamel 2025-01-23 03:26:51 +01:00
  • c0d93dd509 minicpmv works but missing uhd slices Xuan Son Nguyen 2025-01-22 22:42:00 +01:00
  • ba489b4743 wip minicpmv Xuan Son Nguyen 2025-01-22 22:26:38 +01:00
  • 530fd0cb95 Add initial ggml cmake package Mason M 2025-01-22 17:03:53 -04:00
  • 46415d7a51 Fix lazy trigger handling Olivier Chafik 2025-01-22 19:08:19 +00:00
  • c2d836f9d0 Update real tool call tests (use less models) Olivier Chafik 2025-01-22 18:47:32 +00:00
  • a46de6a03a Add grammar options + rename builder to common_grammar_builder Olivier Chafik 2025-01-22 18:36:04 +00:00
  • cdfa8b9d4f Update chat-template.hpp Olivier Chafik 2025-01-22 18:35:24 +00:00
  • 5e358ade59 fix msg init warning Olivier Chafik 2025-01-22 18:35:20 +00:00
  • 6152129d05
    main : update README documentation for batch size (#11353) Diego Devesa 2025-01-22 19:22:20 +01:00
  • 16d3df7ab0
    readme : add plugin links (#11355) Georgi Gerganov 2025-01-22 19:44:26 +02:00
  • c2f5b602d1
    readme : add plugin links Georgi Gerganov 2025-01-22 19:42:57 +02:00
  • 12c2bdf2de
    server : fix draft context not being released (#11354) b4529 Diego Devesa 2025-01-22 17:44:40 +01:00
  • f0231a586e fix common_chat_msg invocations Olivier Chafik 2025-01-22 16:25:51 +00:00
  • 0410e03ced server : fix draft context not being released slaren 2025-01-22 17:24:49 +01:00
  • d186721e41 Merge remote-tracking branch 'origin/master' into tool-call Olivier Chafik 2025-01-22 16:22:16 +00:00
  • 9006401b2a
    minor Diego Devesa 2025-01-22 17:21:46 +01:00
  • b00f23ef78
    fix formatting Diego Devesa 2025-01-22 17:20:38 +01:00
  • c1ea8ec6c9
    main : update README documentation for batch size Diego Devesa 2025-01-22 17:19:11 +01:00
  • c64d2becb1
    minja: sync at 0f5f7f2b37 (#11352) b4528 Olivier Chafik 2025-01-22 16:16:27 +00:00
  • 32c6074eaf minja: sync at 0f5f7f2b37 Olivier Chafik 2025-01-22 15:17:11 +00:00
  • 9ccc62b3c9 Sync minja after https://github.com/google/minja/pull/29 Olivier Chafik 2025-01-22 14:32:18 +00:00
  • 93864cda8a llama : experimental DeepSeek2 MLA implementation that caches latent kv representations Stanisław Szymczyk 2025-01-22 15:19:34 +01:00
  • 9716c7bff7 temporary refactor llama_vision_graph_builder Xuan Son Nguyen 2025-01-22 14:40:35 +01:00
  • a9db9b0048 Implement --no-byteswap argument to disable byteswapping on big endian platform Aleksei Nikiforov 2025-01-21 12:16:32 +01:00
  • f4217a81fc Disable mmap on s390x in llama-quant too Aleksei Nikiforov 2025-01-21 12:15:34 +01:00
  • 3c22daa66e Update preprocessor directives according to guidelines Aleksei Nikiforov 2025-01-15 11:59:30 +01:00
  • cfb2cd1ee9 Update alignment of byteswap function type definition Aleksei Nikiforov 2025-01-15 11:54:37 +01:00
  • 1d06f0f115 Make assert messages unique Aleksei Nikiforov 2025-01-15 11:53:02 +01:00
  • a9402ba2b6 Move conversion functions to common header Aleksei Nikiforov 2025-01-15 11:48:26 +01:00
  • 1d01548627 Fix unused variable warnings Aleksei Nikiforov 2025-01-14 11:11:46 +01:00
  • fa8fc317f3 Fix unicode flags conversion from and to uint16_t Aleksei Nikiforov 2025-01-10 18:19:47 +01:00
  • 27c19c4eb7 Implement write byteswap for tests Aleksei Nikiforov 2025-01-10 12:19:26 +01:00
  • 088f9a6c32 Get rid of additional memcpy calls Aleksei Nikiforov 2025-01-10 11:16:41 +01:00
  • 21f7ca2fb3 Disable mmap on s390x Aleksei Nikiforov 2025-01-09 14:59:22 +01:00
  • 0682209c66 Implement byteswap for tq1_0 and tq2_0 Aleksei Nikiforov 2025-01-09 14:50:32 +01:00
  • a8757fec66 Implement most of remaining byteswap functions Aleksei Nikiforov 2024-10-24 11:07:44 +02:00
  • 9a4b0df5e8 Load little-endian models on s390x Aleksei Nikiforov 2024-10-23 12:21:27 +02:00
  • d1bb943c10 gguf_convert_endian.py: implement byteswapping for q4_k and q6_k Aleksei Nikiforov 2025-01-22 14:04:11 +01:00
  • 32daa38333 Merge branch 'master' into xsn/vision_2 Xuan Son Nguyen 2025-01-22 13:28:31 +01:00
  • 0244e79763 fix std::remove_if Xuan Son Nguyen 2025-01-22 13:05:57 +01:00
  • b9e517106f std::remove_if Xuan Son Nguyen 2025-01-22 12:57:00 +01:00
  • 96f4053934
    Adding logprobs to /v1/completions (#11344) b4527 Jiří Podivín 2025-01-22 12:51:32 +01:00
  • 30d33d9f68 Update test_chat_completion.py Olivier Chafik 2025-01-22 11:42:36 +00:00
  • c6a22edc57 Greedy sampling in tool call tests Olivier Chafik 2025-01-22 11:41:43 +00:00
  • cce1166b37 Update tool-call.cpp Olivier Chafik 2025-01-22 11:25:26 +00:00
  • a4226365bf nits Olivier Chafik 2025-01-22 11:23:37 +00:00
  • 63387c6dca smaller diff Olivier Chafik 2025-01-22 11:14:25 +00:00
  • 82b6e9a5c3 merge common_tool_calls into common_chat_msg Olivier Chafik 2025-01-22 11:05:05 +00:00
  • 01b345be0f Merge remote-tracking branch 'origin/master' into tool-call Olivier Chafik 2025-01-22 10:02:23 +00:00
  • a94f3b2727
    common: utils to split / join / repeat strings (from json converter) (#11342) b4526 Olivier Chafik 2025-01-22 09:51:44 +00:00