Commit graph

  • a151674c3e CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
  • 40cc3f2fde Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call ochafik 2025-01-29 16:45:59 +00:00
  • 6f29bcbdae use suggested idea instead of my overly verbose way Nigel Bosch 2025-01-29 10:29:03 -06:00
  • b407a4e9a7 return only "prompt" field in /apply-template Nigel Bosch 2025-01-29 10:25:11 -06:00
  • 384f54a135 Split bulk of tool call tests to slow lane Olivier Chafik 2025-01-29 16:13:45 +00:00
  • 923c805d04 rm dead code + nits Olivier Chafik 2025-01-29 15:57:58 +00:00
  • e51c47b401
    server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
  • 2711d0215f
    vulkan: Catch pipeline creation failure and print an error message (#11436) b4586 Jeff Bolz 2025-01-29 09:26:50 -06:00
  • ee640dd1b6
    Merge 5d7bb10ee5 into f0d4b29edf 蕭澧邦 2025-01-29 10:21:34 -05:00
  • 7db8ee9b50
    Merge 4fc8673d09 into f0d4b29edf Diego Devesa 2025-01-29 10:21:21 -05:00
  • 7c67a5e96d
    Merge 9bb2f9b63d into f0d4b29edf Robert 2025-01-29 10:21:12 -05:00
  • 710e2f9dc5
    Merge 883dc22d44 into f0d4b29edf Robert 2025-01-29 10:21:11 -05:00
  • a6ace0a0c4
    Merge f9e9792f1d into f0d4b29edf KenForever 2025-01-29 10:21:05 -05:00
  • c53588989a vulkan: fix pipeline creation logging Jeff Bolz 2025-01-29 08:51:58 -06:00
  • 453d204d8a add /apply-template documentation Nigel Bosch 2025-01-29 08:28:52 -06:00
  • 10448bf934 remove unnecessary line Nigel Bosch 2025-01-29 08:28:33 -06:00
  • 6662e28830 squash! server : update auto gen files comments [no ci] Daniel Bevenius 2025-01-29 14:04:26 +01:00
  • c30e34cdba
    Merge branch 'master' into gg/llama-kv-cache Georgi Gerganov 2025-01-29 15:01:26 +02:00
  • 6d21abfc20 squash! server : update auto gen files comments [no ci] Daniel Bevenius 2025-01-29 13:58:13 +01:00
  • 918885697e
    llama : resolve rwkv conflict Georgi Gerganov 2025-01-29 14:45:04 +02:00
  • ad622ca97e Disabled Auto-Format dhruvanand24 2025-01-29 17:55:41 +05:30
  • 2eead75fba server : update auto gen files comments [no ci] Daniel Bevenius 2025-01-29 12:41:11 +01:00
  • f0d4b29edf
    Parse https://ollama.com/library/ syntax (#11480) b4585 Eric Curtin 2025-01-29 12:23:10 +01:00
  • 234dd597ac Parse https://ollama.com/library/ syntax Eric Curtin 2025-01-29 09:28:20 +00:00
  • 815857791d
    sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
  • 1a0e87d291
    ggml : add option to not print stack on abort (ggml/1081) b4583 William Tambellini 2025-01-23 11:59:08 -08:00
  • d2e518e9b4
    ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
  • b636228c0a
    embedding : enable --no-warmup option (#11475) b4581 Daniel Bevenius 2025-01-29 09:38:54 +01:00
  • 971f2f0d04 Readme Update: Added IRIS under UI section dhruvanand24 2025-01-29 12:23:15 +05:30
  • ef8ee22bd1 embedding : enable --no-warmup option Daniel Bevenius 2025-01-29 06:08:08 +01:00
  • d875c8e919
    Merge branch 'ggerganov:master' into master Jianlin Shi 2025-01-28 21:40:24 -07:00
  • 325afb370a
    llama: fix missing k_cache store for rwkv6qwen2 (#11445) b4580 Molly Sophia 2025-01-29 12:07:21 +08:00
  • 4a1e8e9f91 refactor test-chat-handler ochafik 2025-01-29 04:00:01 +00:00
  • 18d5a1b2ca nits ochafik 2025-01-29 02:15:34 +00:00
  • 47be437356 Text fireworks v2 template ochafik 2025-01-29 01:51:07 +00:00
  • 4cdbb8c53f Revert breaking minja change ochafik 2025-01-29 01:50:49 +00:00
  • 64263910d8 Fix firefunction w/ jinja: requires two variables, use the chat handlers everywhere templates are used ochafik 2025-01-29 01:15:44 +00:00
  • a864590aef add /apply-template endpoint to server Nigel Bosch 2025-01-28 18:59:09 -06:00
  • a7054a11a9 format clip.cpp liyuhang 2025-01-29 08:33:58 +08:00
  • 96bde6fd51 format liyuhang 2025-01-29 08:26:13 +08:00
  • d603d067d5 sync: minja ochafik 2025-01-28 23:49:04 +00:00
  • 4f257550a2 minja: sync on https://github.com/google/minja/pull/33 ochafik 2025-01-28 23:46:51 +00:00
  • 794fe23f29
    cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
  • cf8cc856d7
    server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
  • d0c08040b6
    ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
  • fce9cebf42 vulkan : jammy --> noble Xuan Son Nguyen 2025-01-28 23:44:36 +01:00
  • e24e4bb627 Fixed wrong function name in llamacpp server unit test: Daqi Pei 2025-01-28 15:23:25 -07:00
  • 02b490f92a vulkan: ubuntu 24 Xuan Son Nguyen 2025-01-28 23:18:53 +01:00
  • be5ef7963f
    HIP: Supress transformation warning in softmax.cu b4576 uvos 2025-01-28 23:06:32 +01:00
  • cb0037d8ed failed, trying ubuntu 22 Xuan Son Nguyen 2025-01-28 22:58:42 +01:00
  • 0eeb9fad0e ci : fix build CPU arm64 Xuan Son Nguyen 2025-01-28 22:53:31 +01:00
  • 570ade373d Hip: Supress transformation warning in softmax.cu, loops with bounds not known at compiletime indeed can not be unrolled. uvos 2025-01-27 23:25:55 +01:00
  • 58972e66a1
    Merge 60e6e2af36 into cae9fb4361 Justine Tunney 2025-01-28 13:06:19 -06:00
  • 8a887decd3 llama : prompt processing optimizations in DeepSeek V2 Stanisław Szymczyk 2025-01-28 19:26:54 +01:00
  • 6db379dc81
    Merge 996dc4cdd2 into cae9fb4361 Dhruv Anand 2025-01-29 01:50:00 +08:00
  • cae9fb4361
    HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) b4575 Nikita Sarychev 2025-01-28 07:42:20 -08:00
  • cad1448ac7 Disable test-chat-handler on win32 like the other grammar-related tests ochafik 2025-01-28 14:46:37 +00:00
  • 7fee2889e6
    Add github protocol pulling and http:// (#11465) b4574 Eric Curtin 2025-01-28 15:45:41 +01:00
  • cd63ba435e beef up test-chat-handler w/ delta expectations ochafik 2025-01-28 14:40:23 +00:00
  • bdf0846fba Windows ggml find_package fix in llama-config.cmake.in file Emreerdog 2025-01-28 17:21:06 +03:00
  • d7d1eccacc
    docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
  • 4bf3119d61
    cmake : don't fail on GGML_CPU=OFF (#11457) b4572 someone13574 2025-01-28 09:15:34 -05:00
  • 89beb75960 Add github protocol pulling and http:// Eric Curtin 2025-01-28 11:49:53 +00:00
  • 05a6f739a7
    Merge 9a2380ec32 into f643120bad Herman Semenoff 2025-01-28 05:46:26 -06:00
  • ba10b47ae5 Add missing link dep for windows build ochafik 2025-01-28 10:52:14 +00:00
  • b5a74d1a24 Simplify parser defs (incremental parsing for streaming will need more thinking) ochafik 2025-01-28 10:48:11 +00:00
  • f643120bad
    docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
  • ec4aeaf18a Revert "Allow tool use + streaming" ochafik 2025-01-28 10:29:17 +00:00
  • 8ff0991eed convert : make lint happy Stanisław Szymczyk 2025-01-28 11:02:52 +01:00
  • 6e84b0ab8e
    SYCL : SOFTMAX F16 mask support and other fixes (#11261) b4570 Akarshan Biswas 2025-01-28 15:26:58 +05:30
  • 62d45a552f Disable slow tests where appropriate, + nits ochafik 2025-01-28 09:47:41 +00:00
  • d274ffcc95 build: Add missing optional include for gcc ochafik 2025-01-28 09:29:31 +00:00
  • 0a51e514f6 Update test-chat-handler.cpp ochafik 2025-01-28 09:24:35 +00:00
  • 2f99236f77 Tool-call: do last partial parse upon limit stop Olivier Chafik 2025-01-28 09:23:19 +00:00
  • 6d5682909f Cleanup dead code in llama_3_1 tool call code Olivier Chafik 2025-01-28 09:22:26 +00:00
  • 62717145f7 Allow tool use + streaming Olivier Chafik 2025-01-28 09:22:03 +00:00
  • 2b8525d5c8
    Handle missing model in CLI parameters for llama-run (#11399) b4569 Michael Engel 2025-01-28 09:32:40 +01:00
  • 5fde721592 Merge remote-tracking branch 'upstream/master' into Remove_obsolete_HIP_workaround Nikita Sarychev 2025-01-27 21:39:20 -08:00
  • 61d341f818 Address code review feedback Nikita Sarychev 2025-01-27 21:16:54 -08:00
  • 5be217a828 tests: increase timeout for Vulkan llvmpipe backend Rémy O 2025-01-28 06:10:48 +01:00
  • aa17d321b3 vulkan: avoid using workgroup size before it is referenced Rémy O 2025-01-26 14:50:47 +01:00
  • ef9efc9ed3 Fix Llama 3.1 (incl. constrained builtin tools e.g. <|python_tag|>foo.call(arg=vallue)) ochafik 2025-01-28 01:04:06 +00:00
  • 2d607f1a68 Update test-chat-handler.cpp ochafik 2025-01-27 23:29:28 +00:00
  • b565ab2ab1 comment out broken tests in test_tool_call.py ochafik 2025-01-27 23:02:15 +00:00
  • cafea60922 Split e2e test_tool_call from test_chat_completion ochafik 2025-01-27 22:46:33 +00:00
  • 90effb845f Pass grammar laziness all the way down to sampler (need to print special trigger tokens e.g. for Nemo even w/ tool_choice=required) ochafik 2025-01-27 22:46:17 +00:00
  • ad229783c5 updated tool call example to be less ambiguous (deepseek likes to rant about hello world) ochafik 2025-01-27 22:44:44 +00:00
  • cc4601599b cmake: don't fail on GGML_CPU=OFF Owen Law 2025-01-27 16:46:24 -05:00
  • fa065eb095 Rehabilitate test_format_detection ochafik 2025-01-27 20:46:03 +00:00
  • add9124115 fix test-chat-handler grammar tests ochafik 2025-01-27 20:13:09 +00:00
  • a4417ddda9
    Add new hf protocol for ollama (#11449) b4568 Eric Curtin 2025-01-27 19:36:10 +01:00
  • 118f799ae4 DeepSeek-R1: implement grammar constraints ochafik 2025-01-27 17:51:13 +00:00
  • 055e01b139 try CI fix Johannes Gäßler 2025-01-27 18:33:34 +01:00
  • 92ac336dfa Prepare DeepSeek-R1-Distill-Llama-8B support ochafik 2025-01-27 17:26:43 +00:00
  • c25557362a llama/ggml: add LLM training support Johannes Gäßler 2024-11-17 14:58:51 +01:00
  • 09971e626c Update test_chat_completion.py ochafik 2025-01-27 15:43:03 +00:00
  • 67709552ad tool-call: compact json output to cap # tokens generated ochafik 2025-01-27 15:42:27 +00:00
  • 57f40e366b tool-call: fix lazy grammar & mixed content + tool calls parsing ochafik 2025-01-27 15:41:54 +00:00
  • c00120a2bc Add new hf protocol for ollama Eric Curtin 2025-01-24 19:26:19 +00:00
  • 9517aee23c rm redundant clamp Xuan Son Nguyen 2025-01-27 16:36:20 +01:00