Commit graph

  • 521242ffb8
    minor fix sasha0552 2024-11-01 09:29:08 +00:00
  • 4bc4056230
    ggml : remove ggml_scratch Georgi Gerganov 2024-11-01 11:21:06 +02:00
  • 9ef8cb5a3e Removed custom reset MaggotHATE 2024-11-01 14:15:05 +05:00
  • aa458d1611
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-01 13:55:32 +05:00
  • 8d32422d39
    Fix smart selection of available slot sasha0552 2024-11-01 08:55:28 +00:00
  • 815fe72adc
    sync : ggml b4005 Georgi Gerganov 2024-11-01 10:28:24 +02:00
  • f221d56220
    ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
  • 2b7be22977
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-11-01 09:30:39 +05:00
  • e597e50794
    build: fix build error in Windows env with OneAPI setup (#10107) b4003 Zhenwei Jin 2024-11-01 11:09:59 +08:00
  • 48e6e4c28d llama : use smart pointers for ggml resources slaren 2024-11-01 03:36:35 +01:00
  • d84c372bbf
    Update ggml-quants.c Eve 2024-11-01 01:48:56 +00:00
  • ed6f845aff
    Merge branch 'ggerganov:master' into q6_k Eve 2024-11-01 01:42:10 +00:00
  • 5b367158c7 use shift Eve 2024-10-31 20:16:25 -04:00
  • 85679d37f3
    llama : improve output buffer type selection (#10098) b4002 Diego Devesa 2024-11-01 00:49:53 +01:00
  • 1e9f94994e
    quantize : fix --keep-split (#10114) b4001 Diego Devesa 2024-11-01 00:45:34 +01:00
  • 35255d64f6 handle -32 offset separately. bsums exists for a reason! Eve 2024-10-31 18:25:25 -04:00
  • bbd518e7de
    Merge bb668b608e into c02e5ab2a6 Justine Tunney 2024-10-31 19:33:07 -04:00
  • 77c86271c4 handle case where base_model_name_or_path is invalid Xuan Son Nguyen 2024-11-01 00:32:42 +01:00
  • c02e5ab2a6
    llama : fix buffer checks for mamba and rwk (#10111) b4000 Diego Devesa 2024-10-31 22:54:23 +01:00
  • 802687a4d8 Separate LlamaKitMacros; Add @Tool Macro; Fix extern LLAMA settings Jason Flax 2024-10-31 17:44:46 -04:00
  • de44a080c3 quantize : fix --keep-split slaren 2024-10-31 21:56:09 +01:00
  • 0ab2e7f424
    Merge 645eb3c6ad into ab3d71f97f CrispStrobe 2024-10-31 23:37:55 +03:00
  • ab3d71f97f
    loader: refactor tensor weights storage (#9935) b3999 Zhenwei Jin 2024-11-01 02:50:39 +08:00
  • 9a99293174 use sorted map, sort weights by layer slaren 2024-10-31 19:34:15 +01:00
  • 13eba91a32 minor style changes slaren 2024-10-31 19:23:44 +01:00
  • 899351e066 disable sched SET_CAUSE slaren 2024-10-31 17:55:32 +01:00
  • d1faeca19d cuda : fix supports_op for norm slaren 2024-10-31 17:49:51 +01:00
  • b135927ca4 llama : fix missing worst case flag during reserve slaren 2024-10-31 17:49:15 +01:00
  • bc52c0a4f0 agent: add missing tool name in response! ochafik 2024-10-31 15:01:17 +00:00
  • 479c1520b1 tool-call: fix qwen template test ochafik 2024-10-31 14:49:59 +00:00
  • fe967b61a1 Update README.md ochafik 2024-10-31 14:37:55 +00:00
  • f5f74751b9 nits ochafik 2024-10-31 14:28:52 +00:00
  • c4a8050120 Update README.md ochafik 2024-10-31 14:27:40 +00:00
  • bcef54e10a improve performance isotr0py 2024-10-31 22:13:15 +08:00
  • dec6ce2535 llama : fix buffer checks for mamba and rwk slaren 2024-10-31 15:12:43 +01:00
  • 9477c54676 tool-call: functionary-small-v3.2 test now green ochafik 2024-10-31 14:11:34 +00:00
  • 19dbc442c6 lint Xuan Son Nguyen 2024-10-31 14:54:08 +01:00
  • d3998ab8b8 Merge branch 'master' of https://github.com/ggerganov/llama.cpp Jason Flax 2024-10-31 09:54:00 -04:00
  • b35aa4ae1c tool-call: add LLAMA_UPDATE_GOLDENS env for test-chat-template ochafik 2024-10-31 13:53:33 +00:00
  • c773516d57 tool-call: don't use -fa w/ Mistral-Nemo (hard crashes?) ochafik 2024-10-31 13:53:11 +00:00
  • f5b7825595 tool-call: code_interpreter & system + tool call support for all jinja templates! ochafik 2024-10-31 13:52:46 +00:00
  • 1aeb72ba58 convert-lora : make --base optional Xuan Son Nguyen 2024-10-31 14:49:37 +01:00
  • c395d4804f tool-call: behaviour-based detection of template features ochafik 2024-10-31 13:45:10 +00:00
  • 0a683e8088
    server : include scheme when printing URL (#10106) b3998 Kevin Gibbons 2024-10-31 06:02:35 -07:00
  • 5ce2dbcf38 refactor gguf reader isotr0py 2024-10-31 20:19:24 +08:00
  • dea5e86051
    ggml : check tensor name lengths in gguf files (#10100) b3997 Diego Devesa 2024-10-31 11:40:59 +01:00
  • 2cb96516d3 build: fix build error in Windows env with OneAPI setup zhenweijin 2024-10-31 17:19:10 +08:00
  • 1329c0a75e
    kompute: add mul_mat_q4_k shader (#10097) b3996 Sergio López 2024-10-31 10:09:52 +01:00
  • f742e3c0d5 loader: refactor tensor weights storage zhenweijin 2024-10-18 15:31:10 +08:00
  • 0320392968
    server : include scheme when printing URL Kevin Gibbons 2024-10-30 21:51:10 -07:00
  • e8d9d711f6 Update tool_call.feature ochafik 2024-10-31 04:50:38 +00:00
  • 7d9c90f46b tool-call: nemo tweak (accept raw sql again) ochafik 2024-10-31 04:39:40 +00:00
  • 542853b34b tool-call: greedy sampling in server tests + tweak prompt ochafik 2024-10-31 04:38:22 +00:00
  • 5144fd90bf
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-31 09:26:25 +05:00
  • be9de3ed8a Update llama-sampling.cpp ochafik 2024-10-31 03:58:15 +00:00
  • a420e4cd44 optimize bit fiddling Eve 2024-10-30 23:38:00 -04:00
  • 243fd5dd37 Update test-cli.cpp ochafik 2024-10-31 02:54:06 +00:00
  • 0b75215f9d should be theoretically faster Eve 2024-10-30 22:11:32 -04:00
  • ca512cc934 test-cli: add llama-cli as order-only prerequisite in Makefile ochafik 2024-10-31 02:03:35 +00:00
  • 61655b9cdd Merge remote-tracking branch 'origin/master' into tool-call ochafik 2024-10-31 01:45:07 +00:00
  • e3a34321c4 better subtract method Eve 2024-10-30 21:43:22 -04:00
  • 499e9f2f49 q6_k instruction reordering attempt Eve 2024-10-30 21:42:22 -04:00
  • 82d5e91a6f test-cli: greedy sampling + print exception messages ochafik 2024-10-31 00:10:44 +00:00
  • 56f9d4b52a Init LlamaObjC Commit Jason Flax 2024-10-30 19:15:23 -04:00
  • 030fda09a9 main: add test-cli + ensure output still logged w/ --log-disable ochafik 2024-10-30 23:03:42 +00:00
  • afc4a7de65 llama : enable flash attn automatically when supported sl/auto-flash-attn slaren 2024-10-30 23:30:04 +01:00
  • 77b6c12325 ggml : check tensor name lengths in gguf files slaren 2024-10-30 22:44:49 +01:00
  • e4d5449638 tool-calls: test Qwen2.5-7B-Instruct-Q4_K_M.gguf Olivier Chafik 2024-10-30 21:40:15 +00:00
  • e3e1e0c96f llama : improve output buffer type selection slaren 2024-10-30 21:09:49 +01:00
  • c616263b01
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-30 19:30:23 +05:00
  • 5227321dfd tool-call: when slow server tests fail, hint to run python scripts/fetch_server_test_models.py ochafik 2024-10-30 12:40:22 +00:00
  • 35ac17f3f1 tool-call: fix missing initializer errors ochafik 2024-10-30 12:38:34 +00:00
  • 9719dad859 ggml : avoid crashing on failed memory allocations when loading a gguf file slaren 2024-10-30 13:14:46 +01:00
  • f509f455a9 fix style slaren 2024-10-30 13:02:05 +01:00
  • 29234567a7 ggml : avoid crashing with GGML_ABORT when the KV has an invalid type slaren 2024-10-30 12:54:03 +01:00
  • 5bd963e90b ggml : fix gguf string leak when reading kv pairs fails slaren 2024-10-30 12:48:11 +01:00
  • 3ebdb2b805 tool-call: support tool_use variant in llama_chat_template_from_model + drop llama_get_chat_template ochafik 2024-10-30 10:07:10 +00:00
  • 274a7720d1 ggml : Fix a typo in RVV q4_0_8_8 GEMM Xiongchuan Tan 2024-10-30 14:17:26 +08:00
  • 62a878b125
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-30 09:55:15 +05:00
  • 85efc90c9e kompute: add mul_mat_q4_k shader Sergio Lopez 2024-10-23 14:34:46 +02:00
  • 61408e7fad
    kompute: add backend registry / device interfaces (#10045) b3995 Sergio López 2024-10-30 17:01:52 +01:00
  • bb36c4bd70 kompute: add backend registry / device interfaces Sergio Lopez 2024-10-25 17:47:24 +02:00
  • b9e02e8184
    ggml : fix memory leaks when loading invalid gguf files (#10094) b3994 Diego Devesa 2024-10-30 14:51:21 +01:00
  • 6763f713bb
    readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
  • 79a2bc042d
    convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
  • fc83a9e584
    ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) b3991 xctan 2024-10-30 15:00:40 +08:00
  • c5b0f4b5d9
    llama : refactor model loader with backend registry (#10026) b3990 Diego Devesa 2024-10-30 02:01:23 +01:00
  • 4d1ab99ee7
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-29 23:35:43 +05:00
  • 484984c8ec minor slaren 2024-10-29 18:48:46 +01:00
  • 92c384a5e8 nits Olivier Chafik 2024-10-29 17:24:59 +00:00
  • 39aa7e9e18
    Merge 23214c92cf into 8f275a7c45 agray3 2024-10-29 09:48:36 -07:00
  • 773ff91b7a tool-call: force printing of lazy grammar trigger tokens to regularize function call parsing Olivier Chafik 2024-10-29 15:26:51 +00:00
  • fa4c1119c9 tool-call: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too dumb for function call) Olivier Chafik 2024-10-29 15:25:37 +00:00
  • 64287a328d tool-call: test Hermes-3-Llama-3.1-8B Olivier Chafik 2024-10-29 14:52:25 +00:00
  • 8f275a7c45
    ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) b3989 Changyeon Kim 2024-10-29 17:52:56 +09:00
  • 9c233c72be
    Merge branch 'master' into k-shift2 MaggotHATE 2024-10-29 13:48:09 +05:00
  • 8d8ff71536
    llama : remove Tail-Free sampling (#10071) b3988 Georgi Gerganov 2024-10-29 10:42:05 +02:00
  • e83245e96e
    Merge branch 'ggerganov:master' into k-shift2 MaggotHATE 2024-10-29 09:46:50 +05:00
  • 3eb73ff595 convert : more detailed convert lora usage docs Rich Dougherty 2024-10-29 13:47:09 +13:00
  • 758a46af34 Remove an unnecessary return statement that was accidentally committed J M 2024-10-28 17:36:39 -07:00