Commit graph

  • 5c43ba6dae Fix cann compilation error after merging llama.cpp supports dynamically loadable backends leo-pony 2024-10-15 14:29:52 +08:00
  • 1017ddc11c
    server : improve infill context reuse Georgi Gerganov 2024-10-15 11:51:20 +03:00
  • 050eb7abc5
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-10-15 14:52:28 +05:00
  • dcdd535302
    server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
  • c854c97b5e
    server : update preact Georgi Gerganov 2024-10-15 11:57:54 +03:00
  • 4c42f93b22
    readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
  • 3496f584cc Small fixes MaggotHATE 2024-10-15 11:23:11 +05:00
  • 28d2cff729 Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc MaggotHATE 2024-10-15 09:46:14 +05:00
  • 2be814aa69 Fixed tests and outdated README MaggotHATE 2024-10-15 09:46:04 +05:00
  • de7836a4e3 fix: iOS page size Gilad S 2024-10-15 03:38:47 +03:00
  • 4455c7f073 fix: increase page size to 32 on iOS Gilad S 2024-10-15 03:29:23 +03:00
  • 1516f7b790 fix: convert TENSOR_ALIGNMENT to a macro Gilad S 2024-10-15 03:06:30 +03:00
  • cb224645e1 fix: page align to TENSOR_ALIGNMENT Gilad S 2024-10-15 02:55:13 +03:00
  • 33b430810b fix: page align to GGUF_DEFAULT_ALIGNMENT Gilad S 2024-10-15 02:51:39 +03:00
  • df67ae380e fix: unused var Gilad S 2024-10-15 02:01:02 +03:00
  • 8352354aa2 fix: transform GGML_ALIGNED_MALLOC and GGML_ALIGNED_FREE into functions and add them to ggml-impl.h Gilad S 2024-10-15 01:56:07 +03:00
  • 19a820fced fix: unused var Gilad S 2024-10-15 01:28:44 +03:00
  • fa79e0d0dd style: formatting Gilad S 2024-10-15 01:23:01 +03:00
  • 55298d2530 fix: move const outside of #ifndef Gilad S 2024-10-15 01:21:11 +03:00
  • c2259e3cd1 style: formatting Gilad S 2024-10-15 01:14:36 +03:00
  • 51ccdebc84 feat: move GGML_ALIGNED_MALLOC to ggml-backend-impl.h, add support for vm_allocate on macOS Gilad S 2024-10-15 01:08:53 +03:00
  • 90fb7c0767
    Added swift bindings repo to readme Michał Tuszyński 2024-10-14 22:51:30 +02:00
  • 40a68f4d0c
    simplify "send text" checks ZXED 2024-10-14 21:53:56 +03:00
  • 65ec5f4326
    Merge PR (#10) dennyxbox890 2024-10-15 00:29:57 +08:00
  • 17ad143ead
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-10-14 18:36:52 +05:00
  • 3613a6d27b Renamed random distribution MaggotHATE 2024-10-14 18:36:03 +05:00
  • ba7a8a3728 llava : fix typo in error message Daniel Bevenius 2024-10-14 15:34:02 +02:00
  • 436a9919e3 Simplified algorithm since threshold_max is removed MaggotHATE 2024-10-14 16:10:13 +05:00
  • 93f85ea213 fix memory leaks in minicpmv caitianchi 2024-10-14 15:57:34 +08:00
  • a89f75e1b7
    server : handle "logprobs" field with false value (#9871) b3917 VoidIsVoid 2024-10-14 15:04:36 +08:00
  • dfef2c4c37
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-10-14 11:44:50 +05:00
  • a3e652296a Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc MaggotHATE 2024-10-14 11:44:00 +05:00
  • 44bbd6337a Quick fixes by comments MaggotHATE 2024-10-14 11:43:45 +05:00
  • 901a3479b1 move cache stack to advance stack Clarissa Miranda 2024-10-14 17:13:40 +11:00
  • 13dca2a54a
    Vectorize load instructions in dmmv f16 CUDA kernel (#9816) b3916 agray3 2024-10-14 01:49:08 +01:00
  • 563ff44ed9 server: Handle "logprobs" field with false value Gimling 2024-10-13 09:41:20 +08:00
  • cec2d4e265 fix: switch to posix_memalign to keep existing free() usages work Gilad S 2024-10-14 03:40:25 +03:00
  • 06b119a177 fix: use vm_allocate to allocate CPU backend buffer on macOS Gilad S 2024-10-14 02:30:25 +03:00
  • d4c19c0f5c
    server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
  • 33d9acf97a
    server : use repo-level FIM pattern if possible Georgi Gerganov 2024-10-13 21:08:39 +03:00
  • a28b8c81a8
    server : update readme [no ci] Georgi Gerganov 2024-10-13 19:18:36 +03:00
  • 5a699f147e
    server : accept extra_context for the infill endpoint Georgi Gerganov 2024-10-13 18:58:51 +03:00
  • c7181bd294
    server : reuse cached context chunks (#9866) b3914 Georgi Gerganov 2024-10-13 18:52:48 +03:00
  • 27addf545b
    server : reuse context chunks Georgi Gerganov 2024-10-12 11:32:10 +03:00
  • ea62e65fe9
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-10-13 13:45:40 +05:00
  • 92be9f1216
    flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
  • 5d188fcc40 flake.lock: Update github-actions[bot] 2024-10-13 00:22:26 +00:00
  • 865ccb4e36 add nvidia nemotron chat template Xuan Son Nguyen 2024-10-13 00:18:55 +02:00
  • 4be7ecf25e fix perplexity Xuan Son Nguyen 2024-10-12 23:19:52 +02:00
  • 6395174a54 fix save-load-state example Xuan Son Nguyen 2024-10-12 23:09:05 +02:00
  • 7264596a5c null terminated seq_id list Xuan Son Nguyen 2024-10-12 22:53:05 +02:00
  • 734f9e29de use common_batch_add, reuse llama_batch in loop Xuan Son Nguyen 2024-10-12 22:51:30 +02:00
  • b4c9911ebe Merge branch 'master' into xsn/llama_batch_remove_compat Xuan Son Nguyen 2024-10-12 22:47:37 +02:00
  • 0639ff16d0 free batch before return Xuan Son Nguyen 2024-10-12 22:47:27 +02:00
  • 805512a73b ggml : remove unused fast broadcast path in GGML_MUL Francis Couture-Harpin 2024-10-12 16:20:26 -04:00
  • 038d958333 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-10-12 16:12:06 -04:00
  • ae996a2fc8
    server: fix the disappearance of the end of the text when streaming with stop strings ZXED 2024-10-12 18:31:43 +03:00
  • cca842fbd3 Fixed arg after update MaggotHATE 2024-10-12 18:46:13 +05:00
  • ea85a51af1
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-10-12 18:38:06 +05:00
  • 68557eb7a0 Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc MaggotHATE 2024-10-12 18:36:14 +05:00
  • 9c43a01c5d Removed xtc_threshold_max MaggotHATE 2024-10-12 18:35:56 +05:00
  • edc265661c
    server : add option to time limit the generation phase (#9865) b3912 Georgi Gerganov 2024-10-12 16:14:27 +03:00
  • 1bde94dd02
    server : remove self-extend features (#9860) b3911 Georgi Gerganov 2024-10-12 16:06:31 +03:00
  • a34cde99ee
    server : add option to time limit the generation phase Georgi Gerganov 2024-10-12 15:38:18 +03:00
  • b75afe34c2
    server : fix context limit check to use slot.n_past Georgi Gerganov 2024-10-12 15:33:47 +03:00
  • 8a1f4393ee
    server : remove self-extend Georgi Gerganov 2024-10-12 10:58:20 +03:00
  • 95c76e8e92
    server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
  • 0db72b63f5
    server : fix non-transformer logic + remove response from /props Georgi Gerganov 2024-10-12 09:21:41 +03:00
  • 9ec6b49176
    readme : update [no ci] Georgi Gerganov 2024-10-12 09:03:24 +03:00
  • f6dd38c2dd
    server : remove legacy system_prompt feature Georgi Gerganov 2024-10-12 08:56:14 +03:00
  • 11ac9800af
    llama : improve infill support and special token detection (#9798) b3909 Georgi Gerganov 2024-10-12 08:21:51 +03:00
  • 943d20b411
    musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
  • 5303b070fe mtgpu: update doc Xiaodong Ye 2024-10-12 10:35:09 +08:00
  • 2b2ab6fb6f New quand strategy / FTYPE IQ3_XL 4bpw Nexesenex 2024-10-12 02:48:51 +02:00
  • dfe587a5f3
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-10-12 00:41:34 +05:00
  • 351354115f Merge branch 'master' into sycl_async_data_load OuadiElfarouki 2024-10-11 17:07:24 +01:00
  • cdc3e78bb6
    Merge 41efa86198 into 96776405a1 Sigbjørn Skjæret 2024-10-11 11:49:00 -04:00
  • d2b1d0e7f1 trigger build Yuri Khrustalev 2024-10-11 09:35:31 -04:00
  • 96776405a1
    ggml : move more prints to the ggml log system (#9839) b3907 Diego Devesa 2024-10-11 15:34:45 +02:00
  • a85f67db5b change graph alloc fail print from debug to error slaren 2024-10-11 15:22:52 +02:00
  • fc06b628f8
    Merge branch 'master' into fix-logging-main.cpp Kurt Manucredo 2024-10-11 15:20:06 +02:00
  • 32d1c6ea13 show BLAS OpenMP warnings in all builds using debug prints slaren 2024-10-11 15:16:17 +02:00
  • bc1bcb1c9d revert backend registering messages to only debug builds slaren 2024-10-11 15:10:21 +02:00
  • 6a9769a260 fix context shifting Xuan Son Nguyen 2024-10-11 14:36:48 +02:00
  • 7740c969d0 fix Xuan Son Nguyen 2024-10-11 13:59:26 +02:00
  • acada1a5e7 Made algorithm safer and more readable MaggotHATE 2024-10-11 15:36:25 +05:00
  • 23214c92cf ggml: avoid rebuild of GGML graph for each token (#7456) Alan Gray 2024-06-04 05:23:13 -07:00
  • 59fd6b6119 fix llama_bench Xuan Son Nguyen 2024-10-11 12:21:20 +02:00
  • 92769503dc fix simple.cpp Xuan Son Nguyen 2024-10-11 12:16:03 +02:00
  • 997031605b Merge branch 'master' into xsn/llama_batch_remove_compat Xuan Son Nguyen 2024-10-11 12:14:47 +02:00
  • 1c486169ed adapt all examples Xuan Son Nguyen 2024-10-11 12:11:00 +02:00
  • b226c5b1a7 refactor llama_batch_get_one Xuan Son Nguyen 2024-10-11 11:48:09 +02:00
  • 36815404c9
    gguf : deprecate old FIM token KVs Georgi Gerganov 2024-10-11 10:07:14 +03:00
  • 3ae86704e6
    server : update prompt on slot restore (#9800) Georgi Gerganov 2024-10-11 09:16:00 +03:00
  • 0fb9c91f14
    llama : add more FIM token strings Georgi Gerganov 2024-10-10 13:37:56 +03:00
  • 3a8a89ac4c
    llama : improve infill support Georgi Gerganov 2024-10-08 14:24:22 +03:00
  • 3968369071 Fixed labels in old server UI MaggotHATE 2024-10-11 11:53:19 +05:00
  • 7eb1990d4b ggml : move more prints to the ggml log system slaren 2024-10-11 08:29:48 +02:00
  • 882a603bda
    Merge branch 'master' into master MaggotHATE 2024-10-11 11:26:05 +05:00
  • cb1632b593 llama : adds llama-grammar memorization stacks (#4218) Clarissa Miranda 2024-10-11 12:20:48 +11:00