Commit graph

  • a52e023969 fixed string format to float VJHack 2025-01-19 22:58:42 -06:00
  • 6c1ca58f07 changed sigma to float VJHack 2025-01-19 22:40:54 -06:00
  • 4f1b1f0349
    Redded LLM_ARCH_PHIMOE Kyle Bruene 2025-01-19 20:55:08 -06:00
  • fee2aa687a no change convert py caitianchi 2025-01-20 10:40:16 +08:00
  • cf32856334 linenoise.cpp refactoring Eric Curtin 2025-01-19 17:22:43 +00:00
  • 4a7ab89d75 wip minicpmv Xuan Son Nguyen 2025-01-19 22:33:05 +01:00
  • 161c25f776
    Merge branch 'ggerganov:master' into master Jianlin Shi 2025-01-19 13:04:21 -07:00
  • 92bc493917
    tests : increase timeout when sanitizers are enabled (#11300) Georgi Gerganov 2025-01-19 20:22:30 +02:00
  • 0cc8e0224e run : fix BOS being added to each message Eric Curtin 2025-01-19 17:52:20 +00:00
  • 08a756aea0
    tests : add DEFAULT_HTTP_TIMEOUT Georgi Gerganov 2025-01-19 19:28:26 +02:00
  • 9d65a3d65b
    tests : increase timeout when sanitizers are enabled Georgi Gerganov 2025-01-19 18:47:01 +02:00
  • b9daaffe02
    simple-chat : fix BOS being added to each message (#11278) b4510 Georgi Gerganov 2025-01-19 18:12:09 +02:00
  • d0068ef0ed add mobilevlm Xuan Son Nguyen 2025-01-19 16:29:20 +01:00
  • 90a0349349
    recommended way to check if the version is 0.3, as requested by ngxson cedo/add-outetts-v0.3 LostRuins Concedo 2025-01-19 21:43:59 +08:00
  • 99487b57d4
    SYCL: Introducing memory host pool (#11251) b4509 Nicolò Scipione 2025-01-19 14:33:34 +01:00
  • a4aed1d302 Update cuda.Dockerfile ochafik 2025-01-19 12:33:56 +00:00
  • 60151daef8 Use ccache in Docker CUDA build ochafik 2025-01-19 02:07:58 +00:00
  • 0401a83b9b agent: add --greedy, --top-p, --top-k options ochafik 2025-01-19 02:07:06 +00:00
  • 6cabdda0df add back convert hf to gguf Xuan Son Nguyen 2025-01-18 22:56:04 +01:00
  • c146c3075a fix: llama-mmap: add include for cerrno Christopher Nielsen 2025-01-18 15:57:41 -05:00
  • 0a81051ae2 llama : second attempt to refactor vision API Xuan Son Nguyen 2025-01-18 20:56:35 +01:00
  • 3100a05ba1 ggml: reserve in gguf_writer and added const pointers as params Herman Semenov 2025-01-18 21:51:44 +03:00
  • c207fdcde6 Merge branch 'jinja' into tool-call ochafik 2025-01-18 18:05:11 +00:00
  • cc50356470 minja: fix vigogne (https://github.com/google/minja/pull/22) ochafik 2025-01-18 17:55:04 +00:00
  • 9a2380ec32 ggml: align structures for 64bit, reorder params and ignore error-warn for Clang 19 Herman Semenov 2025-01-18 19:33:02 +03:00
  • 1366444896 fix editorconfig-checker caitianchi 2025-01-18 23:49:48 +08:00
  • e3c475cd12 Disable jinja test that has a cryptic windows failure ochafik 2025-01-18 14:55:27 +00:00
  • d6f058da8c Merge branch 'jinja' into tool-call ochafik 2025-01-18 14:54:57 +00:00
  • a1649cc13f
    Adding linenoise.cpp to llama-run (#11252) b4508 Eric Curtin 2025-01-18 14:42:31 +00:00
  • 4dd34ff831
    cmake : add sanitizer flags for llama.cpp (#11279) Georgi Gerganov 2025-01-18 16:18:15 +02:00
  • 27d0ec81ea update fix code caitianchi 2025-01-18 22:12:13 +08:00
  • f30f099228
    server : implement cancellable request (#11285) b4506 Xuan Son Nguyen 2025-01-18 14:12:05 +01:00
  • 0e74c9dabe Add missing optional include to server.cpp ochafik 2025-01-18 11:58:00 +00:00
  • fc60802b6e Rm unused optional include ochafik 2025-01-18 11:35:54 +00:00
  • 76893f5880 Merge branch 'jinja' into tool-call ochafik 2025-01-18 11:26:56 +00:00
  • 06dfcdfcc0 fix i underflow Xuan Son Nguyen 2025-01-18 12:20:21 +01:00
  • 2a458d1a9d wip Xuan Son Nguyen 2025-01-18 12:19:25 +01:00
  • f26c874179
    scripts : restore hf.sh (#11288) Georgi Gerganov 2025-01-18 13:18:32 +02:00
  • d3dde49b1a
    scripts : restore hf.sh Georgi Gerganov 2025-01-18 13:02:57 +02:00
  • b5486956ff added rudimentary support for outetts v0.3 500m and 1b models Concedo 2025-01-18 18:48:49 +08:00
  • 5074e6fecd Fix copy elision warning ochafik 2025-01-18 10:48:03 +00:00
  • 33322e823e Flush stdout in chat template before potential crash ochafik 2025-01-18 10:38:21 +00:00
  • e63520f37a Forward decl minja::chat_template to avoid eager json dep ochafik 2025-01-18 10:37:56 +00:00
  • e9c43b852d
    cmake : add status messages [no ci] Georgi Gerganov 2025-01-18 12:33:18 +02:00
  • 6390a998bf
    tts : add guide tokens support (#11186) b4504 LostRuins Concedo 2025-01-18 18:20:57 +08:00
  • d076b6acd0
    ci : use sanitizer builds only in Debug mode Georgi Gerganov 2025-01-18 12:12:53 +02:00
  • 9945478438
    unicode : silence gcc warnings Georgi Gerganov 2025-01-18 12:01:41 +02:00
  • 84aef8dfa8
    dummy : trigger ggml-ci Georgi Gerganov 2025-01-18 11:30:25 +02:00
  • a3a2a064b7
    gguf-test: tensor data comparison Johannes Gäßler 2025-01-18 09:49:47 +01:00
  • c318e0f209 httplib 0.18.5 Xuan Son Nguyen 2025-01-18 10:24:23 +01:00
  • 1131522211
    Merge 1bc896fede into 44e18ef939 Xuan Son Nguyen 2025-01-18 00:53:27 -08:00
  • ba421dd04e gguf-test: tensor data comparison jg/llama-sanitize Johannes Gäßler 2025-01-18 09:49:47 +01:00
  • 44e18ef939
    vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281) b4503 Jeff Bolz 2025-01-18 02:26:50 -06:00
  • a6013ded42 applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start Concedo 2025-01-18 12:53:18 +08:00
  • 9fa30422dc Merge branch 'master' into cedo/tts-guide-tokens Concedo 2025-01-18 11:27:08 +08:00
  • ee1e10e21e Normalize newlines in test-chat-templates for windows tests ochafik 2025-01-18 02:52:40 +00:00
  • acf7c240d8 tools: run tool call slow tests when SLOW_TESTS=1 (+ prefetch models) ochafik 2025-01-18 02:39:37 +00:00
  • 259d9e4511 tools: greedy sampling in tests ochafik 2025-01-18 02:39:10 +00:00
  • 2ceabee0f8 Fix fetch_server_test_models.py (avoid conv trap) ochafik 2025-01-18 01:36:46 +00:00
  • 045edd1d7e Merge branch 'jinja' into tool-call ochafik 2025-01-18 01:04:57 +00:00
  • d5fa351a24 Revert LLAMA_CHATML_TEMPLATE refactor ochafik 2025-01-18 01:04:12 +00:00
  • 138a4ba83f Merge branch 'jinja' into tool-call ochafik 2025-01-18 00:59:10 +00:00
  • 81c0d437a5 Attempt to fix linkage of LLAMA_CHATML_TEMPLATE ochafik 2025-01-18 00:56:19 +00:00
  • 40db78963b Merge remote-tracking branch 'origin/master' into jinja ochafik 2025-01-18 00:44:37 +00:00
  • b75d0622e4 Refactor common_chat_* functions to accept minja template + use_jinja option ochafik 2025-01-18 00:43:38 +00:00
  • 3c7784c51c Refactor common_chat_* functions to accept minja template + use_jinja option ochafik 2025-01-18 00:13:16 +00:00
  • 0d7245aa46 fix: Only flatten to Q8_0 if the raw target type is a quantization Gabe Goodhart 2025-01-17 16:34:21 -07:00
  • 059325d9b4
    Merge branch 'ggerganov:master' into master Jianlin Shi 2025-01-17 15:57:22 -07:00
  • bdc22d606e fix typo Xuan Son Nguyen 2025-01-17 22:32:25 +01:00
  • 8ebe9b23f6 server : implement cancellable request Xuan Son Nguyen 2025-01-17 22:19:08 +01:00
  • 614c6e6544 fix: Use Q8_0 for all embedding quantizations for granite and granitemoe Gabe Goodhart 2025-01-17 11:23:29 -07:00
  • b2d861ba77 vulkan: fix coopmat2 flash attention for non-contiguous inputs Jeff Bolz 2025-01-16 22:30:01 -06:00
  • 7000623c00
    tests : fix gguf context use in same_tensor_data Georgi Gerganov 2025-01-17 16:26:12 +02:00
  • e872097c35
    cmake : apply only sanitizer flags at top level Georgi Gerganov 2025-01-17 15:48:39 +02:00
  • 9d1b20ad1a
    cmake : move llama.cpp compile flags to top level lists Georgi Gerganov 2025-01-17 15:40:03 +02:00
  • 9a03bc811f
    cmake : move sanitizer flags to llama_add_compile_flags Georgi Gerganov 2025-01-17 15:33:36 +02:00
  • ce293d837c
    tests : fix compile warnings Georgi Gerganov 2025-01-17 15:22:36 +02:00
  • 72dc7bff4d
    cmake : add sanitizer flags for llama.cpp Georgi Gerganov 2025-01-17 15:18:24 +02:00
  • 0ea354ccf3 Adding linenoise.cpp to llama-run Eric Curtin 2025-01-15 13:25:35 +00:00
  • 3edfa7d375
    llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) b4502 codezjx 2025-01-17 20:57:56 +08:00
  • 10d41f2ede
    simple-chat : fix BOS being added to each message Georgi Gerganov 2025-01-17 12:31:41 +02:00
  • 667d72846c
    rpc : early register backend devices (#11262) b4501 Radoslav Gerganov 2025-01-17 10:57:09 +02:00
  • b38e86f4d0 rpc : early register backend devices Radoslav Gerganov 2025-01-16 10:55:11 +02:00
  • a133566d34
    vocab : fix double-eos check (#11273) b4500 Georgi Gerganov 2025-01-17 09:28:00 +02:00
  • c6123e69b0 added top-k sampler to improve performance VJHack 2025-01-17 01:17:40 -06:00
  • 960ec65273
    llama : fix deprecation message: vocabable -> vocab (#11269) b4499 David Renshaw 2025-01-17 02:12:01 -05:00
  • bf2dab556f
    vocab : fix double-eos check Georgi Gerganov 2025-01-17 08:47:17 +02:00
  • f1b07d65f2 llama.android: add field formatChat to control whether to parse special tokens when send message codezjx 2025-01-17 11:03:12 +08:00
  • 406a5dd868 fix deprecation message: vocabable -> vocab David Renshaw 2025-01-16 19:28:00 -05:00
  • 7a689c415e
    README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
  • bd38ddea01
    vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) b4497 Jeff Bolz 2025-01-16 15:47:10 -06:00
  • 466300fe14
    vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
  • 206bc53422
    vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
  • 4dbc8b9cb7
    llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00
  • 9c8dcefe17
    CUDA: backwards pass for misc. ops, add tests (#11257) b4493 Johannes Gäßler 2025-01-16 16:43:38 +01:00
  • 066f6cf3e1 fix pot. int overflows Johannes Gäßler 2025-01-16 15:31:37 +01:00
  • 1120d94b60 remove restrict from pointers Johannes Gäßler 2025-01-16 13:56:39 +01:00
  • 681149ced2
    llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
  • 49822bab15 update Xuan Son Nguyen 2025-01-16 12:44:21 +01:00
  • 963b685075 Address PR review feedback - remove warning nscipione 2025-01-16 11:22:56 +01:00