Commit graph

  • 3b82c9c7b0 cuda : ignore peer access already enabled errors slaren 2024-02-19 21:37:49 +01:00
  • e2b8d731bc server: health endpoint configurable failure on no slot Pierrick HYMBERT 2024-02-19 20:12:41 +01:00
  • 2c225c9598 server: use llama_chat_apply_template ngxson 2024-02-19 19:39:51 +01:00
  • 2976a05d3f Adjust macro for VMM Daniel Hiltgen 2024-02-19 09:17:00 -08:00
  • 8388e73b72 Add quant type GGML_TYPE_IQ1_S to unsupported Aidan 2024-02-19 16:51:06 +00:00
  • 69cb1ef0b1
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-02-19 16:38:20 +00:00
  • ea0e8ac758 Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch pudepiedj 2024-02-19 16:37:02 +00:00
  • efe38c636f server changes pudepiedj 2024-02-19 16:36:59 +00:00
  • 0ffba498a1 expose llava_image_embed_make_with_clip_img CJ Pais 2024-02-19 07:30:36 -08:00
  • 1c9b186452
    Merge 4fb1bd85a0 into 9d679f0fcc Johannes Gäßler 2024-02-19 16:18:31 +02:00
  • 9d679f0fcc
    examples : support minItems/maxItems in JSON grammar converter (#5039) b2203 nopperl 2024-02-19 14:14:07 +00:00
  • c1362cbd63
    Update examples/json-schema-to-grammar.py Georgi Gerganov 2024-02-19 16:13:25 +02:00
  • 1387cf60f7
    llava : remove extra cont (#5587) b2202 Georgi Gerganov 2024-02-19 15:23:17 +02:00
  • 6fd413791a llava : replace ggml_cpy with ggml_cont b2201 slaren 2024-02-19 14:02:36 +01:00
  • 337c9cbd52 sync : ggml Georgi Gerganov 2024-02-19 14:54:21 +02:00
  • a3145bdc30 ggml-alloc : apply ggml/731 Georgi Gerganov 2024-02-19 14:53:48 +02:00
  • 890559ab28 metal : option to embed MSL source into compiled binary (whisper/1842) Didzis Gosko 2024-02-11 16:41:41 +02:00
  • e388894a10 llava : replace ggml_cpy with ggml_cont slaren 2024-02-19 14:02:36 +01:00
  • 2bf201545f
    sync : ggml Georgi Gerganov 2024-02-19 14:54:21 +02:00
  • 71e8fca158
    ggml-alloc : apply ggml/731 Georgi Gerganov 2024-02-19 14:53:48 +02:00
  • 5d3631b32c
    metal : option to embed MSL source into compiled binary (whisper/1842) Didzis Gosko 2024-02-11 16:41:41 +02:00
  • 491e11b283
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-02-19 12:48:37 +00:00
  • d0e3ce51f4
    ci : enable -Werror for CUDA builds (#5579) b2197 Georgi Gerganov 2024-02-19 14:45:41 +02:00
  • aaffb2387f
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-02-19 12:44:51 +00:00
  • 8a4d202957 minor changes pudepiedj 2024-02-19 12:41:27 +00:00
  • 7f0d8987eb minor updates and TCPshellscript pudepiedj 2024-02-19 12:14:23 +00:00
  • c3520e9327
    make, cmake : enable CUDA errors on warnings Georgi Gerganov 2024-02-19 14:10:45 +02:00
  • 46fd273724
    cmake : pass -Werror through -Xcompiler Georgi Gerganov 2024-02-19 09:35:26 +02:00
  • 3fc455584f Use IQ4_NL instead of Q4_K when using k-quants is not possible Iwan Kawrakow 2024-02-19 13:42:13 +02:00
  • 68a6b98b3c
    make : fix CUDA build (#5580) b2196 Georgi Gerganov 2024-02-19 13:41:51 +02:00
  • f249c997a8
    llama : adapt to F16 KQ_pos gg/flash-attn-sync Georgi Gerganov 2024-02-19 13:10:24 +02:00
  • 4bd8334f8d
    Update README.md Sol6e 2024-02-19 14:06:40 +03:00
  • 31109ca00a
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-02-19 12:58:18 +02:00
  • 427d7a4d3a
    squash! ci: add g++-10 and gcc-10 to the main-cuda Daniel Bevenius 2024-02-19 11:53:26 +01:00
  • 5d7d353707 fix bug Abhilash Majumder 2024-02-16 09:16:50 +05:30
  • 31bc422b55 revert suggestion on macro Abhilash Majumder 2024-02-16 09:03:54 +05:30
  • f58d29b104 Apply suggestions from code review AidanBeltonS 2024-02-15 12:09:29 +00:00
  • d8eba3cebb Update ggml_sycl_op_mul_mat_vec_q Aidan 2024-02-14 17:31:09 +00:00
  • 70d45af0ef
    readme : fix typo in README-sycl.md (#5353) valiray 2024-02-19 02:37:10 -08:00
  • e7b999c3dc iq4_nl: another fix after merging with master Iwan Kawrakow 2024-02-19 11:40:35 +02:00
  • 1d90021241 iq4_nl: Fix after merging with master Iwan Kawrakow 2024-02-19 11:29:26 +02:00
  • 412735ec70
    Merge branch 'master' into gg/metal-batched gg/metal-batched Georgi Gerganov 2024-02-19 11:25:24 +02:00
  • 9b0d3a857e iq4_nl: squash commits for easier rebase Iwan Kawrakow 2024-02-19 10:45:31 +02:00
  • 13e2c771aa
    cmake : remove obsolete sycl compile flags (#5581) b2194 Abhilash Majumder 2024-02-19 14:45:18 +05:30
  • a706d1f523
    format fix Abhilash Majumder 2024-02-19 14:38:59 +05:30
  • 094207e19d
    fix bug Abhilash Majumder 2024-02-19 14:37:35 +05:30
  • febdf14d36
    fix bug Abhilash Majumder 2024-02-19 14:36:39 +05:30
  • 27899c1c79
    Merge branch 'ggerganov:master' into sycl_cmake Abhilash Majumder 2024-02-19 14:28:34 +05:30
  • 600d517cde
    ci: add g++-10 and gcc-10 to the main-cuda Daniel Bevenius 2024-02-19 09:37:42 +01:00
  • f53119cec4
    minor : fix trailing whitespace (#5538) b2193 Georgi Gerganov 2024-02-19 10:34:10 +02:00
  • 7084755396
    llava : avoid changing the original BakLLaVA model (#5577) Daniel Bevenius 2024-02-19 09:31:59 +01:00
  • 4480542b22
    baby-llama : allocate graphs in ggml_context (#5573) b2191 NawafAlansari 2024-02-19 03:25:38 -05:00
  • bf2976591e
    minor : fix whitespaces Georgi Gerganov 2024-02-19 10:25:02 +02:00
  • 11b12de39b
    llama : add llama_chat_apply_template() (#5538) b2190 Xuan Son Nguyen 2024-02-19 09:23:37 +01:00
  • 3a9cb4ca64
    cuda, metal : fix nans in soft_max (#5574) b2189 slaren 2024-02-19 09:04:45 +01:00
  • 76d9cbf60e
    metal : fix nans in soft_max Georgi Gerganov 2024-02-19 10:00:16 +02:00
  • 769a716e30
    readme : update (#5572) Mirko185 2024-02-19 08:39:31 +01:00
  • f0d1fafc02
    ggml : android and old glibc NUMA incompatibility bugfixes (#5557) b2187 bmwl 2024-02-18 23:38:32 -08:00
  • 3ceebe945b
    llava: avoid chaging the original BakLLaVA model Daniel Bevenius 2024-02-19 07:37:42 +01:00
  • 657e6b176b
    rm unwanted sycl compile options Abhilash Majumder 2024-02-19 11:13:52 +05:30
  • 9ee3ba6b0f Fix cuda memory leaks Daniel Hiltgen 2024-02-17 10:13:44 -08:00
  • e05bd2f0ab
    Merge branch 'ggerganov:master' into master bmwl 2024-02-18 14:37:06 -08:00
  • 558d5d4b08 cuda : fix nans in soft_max slaren 2024-02-18 23:31:23 +01:00
  • 21b857ee36 Fixed the baby-llama issue (see issue #4830) nma5214 2024-02-18 17:22:24 -05:00
  • 83670169d2
    Update README.md Mirko185 2024-02-18 23:12:30 +01:00
  • fb16cc9e9f remove todo CJ Pais 2024-02-18 13:30:08 -08:00
  • e801037de6 remove c++ style from header CJ Pais 2024-02-18 13:25:49 -08:00
  • a0c2dad9d4
    build : pass all warning flags to nvcc via -Xcompiler (#5570) b2186 Jared Van Bortel 2024-02-18 16:21:52 -05:00
  • b51b78ea90 make : fix incorrect GF_CC_VER for CUDA host compiler Jared Van Bortel 2024-02-18 16:08:42 -05:00
  • 649f6f8290 llama_chat_apply_template: change variable name to "tmpl" ngxson 2024-02-18 22:06:44 +01:00
  • d42a8bf134 make : fix apparent mis-merge from #3952 Jared Van Bortel 2024-02-18 16:04:11 -05:00
  • 498fb769a5
    Merge branch 'ggerganov:master' into master bmwl 2024-02-18 13:03:26 -08:00
  • 14278f55d2
    ggml : restore vec dot stride arg names (#5453) b2185 Georgi Gerganov 2024-02-18 22:58:57 +02:00
  • 73fbd67901 llama_chat_apply_template: use term "chat" everywhere ngxson 2024-02-18 21:44:47 +01:00
  • 92fb52d456 build: pass all warning flags to nvcc via -Xcompiler Jared Van Bortel 2024-02-18 15:37:03 -05:00
  • 47c662b0de
    fix some spaces added by IDE in math op gg/rename-n_ctx Pierrick HYMBERT 2024-02-18 21:04:04 +01:00
  • 606873401c
    rename n_ctx to kv_size Pierrick HYMBERT 2024-02-18 20:59:26 +01:00
  • ef96e8b1f7
    server: document the --ctx-size deprecation in server README.md Pierrick HYMBERT 2024-02-18 11:20:34 +01:00
  • 9a0695671d
    server: rename legacy --ctx-size to --kv-size Pierrick HYMBERT 2024-02-17 11:49:42 +01:00
  • b1de96824b
    ci : fix wikitext url + compile warnings (#5569) b2184 Georgi Gerganov 2024-02-18 22:39:30 +02:00
  • d03f66f24e
    ci : fix wikitext url + compile warnings Georgi Gerganov 2024-02-18 22:05:33 +02:00
  • d7d4d971a1 fix some spaces added by IDE in math op Pierrick HYMBERT 2024-02-18 21:04:04 +01:00
  • c8e172a35d rename n_ctx to kv_size Pierrick HYMBERT 2024-02-18 20:59:26 +01:00
  • 7ad554f90e
    metal : fix unused warnings (#0) Georgi Gerganov 2024-02-18 21:39:58 +02:00
  • 5ee99c32f5
    common, server : surface min_keep as its own parameter (#5567) b2182 Robey Holderith 2024-02-18 11:11:16 -08:00
  • d2f97227ba Merge remote-tracking branch 'origin/master' into server_branch pudepiedj 2024-02-18 19:00:06 +00:00
  • d5b236caf8 Updated README with min_keep param Robey Holderith 2024-02-18 10:58:28 -08:00
  • 894adc8604 Feature - surface min_keep as its own parameter Robey Holderith 2024-02-18 10:51:10 -08:00
  • 63d186214e
    Merge branch 'ggerganov:master' into master bmwl 2024-02-18 09:59:14 -08:00
  • 12addf2d5f server: document the --ctx-size deprecation in server README.md Pierrick HYMBERT 2024-02-18 11:20:34 +01:00
  • d54696fed9 server: rename legacy --ctx-size to --kv-size Pierrick HYMBERT 2024-02-17 11:49:42 +01:00
  • c145f8a132
    server : slots monitoring endpoint (#5550) b2181 Pierrick Hymbert 2024-02-18 18:39:57 +01:00
  • 689a091bbe
    sampling : do not set min_keep to n_probs (#5564) b2180 Georgi Gerganov 2024-02-18 19:38:06 +02:00
  • 7a98f62f6d server edit pudepiedj 2024-02-18 17:32:02 +00:00
  • f3f28c5395
    cmake : fix GGML_USE_SYCL typo (#5555) b2179 Georgi Gerganov 2024-02-18 19:17:00 +02:00
  • 491cf7994a server: slots monitoring endpoint Pierrick HYMBERT 2024-02-17 15:43:44 +01:00
  • 20906de171
    n_probs should be view only Robey Holderith 2024-02-18 08:49:09 -08:00
  • bad3de0511 server with flag pudepiedj 2024-02-18 16:35:26 +00:00
  • e75c6279d1
    server : enhanced health endpoint (#5548) b2178 Pierrick Hymbert 2024-02-18 17:31:28 +01:00
  • 36376abe05
    server : --n-predict option document and cap to max value (#5549) b2177 Pierrick Hymbert 2024-02-18 17:30:09 +01:00