Commit graph

  • 2ddc9bbef1
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-01-31 18:49:43 +02:00
  • d3bac7d584
    llama : reorder build_orion() at correct place (#5118) b2036 Georgi Gerganov 2024-01-31 18:47:10 +02:00
  • 3d0a552359 docs: clarify install vulkan SDK outside docker ngxson 2024-01-31 17:38:49 +01:00
  • 99892be490
    Fix out of bounds memory clobber in GGML_OP_ROPE Justine Tunney 2024-01-31 08:20:53 -08:00
  • 5cb04dbc16
    llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) b2035 Georgi Gerganov 2024-01-31 17:30:17 +02:00
  • 6dfcb42afd
    llama : remove gpu includes from llama.h Georgi Gerganov 2024-01-31 17:10:25 +02:00
  • 1139b66f2e
    readme : change deprecation notice to "remove" and fix url Georgi Gerganov 2024-01-31 16:59:18 +02:00
  • ddfbc23ca9 docs: sycl: add docker section ngxson 2024-01-31 15:58:02 +01:00
  • 3cedb7ef4b
    readme : add deprecation notice Georgi Gerganov 2024-01-31 16:50:02 +02:00
  • 380eabd828 Merge branch 'master' into xsn/docs-sycl-vulkan ngxson 2024-01-31 15:46:57 +01:00
  • aa71356dc8
    train : remove LLAMA_SUPPORTS_GPU_OFFLOAD Georgi Gerganov 2024-01-31 16:44:51 +02:00
  • 8bfb0b6a64
    llama : remove LLAMA_SUPPORTS_GPU_OFFLOAD Georgi Gerganov 2024-01-31 16:38:00 +02:00
  • 3180a6468f
    server : remove LLAMA_MAX_DEVICES Georgi Gerganov 2024-01-31 16:15:08 +02:00
  • ffa4293245
    Update llama.cpp Georgi Gerganov 2024-01-31 16:13:41 +02:00
  • 43312b2039
    llama : remove LLAMA_MAX_DEVICES from llama.h Georgi Gerganov 2024-01-31 15:51:23 +02:00
  • efb7bdbbd0
    metal : add im2col F32 dst support (#5132) b2034 Georgi Gerganov 2024-01-31 15:35:41 +02:00
  • 15606309a0
    llava : add MobileVLM support (#5132) b2033 JidongZhang-THU 2024-01-31 21:10:15 +08:00
  • 18fd0b0ccc test-backend-ops : add dst_type to im2col slaren 2024-01-31 14:06:55 +01:00
  • b2b9f025e7
    format license text, restore apache license by legal suggestion (#5233) b2032 Neo Zhang Jianyu 2024-01-31 21:04:46 +08:00
  • dabcc5b471
    ggml : limit n_threads to the max n_tasks (#5238) b2031 slaren 2024-01-31 13:43:03 +01:00
  • 7a04afde66 ggml : limit n_threads to the max n_tasks slaren 2024-01-31 13:00:41 +01:00
  • f8e9140cb4
    Vulkan Fixes (#5223) b2030 0cc4m 2024-01-31 11:44:19 +01:00
  • 7c8cf29925 Fix small matrix multiplication errors in AMD GPUs on Windows or with amdvlk 0cc4m 2024-01-31 06:26:32 +01:00
  • 77223b1d58
    Merge branch 'master' into support-xcomposer2 John 2024-01-31 04:32:32 +01:00
  • 558007bc3b format license text, restore apache license by legal suggestion Zhang 2024-01-31 11:29:05 +08:00
  • 0285b88c2b xx John 2024-01-31 03:14:13 +00:00
  • d62520eb2c
    Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231) b2029 Yiming Cui 2024-01-31 11:04:21 +08:00
  • 5188ea09d6 xcomposer2 support - loading of all lora tensors in correct shape [done] - make llm inference possible [done] - make clip conversion possible [done] - dynamic shape for lora tensors [] - add image token mask [] - add conditional "lora" on tensors during inference John 2024-01-31 03:01:43 +00:00
  • b35da1d812
    Update llama.cpp Yiming Cui 2024-01-31 10:56:40 +08:00
  • 01684139c3
    support SYCL backend windows build (#5208) b2028 Neo Zhang Jianyu 2024-01-31 10:38:07 +08:00
  • 66dd123b0f pool2d float padding fallback zhangjidong 2024-01-31 10:22:09 +08:00
  • 7fe612dc73 added docs explaining the rollback op l3utterfly 2024-01-31 10:47:24 +09:00
  • d5f4dec935 preserve "prev" length during rollback l3utterfly 2024-01-31 10:42:28 +09:00
  • a5c3f30cb6 fixed editor config check l3utterfly 2024-01-31 10:39:13 +09:00
  • b72292682b fix format issuse Zhang 2024-01-31 09:35:08 +08:00
  • 1d6e21acce
    Merge pull request #1 from netrunnereve/ci Eve 2024-01-31 01:20:02 +00:00
  • a6ea06faee
    vulkan ci Eve 2024-01-31 00:53:20 +00:00
  • f131a6200e
    Merge branch 'ggerganov:master' into master Eve 2024-01-31 00:22:34 +00:00
  • e8dc55d006
    kompute : llama-bench support and ggml_cpu_has_kompute() (#5226) b2027 Jared Van Bortel 2024-01-30 19:04:37 -05:00
  • 722e483ad6 docs: remove trailing spaces ngxson 2024-01-31 00:25:40 +01:00
  • 1b2d22f3de docs: sycl build in docker ngxson 2024-01-31 00:10:44 +01:00
  • c3a0d28afb add docs for vulkan ngxson 2024-01-31 00:10:32 +01:00
  • 85bb983c3c fix vulkan dockerfile ngxson 2024-01-31 00:09:46 +01:00
  • b86d0ac514 intel dockerfile: compile sycl by default ngxson 2024-01-31 00:02:54 +01:00
  • 6d3a06717f add vulkan dockerfile ngxson 2024-01-31 00:02:31 +01:00
  • 3536cf6000 llama : remove obsolete set of n_threads=1 Jared Van Bortel 2024-01-30 16:37:00 -05:00
  • e3b420a407 kompute : llama-bench support and ggml_cpu_has_kompute() Jared Van Bortel 2024-01-30 16:21:47 -05:00
  • 3b0f74b428 latest kernel update, wrong values FSSRepo 2024-01-30 14:57:12 -05:00
  • 3d03bcb7af
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-01-30 21:49:13 +02:00
  • 78df5527e4
    tests : ifdef Georgi Gerganov 2024-01-30 21:46:49 +02:00
  • d073e4f933
    metal : fix array initialization Georgi Gerganov 2024-01-30 21:45:32 +02:00
  • e0085fdf7c
    Revert "server : change deps.sh xxd files to string literals (#5221)" b2026 Georgi Gerganov 2024-01-30 21:19:26 +02:00
  • ffe841cdd9 Fix bug in Vulkan CPY op 0cc4m 2024-01-30 19:57:10 +01:00
  • e6f291d158
    server : fix context shift (#5195) Georgi Gerganov 2024-01-30 20:17:30 +02:00
  • 3c9b68df23 Add Vulkan to common.cpp dump_non_result_info_yaml function 0cc4m 2024-01-30 19:03:47 +01:00
  • 54b1181bed Fix Vulkan context shift crash 0cc4m 2024-01-30 19:02:48 +01:00
  • 4003be0e5f
    server : change deps.sh xxd files to string literals (#5221) JohnnyB 2024-01-30 12:15:05 -06:00
  • 98bd9a73fa
    XXD to string literals. JohnnyB 2024-01-30 12:07:45 -06:00
  • df5cc0b27c
    Comment on removing xxd. JohnnyB 2024-01-30 18:04:51 +00:00
  • fccec2a813
    Dashes in literal variable names. JohnnyB 2024-01-30 18:04:13 +00:00
  • fd0ad6e521
    Changed ugly xxd to literals. JohnnyB 2024-01-30 17:56:21 +00:00
  • 47cba0d23b
    fix format Abhilash Majumder 2024-01-30 23:15:41 +05:30
  • eb76719eea Fix Vulkan F16 models 0cc4m 2024-01-30 18:22:18 +01:00
  • fea4fd4ba7
    ggml : fix IQ3_XXS on Metal (#5219) Kawrakow 2024-01-30 19:15:28 +02:00
  • 05350f2862
    server : rever n_past_se changes Georgi Gerganov 2024-01-30 19:05:36 +02:00
  • 719a087138 iq3_xxs: forgotten update of the grid points ik/fix_iq3xxs_metal Iwan Kawrakow 2024-01-30 18:39:07 +02:00
  • d33f030896 separated sampling reset and sampling reset grammer only l3utterfly 2024-01-31 01:03:31 +09:00
  • 70074f6f10 added llama_sampling_rollback api l3utterfly 2024-01-31 01:02:34 +09:00
  • 5960879814
    Update reader.py John 2024-01-30 16:07:01 +01:00
  • e86dff5818
    Update reader.py John 2024-01-30 15:54:30 +01:00
  • 8f8ddfcfad
    sync : ggml (#0) b2022 Georgi Gerganov 2024-01-30 16:21:57 +02:00
  • 6fb50ebbf0
    gguf : fix comparison (ggml/715) Georgi Gerganov 2024-01-29 21:08:18 +02:00
  • 625a699b54
    ggml_cuda_cpy support for 4d tensors and float16->float32 upcasting (ggml/686) John Balis 2024-01-29 06:37:33 -06:00
  • a4b07c057a
    gguf : add input validation, prevent integer overflows (ggml/709) Georgi Gerganov 2024-01-29 14:00:10 +02:00
  • 549a1e6cd5
    ci : fix yolo URLs + fix metal capture (ggml/712) Georgi Gerganov 2024-01-29 13:29:46 +02:00
  • 5f14ee0b0c
    metal : add debug capture backend function (ggml/694) Jack Mousseau 2024-01-29 01:22:23 -08:00
  • ca4ec6d867 Add assert in ggml_cuda_op_pool2d zhangjidong 2024-01-30 22:14:50 +08:00
  • ccd4e4c1da
    Create reader.py John 2024-01-30 15:07:09 +01:00
  • 2fc88aa6e1
    Update CMakeLists.txt John 2024-01-30 14:54:26 +01:00
  • 0d94da7cbb cuda : more style fixes slaren 2024-01-30 14:52:54 +01:00
  • 8824e42786 test-backend-ops : remove f16 pool_2d tests slaren 2024-01-30 14:50:26 +01:00
  • bdf3b8ad70 ggml : check types in release builds too in pool_2d slaren 2024-01-30 14:49:55 +01:00
  • a8f58222fb
    fix format Abhilash Majumder 2024-01-30 19:01:24 +05:30
  • 8e14e3ddb3
    Faster AVX2 dot product for IQ2_XS (#5187) b2016 Kawrakow 2024-01-30 15:15:07 +02:00
  • f4d7e54974
    SOTA 3-bit quants (#5196) b2015 Kawrakow 2024-01-30 15:14:12 +02:00
  • caf2fc8294 cuda : fix warnings and formatting slaren 2024-01-30 14:05:35 +01:00
  • 04f10a2287 test-backend-ops : add more pool_2d tests slaren 2024-01-30 14:03:49 +01:00
  • 2256f36b79
    Vulkan Windows APU Memory Handling (#5199) b2014 0cc4m 2024-01-30 13:59:30 +01:00
  • 49f09aa72c fix avg pooling, count_include_pad zhangjidong 2024-01-30 20:36:08 +08:00
  • d0e10bf1b2
    server : more n_past fixes Georgi Gerganov 2024-01-30 13:22:33 +02:00
  • 5f1d91fbfe
    add newline Abhilash Majumder 2024-01-30 17:23:31 +05:30
  • 969f257383
    fix format Abhilash Majumder 2024-01-30 17:19:53 +05:30
  • fb6576bc18 Add IQ3_XXS to test-backend-ops Iwan Kawrakow 2024-01-30 13:43:20 +02:00
  • 8772d3ee63
    server : take system_tokens into account Georgi Gerganov 2024-01-29 15:52:18 +02:00
  • 51bb7f0eef
    server : fix context shift + simplify self-extend Georgi Gerganov 2024-01-29 14:58:40 +02:00
  • 7359016c7c
    quantize : fix typo (#5211) b2013 Vladimir Malyutin 2024-01-30 17:57:07 +07:00
  • 2f1262f46e Merge branch 'sycl_win_build' of https://github.com/NeoZhangJianyu/llama.cpp into sycl_win_build Zhang 2024-01-30 17:40:53 +08:00
  • 3e5b2eb163 allow to trigger manually, fix format issue Zhang 2024-01-30 17:39:32 +08:00
  • d6bf5fcf28
    Update quantize.cpp Vladimir Malyutin 2024-01-30 16:25:47 +07:00
  • 813416991a
    main : allow empty --prompt-cache file (#5176) b2012 divinity76 2024-01-30 10:18:02 +01:00