Commit graph

  • ad52d5c259
    doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288) Vaibhav Srivastav 2024-05-16 07:38:43 +02:00
  • 172b78210a
    ci: fix bin/Release path for windows-arm64 builds (#7317) b2897 Max Krasnyansky 2024-05-15 22:36:43 -07:00
  • e910cea4f3 ci: fix bin/Release path for windows-arm64 builds Max Krasnyansky 2024-05-15 21:36:43 -07:00
  • 13ad16af12
    Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191) Max Krasnyansky 2024-05-15 19:47:36 -07:00
  • d467139e44
    ROCm: use native CMake HIP support Gavin Zhao 2024-03-09 12:21:27 -05:00
  • 2bfab835e6 debug-test.sh: fix gdb brian khuu 2024-05-16 09:23:42 +10:00
  • 5de9b743f8 sched : support async weight copy sl/async-weight-copy slaren 2024-05-14 20:41:33 +02:00
  • f59edeeae9 increase eps to 1e-7 slaren 2024-05-16 00:04:02 +02:00
  • cc0332dfd7 CUDA: faster large batch FA without tensor cores Johannes Gäßler 2024-05-13 21:24:11 +02:00
  • 8f7080bf48
    readme : remove stray double quote (#7310) Daniel Bevenius 2024-05-15 23:41:03 +02:00
  • 6fa6a9a10a ggml : fix quants nans when all the group weights are very close to zero slaren 2024-05-15 23:26:11 +02:00
  • f1bac2e995
    readme : remove stray double quote Daniel Bevenius 2024-05-15 20:49:30 +02:00
  • 76af67c853
    Update README.md Vaibhav Srivastav 2024-05-15 20:48:09 +02:00
  • e1b40ac3b9
    ggml : use dynamic thread scheduling for matrix multiplication (#6915) b2894 kunnis 2024-05-15 12:59:12 -05:00
  • 7a3ac0cc15 Merge branch 'master' into hkvc_chaton_v3 HanishKVC 2024-05-15 23:17:11 +05:30
  • 50006b3f8d Qwen/Qwen-7b support added amd-lalithnc 2024-05-15 22:45:15 +05:30
  • 2185e5cf14
    docs: Update and fix CLI help descriptions teleprint-me 2024-05-15 12:58:24 -04:00
  • b4b6f1fa00
    fix: End messages with a user role due to jinja2 conditional checks teleprint-me 2024-05-15 12:54:55 -04:00
  • 14c104d166
    Update ggml.c slaren 2024-05-15 18:46:01 +02:00
  • f2aabab436
    Update ggml.c slaren 2024-05-15 18:45:54 +02:00
  • 6f201480de
    Tie weights for larger Granite Code models (20B, 34B) Steffen Röcker 2024-05-15 18:08:34 +02:00
  • ece01fc2e9 matmul-int8: remove unnecessary casts in q8_0_q8_0 Max Krasnyansky 2024-05-15 09:05:07 -07:00
  • 397249df61 DataUtilsString: string_as_hex and use direct log helpers HanishKVC 2024-05-15 19:44:07 +05:30
  • bb3fe48c16 SimpCfg+DataUtilsString: Move string helpers to its own file HanishKVC 2024-05-15 19:20:38 +05:30
  • dc020985b8
    Avoid unnecessarily disabling CUDA graphs (#7302) b2893 agray3 2024-05-15 14:44:49 +01:00
  • 344f9126cc
    ggml : tag ggml_tensor::backend as deprecated (#7290) b2892 slaren 2024-05-15 15:08:48 +02:00
  • cdd91f5ad1 SimpCfg: Trap conversion error and raise appropriate exception HanishKVC 2024-05-15 18:37:15 +05:30
  • 48ff66c979 rpc : add command line arg for specifying backend memory Radoslav Gerganov 2024-05-15 15:29:07 +03:00
  • 9a17ab914b
    Add missing " (#7303) b2891 AidanBeltonS 2024-05-15 13:26:30 +01:00
  • 5a3391a820 Add missing " Aidan 2024-05-15 13:19:10 +01:00
  • c9c88ac378 Avoid unnecessarily disabling CUDA graphs Alan Gray 2024-05-15 04:24:55 -07:00
  • ea3b0590ee
    embedding : free the batch after execution (#7297) b2890 dm4 2024-05-15 20:01:12 +08:00
  • 4ac22a8239
    Update convert-hf-to-gguf-update.py Bram Vanroy 2024-05-15 12:55:11 +02:00
  • 0c6ae1249f
    add phi 2 hash Bram Vanroy 2024-05-15 12:52:50 +02:00
  • 89b3236152 fix: vsnprintf terminates with 0, string use not correct Y. Velkov 2024-05-15 13:29:50 +03:00
  • 29499bb593
    sync : ggml b2889 Georgi Gerganov 2024-05-15 13:23:41 +03:00
  • 48aa8fd1f2
    ggml : add ggml_upscale_ext (ggml/814) John Balis 2024-05-15 03:52:33 -05:00
  • 9f8d92d690 fix compile error Y. Velkov 2024-05-15 12:52:09 +03:00
  • 5c65037280
    Update README.md Vaibhav Srivastav 2024-05-15 11:45:19 +02:00
  • 85263f0568 Minor fixes after merging. Stanisław Szymczyk 2024-05-15 09:30:58 +02:00
  • 7a5df5fbd5 Merge branch 'ggerganov:master' into snowflake-arctic Stanisław Szymczyk 2024-05-15 11:42:38 +02:00
  • f4421f7cd8 convert-hf : Corrected sentencepiece API calls. Stanisław Szymczyk 2024-05-14 20:52:51 +02:00
  • 98a40c555c fix grammer lol. Vaibhav Srivastav 2024-05-15 11:38:15 +02:00
  • 518fbc1f74 embedding: free the batch after execution dm4 2024-05-15 17:14:44 +08:00
  • 4f5add68c6 GroupKV:Dump/Log type of the variant instance also HanishKVC 2024-05-15 13:36:27 +05:30
  • 26235eda50 logging: output capture in cuda module Y. Velkov 2024-05-15 11:57:44 +03:00
  • dc03a7134a CMakeLists: base std::variantC++17, specificTest std::formatC++20 HanishKVC 2024-05-15 12:56:05 +05:30
  • 583fd6b000
    server bench: fix bench not waiting for model load (#7284) Johannes Gäßler 2024-05-15 08:44:16 +02:00
  • f70e6d1da6
    Merge branch 'ggerganov:master' into jieran-dev Bizhao Shi 2024-05-15 14:32:58 +08:00
  • 397801476e matmul-int8: fixed typos in q8_0_q8_0 matmuls Max Krasnyansky 2024-05-12 20:24:21 -07:00
  • e838a3d459 ci: add support for optimized Windows ARM64 builds with MSVC and LLVM Max Krasnyansky 2024-05-14 20:36:57 -07:00
  • 18d20c0727 matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings Max Krasnyansky 2024-04-30 17:10:38 -07:00
  • 7d4695370a build: add CMake Presets and toolchian files for Windows ARM64 Max Krasnyansky 2024-04-30 12:31:54 -07:00
  • ff48f5a4a0 logging: add proper checks for clang to avoid errors and warnings with VA_ARGS Max Krasnyansky 2024-04-30 12:29:34 -07:00
  • 2dd9f017e8 Going with unused because there's conditional logic that needs it. Kunnis 2024-05-14 20:51:47 -05:00
  • 2304113b1c debug-test.sh: comment style changes brian khuu 2024-05-15 11:49:28 +10:00
  • 1aa55e1452 ggml : tag ggml_tensor::backend as deprecated slaren 2024-05-15 01:40:17 +02:00
  • 741a1981b8 Fix Warnings Kunnis 2024-05-14 18:32:52 -05:00
  • 6b0c90fc71 More style fixes. Kunnis 2024-05-14 17:32:23 -05:00
  • 163dbfdd28 Couple more formatting fixes. Kunnis 2024-05-14 17:19:34 -05:00
  • d9ba30a204 Fix formatting Kunnis 2024-05-14 17:15:47 -05:00
  • 4a15989000 ChatON: Forgot this note earlier HanishKVC 2024-05-15 03:38:41 +05:30
  • a3d641b555 ChatON: Move loading from json file into its own file HanishKVC 2024-05-15 02:26:51 +05:30
  • 3775d0debb chore: add references to the quantisation space. Vaibhav Srivastav 2024-05-14 23:01:11 +02:00
  • 14c28e717e GroupKV+: dump cleanup - forgot to commit earlier HanishKVC 2024-05-15 02:11:26 +05:30
  • 2992479a42 avoid to get prompt in infill mode and embedding mode wudexiang 2024-05-15 01:54:13 +08:00
  • edec5ba3d7 debug-test: refactor for clarity brian khuu 2024-05-15 02:37:07 +10:00
  • 8975de996b ChatON: Update Notes to match the updated semantics and flows HanishKVC 2024-05-14 19:39:17 +05:30
  • 9f773486ab
    script : sync ggml-rpc b2886 Georgi Gerganov 2024-05-14 19:14:38 +03:00
  • e8a7fd4fb0
    metal : support FA without mask + add asserts (#7278) b2885 Georgi Gerganov 2024-05-14 19:09:30 +03:00
  • a5e3fde857 sync : ggml b2884 Georgi Gerganov 2024-05-14 15:33:16 +03:00
  • f308ea7059 metal : tune soft_max number of threads (whisper/0) Georgi Gerganov 2024-05-13 11:01:07 +03:00
  • c3c88f296a ggml : try fix ppc64 (whisper/0) Georgi Gerganov 2024-05-12 20:36:31 +03:00
  • 182adefcf3 ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/2128) Przemysław Pawełczyk 2024-05-08 17:33:43 +02:00
  • 0d26d8ccd8 ggml : optimize for ppc64le using VSX intrinsics (ggml/784) Hong Bo PENG 2024-05-12 17:17:18 +08:00
  • f8c0b474ec ChatON+:RenameTo chaton_meta_load_json to match semantic HanishKVC 2024-05-14 21:34:53 +05:30
  • bd5c39e0f0 ChatOn+GroupKV: Cleanup a bit, including using debug logging HanishKVC 2024-05-14 21:15:02 +05:30
  • 4f0263633b
    server: free sampling contexts on exit (#7264) b2879 Steve Grubb 2024-05-14 10:11:24 -04:00
  • 4d646f8f13 common: free ctx_gguf when exiting llama_control_vector_load_one Steve Grubb 2024-05-14 10:05:26 -04:00
  • f692dbd724 server bench: fix bench not waiting for model load Johannes Gäßler 2024-05-14 14:50:39 +02:00
  • d4239194cb debug-test.sh: refactor brian khuu 2024-05-14 23:18:57 +10:00
  • bb9ce52b11 ChatON+: ValidateDump dumps All, wrapped in optional LDBUG_LN HanishKVC 2024-05-14 18:25:58 +05:30
  • 1265c670fd
    Revert "move ndk code to a new library (#6951)" (#7282) b2878 Brian 2024-05-14 23:10:39 +10:00
  • f274de5386 debug-test.sh: combined execute and gdb test mode via -g flag brian khuu 2024-05-14 23:09:25 +10:00
  • 0b7c4e82c2
    Revert "move ndk code to a new library (#6951)" Brian 2024-05-14 22:38:06 +10:00
  • a607a0d93d
    sync : ggml Georgi Gerganov 2024-05-14 15:33:16 +03:00
  • 71e51a0e9b
    metal : tune soft_max number of threads (whisper/0) Georgi Gerganov 2024-05-13 11:01:07 +03:00
  • 6555bb3848
    ggml : try fix ppc64 (whisper/0) Georgi Gerganov 2024-05-12 20:36:31 +03:00
  • ec5bcc4c2e
    ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/2128) Przemysław Pawełczyk 2024-05-08 17:33:43 +02:00
  • 469571d9a3
    ggml : optimize for ppc64le using VSX intrinsics (ggml/784) Hong Bo PENG 2024-05-12 17:17:18 +08:00
  • d75ceb168b run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust brian khuu 2024-05-14 21:53:33 +10:00
  • 5e31828d3e
    ggml : add RPC backend (#6829) b2877 Radoslav Gerganov 2024-05-14 14:27:19 +03:00
  • f8c96dfd97
    metal : support non-contiguous KV Georgi Gerganov 2024-05-14 14:10:35 +03:00
  • a2e6b9dee1
    ggml : fa without mask + add asserts Georgi Gerganov 2024-05-14 14:00:43 +03:00
  • 1519cb4582 fix compile warnings on macos Radoslav Gerganov 2024-05-13 17:28:48 +03:00
  • caa37b2bd6 Address review comments Radoslav Gerganov 2024-05-13 16:22:34 +03:00
  • cfd5d3e7ca win32 fix Radoslav Gerganov 2024-05-13 14:47:10 +03:00
  • b4bf42a9fc Address review comments Radoslav Gerganov 2024-05-13 10:11:46 +03:00
  • df54adabea readme : trim trailing whitespace Radoslav Gerganov 2024-05-10 14:35:24 +03:00
  • 7975f43eb1 add README Radoslav Gerganov 2024-05-10 10:43:51 +03:00