Commit graph

  • 028eebf180 WIndows on ARM build clarifications AndreasKunar 2024-07-18 14:11:58 +02:00
  • bfc0e0c923 WOA build clarifications AndreasKunar 2024-07-18 14:04:04 +02:00
  • 974410a684
    ggml : remove special Q4_0 code for first 2 blocks Georgi Gerganov 2024-07-18 13:50:26 +03:00
  • 62a3185ca6
    ggml : fix q8_0 Georgi Gerganov 2024-07-18 10:53:03 +03:00
  • 79b95e3420
    ggml : fix q4_0 Georgi Gerganov 2024-07-18 10:47:14 +03:00
  • 15f5cc450c bug: fix allocation size overflow at log hongruichen 2024-07-18 19:44:05 +08:00
  • be8306d795 convert-*.py: autogen uuid brian khuu 2024-07-18 21:02:10 +10:00
  • 0d2c7321e9
    server: use relative routes for static files in new UI (#8552) Eric Zhang 2024-07-18 18:43:49 +08:00
  • 672a6f1018
    convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) Brian 2024-07-18 20:40:15 +10:00
  • 3cc2edb073
    Merge branch 'ggerganov:master' into snapdragonxwin-fix1 Andreas (Andi) Kunar 2024-07-18 12:36:21 +02:00
  • 15260c5ba8 fix layer input for swin norm nopperl 2024-07-18 10:26:34 +02:00
  • 6160a76efb sycl::queue can directly use as shared_ptr Chen Xi 2024-07-18 08:07:05 +00:00
  • 3807c3de04
    server : respect --special cli arg (#8553) b3412 RunningLeon 2024-07-18 16:06:22 +08:00
  • f40cd2073a adapt to new lora implementation nopperl 2024-07-18 09:59:30 +02:00
  • 3f68842e1c
    ggml : fix iq4_nl metal Georgi Gerganov 2024-07-18 10:34:19 +03:00
  • f6f2ff9557
    ggml : fix q5_1 Georgi Gerganov 2024-07-18 10:06:17 +03:00
  • 67b079fc98
    ggml : fix q5_0 Georgi Gerganov 2024-07-18 10:04:11 +03:00
  • e5e7a24ee7
    ggml : fix q4_1 Georgi Gerganov 2024-07-18 10:00:03 +03:00
  • fa568f6a82 check for k norm separately nopperl 2024-07-18 09:30:40 +02:00
  • da5e356dfb fix ci nopperl 2024-07-18 09:28:26 +02:00
  • bd71cdac0f file format issue Chen Xi 2024-07-18 07:19:24 +00:00
  • d096de2e90 remove unnecessary whitespace Chen Xi 2024-07-18 07:16:17 +00:00
  • 6b4f7b2ac1 fix some typo Chen Xi 2024-07-18 07:13:01 +00:00
  • 90e93db548 add warm up also for promp_len=32, warm up both gemm and gemv luoyu-intel 2024-07-18 15:04:41 +08:00
  • cd296feac3 fix multi-gpu issue on sycl Chen Xi 2024-07-17 06:38:36 +00:00
  • beaafb3cca fix special not work RunningLeon 2024-07-18 14:33:12 +08:00
  • 73899f74cf gguf-py : handle more name metadata extraction edge cases Francis Couture-Harpin 2024-07-18 02:28:57 -04:00
  • 126201d1a2 add comment to conversion nopperl 2024-07-18 08:28:27 +02:00
  • b16c09a1cc Merge branch 'master' into chameleon nopperl 2024-07-18 08:25:33 +02:00
  • c4b159958b
    Merge pull request #2 from zihaoccc/device_info Zihao Chen 2024-07-18 00:46:52 -05:00
  • 6d598b6b1f fetch device info and including in network registry Zihao Chen 2024-07-18 00:45:45 -05:00
  • 0d314db877
    server: public: use relative routes for static files in new UI EZForever 2024-07-18 13:36:25 +08:00
  • f1eecf1add
    server: public: fix api_url on non-index pages EZForever 2024-07-18 13:28:03 +08:00
  • 6e3ab8736e remove empty line luoyu-intel 2024-07-18 11:41:18 +08:00
  • 9dd297f847 fix bug of dequant luoyu-intel 2024-07-18 03:34:47 +00:00
  • 4c9932c1e1 gguf-py : fix flake8 lint Francis Couture-Harpin 2024-07-17 23:26:45 -04:00
  • 3e67a42394 fix stride of src1 luoyu-intel 2024-07-18 11:25:05 +08:00
  • 2c18a9a4d4 gguf-py : extract metadata from model name more resiliently Francis Couture-Harpin 2024-07-17 23:17:39 -04:00
  • d82b3a0bdb feat: add GGML_UNARY_OP_GELU hongruichen 2024-07-18 10:25:45 +08:00
  • 85a1c1271e fix code bug luoyu-intel 2024-07-18 10:39:57 +08:00
  • 20d3b74020 fix deq kernel luoyu-intel 2024-07-18 10:39:32 +08:00
  • e62fab8a49 add q4_0 dequant luoyu-intel 2024-07-18 10:35:12 +08:00
  • 0b8565d979 use dmmv as default luoyu-intel 2024-07-16 15:46:49 +08:00
  • a8c75c041d fix buf luoyu-intel 2024-07-16 06:37:10 +00:00
  • 216201230c support new q4_0 layout luoyu-intel 2024-07-16 14:17:22 +08:00
  • 127d62fa06 fix the unalign size luoyu-intel 2024-07-15 15:15:41 +08:00
  • d9b89eb308 revert cpy code luoyu-intel 2024-07-11 07:51:23 +00:00
  • 92784e8059 add new format luoyu-intel 2024-07-11 07:13:15 +00:00
  • b67ff18382 fix code luoyu-intel 2024-07-10 09:25:35 +00:00
  • 46e9503851 update luoyu-intel 2024-07-10 09:20:28 +00:00
  • d838096ebe add new q40 dmmv luoyu-intel 2024-07-10 16:59:42 +08:00
  • 90e8f81556 ggml : fix iq4_nl dot product with odd number of blocks slaren 2024-07-17 23:11:05 +02:00
  • e02b597be3
    lookup: fibonacci hashing, fix crashes (#8548) b3411 Johannes Gäßler 2024-07-17 23:35:44 +02:00
  • f203a49919 lookup: fibonacci hashing, fix crashes Johannes Gäßler 2024-07-17 22:09:04 +02:00
  • 1725de768e llama : fix t5 segfault Francis Couture-Harpin 2024-07-17 15:36:56 -04:00
  • 1fb5d4fdee llama : apply suggestions Francis Couture-Harpin 2024-07-17 14:48:09 -04:00
  • b3283448ce
    build : Fix docker build warnings (#8535) (#8537) Al Mochkin 2024-07-17 20:21:55 +02:00
  • ce199b2de7 refactoring: downgrade some log to debug level hongruichen 2024-07-17 23:43:22 +08:00
  • c76fc9aa2f fix warnings hongruichen 2024-07-17 23:30:14 +08:00
  • 6457a68bd7 disable qnn profiling in release build hongruichen 2024-07-17 23:24:29 +08:00
  • 90766e15e2 rem tabs nopperl 2024-07-17 17:18:09 +02:00
  • b7d781ec81 remove qnn dedicated unit tests since we're now using the test-backend-ops to cross-validate backend ops hongruichen 2024-07-17 23:08:16 +08:00
  • 30f80ca0bc
    CONTRIBUTING.md : remove mention of noci (#8541) Brian 2024-07-18 00:57:06 +10:00
  • 758612a984 suppress image token output nopperl 2024-07-17 16:19:13 +02:00
  • 2502b57203 fix warnings hongruichen 2024-07-17 21:39:25 +08:00
  • fb1746f381
    Merge branch 'ggerganov:master' into snapdragonxwin-fix1 Andreas (Andi) Kunar 2024-07-17 16:03:37 +02:00
  • 66a1f66d2e
    Update CONTRIBUTING.md to remove mention of noci Brian 2024-07-17 23:42:24 +10:00
  • 454deef83c register qnn backend hongruichen 2024-07-17 20:53:53 +08:00
  • ee1c6a4d89
    Update README.md to include steps to run cmake Amit Kumar Jha 2024-07-17 17:20:07 +04:00
  • eed960575f add build step of QNN backend at ggml hongruichen 2024-07-17 19:43:01 +08:00
  • 1bdd8ae19f
    [CANN] Add Ascend NPU backend (#6035) b3408 hipudding 2024-07-17 19:23:50 +08:00
  • 3d3523e432 implement swin norm nopperl 2024-07-17 12:55:47 +02:00
  • c460d5c3bb return qk norm weights and biases to original format nopperl 2024-07-17 12:53:56 +02:00
  • 1289e3516e Improvements for Windows with Snapdragon X AndreasKunar 2024-07-17 11:54:53 +02:00
  • 67cf0ea16f
    build : Fix docker build warnings (#8535) Al Mochkin 2024-07-17 11:16:34 +02:00
  • 861bb9c580 Merge tag 'b3405' into dev-refactoring hongruichen 2024-07-17 17:13:55 +08:00
  • 9944aa60a6 Delete Trailing whitespace huafengchun 2024-07-17 09:06:01 +00:00
  • 5b7c575282 Add logging for CANN backend huafengchun 2024-07-17 08:16:21 +00:00
  • 820665f3a1 Revert "Improvements for Windows with Snapdragon X" AndreasKunar 2024-07-17 09:56:54 +02:00
  • 7d87a0d1d8
    llama : replace asserts with exceptions Georgi Gerganov 2024-07-17 10:47:29 +03:00
  • da3913d8f9
    batched: fix n_predict parameter (#8527) b3407 Masaya, Kato 2024-07-17 16:34:28 +09:00
  • d65a8361fe
    llama : disable context-shift for DeepSeek v2 (#8501) b3406 Georgi Gerganov 2024-07-17 10:32:59 +03:00
  • b268edf87c
    llama : bump max layers from 256 to 512 Georgi Gerganov 2024-07-17 10:01:54 +03:00
  • c5b68515f0 fix issues for merging caitianchi 2024-07-17 15:04:25 +08:00
  • bf21397ae5 Improvements for Windows with Snapdragon X AndreasKunar 2024-07-17 08:59:21 +02:00
  • bb13795dce refactoring: remove unused functions and variables hongruichen 2024-07-17 14:13:42 +08:00
  • 63dc587dff refactoring: make the buffer alloc and free stay in same class hongruichen 2024-07-17 13:34:05 +08:00
  • 7b7db0bbee llama : logits_all has priority over batch->logits Francis Couture-Harpin 2024-07-17 01:14:26 -04:00
  • b1ef302991 refactoring: remove depend of dlsym at utils.hpp hongruichen 2024-07-17 12:21:33 +08:00
  • 5b393836a9 batched: fix n_predict parameter msy-kato 2024-07-17 12:38:47 +09:00
  • 2e4adb47ec llama : fix integer signedness mixing Francis Couture-Harpin 2024-07-16 22:12:47 -04:00
  • 57197b74b0 add ggml_cann prefix for acl funcs huafengchun 2024-07-17 02:06:36 +00:00
  • 94483d4260 ggml: Install all public headers regardless of build settings 65a 2024-07-14 12:49:29 -07:00
  • 96e09b979d Make ggml-common.h private huafengchun 2024-07-17 01:04:15 +00:00
  • 22504ec67e Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-07-16 20:54:39 -04:00
  • c51daefc32 llama : advanced batch splits Francis Couture-Harpin 2024-07-16 20:33:45 -04:00
  • c140bc1c88 Add support for Lite-Mistral-Instruct chat template AmgadHasan 2024-07-16 22:54:35 +00:00
  • 5e116e8dd5
    make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) b3405 Johannes Gäßler 2024-07-16 21:20:59 +02:00
  • 0301b500cd refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_INTERFACE_VER_TYPE outside of qnn.hpp hongruichen 2024-07-16 22:52:16 +08:00
  • 7e9271cabf convert_lora_to_gguf.py: remove model_name parameter. Doesn't exist in LoraModel() brian khuu 2024-07-17 01:11:27 +10:00