Commit graph

  • daeaeb1222 Merge remote-tracking branch 'origin/master' into bins Olivier Chafik 2024-06-10 15:38:41 +01:00
  • 5265c15d4c rename llama|main -> llama-cli; consistent RPM bin prefixes Olivier Chafik 2024-06-10 15:34:14 +01:00
  • f2a029bd9d
    ggml : improve ggml_is_contiguous logic Georgi Gerganov 2024-06-10 17:04:14 +03:00
  • fd5ea0f897
    ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • 348e8d5f02 try win-2019 on server windows test slaren 2024-06-10 12:04:38 +02:00
  • 5f8cfe4a1e
    ggml-qnn: refine source code of ggml-qnn.cpp to make reviewer more happy zhou.weiguo 2024-06-10 20:07:26 +08:00
  • c28a83902c
    examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • d9da0e4986
    server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • efa06098a6 swap bar orders Christian Zhou-Zheng 2024-06-10 07:58:17 -04:00
  • 4550826871
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 07:55:24 -04:00
  • c1b1a29266
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 07:55:01 -04:00
  • ad02c9409a
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 07:54:50 -04:00
  • 99f9a24805
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 07:54:18 -04:00
  • 7eea552db8
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-10 07:54:06 -04:00
  • 514d83e0e0 try win-2019 on server windows test slaren 2024-06-10 12:04:38 +02:00
  • 1f0dabda8d
    CUDA: use tensor cores for MMQ (#7676) Johannes Gäßler 2024-06-10 11:45:13 +02:00
  • 4bb03cade0
    ci : disable server-windows workflow gg/server-debug-win Georgi Gerganov 2024-06-10 12:30:18 +03:00
  • af4ae502dd
    use the correct SYCL context for host USM allocations (#7777) Ben Ashbaugh 2024-06-10 02:21:31 -07:00
  • 70d4cc1c33 Make updates to type cast based on compiler instead of OS Srihari-mcw 2024-06-10 00:18:35 -07:00
  • a64a81a294 fix writeback returning too early Johannes Gäßler 2024-06-10 09:17:58 +02:00
  • a9cde5c63e __builtin_assume -> GGML_CUDA_ASSUME Johannes Gäßler 2024-06-10 08:51:35 +02:00
  • 9e4d62e6ab
    server : improve "prompt" handling gg/server-fix-prompt Georgi Gerganov 2024-06-10 09:12:04 +03:00
  • 956bb14595
    examples : remove --instruct remnants gg/remove-instruct Georgi Gerganov 2024-06-10 08:37:47 +03:00
  • 3695a2b00f
    examples: refine tensor dump zhou.weiguo 2024-06-10 12:37:27 +08:00
  • 0fd5a1bb58 initial iq4_xs netrunnereve 2024-06-09 23:48:36 -04:00
  • 2c96a5cd8b
    examples: refine tensor dump zhou.weiguo 2024-06-10 11:43:03 +08:00
  • c0fd4df883 fix merge Eddie-Wang 2024-06-10 03:07:38 +00:00
  • 841c903ff9
    Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-10 10:51:47 +08:00
  • abd798d70f fix code Eddie-Wang 2024-06-10 02:50:14 +00:00
  • 46054d1aab
    truncate intermediate fp32 if converting bf16 to bf16 Sigbjørn Skjæret 2024-06-10 04:30:47 +02:00
  • 6a52bfe332
    add truncate_bf16 Sigbjørn Skjæret 2024-06-10 04:26:55 +02:00
  • 6a9b626ba5 Merge remote-tracking branch 'origin/master' into grammar-fast ochafik 2024-06-10 02:02:50 +01:00
  • d6483a9c07 add min/max constrained int field to pydantic json schema example ochafik 2024-06-10 02:00:04 +01:00
  • 124f44dd38 json: add doc to grammar readme ochafik 2024-06-10 01:27:54 +01:00
  • 8cf46e0424 json: prevent number precision & whitespace runaways in example grammars ochafik 2024-06-10 00:35:15 +01:00
  • 52c4bcf1d2 json: fix char pattern in grammar converters ochafik 2024-06-10 00:29:35 +01:00
  • 79bd2bfcb0 catch oversights Christian Zhou-Zheng 2024-06-09 20:22:17 -04:00
  • f7e7983946
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-09 20:17:25 -04:00
  • 10ceba354a
    flake.lock: Update (#7838) Georgi Gerganov 2024-06-10 02:04:50 +03:00
  • 43f74e0c1f json: update pydantic example to set additionalProperties: false ochafik 2024-06-09 23:14:18 +01:00
  • 87d506f523 json: allow space after enum/const ochafik 2024-06-09 22:45:43 +01:00
  • 8785a6eea0 json: don't force additional props after normal properties! ochafik 2024-06-09 22:38:58 +01:00
  • 1e2d9cb589 progress bar, fix split logic Christian Zhou-Zheng 2024-06-09 17:31:25 -04:00
  • 70a6bc91cc
    Update gguf-py/gguf/gguf_writer.py Christian Zhou-Zheng 2024-06-09 17:08:11 -04:00
  • 0417104397 fix linting Christian Zhou-Zheng 2024-06-09 16:05:08 -04:00
  • 12a061c8eb json: default additionalProperty to true ochafik 2024-06-09 21:03:52 +01:00
  • 9d7f694438 fix typing and clean up Christian Zhou-Zheng 2024-06-09 16:02:23 -04:00
  • cad377d3a1 add C++11-compatible replacement for std::string_view ochafik 2024-06-09 19:35:36 +01:00
  • aede2f51f1 spacing changes. Julia Longtin 2024-05-12 09:36:08 +00:00
  • bd22e9d28a do 2 rounds of 4, instead of 4 rounds of 2. and properly offset unalligned reads across a 64 byte boundary. Julia Longtin 2024-05-11 20:28:47 +00:00
  • 7925fb1f64 make offset available in a register. Julia Longtin 2024-05-11 19:57:45 +00:00
  • 084e3683fb load from identical addresses for low and high side. Julia Longtin 2024-05-11 19:48:53 +00:00
  • 420e9dbd44 minor comment fixes. Julia Longtin 2024-05-11 19:47:20 +00:00
  • 3d39d619da make the offset of q4 available. Julia Longtin 2024-05-11 19:39:53 +00:00
  • 257c06b73c add missing vector. Julia Longtin 2024-05-11 19:29:09 +00:00
  • 50887fc9fd fill and increment r12 and r13. Julia Longtin 2024-05-11 19:24:11 +00:00
  • 0c0137ef18 relabel some other labels. Julia Longtin 2024-05-11 19:02:48 +00:00
  • eefa650da0 rename some labels. Julia Longtin 2024-05-11 17:56:10 +00:00
  • 9aa34c8884 rename label 1 to 3. Julia Longtin 2024-05-11 14:24:30 +00:00
  • 9f3623fffc introduce r10 and r11, for vloadunpackhd. Julia Longtin 2024-05-11 14:02:36 +00:00
  • a273a9ebf2 spacing changes. Julia Longtin 2024-05-11 13:35:50 +00:00
  • fc23c22fd2 spacing changes. Julia Longtin 2024-05-11 13:26:00 +00:00
  • 4d948317c8 add missing jump. Julia Longtin 2024-05-11 12:53:23 +00:00
  • 1b7ca0b413 look at the right final memory location. Julia Longtin 2024-05-11 11:27:52 +00:00
  • 47ca67a062 subtract the correct amount. Julia Longtin 2024-05-11 11:11:15 +00:00
  • 511ad8043f change from handling three iterations per loop to four. Julia Longtin 2024-05-11 11:07:16 +00:00
  • 4097cde569 comment clarification. Julia Longtin 2024-05-10 21:57:16 +00:00
  • f3b86eb792 correct a comment, and use jz when comparing to zero. Julia Longtin 2024-05-10 20:30:56 +00:00
  • 9a1a53be8e use values inside of the loop as soon as we have them. Julia Longtin 2024-05-10 19:33:58 +00:00
  • 270204e57b fix loop. Julia Longtin 2024-05-10 17:07:27 +00:00
  • dda250f637 move sub earlier, and move the compare of iterations to outside, and at the end of the loop. Julia Longtin 2024-05-10 17:03:41 +00:00
  • f555f9d075 spacing and comment changes. Julia Longtin 2024-05-10 16:50:39 +00:00
  • 204bc1ffdc remove useless prefetches. Julia Longtin 2024-05-10 16:28:53 +00:00
  • d8d574c56f perform better prefetches, and invert the test of our clear flag for clarity. Julia Longtin 2024-05-10 16:14:28 +00:00
  • a14fe02cf8 use vbroadcastss in place of vbroadcast32x4. Julia Longtin 2024-05-10 15:52:35 +00:00
  • b1c9622d9e Use a vectorized assembly function to handle remaining chunks less than vector wide. Julia Longtin 2024-05-10 14:52:46 +00:00
  • 6e0258abac broadcast a single int8, instead of 4 of them. Julia Longtin 2024-05-10 14:19:27 +00:00
  • 664a6025a1 use different restrict syntax, to make g++ happy. Julia Longtin 2024-05-09 23:08:43 +00:00
  • 2cf193efc0 fix typo Julia Longtin 2024-05-09 20:41:50 +00:00
  • c39fa8b6b8 remove a warning. Julia Longtin 2024-05-09 20:40:50 +00:00
  • 9fa06f4767 add batch fp16<->fp32 conversion functions. Julia Longtin 2024-05-09 19:31:28 +00:00
  • 1c2fdc3412 minor spacing and comment changes. Julia Longtin 2024-05-09 16:57:59 +00:00
  • 54f181d24a spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. Julia Longtin 2024-04-26 14:44:08 +00:00
  • 9a799ebdae spacing and capitalization changes. Julia Longtin 2024-04-25 21:23:22 +00:00
  • 0124f7acd8 use or, instead of and. bug fix? Julia Longtin 2024-04-24 17:50:12 +00:00
  • dc1f639bf0 comment and spacing fixes. Julia Longtin 2024-04-24 17:38:42 +00:00
  • 4fb1547ba6 fix an offset error, and get rid of tabs. Julia Longtin 2024-04-22 18:29:31 +00:00
  • e37b7f8497 fix some small errors. Julia Longtin 2024-04-22 18:22:22 +00:00
  • c3d438bce2 further optimizations. 0.99 tokens per second. Julia Longtin 2024-04-22 18:16:28 +00:00
  • d966ac2ebe replace tabs with spaces. Julia Longtin 2024-04-03 23:42:34 +00:00
  • fb83cd987d reformat, and label what these files are. Julia Longtin 2024-04-03 23:21:24 +00:00
  • b8abefbec6 use GGML_F32_EPR, and remove some dead code. Julia Longtin 2024-04-03 22:04:45 +00:00
  • f84859a926 whoops. missing tab. Julia Longtin 2024-04-03 21:58:29 +00:00
  • ded4da4518 add Makefile rule for generation .s file, for manual inspection. Julia Longtin 2024-04-03 20:30:25 +00:00
  • aeb5ae85ad formatting changes. Julia Longtin 2024-04-03 20:24:00 +00:00
  • 3ff09248ff indent headers consistently. Julia Longtin 2024-04-03 19:01:18 +00:00
  • 3cf6eb0cc0 formatting. Julia Longtin 2024-04-02 17:01:53 +00:00
  • 90498c1181 minor changes. Julia Longtin 2024-04-02 16:55:40 +00:00
  • 33cc1d8c8e massively rewrite assembly routines. Julia Longtin 2024-04-02 15:41:56 +00:00
  • 20c2bc53f9 fix vector sizes. Julia Longtin 2024-03-25 19:43:37 +00:00