Commit graph

  • 4aa73e37b0
    swift : exclude ggml-metal.metal from the package Georgi Gerganov 2024-01-08 15:58:33 +02:00
  • 42ea63c5a3
    llama.swiftui : update readme Georgi Gerganov 2024-01-08 15:57:36 +02:00
  • 550829ed98 dont get stuck if cloudflared failed to download correctly Concedo 2024-01-08 21:11:17 +08:00
  • 4813e17548 llama : abort ctx if cuda backend init fails slaren 2024-01-08 14:00:34 +01:00
  • 5a62db30c3 llama : only initialize used backends, free backends on context free slaren 2024-01-08 13:54:50 +01:00
  • d41cef9326 minor slaren 2024-01-08 13:42:20 +01:00
  • 444b975edd llama : rewrite session kv load/set without graphs slaren 2024-01-08 12:56:31 +01:00
  • 5684d79079 Fixing tests Iwan Kawrakow 2024-01-08 13:39:36 +02:00
  • 542ae3b44c Merge upstream changes, fix conflicts 0cc4m 2024-01-08 12:34:35 +01:00
  • 7db967e811 Fix bug in qequantize_row_iq2_xxs Iwan Kawrakow 2024-01-08 12:12:00 +01:00
  • 61c04053a4 Fix missing MMQ ops when on hipBLAS Iwan Kawrakow 2024-01-08 10:09:50 +01:00
  • 47ae9b8f34 iq2_xxs: fix MoE on Metal Iwan Kawrakow 2024-01-06 06:32:37 +01:00
  • fd42737c09 iq2_xxs: add to llama ftype enum Iwan Kawrakow 2024-01-04 10:13:00 +01:00
  • c19d0d09ba iq2_xxs: slightly faster CUDA dot product Iwan Kawrakow 2024-01-04 08:56:17 +02:00
  • 8240521901 iq2_xxs: quantized CUDA dot product (MMVQ) Iwan Kawrakow 2024-01-04 07:32:39 +02:00
  • 06e6908a6b iq2_xxs: dequantize CUDA kernel - fix conflict with master Iwan Kawrakow 2024-01-04 06:25:06 +02:00
  • 065cc8cb47 iq2_xxs: even faster Metal dot product Iwan Kawrakow 2024-01-04 02:25:59 +01:00
  • e211fadc8a iq2_xxs: slighty faster dot product Iwan Kawrakow 2024-01-04 02:06:23 +01:00
  • 1c96aa0d7f iq2_xxs: slighty faster dot product Iwan Kawrakow 2024-01-03 19:32:57 +01:00
  • dd29610153 iq2_xxs: Metal dot product now works Iwan Kawrakow 2024-01-03 18:59:14 +01:00
  • d383f00eea iq2_xxs: WIP Metal Iwan Kawrakow 2024-01-03 18:25:02 +01:00
  • 7b72318e6f iq2_xxs: ARM_NEON dot product Iwan Kawrakow 2024-01-03 15:35:24 +01:00
  • 7ef6389694 iq2_xxs: scalar and AVX2 dot products Iwan Kawrakow 2024-01-03 15:46:59 +02:00
  • 4af24881b3 iq2_xxs: basics Iwan Kawrakow 2024-01-03 10:58:19 +02:00
  • 52531fdff8
    main : add self-extend support (#4815) b1789 Georgi Gerganov 2024-01-08 11:18:32 +02:00
  • 82048d4750
    llama : add comment about llama_kv_cache_seq_div Georgi Gerganov 2024-01-08 11:17:00 +02:00
  • 66ad819dca
    Merge remote-tracking branch 'origin/master' into gg/self-extend-part-2 Georgi Gerganov 2024-01-08 11:15:55 +02:00
  • b0034d93ce
    examples : add passkey test (#3856) b1788 Georgi Gerganov 2024-01-08 11:14:04 +02:00
  • d57cb9c294
    passkey : add readme passkey Georgi Gerganov 2024-01-08 11:13:44 +02:00
  • 164d7a0546
    passkey : add "self-extend"-like context extension (#4810) Georgi Gerganov 2024-01-08 11:10:32 +02:00
  • a42feb1885
    make : add passkey target Georgi Gerganov 2024-01-08 11:09:07 +02:00
  • 5ba6593252 add printing codes luffy06 2024-01-08 16:10:15 +08:00
  • 06e3b4f5c9 llama : break session version due to serialization changes David Friehs 2024-01-08 08:58:15 +01:00
  • 69d44e3e3f llama : serialize rng into minimum amount of space required David Friehs 2024-01-08 08:58:06 +01:00
  • e872af8dee llama : use ostringstream and istringstream for save and load David Friehs 2024-01-08 08:57:01 +01:00
  • 5ee581473e llama : only save and restore used logits David Friehs 2024-01-08 08:56:20 +01:00
  • b9c60dec98 llama : always reserve n_vocab * n_batch for logits David Friehs 2024-01-08 08:54:39 +01:00
  • 0093dea953 llama : only reserve n_vocab * n_batch at most for logits David Friehs 2024-01-08 08:54:13 +01:00
  • aee95df8f1 examples : save-load-state: save only required state David Friehs 2024-01-08 08:53:57 +01:00
  • f04b6e7287 Merge branch 'master' into concedo_experimental Concedo 2024-01-08 14:18:49 +08:00
  • c6af89e5ce
    chore: Apply ruff formatting to convert.py teleprint-me 2024-01-07 22:01:49 -05:00
  • 0614c338f8
    refactor: Further refine functionality, improve user interaction, and streamline vocabulary handling teleprint-me 2024-01-07 21:54:42 -05:00
  • ac145fd2e3 ggml : fix mul_mat_id work size slaren 2024-01-08 03:51:15 +01:00
  • 226cea270e
    refactor: Improve code organization, argument parsing, and user interface teleprint-me 2024-01-07 21:42:58 -05:00
  • 8aa5818a20
    feat: Introduce VocabFactory for flexible vocabulary management in model conversion teleprint-me 2024-01-07 21:32:42 -05:00
  • 5fa1a08c2f
    refactor: Update OutputFile class for enhanced model vocabulary management teleprint-me 2024-01-07 20:36:00 -05:00
  • 7e4a4ebc10
    refactor: Enhance readability, functionality, and code quality teleprint-me 2024-01-07 20:20:38 -05:00
  • db4b8ac37a
    refactor: Standardize vocabulary handling with HfVocab teleprint-me 2024-01-07 20:05:38 -05:00
  • 3ca2b100a9
    Restore BpeVocab and SentencePieceVocab classes teleprint-me 2024-01-07 19:39:20 -05:00
  • 15e18973da
    Refine Model Hyperparameters and Params Class teleprint-me 2024-01-07 19:25:07 -05:00
  • acf8f4b20f
    Merge branch 'master' into convert-py teleprint-me 2024-01-07 19:02:43 -05:00
  • b69021ef7f
    Update Imports and Add Notes for Future Reference teleprint-me 2024-01-07 18:51:51 -05:00
  • 5e879c9977 llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row) slaren 2024-01-07 23:26:49 +01:00
  • b7e7982953
    readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) b1787 Lars Grammel 2024-01-07 21:24:11 +01:00
  • ea12921826
    main : add Self-Extend support Georgi Gerganov 2024-01-07 22:22:44 +02:00
  • cde1c78e0d
    Add lgrammel/modelfusion JS/TS client for llama.cpp server Lars Grammel 2024-01-07 21:08:35 +01:00
  • d91c467379
    ROCm: use native CMake HIP support Gavin Zhao 2024-01-07 12:14:23 -05:00
  • 87c8207a04 Merge remote-tracking branch 'origin/master' into sl/backend-sched slaren 2024-01-07 17:59:26 +01:00
  • 226460cc0d
    llama-bench : add no-kv-offload parameter (#4812) b1786 slaren 2024-01-07 17:59:01 +01:00
  • 32d9f7d991 llama-bench : add no-kv-offload parameter slaren 2024-01-07 17:28:43 +01:00
  • d5a410e855
    CUDA: fixed redundant value dequantization (#4809) b1785 Johannes Gäßler 2024-01-07 17:24:08 +01:00
  • f64cddc76d
    passkey : add comment Georgi Gerganov 2024-01-07 16:37:02 +02:00
  • 2f40c9f6c5
    llama : "self-extend"-like context extension Georgi Gerganov 2024-01-07 16:16:19 +02:00
  • f2c9800dfb
    passkey : simplify n_past logic Georgi Gerganov 2024-01-07 17:52:12 +02:00
  • b9f89db84a
    metal : check for simdgroup matrix mul. feature Georgi Gerganov 2024-01-07 17:47:42 +02:00
  • 42b54f1419 CUDA: fixed redundant value dequantization JohannesGaessler 2024-01-07 13:57:35 +01:00
  • c6a638d054
    metal : fix Metal3 family check Georgi Gerganov 2024-01-07 15:01:18 +02:00
  • 7c16cf106d test-backend-ops : check buffer allocation failures slaren 2024-01-07 13:50:02 +01:00
  • bda3f2c892
    passkey : select pass key pos from CLI Georgi Gerganov 2024-01-07 14:48:09 +02:00
  • fbb999f592
    passkey : better prints Georgi Gerganov 2023-10-30 11:13:44 +02:00
  • 21196da114
    examples : add passkey test Georgi Gerganov 2023-10-30 10:44:07 +02:00
  • 9dede37d81
    llama : remove unused vars (#4796) b1784 Georgi Gerganov 2024-01-07 14:29:36 +02:00
  • f77c72f371
    ggml : fix null backend dereference (#4807) Georgi Gerganov 2024-01-07 13:06:57 +02:00
  • 44c93c6746
    ggml : also check ggml_backend_is_cpu Georgi Gerganov 2024-01-07 12:26:50 +02:00
  • cdb81d4ea1
    ggml : fix null backend dereference Georgi Gerganov 2024-01-07 12:20:04 +02:00
  • 3c36213df8
    llama : remove redundant GQA check (#4796) b1783 Georgi Gerganov 2024-01-07 11:21:53 +02:00
  • 72d8407b36
    llama.swiftui : use llama.cpp as SPM package (#4804) b1782 Alex Azarov 2024-01-07 09:20:50 +01:00
  • d117d4dc5d
    llama : print tensor meta for debugging b1781 Georgi Gerganov 2024-01-07 09:50:31 +02:00
  • 3418c03ecc
    llama.swiftui : add visionOS target (#4805) Alex Azarov 2024-01-07 08:46:55 +01:00
  • 63ee677efd
    ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787) b1779 Konstantin Zhuravlyov 2024-01-07 01:52:42 -05:00
  • 67984921a7
    server : fix n_predict check (#4798) b1778 Georgi Gerganov 2024-01-07 08:45:26 +02:00
  • e56a6e7865 Use llama.cpp as SPM package in iOS sample Alex Azarov 2024-01-07 05:19:51 +01:00
  • da95578195 Add support for visionOS in SwiftUI sample Alex Azarov 2024-01-07 05:58:37 +01:00
  • 1a108adf8a
    Merge branch 'master' into phi-1 teleprint-me 2024-01-06 19:32:30 -05:00
  • 72b74f364b cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available) slaren 2024-01-07 00:33:51 +01:00
  • 2f2c36799d cuda : add ggml-backend split buffer support slaren 2024-01-06 23:07:43 +01:00
  • c75ca5d96f
    llama.swiftui : use correct pointer for llama_token_eos (#4797) b1777 Daniel Illescas Romero 2024-01-06 16:12:59 +01:00
  • 58de673662
    server : fix n_predict check Georgi Gerganov 2024-01-06 17:10:33 +02:00
  • b5ce9349df Use correct pointer for llama_token_eos Daniel Illescas Romero 2024-01-06 15:41:09 +01:00
  • 77bb72cd8c
    metal : normalize encoder:setComputePipelineStatus calls Georgi Gerganov 2024-01-06 16:33:16 +02:00
  • fef1dbf2eb
    metal : free allocations Georgi Gerganov 2024-01-06 16:21:28 +02:00
  • 7cfde78190
    llama : remove redundant GQA check gg/remove-gqa-check-4657 Georgi Gerganov 2024-01-06 16:04:20 +02:00
  • 05f808cc83
    metal : check for Metal 3 Georgi Gerganov 2024-01-06 15:42:50 +02:00
  • 90fd43c7eb
    metal : fix check for simdgroup reduction support Georgi Gerganov 2024-01-06 15:04:27 +02:00
  • ae551e30ee
    metal : print only skipped kernels Georgi Gerganov 2024-01-06 14:24:23 +02:00
  • 0658b0275f
    metal : take into account simdgroup reduction support Georgi Gerganov 2024-01-06 14:22:12 +02:00
  • cd160f85f1
    metal : fix kernel init + fix compile options Georgi Gerganov 2024-01-06 14:05:13 +02:00
  • 8437ff5ee2
    metal : set kernel family requirements Georgi Gerganov 2024-01-06 13:47:08 +02:00
  • 9bc3618de8
    metal : refactor kernel loading Georgi Gerganov 2024-01-06 13:01:35 +02:00
  • 2c3ffd2944
    metal : detect more GPU families Georgi Gerganov 2024-01-06 12:20:09 +02:00