Commit graph

  • 8fb5cc99a2 PR comments; using portable initialization of uint32x2_t Eric Sommerlade 2023-09-04 20:28:27 +01:00
  • e36ecdccc8
    build : on Mac OS enable Metal by default (#2901) b1177 Georgi Gerganov 2023-09-04 22:26:24 +03:00
  • 30ac7a4117
    gitignore : metal build-metal-default Georgi Gerganov 2023-09-04 22:23:16 +03:00
  • 28eea84ac0
    make : fix merge conflict remnants Georgi Gerganov 2023-09-04 22:21:45 +03:00
  • 65520729a2
    Merge branch 'master' into build-metal-default Georgi Gerganov 2023-09-04 22:20:51 +03:00
  • ac4038aab1
    readme : update Metal instructions Georgi Gerganov 2023-09-04 22:19:24 +03:00
  • 23360b15b6
    common : better n_gpu_layers assignment Georgi Gerganov 2023-09-04 22:14:22 +03:00
  • f3a84b2e0d
    llama : better express the KV cache dependencies in the graph metal-cont-bug Georgi Gerganov 2023-09-04 21:44:48 +03:00
  • 60c2ef6d92
    metal : utilize view_src to see of tensor is a view Georgi Gerganov 2023-09-04 20:49:09 +03:00
  • ebd3467cc8
    metal : more readable kernel Georgi Gerganov 2023-09-04 20:48:46 +03:00
  • 7704db2521
    ggml : just in case Georgi Gerganov 2023-09-04 20:48:25 +03:00
  • ad80e5a4a7
    llama : add ggml_cont to trigger bug with Metal Georgi Gerganov 2023-09-04 19:46:52 +03:00
  • c25ed2a522 Added magic for file types John Boero 2023-09-04 14:19:25 +01:00
  • bd33e5ab92
    ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994) b1176 slaren 2023-09-04 14:59:52 +02:00
  • c79d130f74
    make : fix speculative build speculative-grammar Georgi Gerganov 2023-09-04 15:50:04 +03:00
  • 2db2471c13
    speculative : avoid grammar_mem Georgi Gerganov 2023-09-04 15:42:54 +03:00
  • e7dc5b08ac
    speculative : reuse grammar parser + better logs and comments Georgi Gerganov 2023-09-04 15:18:38 +03:00
  • 6c150d763e
    speculative : print draft token pieces Georgi Gerganov 2023-09-04 12:54:38 +03:00
  • ebe41d49a6
    common : warm-up with 2 tokens - seems to work better Georgi Gerganov 2023-09-03 21:07:01 +03:00
  • 013457885a
    grammar : remove one nested level Georgi Gerganov 2023-09-03 19:10:43 +03:00
  • 2d89da4f77
    grammar : add comments to new grammar file Georgi Gerganov 2023-09-03 18:47:38 +03:00
  • e0a8658e7c
    grammars : add json_arr.gbnf Georgi Gerganov 2023-09-03 17:52:49 +03:00
  • 69f2fafebc
    speculative : add grammar support Georgi Gerganov 2023-09-03 15:25:53 +03:00
  • 2d5f5d7499 Also guard against extremely small weights Iwan Kawrakow 2023-09-04 15:04:09 +03:00
  • 37873ae77a update format jameswu2014 2023-09-04 19:10:22 +08:00
  • 79f1757661 Guard against all weights in a super-block being zero Iwan Kawrakow 2023-09-04 14:10:15 +03:00
  • 2cc8dce48f
    Merge branch 'ggerganov:master' into master jameswu2014 2023-09-04 19:02:30 +08:00
  • 9248528d6e
    Constrain minimum n_draft to 2 Lengyue 2023-09-04 06:59:51 -04:00
  • 3103568144
    llama-bench : make cpp file non-executable (#2999) b1175 Cebtenzzre 2023-09-04 06:40:18 -04:00
  • 5b8530d88c
    make : add speculative example (#3003) b1174 Leng Yue 2023-09-04 03:39:57 -07:00
  • d1940a3646 update format jameswu2014 2023-09-04 18:12:25 +08:00
  • bd72ba0445 Feature: support baichuan serial models, by now, including Baichuan-7B, Baichuan-13B,in the feature, we will support more Baichuan-models jameswu2014 2023-09-04 17:43:41 +08:00
  • 245f9e8d82
    Merge pull request #1 from es0m/last-working-old-file-format Eric Sommerlade 2023-09-04 10:29:01 +01:00
  • 416f98789b
    Merge branch 'master' into last-working-old-file-format Eric Sommerlade 2023-09-04 10:26:17 +01:00
  • 98230ef656
    Add heuristic algo for speculative Lengyue 2023-09-04 04:51:30 -04:00
  • e4386f417f
    server : add a subtle loading animation to the edit box (#2466) b1173 Aarni Koskela 2023-09-04 10:28:55 +02:00
  • 687f77b7ef
    Fix speculative makefile Lengyue 2023-09-04 03:55:10 -04:00
  • 35195689cd
    2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985) b1172 Jiahao Li 2023-09-04 14:53:30 +08:00
  • 578371693a server: add a subtle loading animation to the edit box Aarni Koskela 2023-07-31 12:15:28 +02:00
  • 2796e750b1 editorconfig: add override for the server HTML (which already is 2-space indented) Aarni Koskela 2023-07-31 12:15:11 +02:00
  • 84220dfec5 Merge branch 'master' of https://github.com/goerch/llama.cpp goerch 2023-09-04 07:22:49 +02:00
  • 64c87bfe8c examples : add compiler version and target to build info Cebtenzzre 2023-09-03 19:47:01 -04:00
  • 5453cc6483 llama-bench : make cpp file non-executable Cebtenzzre 2023-09-03 21:49:58 -04:00
  • 98a6004aa1 build-info : cleanup Cebtenzzre 2023-09-03 13:55:20 -04:00
  • b1593bc17c build-info : fix newlines correctly Cebtenzzre 2023-09-03 13:49:29 -04:00
  • 045718f07a examples : refactor build info prints into print_build_info Cebtenzzre 2023-09-03 13:38:03 -04:00
  • 9ea2f7ff58
    Merge branch 'master' into finetune-lora xaedes 2023-09-04 02:40:44 +02:00
  • d2a4c682ef win arm fixes Eric Sommerlade 2023-09-03 23:39:14 +01:00
  • ca588a3c39 Added FIM readme. apaz-cli 2023-09-03 16:48:03 -05:00
  • 82dcadde0d Added -fsanitize=address to the makefile. apaz-cli 2023-09-03 16:33:59 -05:00
  • 314c29cc32 Debugging crash. apaz-cli 2023-09-03 16:32:55 -05:00
  • 5e4063a508 update the package kchro3 2023-09-03 12:38:09 -07:00
  • 2cab21c3db nother small improvement for Q3_K on metal Iwan Kawrakow 2023-09-03 21:50:26 +03:00
  • 9eb1d4d347 Slowly progressing on Q3_K on metal Iwan Kawrakow 2023-09-03 19:38:57 +03:00
  • 123a870b36 Another Q3_K speedup on metal Iwan Kawrakow 2023-09-03 17:35:41 +03:00
  • ec13de521c Slightly faster Q3_K and Q5_K on metal Iwan Kawrakow 2023-09-03 16:03:32 +03:00
  • cf9b08485c
    ggml-alloc : use virtual memory for measurement (#2973) b1171 slaren 2023-09-03 20:34:09 +02:00
  • 50589ed6be
    load default rms_norm and rope parameters from base model xaedes 2023-09-03 20:05:54 +02:00
  • bdb7092e82
    add missing gguf_free in load_checkpoint_lora_file xaedes 2023-09-03 20:04:03 +02:00
  • e07f5c57bb
    fix printf format warnings xaedes 2023-09-03 20:03:39 +02:00
  • 2d63144343 ggml-opencl : store GPU buffer in ggml_tensor::extra slaren 2023-09-03 20:01:51 +02:00
  • 406e0750cc
    update README.md xaedes 2023-09-03 19:25:18 +02:00
  • 203afcf6bc fallback to fixed address for systems without virtual memory slaren 2023-09-03 15:20:15 +02:00
  • 5b335cbe95 Remove trailing whitespace Mason M 2023-09-03 10:17:29 -03:00
  • 47068e5170
    speculative : PoC for speeding-up inference via speculative sampling (#2926) b1170 Georgi Gerganov 2023-09-03 15:12:08 +03:00
  • 847896aba7
    speculative : add --draft CLI arg speculative Georgi Gerganov 2023-09-03 13:51:07 +03:00
  • a15ca746c7
    speculative : print encoding speed Georgi Gerganov 2023-09-03 13:40:42 +03:00
  • c82c808da0
    speculative : initial example Georgi Gerganov 2023-09-03 13:34:50 +03:00
  • 8f429fa511
    perplexity : fix ETA by warming up the model with an empty run b1169 Georgi Gerganov 2023-09-03 13:42:56 +03:00
  • 6519e9c99c
    gguf(python): Fix special vocab handling when id < 0 (#2984) Kerfuffle 2023-09-03 04:38:43 -06:00
  • b7f2aa9e51
    metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986) Georgi Gerganov 2023-09-03 13:23:33 +03:00
  • 73a12a6344
    cov : disable comment in PRs (#2989) Alon 2023-09-03 13:19:01 +03:00
  • 3730134776
    llama : fix bpe tokenize from byte (#2889) b1165 opparco 2023-09-03 19:18:09 +09:00
  • 0b02f60c4f disable comment in PRs Alon Faraj 2023-09-03 13:16:16 +03:00
  • 6731796b02 Fix bug intriduced in PR #2959 Iwan Kawrakow 2023-09-03 13:14:32 +03:00
  • 8f653ddc5e
    metal : restore 363f0bf and fix reduce in F16_F32 kernels Georgi Gerganov 2023-09-03 13:03:21 +03:00
  • 9dc817e57f Fix code style lijiahao 2023-09-03 17:54:22 +08:00
  • d9151e6f57
    metal : revert 6af0bab until we fix it Georgi Gerganov 2023-09-03 12:40:56 +03:00
  • 54ddacaa8b 2x faster (rms) norm cuda kernels lijiahao 2023-09-03 16:25:29 +08:00
  • afc43d5f82
    cov : add Code Coverage and codecov.io integration (#2928) b1163 Alon 2023-09-03 11:48:49 +03:00
  • 6460f758db
    opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955) b1162 Wentai Zhang 2023-09-03 16:46:44 +08:00
  • 457de66764
    Merge branch 'master' into add-code-coverage Alon 2023-09-03 11:38:16 +03:00
  • 2c6ae862a4 wrap coverage output files in COV_TARGETS Alon Faraj 2023-09-03 11:29:04 +03:00
  • ca82cf7bac
    metal : more optimizations (#2959) Kawrakow 2023-09-03 11:06:22 +03:00
  • c3d3ea9b3a gguf: Fix special vocab handling when id < 0 KerfuffleV2 2023-09-03 01:41:26 -06:00
  • 323a9d3b8c
    llama : fix vocab_only logic when GPU is enabled Georgi Gerganov 2023-09-03 10:39:15 +03:00
  • 99161230c4
    llama : enable GPU inference by default with Metal Georgi Gerganov 2023-09-03 10:30:53 +03:00
  • 15f1790a75
    make : fix target clean Georgi Gerganov 2023-09-03 10:13:44 +03:00
  • b59beebdbf
    make : move targets back to the top Georgi Gerganov 2023-09-03 10:04:51 +03:00
  • 4de22829d9
    Merge branch 'master' into build-metal-default Georgi Gerganov 2023-09-03 10:03:59 +03:00
  • e2ac2965d4
    Merge b46ae7bde9 into 6a31a3bd98 Shouzheng Liu 2023-09-03 02:21:50 -04:00
  • 6a31a3bd98
    swift : add support for k-quants (#2983) kchro3 2023-09-02 23:21:05 -07:00
  • 63cef16956 merge conflict kchro3 2023-09-02 23:18:23 -07:00
  • e7056a465e add support for k quantization for swift kchro3 2023-09-02 23:10:45 -07:00
  • 6af0bab347 ~4-5% improvement for Q8_0 TG on metal Iwan Kawrakow 2023-09-03 09:00:27 +03:00
  • cff7b0bf07
    convert.py : BPE fixes (#2938) Kerfuffle 2023-09-02 23:52:13 -06:00
  • 340af42f09
    docs : add catai to README.md (#2967) Ido S 2023-09-03 08:50:51 +03:00
  • c42f0ec6b3
    examples : fix gpt-neox (#2943) b1157 momonga 2023-09-03 14:36:28 +09:00
  • 2753415afd
    swift : add missing c file to Package.swift (#2978) kchro3 2023-09-02 22:27:25 -07:00
  • bc054af97a
    make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886) b1155 Cebtenzzre 2023-09-03 01:26:59 -04:00