Commit graph

  • 4b2ee61f62 move graph map to backend object hongruichen 2024-07-05 11:56:31 +08:00
  • ca0d999c2a add ggml_qnn_graph hongruichen 2024-07-04 23:32:21 +08:00
  • 7c7b9c0bae
    Merge 5175117a09 into f09b7cb609 Farris 2024-07-05 11:18:41 +08:00
  • e7e17cf364 fix minor typos [no ci] Pieter Ouwerkerk 2024-07-04 22:58:49 -04:00
  • 87098db626 rebase work_space api luoyu-intel 2024-07-05 10:57:05 +08:00
  • ac8a4bd9d5 move QK_WARP_SIZE to presets.hpp luoyu-intel 2024-07-03 13:33:48 +08:00
  • d7cf5f5abb revert debug code luoyu-intel 2024-07-03 03:15:51 +00:00
  • 870b607c76 add concat support condition luoyu-intel 2024-07-03 03:05:21 +00:00
  • 0012f2c149 revert qx_k luoyu-intel 2024-07-02 09:07:49 +00:00
  • d70305b343 fix softmax luoyu-intel 2024-07-02 08:27:09 +00:00
  • e50517b64f split softmax luoyu-intel 2024-07-02 07:25:42 +00:00
  • c675aaf0b5 fix group_norm ut luoyu-intel 2024-07-02 06:55:06 +00:00
  • f09b7cb609
    rm get_work_group_size() by local cache for performance (#8286) b3307 Neo Zhang Jianyu 2024-07-05 10:32:29 +08:00
  • e47dadd037
    Merge 5b2736d949 into a38b884c6c Sanjay S Kumar 2024-07-04 18:00:03 -05:00
  • 54557c3d4a CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 Johannes Gäßler 2024-07-05 00:23:57 +02:00
  • 9b38f8bf65 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-07-04 17:33:52 -04:00
  • af514c8d77 sycl : refactored helper headers into multiple files Alberto Cabrera 2024-07-04 22:10:36 +01:00
  • 906476f6ff style: spaces jaime-m-p 2024-07-04 22:52:09 +02:00
  • f324f42c4e gguf-hash: missing stdalign.h in windows bypass brian khuu 2024-06-28 00:12:11 +10:00
  • d8dd43f94f gguf-hash: add cpp and python implementation of layer + model wide hashing brian khuu 2024-06-21 13:52:11 +10:00
  • 4db8c0d5d9 Update bruteforce test: add more models jaime-m-p 2024-07-04 22:38:12 +02:00
  • 11ac641c1e Update bruteforce test: header files location jaime-m-p 2024-07-04 22:35:21 +02:00
  • 2f150197e4 Better leading space removal jaime-m-p 2024-07-04 22:32:12 +02:00
  • 8f5e1e0c76 'viking' detokenizer clean spaces jaime-m-p 2024-07-04 22:30:48 +02:00
  • 91deef4606 py : rename requirements for convert_legacy_llama.py Francis Couture-Harpin 2024-07-04 16:16:05 -04:00
  • 902de8826b gguf-py : use snake_case in scripts entrypoint export Francis Couture-Harpin 2024-07-04 16:08:15 -04:00
  • 3e3cc7102f
    cont : fix link Georgi Gerganov 2024-07-04 22:36:36 +03:00
  • 9f3e9e3a09
    CUDA: revert part of the RDNA1 optimizations Daniele 2024-07-04 19:35:10 +00:00
  • 963fe1d70d
    llama : prefer n_ over num_ prefix Georgi Gerganov 2024-07-04 22:33:44 +03:00
  • c172b322c2
    cont Georgi Gerganov 2024-07-04 22:28:19 +03:00
  • a38b884c6c
    cli: add EOT when user hit Ctrl+C (#8296) b3306 Xuan Son Nguyen 2024-07-04 20:55:03 +02:00
  • 6fa99b4fad added support for Authorization Bearer tokens Derrick T. Woolworth 2024-07-04 13:48:32 -05:00
  • 600398fe60
    ggml : remove k_quants_per_iteration macro Georgi Gerganov 2024-07-04 21:19:09 +03:00
  • d8f2da6b9f
    cont Georgi Gerganov 2024-07-04 20:47:03 +03:00
  • 39a41a53b0
    py : switch to snake_case Georgi Gerganov 2024-07-04 20:44:32 +03:00
  • d7fd29fff1
    llama : add OpenELM support (#7359) b3305 Icecream95 2024-07-05 05:14:21 +12:00
  • 8be4fe43c3
    llama : minor comment Georgi Gerganov 2024-07-04 20:13:51 +03:00
  • 8072089c4e Merge commit 'f8c4c073' into detokenizer jaime-m-p 2024-07-04 18:50:04 +02:00
  • 6f63d646c1
    tokenize : add --show-count (token) option (#8299) b3304 Daniel Bevenius 2024-07-04 18:38:58 +02:00
  • fc67799107 typo fix OuadiElfarouki 2024-07-04 17:37:35 +01:00
  • f55b647300
    llama : minor indentation during tensor loading gg/indent Georgi Gerganov 2024-07-04 19:34:04 +03:00
  • 5be10ddd5e do not format system prompt if it is empty ngxson 2024-07-04 17:59:56 +02:00
  • 18e92879d5 llama : fix t5 uses of n_head and n_ff Francis Couture-Harpin 2024-07-04 11:52:48 -04:00
  • c6ac198424 Merge branch 'master' into openelm Francis Couture-Harpin 2024-07-04 11:45:21 -04:00
  • 269e07bb00 llama : use const ref for print_f and fix division by zero Francis Couture-Harpin 2024-07-04 11:39:32 -04:00
  • 51d2ebadbb build: Export hf-to-gguf as snakecase b3303 ditsuke 2024-07-04 20:54:35 +05:30
  • 1e920018d3 doc: Add context for why we add an explicit pytorch source ditsuke 2024-07-03 01:02:56 +05:30
  • 01a5f06550 chore: Remove rebase artifacts ditsuke 2024-07-02 15:48:13 +05:30
  • 07786a61a2 chore: Fixup requirements and build ditsuke 2024-07-02 15:35:43 +05:30
  • de14e2ea2b chore: ignore all __pychache__ ditsuke 2024-07-02 15:18:13 +05:30
  • 821922916f fix: Update script paths in CI scripts ditsuke 2024-03-10 23:21:46 +05:30
  • b1c3f26e5e fix: Actually include scripts in build ditsuke 2024-02-29 01:47:15 +05:30
  • b0a46993df build(python): Package scripts with pip-0517 compliance ditsuke 2024-02-27 12:01:02 +05:30
  • 80555483aa Caching device_info in device_ext to avoid repetitive queries OuadiElfarouki 2024-07-04 16:30:34 +01:00
  • 000240cf62 add clang format file and reformating hongruichen 2024-07-04 22:18:45 +08:00
  • 199d0fb0c9
    Merge branch 'master' into pr/7359 Georgi Gerganov 2024-07-04 18:25:16 +03:00
  • 4c6f3d6183
    build: Export hf-to-gguf as snakecase ditsuke 2024-07-04 20:54:35 +05:30
  • a9d644271c
    doc: Add context for why we add an explicit pytorch source ditsuke 2024-07-03 01:02:56 +05:30
  • 29e6edcc9b
    chore: Remove rebase artifacts ditsuke 2024-07-02 15:48:13 +05:30
  • 45f29bf802
    chore: Fixup requirements and build ditsuke 2024-07-02 15:35:43 +05:30
  • 4bebdfdb1b
    chore: ignore all __pychache__ ditsuke 2024-07-02 15:18:13 +05:30
  • d766dc4a47
    fix: Update script paths in CI scripts ditsuke 2024-03-10 23:21:46 +05:30
  • 2181a3e951
    fix: Actually include scripts in build ditsuke 2024-02-29 01:47:15 +05:30
  • db85d49ce9
    build(python): Package scripts with pip-0517 compliance ditsuke 2024-02-27 12:01:02 +05:30
  • 3fe395d220
    llama : handle n_head == 0 Georgi Gerganov 2024-07-04 18:23:17 +03:00
  • 98fc182312 style : remove spaces jaime-m-p 2024-07-04 17:11:45 +02:00
  • 952fcf50d7
    Merge f4d3bdabff into 807b0c49ff zhangkaihuo 2024-07-04 10:43:15 -04:00
  • d068ca13d4
    tokenize : add --show-count (token) option Daniel Bevenius 2024-07-04 16:21:23 +02:00
  • 33099de0b9
    Merge 47d821a08c into 807b0c49ff LDLINGLINGLING 2024-07-04 22:48:52 +09:00
  • 807b0c49ff
    Inference support for T5 and FLAN-T5 model families (#5763) b3295 fairydreaming 2024-07-04 15:46:11 +02:00
  • 22a648f8cc
    Merge branch 'master' into pr/7359 Georgi Gerganov 2024-07-04 16:41:27 +03:00
  • 9971c38ada
    llama : do not print hparams for vocab-only models Georgi Gerganov 2024-07-04 16:39:02 +03:00
  • b59ddf945e
    llama : fix save/load state Georgi Gerganov 2024-07-04 15:55:23 +03:00
  • 29ab5a0ed1
    llama : use std::array for per-layer hparams Georgi Gerganov 2024-07-04 15:35:15 +03:00
  • d7a877e244 main: add need_insert_eot ngxson 2024-07-04 14:23:18 +02:00
  • 9bcecf1de5
    Merge branch 'ggerganov:master' into t5-clean-3 fairydreaming 2024-07-04 13:51:33 +02:00
  • 8b560e63ec llama : silence compiler warnings Stanisław Szymczyk 2024-07-04 13:48:34 +02:00
  • f8c4c0738d
    tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231) b3294 Daniel Bevenius 2024-07-04 12:53:42 +02:00
  • 402d6feffa
    llama : suppress unref var in Windows MSVC (#8150) b3293 Daniel Bevenius 2024-07-04 12:50:57 +02:00
  • 763d4aaf56
    Update src/llama.cpp Georgi Gerganov 2024-07-04 13:50:47 +03:00
  • 01cd5a6670
    llama : minor Georgi Gerganov 2024-07-04 13:45:36 +03:00
  • ded682d43b
    llama-batched : add encoder support Georgi Gerganov 2024-07-04 13:38:08 +03:00
  • 88270a3613
    llama : simplify llama_encode_internal Georgi Gerganov 2024-07-04 12:10:32 +03:00
  • 977941d9fe imitate reshape bug of python code caitianchi 2024-07-04 17:25:02 +08:00
  • 03ab5dd67c
    llama : change naming to prefer "_enc" suffix Georgi Gerganov 2024-07-04 11:24:13 +03:00
  • 20fc3804bf
    convert : fix gemma v1 tokenizer convert (#8248) b3292 Georgi Gerganov 2024-07-04 10:41:03 +03:00
  • 9b9593c177
    Merge branch 'master' into gg/fix-gemma Georgi Gerganov 2024-07-04 10:40:34 +03:00
  • fdef7d606e replace get_work_group_size() by local buf Neo Zhang 2024-07-04 11:55:23 +08:00
  • 5dece9f922 rm get_work_group_size() by local cache for performance arthw 2024-07-04 09:40:50 +08:00
  • f619024764
    [SYCL] Remove unneeded semicolons (#8280) b3291 AidanBeltonS 2024-07-04 02:07:19 +01:00
  • 2493479958 skip UT for BF16 Neo Zhang 2024-07-04 08:28:58 +08:00
  • 05ef32bd4a
    Merge da43a545ef into d23287f122 Nathaniel Le Sage 2024-07-03 16:19:51 -07:00
  • d23287f122
    Define and optimize RDNA1 (#8085) b3290 Daniele 2024-07-03 23:02:58 +00:00
  • 68b57ed481
    Define and optimize RDNA1 Daniele 2024-06-23 23:38:28 +00:00
  • 2727b02b4d server : remove extra \n after <|eot_id|> in llama3 template mgroeber9110 2024-07-03 22:33:33 +02:00
  • 8cdf7f0735 Remove trailing whitespace. Clint Herron 2024-07-03 14:45:13 -04:00
  • 9068136659 Wordsmithing readme. Clint Herron 2024-07-03 14:35:39 -04:00
  • 4acbef2a84 Updating comments and message text for clarity. Clint Herron 2024-07-03 14:34:44 -04:00
  • 8445732bb5 Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames. Clint Herron 2024-07-03 14:24:53 -04:00
  • 5f2d4e60e2
    ppl : fix n_seq_max for perplexity (#8277) b3289 slaren 2024-07-03 19:33:31 +02:00