Commit graph

  • 03c0946d73
    convert : support models with multiple chat templates (#6588) Sigbjørn Skjæret 2024-04-18 13:49:01 +02:00
  • fa9e8c6689
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-18 14:39:23 +03:00
  • e11b2e6e1e
    Qwen2 : assume tied weights if lm_head/output weights is missing (#6738) b2692 Ren Xuancheng 2024-04-18 19:38:04 +08:00
  • 105332cc17
    metal : add BS=1 kernel for flash attention (#6508) Georgi Gerganov 2024-04-18 14:33:07 +03:00
  • 260cdb2d08
    llama-bench : add -fa,--flash-attn arg Georgi Gerganov 2024-04-18 14:28:19 +03:00
  • 4980e3507e ggml-ci slaren 2024-04-18 13:17:23 +02:00
  • 87968de9a9 fix KQ FP32 precision fpr parallel_blocks > 1 Johannes Gäßler 2024-04-17 17:31:03 +02:00
  • 2f538b9547 Add __hgt2_mask implementation for CUDA 11 Johannes Gäßler 2024-04-17 16:29:28 +02:00
  • 0bc67dd1c8 Calculate KQ as FP32 if KQV has GGML_PREC_F32 Johannes Gäßler 2024-04-16 16:22:29 +02:00
  • a5b0e2dea0 store temp KQ in registers Johannes Gäßler 2024-04-16 15:58:21 +02:00
  • ef9e1593f3 flush softmax exp below threshold to 0 Johannes Gäßler 2024-04-15 16:05:07 +02:00
  • 6a3b84236d fix flash_attn_vec_f16 race condition Johannes Gäßler 2024-04-13 22:05:43 +02:00
  • 34f93bbb39 CUDA: refactor host code, dyn. par. blocks Johannes Gäßler 2024-04-09 11:39:16 +02:00
  • 2080a97c5b
    llama : simplify moe reshapes Georgi Gerganov 2024-04-18 13:53:01 +03:00
  • f153e7e7c0
    flake-- Sigbjørn Skjæret 2024-04-18 12:05:39 +02:00
  • 57b93282a2
    Merge branch 'master' into multiple-chat-templates Sigbjørn Skjæret 2024-04-18 12:00:51 +02:00
  • 980bb1637f
    Add files via upload Sigbjørn Skjæret 2024-04-18 11:56:14 +02:00
  • a0782056c5
    New script to add/modify/remove metadata Sigbjørn Skjæret 2024-04-18 11:54:57 +02:00
  • c7ab76eacc
    Qwen2: assume tied weights if lm_head/output weights is missing Ren Xuancheng 2024-04-18 17:52:43 +08:00
  • c71bfd736e
    llama : fix compatibility with old 2 expert models (#6735) b2691 slaren 2024-04-18 09:04:47 +02:00
  • ce80217fa9 llama : fix compatibility with old 2 expert models slaren 2024-04-18 03:52:39 +02:00
  • 798c29d6b9 gritlm example using llama_get_embeddings_mean_pooled Matt Grosso 2024-04-17 17:49:29 -07:00
  • 2a24f71497 llama_get_embeddings_mean_pooled Matt Grosso 2024-04-17 17:47:25 -07:00
  • 4d8fe0764b metal : enable buffer log prints again slaren 2024-04-18 01:01:42 +02:00
  • 0e6963da8f cuda : fix warnings slaren 2024-04-17 23:47:04 +02:00
  • d18b19c8fe Merge remote-tracking branch 'origin/master' into sl/moe-rework-2 slaren 2024-04-17 23:45:06 +02:00
  • 5668c79ea0 server: bench: enable flash_attn param Pierrick HYMBERT 2024-04-17 23:26:29 +02:00
  • 3b8f1ec4b1
    llamafile : tmp disable + build sgemm.o when needed (#6716) b2690 Georgi Gerganov 2024-04-17 23:58:26 +03:00
  • 0dd7505ad4
    llamafile : tmp disable due to MoE bug Georgi Gerganov 2024-04-17 23:53:37 +03:00
  • 44ca5764d6 fix KQ FP32 precision fpr parallel_blocks > 1 Johannes Gäßler 2024-04-17 17:31:03 +02:00
  • 4e4d58ab6a Add __hgt2_mask implementation for CUDA 11 Johannes Gäßler 2024-04-17 16:29:28 +02:00
  • a9d6591652 Calculate KQ as FP32 if KQV has GGML_PREC_F32 Johannes Gäßler 2024-04-16 16:22:29 +02:00
  • aef96ff40a store temp KQ in registers Johannes Gäßler 2024-04-16 15:58:21 +02:00
  • 049533d99f flush softmax exp below threshold to 0 Johannes Gäßler 2024-04-15 16:05:07 +02:00
  • 359d0f565e fix flash_attn_vec_f16 race condition Johannes Gäßler 2024-04-13 22:05:43 +02:00
  • a83d993183 CUDA: refactor host code, dyn. par. blocks Johannes Gäßler 2024-04-09 11:39:16 +02:00
  • f7fe79a31d cuda : fix binbcast slaren 2024-04-17 19:28:21 +02:00
  • 997a9b5bd2 cleanup slaren 2024-04-17 19:12:34 +02:00
  • d68c935c8d cuda : fix bin bcast with non-cont src0 slaren 2024-04-17 19:02:52 +02:00
  • d9157cdf34
    Update server.cpp example with correct startup sequence ManniX-ITA 2024-04-17 18:55:09 +02:00
  • bf56fdecb3 cleanup slaren 2024-04-17 18:37:28 +02:00
  • fb168ac5f7 fix merge slaren 2024-04-17 18:12:03 +02:00
  • 42003fdc32 Merge remote-tracking branch 'origin/master' into sl/moe-rework-2 slaren 2024-04-17 17:56:41 +02:00
  • fc363e4afc add metal impl slaren 2024-04-12 23:33:46 +02:00
  • 8dd1ec8b3f
    readme : add UI (#6724) Yaroslav 2024-04-17 14:47:50 +02:00
  • 1f2a1c0d73
    Update README.md Georgi Gerganov 2024-04-17 15:47:35 +03:00
  • 405385726e server: support flash_attn param Pierrick HYMBERT 2024-04-17 14:05:02 +02:00
  • bac2af7872
    Update README.md Yaroslav 2024-04-17 13:51:38 +02:00
  • df175f7c60 Added --hf-token argument support Sourabrata Bose 2024-04-17 15:34:52 +05:30
  • 8d4e096b09
    fix build issue which reported by github CI system zhou.weiguo 2024-04-17 17:52:50 +08:00
  • 6a8e2ddcec
    fix build issue which reported by github CI system zhou.weiguo 2024-04-17 17:30:05 +08:00
  • 8b11ef1426
    fix build issue which reported by github CI system zhou.weiguo 2024-04-17 17:23:39 +08:00
  • 9f844a3d0c
    log: refine log function for Android zhou.weiguo 2024-04-17 17:12:40 +08:00
  • 599ce84a71
    llama : flash_attn cparam + fix defrag Georgi Gerganov 2024-04-17 12:00:35 +03:00
  • 2c41180e88
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-17 10:13:09 +03:00
  • 196e54f3c7
    build : sgemm.o only when needed Georgi Gerganov 2024-04-17 09:24:47 +03:00
  • d58d9d80f8 Fixed issue with gpt2 regex custom preprocessor Kazim Abrar Mahi 2024-04-17 07:40:40 +06:00
  • 49acdcb9c7
    add support to format input json as typescript function str Yingbei 2024-04-16 16:17:43 -07:00
  • 2e33911400
    Hacky func streaming (#2) Yingbei Tong 2024-04-16 21:22:32 +00:00
  • facb8b56f8
    convert : fix autoawq gemma (#6704) b2688 Zheng.Deng 2024-04-17 04:51:07 +08:00
  • 532c1737a1
    llama : make general.name optional (#6709) b2687 Georgi Gerganov 2024-04-16 23:50:38 +03:00
  • 666867b799
    ggml : fix llamafile sgemm wdata offsets (#6710) b2686 Georgi Gerganov 2024-04-16 23:50:22 +03:00
  • 42b5d17c32
    ggml : fix llamafile sgemm wdata offsets Georgi Gerganov 2024-04-16 22:41:03 +03:00
  • f02ea667c1
    ggml : temporary disable llamafile sgemm until fixed gg/disable-sgemm Georgi Gerganov 2024-04-16 22:41:03 +03:00
  • ab0dee580a
    llama : make general.name optional Georgi Gerganov 2024-04-16 22:10:52 +03:00
  • 8cc91dc63c
    ggml : add llamafile sgemm (#6414) b2685 Justine Tunney 2024-04-16 14:55:30 -04:00
  • 8dd93aa06d Merge remote-tracking branch 'origin/support_codeqwen' into support_codeqwen JustinLin610 2024-04-17 02:11:15 +08:00
  • 6d84a42d1e change code to full string match and print necessary message Zheng.Deng 2024-04-17 02:10:12 +08:00
  • 9eef1dfb58 fix typo JustinLin610 2024-04-17 02:09:36 +08:00
  • 9ae221c32f
    Merge branch 'ggerganov:master' into support_codeqwen Junyang Lin 2024-04-17 02:00:54 +08:00
  • f22900b30c override load_hparams JustinLin610 2024-04-17 01:53:56 +08:00
  • 6cf0d467f8 add support of codeqwen due to tokenizer JustinLin610 2024-04-17 00:37:49 +08:00
  • dbceec87c0
    llama : add StableLM2 12B (#6635) b2684 Ashish 2024-04-16 08:48:35 -07:00
  • f4dea7da18
    llama : add qwen2moe (#6074) b2683 Shijie 2024-04-16 23:40:48 +08:00
  • 245565fc6d
    llama : add model type name Georgi Gerganov 2024-04-16 18:39:48 +03:00
  • 94e8c490fe Format Ashish 2024-04-16 07:57:36 -07:00
  • e6ec203336 Fixed incorrect tensor passing Ashish 2024-04-16 07:11:58 -07:00
  • 6ae4dad004 Removed unused comment Ashish 2024-04-16 06:11:47 -07:00
  • 8a15f932af Removed unnecessary conditional branches Ashish 2024-04-16 06:03:47 -07:00
  • 0060ccdde6
    Merge branch 'ggerganov:master' into fix-awq-gemma-convert Zheng.Deng 2024-04-16 20:58:36 +08:00
  • 1cd0a03720 fix zhangkaihuo 2024-04-16 20:34:21 +08:00
  • 4da6e3e3e4 merge upstream zhangkaihuo 2024-04-16 20:29:29 +08:00
  • de3555119e
    llama : reuse build_moe_ffn() Georgi Gerganov 2024-04-16 13:42:19 +03:00
  • b00d38b0b1 feat: embedding gets results Joan Martinez 2024-04-16 11:51:38 +02:00
  • eedd42e376 KV Cache defrag hash overflow - TMP Fix by @slaren #6685 hp/tmp/kv-cache-defrag Pierrick HYMBERT 2024-04-16 10:24:34 +02:00
  • ec3cc36dc8 Use CMAKE_CURRENT_LIST_DIR dixyes 2024-04-16 16:13:54 +08:00
  • 8a56075b07
    gritlm : add --outdir option to hf.sh script (#6699) Daniel Bevenius 2024-04-16 08:34:06 +02:00
  • 58227ffdeb
    perplexity : require positive --ctx-size arg (#6695) b2681 Georgi Gerganov 2024-04-16 09:28:33 +03:00
  • dfb6a0139c Add cmake for MUSA support dixyes 2024-04-16 14:18:38 +08:00
  • 4fbd8098e6
    gguf : add special tokens metadata for FIM/Infill (#6689) b2680 Daniel Bevenius 2024-04-16 08:13:13 +02:00
  • f88e6844a4
    names : for brevity "SHARED_EXP" -> "SHEXP" Georgi Gerganov 2024-04-16 09:01:40 +03:00
  • 7355ca84b5 fix-review simonJJJ 2024-04-16 13:29:42 +08:00
  • 1618626292
    gritln : add --outdir option to hf.sh script Daniel Bevenius 2024-04-16 06:18:05 +02:00
  • 183c4bb365
    Improve cpu prompt eval speed Justine Tunney 2024-04-15 19:43:40 -07:00
  • cc8d52921c Fixed issues Kazim Abrar Mahi 2024-04-16 05:53:29 +06:00
  • 6c80b3c504 Added needed functionality, testing remains Kazim Abrar Mahi 2024-04-16 04:56:35 +06:00
  • b9286a4d7b build: nits ochafik 2024-04-15 21:17:28 +01:00
  • 0c991bebb4 Adding unicode regex function Kazim Abrar Mahi 2024-04-16 01:52:33 +06:00
  • 0010280042
    metal : require contiguousness for float4 unary kernels (cont) Georgi Gerganov 2024-04-15 22:51:42 +03:00
  • 7c1ab98183
    metal : require contiguousness for float4 unary kernels Georgi Gerganov 2024-04-15 22:48:39 +03:00