Commit graph

  • c7e3cd08ce
    Merge branch 'master' into update_flake_lock_action Philip Taron 2024-06-24 08:26:19 -07:00
  • b9b64ca889
    Merge aa3fd500b1 into d62e4aaa02 Alexander Komarov 2024-06-24 09:32:11 -04:00
  • 61f3cb6e22 CUDA: use MMQ instead of cuBLAS by default Johannes Gäßler 2024-06-23 12:07:04 +02:00
  • 3a4d5790bf add eos_id_list to llama.cpp toyer 2024-06-24 12:27:02 +00:00
  • d62e4aaa02
    gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py (#8090) fairydreaming 2024-06-24 14:13:39 +02:00
  • a2e9934baa
    llama : return nullptr from llama_grammar_init Daniel Bevenius 2024-06-24 13:03:35 +02:00
  • 64f2a194db
    Merge branch 'master' into gguf-dump-grouping-fix Brian 2024-06-24 20:58:06 +10:00
  • eb1c225f0e gguf-dump.py: Rename variables and adjust comments brian khuu 2024-06-24 18:37:31 +10:00
  • de61181079 gguf-dump: add --data-alignment brian khuu 2024-06-21 21:08:59 +10:00
  • c0e6537508 gguf-dump: refactor GGUFReader for clarity brian khuu 2024-06-21 20:26:09 +10:00
  • 0fee5b6d56 gguf-dump: add tensor data offset table brian khuu 2024-06-21 20:24:15 +10:00
  • b664c3cad7 gguf-dump: add --data-offset brian khuu 2024-06-21 20:23:34 +10:00
  • 9a590c8226
    CUDA: optimize MMQ int8 tensor core performance (#8062) Johannes Gäßler 2024-06-24 12:41:23 +02:00
  • 5db2131250 simplify code, make functions constexpr Johannes Gäßler 2024-06-22 19:00:00 +02:00
  • cab5981951 only a single get_mma_tile_x_k function Johannes Gäßler 2024-06-22 15:02:42 +02:00
  • db6dae797b CUDA: optimize MMQ int8 tensor core performance Johannes Gäßler 2024-06-21 11:16:18 +02:00
  • 52fc8705a0
    Option to split during conversion (#6942) Christian Zhou-Zheng 2024-06-24 05:42:03 -04:00
  • 6fdcb3b9a6
    Merge f05a0e0a00 into 8cb508d0d5 Galunid 2024-06-24 11:09:26 +02:00
  • a28e70fde8 code style ngxson 2024-06-24 11:07:08 +02:00
  • c530ce4c17 Merge branch 'master' into xsn/main_chat_template_2 ngxson 2024-06-24 11:00:00 +02:00
  • 7a7650231a code style ngxson 2024-06-24 10:57:47 +02:00
  • ea784c1051
    [SYCL] re-enabled mul_mat_batched_sycl path for batched Q*K & KQ*V (#8057) Ouadie EL FAROUKI 2024-06-24 09:57:12 +01:00
  • a1e9520995 fix server ngxson 2024-06-24 10:56:55 +02:00
  • a3dbfabe93 add llama_chat_format_example ngxson 2024-06-24 10:52:17 +02:00
  • 43cab6bfc6 improve ngxson 2024-06-24 10:45:31 +02:00
  • e2b45a763b gguf-py : fix tensor groups for encoder-decoder models in gguf-dump.py Stanisław Szymczyk 2024-06-24 10:28:58 +02:00
  • e1056da1c0 fix op handle checker hongruichen 2024-06-24 12:06:42 +08:00
  • c2c799cefa llama : add T5 model architecture, tensors and model header parameters Stanisław Szymczyk 2024-06-24 08:27:18 +02:00
  • 6e4182c42d
    Merge branch 'master' into convert-split Christian Zhou-Zheng 2024-06-24 02:14:48 -04:00
  • afed7ee0db
    Merge d7a7a780c9 into 8cb508d0d5 Adrian Liechti 2024-06-24 13:54:49 +08:00
  • 8cb508d0d5
    disable publishing the full-rocm docker image (#8083) b3212 slaren 2024-06-24 07:36:11 +02:00
  • 646ef4a9cf
    embedding : more cli arguments (#7458) b3211 Yann Follet 2024-06-24 13:30:24 +08:00
  • de0d6a68ac
    gguf-py, convert-hf : model conversion support for T5 and FLAN-T5 model variants (#5763) fairydreaming 2024-06-24 07:06:05 +02:00
  • a6821563e8 Add download-release.sh for easy install Nick Crews 2024-06-20 14:31:07 -08:00
  • 3853e3af73 convert-hf : remove duplicated initialization of variables Stanisław Szymczyk 2024-06-24 06:11:53 +02:00
  • cb8cfb9d4d
    Merge pull request #15 from OpenBMB/master tc-mb 2024-06-24 11:29:30 +08:00
  • 77beb4d153
    Merge branch 'prepare-PR-of-minicpm-v2.5' into master tc-mb 2024-06-24 11:29:17 +08:00
  • 8a30bd039e
    Merge branch 'ggerganov:master' into t5-clean fairydreaming 2024-06-24 05:23:37 +02:00
  • d8928ef619
    Merge branch 'ggerganov:master' into master ZeusXuan 2024-06-24 10:43:39 +08:00
  • d94eaa69d1 By changing priorty between --token_embedding_type, --output_tensor_type and --pure, it is more friendly for user to define own quantization strategy zzx 2024-06-24 10:39:22 +08:00
  • 25fc226143 fix code formating group of parameters // embedding print usage for embedding parameters Yann Follet 2024-06-24 02:07:24 +00:00
  • 95f57bb5d5
    ggml : remove ggml_task_type and GGML_PERF (#8017) b3209 slaren 2024-06-24 03:07:59 +02:00
  • 9fb8a75e33 Merge remote-tracking branch 'origin/master' into json-bounds2 ochafik 2024-06-24 00:51:35 +01:00
  • 33933b8504 Merge remote-tracking branch 'origin/master' into json-type ochafik 2024-06-24 00:50:57 +01:00
  • 65680f9434 Merge remote-tracking branch 'origin/master' into json-additional ochafik 2024-06-24 00:46:42 +01:00
  • 3c64db18d7 reformat grammar integ tests w/ R"""()""" strings where there's escapes ochafik 2024-06-24 00:27:32 +01:00
  • 6cf4cc2847 update tests ochafik 2024-06-24 00:10:02 +01:00
  • 3d5019ffbd Merge remote-tracking branch 'upstream/master' into t5-clean Stanisław Szymczyk 2024-06-23 21:51:40 +02:00
  • 44c8648461 Fix detokenizer(): jaime-m-p 2024-06-23 21:12:24 +02:00
  • 60d655bf47
    Merge branch 'master' into sl/remove-task-type slaren 2024-06-23 21:08:00 +02:00
  • 38d54b3c39 tets: skip unicode surrogaes and undefined jaime-m-p 2024-06-23 20:56:32 +02:00
  • 0cf2989b6c tests: gracefully exit threads jaime-m-p 2024-06-23 20:54:46 +02:00
  • 9af762c0ac tests: unexpected vocab type as test fail instead of error jaime-m-p 2024-06-23 20:49:02 +02:00
  • 4d19147db7 disable publishing the full-rocm docker image slaren 2024-06-23 20:32:01 +02:00
  • e112b610a1
    llama : add support for BitnetForCausalLM (#7931) b3208 Eddie-Wang 2024-06-24 02:27:57 +08:00
  • 698ad95ea8 remove LLAMA_PERF slaren 2024-06-23 20:09:54 +02:00
  • 13fe28259d gfx908 optimizations uvos 2024-06-23 18:59:36 +02:00
  • 843b1b7edc gguf-py : whitespace formatting fixes Stanisław Szymczyk 2024-06-23 19:25:48 +02:00
  • 61b96a5e76 vulkan : remove usage of ggml_compute_params slaren 2024-06-23 19:27:54 +02:00
  • 9b00a7eba5 Merge remote-tracking branch 'origin/master' into sl/remove-task-type slaren 2024-06-23 19:22:01 +02:00
  • b6d4c7832e
    moved curl to base joecryptotoo 2024-06-23 09:51:42 -07:00
  • b4c4377a7e
    moved curl to base joecryptotoo 2024-06-23 09:51:24 -07:00
  • a59f4f9eef
    Merge branch 'ggerganov:master' into t5-clean fairydreaming 2024-06-23 18:49:16 +02:00
  • be03af4d3d
    added healthcheck joecryptotoo 2024-06-23 09:47:51 -07:00
  • 237ae63342
    added healthcheck joecryptotoo 2024-06-23 09:47:05 -07:00
  • 7341e26b0f
    added healthcheck joecryptotoo 2024-06-23 09:45:48 -07:00
  • d081b64140
    added healthcheck joecryptotoo 2024-06-23 09:45:00 -07:00
  • 892f252bf9
    added healthcheck joecryptotoo 2024-06-23 09:44:31 -07:00
  • 226c5eed4e fix bo Eddie-Wang 2024-06-23 15:58:30 +00:00
  • 7f156a7223
    Resolving the problem in chktxt Iaroslav Chelombitko 2024-06-23 18:51:12 +03:00
  • 16f0c30d28
    Merge branch 'ggerganov:master' into bitnet Eddie-Wang 2024-06-23 23:50:52 +08:00
  • 6a2f298bd7
    server : fix JSON-Scheme typo (#7975) Aarni Koskela 2024-06-23 18:03:08 +03:00
  • 98931f87d4 convert-hf : for T5 skip both decoder.embed_tokens and encoder.embed_tokens tensors (they are duplicates of shared tensor) Stanisław Szymczyk 2024-06-23 15:52:54 +02:00
  • 11318d9aa1
    Fix typo in llama_set_embeddings comment (#8077) b3206 Daniel Bevenius 2024-06-23 15:39:45 +02:00
  • a41075a810 code style ngxson 2024-06-23 15:32:15 +02:00
  • 6a11a39a8e add mean method ngxson 2024-06-23 15:31:24 +02:00
  • 2ee4573719
    llama : fix typo in llama_set_embeddings comment Daniel Bevenius 2024-06-23 14:37:21 +02:00
  • b6b9a8e606
    fix CI failures (#8066) b3205 slaren 2024-06-23 13:14:45 +02:00
  • 163712e7e3
    Update convert-hf-to-gguf.py Brian 2024-06-23 19:41:16 +10:00
  • 47a0a0cdff gguf-py, convert-hf : conversion support for FLAN-T5 model family Stanisław Szymczyk 2024-06-23 10:57:07 +02:00
  • 45c0e2e4c1
    Refactor Vulkan backend to allow multiple contexts (#7961) b3204 0cc4m 2024-06-23 10:21:25 +02:00
  • 772b68ed66 flake.lock: Update github-actions[bot] 2024-06-23 00:19:09 +00:00
  • bd989c21d4 fix inverted vector ngxson 2024-06-23 00:33:59 +02:00
  • 8a52b546a0 remove completions file ngxson 2024-06-23 00:14:06 +02:00
  • 962be6a834 more consistent naming ngxson 2024-06-22 22:57:16 +02:00
  • f714d7f1a7 json: nit: simplify condition ochafik 2024-06-22 21:15:51 +01:00
  • 670d5a6195 json: add integration tests for min/max bounds ochafik 2024-06-22 21:11:42 +01:00
  • 948e55e890 fix merge ochafik 2024-06-22 21:10:05 +01:00
  • 5c2d3fa1ae json: add integ. test case for additionalProperties ochafik 2024-06-22 20:55:46 +01:00
  • 2f1a087c6b json: fix additionalProperties default, uncomment tests ochafik 2024-06-22 20:52:02 +01:00
  • 9352712c80 json: add test for type: [array, null] fix ochafik 2024-06-22 20:26:25 +01:00
  • 9c2cc11fd7 Merge remote-tracking branch 'origin/master' into json-type ochafik 2024-06-22 19:49:05 +01:00
  • 6c859ee422 Merge remote-tracking branch 'origin/master' into json-additional ochafik 2024-06-22 19:48:34 +01:00
  • 6fa73649a4 Merge remote-tracking branch 'origin/master' into json-bounds2 ochafik 2024-06-22 19:47:32 +01:00
  • 317452730d server: simplify format_chat ngxson 2024-06-22 20:30:33 +02:00
  • b5a5f34efa
    Removing extra blank lines that were breaking Lint. (#8067) b3203 Clint Herron 2024-06-22 14:28:18 -04:00
  • c91f972775 add help message ngxson 2024-06-22 20:25:26 +02:00
  • 5a2fde8385 add chat template support for llama-cli ngxson 2024-06-22 20:24:14 +02:00
  • 3f6a259dc3 Removing extra blank lines that were breaking Lint. Clint Herron 2024-06-22 14:10:46 -04:00
  • f393795a79
    Removed double calls to cb(cur, "l_out", il) jukofyork 2024-06-22 18:02:11 +01:00