Commit graph

  • 4dff06da04 finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
  • a490e75225 finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
  • 0162c9c537 server: Retrieve prompt template in /props (#8337) Bjarke Viksøe 2024-07-07 11:10:38 +02:00
  • a82ac78c6f added support for Authorization Bearer tokens when downloading model (#8307) Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
  • 6234b41211 update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
  • 091a7af9fe llama : add early return for empty range (#8327) Daniel Bevenius 2024-07-06 09:22:16 +02:00
  • 0106884e98 Detokenizer fixes (#8039) jaime-m-p 2024-07-05 19:01:35 +02:00
  • 16ab65b7b9 Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
  • 401892e563 llama : fix compile warning (#8304) Georgi Gerganov 2024-07-05 17:32:09 +03:00
  • c667e897e9 cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
  • c738f1bc89 convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
  • 655a624782 llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
  • 1dfab16f5d CUDA: MMQ support for iq4_nl, iq4_xs (#8278) Johannes Gäßler 2024-07-05 09:06:31 +02:00
  • 4bb7223486 CUDA: revert part of the RDNA1 optimizations (#8309) Daniele 2024-07-05 07:06:09 +00:00
  • d49328a3bf llama : streamline embeddings from "non-embedding" models (#8087) Douglas Hanley 2024-07-05 02:05:56 -05:00
  • 972fbf7fbf CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) Johannes Gäßler 2024-07-05 09:05:34 +02:00
  • df8b4d8e39 readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
  • 53da9d276e passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
  • f5cb88cc73 llama : prefer n_ over num_ prefix (#8308) Georgi Gerganov 2024-07-05 09:10:03 +03:00
  • 8696144105 contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
  • a4c8edcb67 fix for multiple cards Neo Zhang 2024-07-14 00:15:55 +08:00
  • 17eb6aa8a9
    vulkan : cmake integration (#8119) b3386 bandoti 2024-07-13 13:12:39 -03:00
  • acc877f4ab
    llama : fix Gemma-2 Query scaling factor Georgi Gerganov 2024-07-13 18:40:43 +03:00
  • df78f19612
    9B - query_pre_attn_scalar = 256 not 224 Daniel Han 2024-07-10 22:02:21 -07:00
  • aa0a0ffedf server: update README.md with llama-server --help's output [no ci] Marc-Antoine Ruel 2024-07-13 11:35:12 -04:00
  • c917b67f06
    metal : template-ify some of the kernels (#8447) b3385 Georgi Gerganov 2024-07-13 18:32:33 +03:00
  • f2d6eb0f4d gguf_hash.py: rename string UUIDv5 --> uuid brian khuu 2024-07-14 00:36:50 +10:00
  • 77b0ca4c7f gguf_hash.py: Add sha256 brian khuu 2024-07-14 00:17:42 +10:00
  • aeaed61904
    Merge pull request #1 from arthw/update_warp Neo Zhang 2024-07-13 16:44:28 +08:00
  • 74e3185cfd fix editorconfig check format issue arthw 2024-07-13 16:02:15 +08:00
  • 4cd9e48670 cherry-pick b549a1bbef, [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) * fix group_norm ut arthw 2024-07-13 14:43:57 +08:00
  • 59ce85318a test-tokenizer-random : reduce potential confilcts with #8379 compilade/fix-mpt-pretok Francis Couture-Harpin 2024-07-13 01:03:32 -04:00
  • c1e2283887 expose op at unit test hongruichen 2024-07-13 10:55:36 +08:00
  • 100ccd5e7f add unary op template and more ops hongruichen 2024-07-13 00:06:58 +08:00
  • 7e7ff2bc9a
    llama: fix some return values to match hparams types Borislav Stanimirov 2024-07-12 19:52:23 +03:00
  • 7cbc4fbd8c add mul hongruichen 2024-07-12 23:26:38 +08:00
  • e3aa43adbd suppress warning hongruichen 2024-07-12 23:26:11 +08:00
  • 0eb595cc6e use table to simpilify the op mapping hongruichen 2024-07-12 19:52:35 +08:00
  • c42d7d7c3d ggml : suppress unknown pragma 'GCC' on windows Daniel Bevenius 2024-07-12 15:08:59 +01:00
  • 3277bb88e5
    Merge branch 'ggerganov:master' into master hackingthekernel 2024-07-12 14:34:19 +01:00
  • 24fd7d3d52 Merge branch 'master' into vulkan-build-integration Mason M 2024-07-12 10:21:41 -03:00
  • 88fd99bb66 remove clblast from nix pkg Mason M 2024-07-12 10:19:21 -03:00
  • f0894d897a wip hongruichen 2024-07-12 19:40:55 +08:00
  • 4e24cffd8c
    server : handle content array in chat API (#8449) b3384 Georgi Gerganov 2024-07-12 14:48:15 +03:00
  • 6af51c0d96
    main : print error on empty input (#8456) b3383 Georgi Gerganov 2024-07-12 14:48:04 +03:00
  • 0cc7241df4
    main : print error on empty input Georgi Gerganov 2024-07-12 13:32:28 +03:00
  • 0f41b3b14d llama : suppress unary minus operator warning Daniel Bevenius 2024-07-12 10:26:39 +01:00
  • 8428a98770
    Update examples/server/utils.hpp Georgi Gerganov 2024-07-12 12:37:59 +03:00
  • cb2c688648 Update doc for MUSA Xiaodong Ye 2024-07-12 12:49:39 +08:00
  • f53226245f
    llama : suppress unary minus operator warning (#8448) b3382 Daniel Bevenius 2024-07-12 11:05:21 +02:00
  • 9eec4c1eb3
    server : handle content array in chat API Georgi Gerganov 2024-07-12 12:02:44 +03:00
  • 8bf778435c
    metal : template-ify some of the kernels Georgi Gerganov 2024-07-11 13:43:42 +03:00
  • c3ebcfa148
    server : ensure batches are either all embed or all completion (#8420) b3381 Douglas Hanley 2024-07-12 03:14:12 -05:00
  • 8a4441ea1a
    docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
  • 5aefbce27a
    convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
  • 1d1da06e55 Removing fsep token from GPTRefactForCausalLM Jiri Podivin 2024-07-01 14:45:44 +02:00
  • 71c1121d11
    examples : sprintf -> snprintf (#8434) b3378 Georgi Gerganov 2024-07-12 10:46:14 +03:00
  • 370b1f7e7a
    ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
  • fee7936705
    Merge branch 'ggerganov:master' into master Daniel Han 2024-07-11 22:21:03 -07:00
  • 7c63eb09d2
    Merge e6a5a6c6ac into b549a1bbef Paulo de Castro 2024-07-12 00:49:13 -04:00
  • 9d96328bdf convert_lora : MoE LoRA conversion support Francis Couture-Harpin 2024-07-09 18:26:38 -04:00
  • b549a1bbef
    [SYCL] fix the mul_mat_id ut issues (#8427) b3376 Chen Xi 2024-07-12 00:52:04 +00:00
  • f5166cbf4c fix bug zhhan 2024-07-11 14:35:49 -07:00
  • 9b8e05b148 Merge commit 'f4444d99' into tokenizer-fixes jaime-m-p 2024-07-11 22:07:46 +02:00
  • 58ccca0a22 Fix filename for convert-hf-to-gguf.py Armen Kaleshian 2024-07-11 15:38:12 -04:00
  • c4956e4a05 update test: fix special and added token lists jaime-m-p 2024-07-11 19:50:48 +02:00
  • 368645698a
    ggml : add NVPL BLAS support (#8329) (#8425) b3375 Nicholai Tukanov 2024-07-11 11:49:15 -05:00
  • b078c619aa
    cuda : suppress 'noreturn' warn in no_device_code (#8414) b3374 Daniel Bevenius 2024-07-11 17:53:42 +02:00
  • 808aba3916
    CUDA: optimize and refactor MMQ (#8416) b3373 Johannes Gäßler 2024-07-11 16:47:47 +02:00
  • 301afaa6df
    squash! cuda : suppress 'noreturn' warn in no_device_code Daniel Bevenius 2024-07-11 16:26:32 +02:00
  • 7c011abc33 ggml : replace <BLASLIB>_ENABLE_CBLAS with GGML_BLAS_USE_<BLASLIB> ntukanov 2024-07-11 07:07:09 -07:00
  • 3c80cddb85 explicit q8_1 memory layouts, add documentation Johannes Gäßler 2024-07-11 15:26:30 +02:00
  • 25167e0ea7
    examples : use sizeof() instead of hardcoded constants Georgi Gerganov 2024-07-11 15:45:02 +03:00
  • e7416df4bb
    Update convert_hf_to_gguf_update.py Iaroslav Chelombitko 2024-07-11 13:07:49 +03:00
  • a48e6ac06f
    Update convert_hf_to_gguf.py Iaroslav Chelombitko 2024-07-11 13:07:06 +03:00
  • 22ea911d63
    examples : sprintf -> snprintf Georgi Gerganov 2024-07-11 13:01:36 +03:00
  • f07f1b8afa
    ggml : revert FA K/Q names Georgi Gerganov 2024-07-11 12:57:48 +03:00
  • 2a3cd5de2c
    ggml : use PRId64 [no ci] Georgi Gerganov 2024-07-11 12:54:50 +03:00
  • 8229ee52e7
    ggml : minor naming changes Georgi Gerganov 2024-07-11 12:04:34 +03:00
  • a977c11544
    gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
  • 9a55ffe6fb
    tokenize : add --no-parse-special option (#8423) b3371 compilade 2024-07-11 03:41:48 -04:00
  • 7a221b672e
    llama : use F32 precision in Qwen2 attention and no FA (#8412) b3370 Georgi Gerganov 2024-07-11 10:21:30 +03:00
  • 3a2e615f68
    9B - query_pre_attn_scalar = 256 not 224 Daniel Han 2024-07-10 22:02:21 -07:00
  • e6a5a6c6ac server : avoid breaking KV cache when prompt >= n_ctx (#6855) Paulo de Castro 2024-04-28 00:34:07 -03:00
  • be3aa9631f use template function directly hongruichen 2024-07-11 00:09:56 +08:00
  • c3747056f2 skip the bfloat 16 sycl ut Chen Xi 2024-07-11 02:43:15 +00:00
  • 90c8470568 fix part of mul_mat_id Meng, Hengyu 2024-06-21 03:38:00 +00:00
  • 371cb8df86 non-embedding batch for sampled tokens; fix unused params warning Douglas Hanley 2024-07-10 21:07:06 -05:00
  • 278d0e1846
    Initialize default slot sampling parameters from the global context. (#8418) b3369 Clint Herron 2024-07-10 20:08:17 -04:00
  • 0b6de3a135 ggml : add NVPL BLAS support ntukanov 2024-07-10 15:59:24 -07:00
  • 916e95928b llm_build_lora_mm_id ngxson 2024-07-11 00:30:07 +02:00
  • ba06b2deb7 tokenize : add --no-parse-special option compilade/tokenize-example-parse-special Francis Couture-Harpin 2024-07-10 17:59:19 -04:00
  • 1de1d07e4e fix file naming pattern sequence zhhan 2024-07-10 14:49:11 -07:00
  • 1caa20fc7a convert_hf : reduce usages of UNKNOWN for InternLM2 Francis Couture-Harpin 2024-07-10 17:33:04 -04:00
  • 9501f66e8c Make glslc required Vulkan component Mason M 2024-07-10 17:18:02 -03:00
  • afa6119850 Merge branch 'master' into compilade/fix-mpt-pretok Francis Couture-Harpin 2024-07-10 15:32:04 -04:00
  • 9c14c0b036 Initialize default slot sampling parameters from the global context. HanClinto 2024-07-10 15:24:07 -04:00
  • f4b8df4996 CUDA: optimize and refactor MMQ Johannes Gäßler 2024-07-07 14:52:38 +02:00
  • bb1bd5e2b7
    cuda : suppress 'noreturn' warn in no_device_code Daniel Bevenius 2024-07-10 20:26:57 +02:00
  • c09c574d13 update multip adaptation readme zhhan 2024-07-10 11:19:48 -07:00