Commit graph

  • 0478174d59
    [SYCL] Updated SYCL device filtering (#8901) b3540 Ouadie EL FAROUKI 2024-08-07 11:25:36 +01:00
  • 4c6a7bb0b0 Add GGML_VULKAN_PERF option to output performance data per operator 0cc4m 2024-08-06 21:34:04 +02:00
  • f78487b86a Use Vulkan GLSL fused multiply-add instruction where possible 0cc4m 2024-08-06 21:20:40 +02:00
  • 0645ed5c97 Optimize Vulkan REPEAT performance 0cc4m 2024-08-05 09:19:01 +02:00
  • a8dbc6f753
    CUDA/HIP: fix tests/test-backend-ops (#8896) b3539 Johannes Gäßler 2024-08-07 09:07:52 +02:00
  • 0b83303465 Small related update to example/sycl Readme OuadiElfarouki 2024-08-07 06:21:22 +01:00
  • b13ed28fbf Updated device filter to depend on default_selector (fixes non-intel device issues) OuadiElfarouki 2024-08-06 09:14:03 +01:00
  • 86d3155377 make : use C compiler to build metal embed object slaren 2024-08-07 03:31:22 +02:00
  • 506122d854
    llama-bench : add support for getting cpu info on Windows (#8824) b3538 Zhenwei Jin 2024-08-07 09:01:06 +08:00
  • 205c4b78f7
    refactor slaren 2024-08-07 01:51:01 +02:00
  • 725e3d9437
    quantize : update usage comment in quantize.cpp (#8889) b3537 Daniel Bevenius 2024-08-07 01:43:00 +02:00
  • 31958546c3
    typo correction (#8891) b3536 Nexes the Old 2024-08-07 01:41:54 +02:00
  • 96bcc9e94b cuda : more reliable async copy, fix stream used when the devices are the same slaren 2024-08-07 00:48:20 +02:00
  • de7cf9d376 CUDA/HIP: fix tests/test-backend-ops Johannes Gäßler 2024-08-06 21:05:09 +02:00
  • a5eae7a43f ggml-backend : fix async copy from CPU slaren 2024-08-07 00:17:52 +02:00
  • cad8abb49b add tool to allow plotting tensor allocation maps within buffers sl/dump-allocs slaren 2024-08-06 22:09:51 +02:00
  • a3aac23df1 server: when result doesn't fit in max_tokens, finished_reason should be length Bjarke Viksøe 2024-08-06 21:26:27 +02:00
  • 5f1fceaba3 win32 default thread implementation savesanketsw 2024-08-06 10:26:28 -07:00
  • 5ec4de7a48
    Merge branch 'master' into prepare-PR-of-minicpm-v2.5 tc-mb 2024-08-07 01:14:24 +08:00
  • 7c7f7f1b8a
    typo correction Nexes the Old 2024-08-06 19:12:50 +02:00
  • 52858d8ec6
    quantize : update usage comment in quantize.cpp Daniel Bevenius 2024-08-06 18:32:25 +02:00
  • cfd5a113e1 llama : rename llama_reorder_outputs to llama_output_reorder Francis Couture-Harpin 2024-08-06 11:30:50 -04:00
  • 1e6f6554aa
    server : add lora hotswap endpoint (WIP) (#8857) b3535 Xuan Son Nguyen 2024-08-06 17:33:39 +02:00
  • 641f5dd2a6
    CUDA: fix padding logic for FP16/FP32 (#8884) b3534 Johannes Gäßler 2024-08-06 17:13:55 +02:00
  • 5f4dcb1e60
    simple : update name of executable to llama-simple (#8885) Daniel Bevenius 2024-08-06 16:44:35 +02:00
  • 32e7e0b2b7 Add support for getting cpu info on Windows for llama_bench zhenweijin 2024-08-02 17:11:15 +08:00
  • 5eb9b3d783
    simple : update name of executable to llama-simple Daniel Bevenius 2024-08-06 16:19:51 +02:00
  • f01bc37c35 fix style Xuan Son Nguyen 2024-08-06 16:06:20 +02:00
  • 57baa45204 CUDA: fix padding logic for FP16/FP32 Johannes Gäßler 2024-08-06 15:29:58 +02:00
  • db20f50cf4
    cmake : Link vulkan-shaders-gen with pthreads (#8835) b3532 Jaeden Amero 2024-08-06 17:21:47 +04:00
  • 98da78b228 add LoRA test Xuan Son Nguyen 2024-08-06 13:43:29 +02:00
  • efda90c93a
    [Vulkan] Fix compilation of vulkan-shaders-gen on w64devkit after e31a4f6 (#8880) b3531 MaggotHATE 2024-08-06 16:32:03 +05:00
  • f0a35519b0 avoid adding new function Jia Liu 2024-08-06 19:14:59 +08:00
  • 6ed2f795ae fix: crash on token not found at spm thxCode 2024-08-06 17:25:49 +08:00
  • 0b90345749 refactor: respect special token from metadata thxCode 2024-08-06 17:10:21 +08:00
  • bb55b19c04 refactor: let ubatch-size = batch-size if non-casual thxCode 2024-08-06 17:03:58 +08:00
  • a18fb2f5b0 Merge remote-tracking branch 'myfork/test-dry-sampler' into test-dry-sampler wwoodsTM 2024-08-06 03:00:47 -06:00
  • 6579e64f26 Attempt at slightly optimized vector of strings DRY implementation wwoodsTM 2024-08-06 02:54:57 -06:00
  • f04c6e27f9 del common.h in clip caitianchi 2024-08-06 16:54:16 +08:00
  • f33071db60
    Merge pull request #19 from ggerganov/prepare-PR-of-minicpm-v2.5-gg tc-mb 2024-08-06 16:50:31 +08:00
  • 0bf16de07b
    contributing : add note about write access Georgi Gerganov 2024-08-06 11:48:01 +03:00
  • 6e299132e7
    clip : style changes prepare-PR-of-minicpm-v2.5-gg Georgi Gerganov 2024-08-06 11:44:29 +03:00
  • e91c5780a6 fix build Xuan Son Nguyen 2024-08-06 10:37:46 +02:00
  • 2d5dd7bb3f
    ggml : add epsilon as a parameter for group_norm (#8818) b3529 Molly Sophia 2024-08-06 15:26:46 +08:00
  • cdd1889de6
    convert : add support for XLMRoberta embedding models (#8658) b3528 Douglas Hanley 2024-08-06 02:20:54 -05:00
  • 6dab6bfd90
    Guard it under #ifdef _WIN32 MaggotHATE 2024-08-06 11:15:43 +05:00
  • ed6b90906f
    Merge branch 'master' into dry-sampler l3utterfly 2024-08-06 14:43:49 +09:00
  • d1676a10f9
    Merge pull request #29 from wwoodsTM/test-dry-sampler l3utterfly 2024-08-06 14:38:43 +09:00
  • 070c67a6f6
    Merge branch 'ggerganov:master' into master MaggotHATE 2024-08-06 09:43:50 +05:00
  • 3d38312bc6
    Fix compilation issue in vulkan-shaders-gen MaggotHATE 2024-08-06 09:43:25 +05:00
  • c21a896405
    [CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871) b3527 Mengqing Cao 2024-08-06 12:42:42 +08:00
  • f5ca40bf95 fix backend cann set_tensor MengqingCao 2024-08-06 01:36:16 +00:00
  • d4ff847153
    [SYCL] correct cmd name (#8877) Neo Zhang 2024-08-06 09:09:12 +08:00
  • 2ca313830e Fix compiler complaints jaime-m-p 2024-08-05 23:55:17 +02:00
  • 928aa66a92 py: remove license_content from metadata heuristics and add additional license file checks brian khuu 2024-08-06 07:26:09 +10:00
  • c58a332fcd clean up struct def Xuan Son Nguyen 2024-08-05 23:23:37 +02:00
  • 21cb13384c Merge branch 'master' into xsn/lora_server_hotswap Xuan Son Nguyen 2024-08-05 23:01:13 +02:00
  • aa3cea0fa9 updae docs Xuan Son Nguyen 2024-08-05 23:00:54 +02:00
  • 674f0faa74 Fix copy/paste wrong variable jaime-m-p 2024-08-05 21:43:32 +02:00
  • d558c736fd Binary constants are a C++14 feature jaime-m-p 2024-08-05 21:24:13 +02:00
  • 3b36703c8a Update bruteforce test: - Faster failing text range selection. - Show unique failing texts differences. - Add more recent models. jaime-m-p 2024-08-05 21:10:45 +02:00
  • fd6d9b9e6a Update bruteforce test: fix pyright complaints jaime-m-p 2024-08-05 20:58:15 +02:00
  • 735105edf9 Use GGML_ASSERT and GGML_ABORT jaime-m-p 2024-08-05 20:54:30 +02:00
  • 85c59df9ce minor: remove trailing whitespaces and extra semicolons jaime-m-p 2024-08-05 20:52:25 +02:00
  • 16dab13bde correct cmd name fix_cmd_name Neo Zhang 2024-08-06 00:15:33 +08:00
  • 0a4ce78681
    common : Changed tuple to struct (TODO fix) (#8823) b3525 Liu Jia 2024-08-06 00:14:10 +08:00
  • 7f576a7b49
    Merge branch 'ggerganov:master' into hk Henry Kroll III 2024-08-05 07:42:28 -08:00
  • 5ea980d92e Merge branch 'master' into dev-refactoring hongruichen 2024-08-05 22:56:52 +08:00
  • bc0f887e15
    cann: fix buffer_num and runtime speed slowly error (#8865) b3524 wangshuai09 2024-08-05 21:10:37 +08:00
  • b42978e7e4
    readme : add ramalama to the availables UI (#8811) Eric Curtin 2024-08-05 13:45:01 +01:00
  • b9dfc25ca3
    ggml : fix overflows in elu function (#8866) b3522 Justine Tunney 2024-08-05 05:43:40 -07:00
  • d79e8e12ff py: Add LICENSE_CONTENT to metadata override brian khuu 2024-08-05 22:39:29 +10:00
  • 2340ff52d5 cann: fix ggml_backend_cann_buffer_get_tensor MengqingCao 2024-08-05 12:34:19 +00:00
  • 323bf80091 py: if LICENSE exist then include a copy of it brian khuu 2024-08-05 22:29:52 +10:00
  • 1ef14b3007
    py: Add more authorship metadata from model card (#8810) Brian 2024-08-05 21:15:28 +10:00
  • 67c892699d
    Fix overflows in elu function Justine Tunney 2024-08-05 02:30:03 -07:00
  • 6b8db06292 cann: fix buffer_num and runtime speed slowly error wangshuai09 2024-08-05 07:24:10 +00:00
  • 3c8d1abcff delete the extra whitespace Jia Liu 2024-08-05 16:41:02 +08:00
  • d3f0c7166a
    Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858) b3520 fairydreaming 2024-08-05 09:38:01 +02:00
  • ea3aebbc3d
    Merge branch 'ggerganov:master' into hk Henry Kroll III 2024-08-04 23:27:41 -08:00
  • 1d1e6e8ee4 fix build bug on msys2-clang64 and ucrt64 Jia Liu 2024-08-05 15:17:34 +08:00
  • 20dc562f45
    Delete pr-6839.diff wwoodsTM 2024-08-05 00:41:26 -06:00
  • f72945fc0d cann: add doc and docker image wangshuai09 2024-07-27 06:34:10 +00:00
  • e31a4f6797
    cmake: fix paths for vulkan shaders compilation on Windows (#8573) b3519 stduhpf 2024-08-05 08:18:27 +02:00
  • 9105cf435b Add DRY sampling parameters to gpt_params and server_context wwoodsTM 2024-08-05 00:03:38 -06:00
  • 400ae6f65f
    readme : update model list (#8851) b3518 BarfingLemurs 2024-08-05 01:54:10 -04:00
  • f1ea5146d7
    llama : better replace_all (#8852) b3517 Georgi Gerganov 2024-08-05 08:53:39 +03:00
  • 064cdc265f
    vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855) b3516 0cc4m 2024-08-05 07:52:55 +02:00
  • 5587e57a76 sync : ggml b3515 Georgi Gerganov 2024-08-04 19:13:25 +03:00
  • a3738b2fa7 vulkan : implement Stable Diffusion operators (ggml/904) 0cc4m 2024-08-04 17:28:08 +02:00
  • 655858ace0 ggml : move c parameter comment to ggml_rope_ext (ggml/901) Daniel Bevenius 2024-07-29 15:06:06 +02:00
  • c02b0a8a4d
    cann: support q4_0 model (#8822) b3512 wangshuai09 2024-08-05 12:22:30 +08:00
  • a15ceea15c delete llama_init_default_params() Jia Liu 2024-08-05 11:34:44 +08:00
  • 514678a249 cann: support q4_0 model wangshuai09 2024-08-02 08:24:53 +00:00
  • 8bd3749c26 Merge commit '978ba3d8' into tokenizer-codepoint-categs jaime-m-p 2024-08-05 01:13:22 +02:00
  • 5679a3bdbb Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-08-04 17:24:14 -04:00
  • 952ed35ba8 llama : minor cosmetic changes Francis Couture-Harpin 2024-08-04 17:23:44 -04:00
  • aeac342132 Add more comments jaime-m-p 2024-08-04 23:22:56 +02:00
  • 5efd826426 llama : whitespace formatting Stanisław Szymczyk 2024-08-04 21:14:43 +02:00
  • 0b7211387e llama : Use token_to_id map find() method instead of iterating over all tokens. Stanisław Szymczyk 2024-08-04 20:47:47 +02:00