Commit graph

  • 383065ade6 feat(llama.cpp): Add config parsing for Granite multiplier params Gabe Goodhart 2024-09-05 12:06:34 -06:00
  • 406833d779 feat(convert_hf_to_gguf): Add registration and param setup for Granite Gabe Goodhart 2024-09-04 12:16:56 -06:00
  • 5ebc5ef572 feat(gguf-py): Add Granite model and params to gguf-py Gabe Goodhart 2024-09-04 12:16:21 -06:00
  • 57064fbaee ggml : move common CPU backend impl to new header slaren 2024-09-16 15:17:55 +02:00
  • 736e0e6a28 llama.cpp: Add a missing header for cpp23 Yuri Khrustalev 2024-09-16 08:47:49 -04:00
  • acb2c32c33
    llama : rename n_embed to n_embd in rwkv6_time_mix (#9504) b3771 Daniel Bevenius 2024-09-16 13:07:13 +02:00
  • a6a3a5c531
    ggml : link MATH_LIBRARY not by its full path (#9339) b3770 Michael Podvitskiy 2024-09-16 13:06:50 +02:00
  • 603a3f8fb7 ggml: link MATH_LIBRARY not by its full path Michael Podvitskiy 2024-09-06 22:01:42 +02:00
  • d54c21df7e
    convert : identify missing model files (#9397) b3769 compilade 2024-09-16 03:30:22 -04:00
  • 19514d632e
    cmake : do not hide GGML options + rename option (#9465) Georgi Gerganov 2024-09-16 10:27:50 +03:00
  • 5c3d0f1824
    ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422) b3767 Eve 2024-09-16 06:48:24 +00:00
  • 0aadac10c7
    llama : support OLMoE (#9462) b3766 Shane A 2024-09-15 23:47:37 -07:00
  • 0bfa0dfa2e llama : rename n_embed to n_embd in rwkv6_time_mix Daniel Bevenius 2024-09-16 08:45:33 +02:00
  • 95ca85168b
    llama : support MiniCPM3 (#9322) b3765 CarryFun 2024-09-16 14:45:20 +08:00
  • 441b72b91f
    main : option to disable context shift (#9484) b3764 Vinesh Janarthanan 2024-09-16 01:20:01 -05:00
  • cc1c017191
    naming : normalize the name of callback-related identifiers gg/cb-naming Georgi Gerganov 2024-09-16 09:11:42 +03:00
  • c4965a64f7
    metal : handle zero-sized allocs (#9466) b3763 Georgi Gerganov 2024-09-16 09:05:56 +03:00
  • f80e679696
    build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS Georgi Gerganov 2024-09-16 09:00:43 +03:00
  • 2ac8a91fbe
    cmake : do not hide GGML options Georgi Gerganov 2024-09-13 10:08:55 +03:00
  • 169e8a3875 white space VJHack 2024-09-15 21:28:16 -05:00
  • 2736688af4 removed server changes VJHack 2024-09-15 21:26:46 -05:00
  • 90a2fff0e7
    flake.lock: Update (#9488) Georgi Gerganov 2024-09-16 05:14:23 +03:00
  • f5a23928c7 added server example to --no-context-shift args VJHack 2024-09-15 20:57:57 -05:00
  • 2d887f0975 Merge branch 'server-disable-context-shift' of https://github.com/VJHack/llama.cpp into server-disable-context-shift VJHack 2024-09-15 20:35:38 -05:00
  • c73756ab24 resolve merge conflicts VJHack 2024-09-15 20:35:34 -05:00
  • 63f0fa572d
    Update common/arg.cpp Vinesh Janarthanan 2024-09-15 20:35:01 -05:00
  • 6262d13e0b
    common : reimplement logging (#9418) b3761 Georgi Gerganov 2024-09-15 20:46:12 +03:00
  • e6deac31f7
    gguf-split : add basic checks (#9499) b3760 slaren 2024-09-15 19:02:27 +02:00
  • 6988da94a2
    cmake : correct order of sycl flags (#9497) b3759 Michael Podvitskiy 2024-09-15 18:55:52 +02:00
  • 721e2b1d8b
    Merge 82755ed08a into 3c7989fd29 Ma Mingfei 2024-09-15 18:45:03 +02:00
  • e410850051
    Merge 066996d2eb into 3c7989fd29 Ifeanyi 2024-09-15 10:42:55 -06:00
  • 78f3caa88e gguf-split : error when too many arguments are passed slaren 2024-09-15 18:27:55 +02:00
  • d3922ac9e8 gguf-split : do not overwrite existing files when merging slaren 2024-09-15 18:24:43 +02:00
  • 73ef3f769c
    Update llama-server-intel.Dockerfile sycl-cmake-append Meng, Hengyu 2024-09-15 23:21:46 +08:00
  • 4e6035af97 sycl flag should come before the other flags Michael Podvitskiy 2024-09-15 15:47:47 +02:00
  • 3956cf92a9
    Update llama-cli-intel.Dockerfile Meng, Hengyu 2024-09-15 23:21:21 +08:00
  • af95b1424f
    [SYCL] fix cmake broken Meng, Hengyu 2024-09-15 22:57:56 +08:00
  • cf77a846c6 allow disabling context shift in the server VJHack 2024-09-15 09:12:24 -05:00
  • 252f3a88ac added null check for llava decode l3utterfly 2024-09-15 21:45:48 +09:00
  • 3c7989fd29
    py : add "LLaMAForCausalLM" conversion support (#9485) b3758 Csaba Kecskemeti 2024-09-15 00:48:25 -07:00
  • d6b37c881f
    readme : update tools list (#9475) b3757 OSecret 2024-09-15 10:36:53 +03:00
  • 7596487beb
    cmake : try to fix sycl+intel build (#9487) b3756 Michael Podvitskiy 2024-09-15 09:06:38 +02:00
  • e83d2db931 flake.lock: Update github-actions[bot] 2024-09-15 00:22:31 +00:00
  • 70ca91e5c1
    Update README.md OSecret 2024-09-14 23:43:39 +03:00
  • e57f508ac5 sycl+intel build fix Michael Podvitskiy 2024-09-14 21:52:37 +02:00
  • aaf7f53d46 nvidia uses the LLaMAForCausalLM string in their config.json, example nvidia/Llama3-ChatQA-2-8B Csaba Kecskemeti 2024-09-14 10:48:09 -07:00
  • e244300df5 white space VJHack 2024-09-14 11:37:41 -05:00
  • 0680710b06 updated README.md for main VJHack 2024-09-14 11:30:10 -05:00
  • c52b922d98 reverted precommit VJHack 2024-09-14 11:16:54 -05:00
  • 173d4bb336 added cli arg to disable context shift VJHack 2024-09-14 11:15:51 -05:00
  • aa9e72158b
    Update clip.cpp Tejaakshaykumar 2024-09-14 18:54:26 +05:30
  • 822b6322de
    ggml : ggml_type_name return "NONE" for invalid values (#9458) b3755 Yuri Khrustalev 2024-09-14 05:54:37 -04:00
  • dcdcee3a74
    server: add data: [DONE] to /chat/completions stream response (#9459) b3754 VoidIsVoid 2024-09-14 17:36:44 +08:00
  • 1f4111e540
    cmake : use list(APPEND ...) instead of set() + dedup linker (#9463) b3753 Georgi Gerganov 2024-09-14 10:55:05 +03:00
  • befaf1197f
    llama : make cell_id const in inp_s_mask block (#9470) b3752 Daniel Bevenius 2024-09-14 09:50:12 +02:00
  • f83e9c9737 Support MiniCPM3. 范睿凯 2024-09-05 17:48:40 +08:00
  • 8241151f16 set context default to avoid memory issue, update guide arthw 2024-09-14 09:01:05 +08:00
  • 8f358c4c94 server: add data: [DONE] to /chat/completions stream response VoidIsVoid 2024-09-13 11:09:41 +08:00
  • da31f52722
    Added link to proprietary wrapper for Unity3d into README.md OSecret 2024-09-14 00:27:48 +03:00
  • 40638f7136
    log : cleanup, comments, build flags Georgi Gerganov 2024-09-13 21:55:11 +03:00
  • fb8f142554
    one more CMAKE_CXX_FLAGS fix (#9471) gg/cmake-dedup-link Michael Podvitskiy 2024-09-13 15:13:07 +02:00
  • aee23b5462 one more CMAKE_CXX_FLAGS fix Michael Podvitskiy 2024-09-13 15:04:42 +02:00
  • feff4aa846
    server : add loading html page while model is loading (#9468) b3751 Xuan Son Nguyen 2024-09-13 14:23:11 +02:00
  • 228df2bc11
    cmake : fix sycl build (#9469) Michael Podvitskiy 2024-09-13 14:11:21 +02:00
  • 13226dc83e
    log : option to disable the log prefix Georgi Gerganov 2024-09-13 14:48:57 +03:00
  • 9dec071bea llama : make cell_id const in inp_s_mask block Daniel Bevenius 2024-09-13 13:48:43 +02:00
  • ff3b3809d8
    server : fix verbose check Georgi Gerganov 2024-09-13 14:12:58 +03:00
  • 013b6502ba
    Merge branch 'gg/cmake-dedup-link' into sycl-build-fix Georgi Gerganov 2024-09-13 14:22:45 +03:00
  • b653b1e922
    cmake : try to fix sycl 2 Georgi Gerganov 2024-09-13 14:05:00 +03:00
  • a7feae74e7 also support .html files Xuan Son Nguyen 2024-09-13 12:58:00 +02:00
  • 8aa5bb38af use CMAKE_CXX_FLAGS as a string variable Michael Podvitskiy 2024-09-13 12:32:23 +02:00
  • 9eceb1a005 try fix sycl build Michael Podvitskiy 2024-09-13 12:10:20 +02:00
  • 0d0dc11185
    server : improve log format Georgi Gerganov 2024-09-13 12:42:35 +03:00
  • ae9475de40
    cmake : try fix sycl Georgi Gerganov 2024-09-13 12:41:33 +03:00
  • 8f84210df8
    log : add comments + adjust defaults Georgi Gerganov 2024-09-13 12:09:43 +03:00
  • 2afe0a0c7d
    examples : move gpt_init() after parsing the cli args Georgi Gerganov 2024-09-13 11:28:20 +03:00
  • 078be074a7
    log : print if build is debug [no ci] Georgi Gerganov 2024-09-13 11:15:01 +03:00
  • 2948768e25
    common : reimplement the logger Georgi Gerganov 2024-09-10 20:40:43 +03:00
  • 0792375c66
    metal : handle zero-sized allocs Georgi Gerganov 2024-09-13 10:21:55 +03:00
  • 0abc6a2c25
    llama : llama_perf + option to disable timings during decode (#9355) b3750 Georgi Gerganov 2024-09-13 09:53:38 +03:00
  • 19ecca1946
    cmake : use list(APPEND ...) instead of set() + dedup linker Georgi Gerganov 2024-09-13 09:44:55 +03:00
  • 3a3c9ae8af Implement OLMoE architecture Shane A 2024-09-12 22:52:49 -07:00
  • 739ea75015 made loading message more descriptive VJHack 2024-09-12 23:14:29 -05:00
  • df9f16747f removed print statement VJHack 2024-09-12 23:04:53 -05:00
  • e51eb59861 revert changes to pre-commit VJHack 2024-09-12 22:27:34 -05:00
  • cd80fce5e8 eol fix VJHack 2024-09-12 22:16:45 -05:00
  • 69c97bbead
    Merge branch 'ggerganov:master' into master Vinesh Janarthanan 2024-09-12 22:14:53 -05:00
  • 42abdd0207 precommit corrections VJHack 2024-09-12 22:04:08 -05:00
  • b3b84732f9
    Prevent crash on quantization executable Yuri Khrustalev 2024-09-12 22:46:34 -04:00
  • cb13382136 account for both api and web browser requests VJHack 2024-09-12 21:44:52 -05:00
  • 7c39f2d3ab ggml: rwkv_wkv op CUDA impl Molly Sophia 2024-09-06 16:33:46 +08:00
  • daf64fc4a9 revert test VJHack 2024-09-12 20:57:51 -05:00
  • bd35cb0ae3
    feat: remove a sampler from a chain (#9445) b3749 Gilad S. 2024-09-13 04:54:49 +03:00
  • 8b7daaaef2 ca†ch 503 before parsing json VJHack 2024-09-12 20:44:45 -05:00
  • 7da90fb350
    fix: safer casting Gilad S. 2024-09-13 04:38:55 +03:00
  • 0b174abc3d ggml: CUDA unary op EXP Molly Sophia 2024-09-05 18:18:51 +08:00
  • 78203641fe
    server : Add option to return token pieces in /tokenize endpoint (#9108) b3748 Mathijs Henquet 2024-09-12 22:30:11 +02:00
  • 661a740d55 maybe this fix windows ci? Xuan Son Nguyen 2024-09-12 21:58:24 +02:00
  • 444b757bce
    perf : abort on invalid sampler pointer Georgi Gerganov 2024-09-12 15:08:48 +03:00
  • ad971140c3 Merge branch 'master' into feature/tokenize-with-pieces Xuan Son Nguyen 2024-09-12 13:49:52 +02:00