Commit graph

  • e9719576c4 ggml : also faster TQ1_0 Francis Couture-Harpin 2024-07-31 00:06:21 -04:00
  • 560873f337 ggml : even faster TQ2_0 Francis Couture-Harpin 2024-07-30 23:36:52 -04:00
  • 9b4294da9c
    Adding Gemma 2 2B configs pculliton 2024-07-30 21:19:09 -04:00
  • 04a0dcebcc fix missing file from copy Alex O'Connell 2024-07-30 20:48:50 -04:00
  • c6a7c8edef Only enable backtrace on GLIBC linux systems Alex O'Connell 2024-07-30 20:44:56 -04:00
  • 77b8f84ae7 ggml : add TQ1_0 and TQ2_0 ternary quantization types Francis Couture-Harpin 2024-07-30 17:55:54 -04:00
  • 268c566006
    nix: cuda: rely on propagatedBuildInputs (#8772) Someone 2024-07-30 23:35:30 +03:00
  • 22cd988062 Removing unnecessary linker flags. HanClinto 2024-07-30 16:21:49 -04:00
  • 71606d5fbd Adding in CXXFLAGS and LDFLAGS HanClinto 2024-07-30 15:41:13 -04:00
  • b17c191b57 Reference the .o rather than rebuilding every time. HanClinto 2024-07-30 14:20:33 -04:00
  • 70d5c43ece Fix potential race condition as pointed out by @fairydreaming in #8776 HanClinto 2024-07-30 13:19:08 -04:00
  • c1d255e91b Fixed compilation error when using hipblas matteo serva 2024-07-16 19:39:17 +02:00
  • f258b9273b refactoring: Moved the unified memory code in the correct location. matteo serva 2024-07-16 18:45:23 +02:00
  • 82fadbd792 adding again the documentation about unified memory matteo serva 2024-07-06 09:39:41 +02:00
  • 5577cada21 Adding support for unified memory matteo serva 2024-06-20 13:51:07 +02:00
  • a463b79c79 server: update llama-server embedding flag documentation (#8763) Igor Okulist 2024-07-30 11:20:09 -05:00
  • 1e87ac55d6 Support phi-2 GGUF creation, add vocab. results. Marc 2024-07-30 17:52:49 +02:00
  • d5380f3af2 refactor device in sycl_device, restore ctx in create_queue arthw 2024-07-30 23:49:34 +08:00
  • cc50e78fbe llama-vocab, llama : handle <|eom_id|> Llama-3.1 token Stanisław Szymczyk 2024-07-30 16:57:47 +02:00
  • 7e72aa74fd
    py: add_array() will not add to kv store if value is an empty array (#8774) b3493 Brian 2024-07-31 00:57:03 +10:00
  • 7c27a19b2e
    added android implementation of ggml_print_backtrace_symbols (#8751) l3utterfly 2024-07-30 23:40:18 +09:00
  • e96d263407
    Apply suggestions from code review Brian 2024-07-31 00:29:33 +10:00
  • 02665ba32e gguf_writer.py: add_array() should not add to kv store if empty brian khuu 2024-07-31 00:16:00 +10:00
  • 5a618f0138
    nix: cuda: rely on propagatedBuildInputs Someone Serge 2024-07-07 00:44:35 +00:00
  • 140074bb86
    flake.lock: Update (#8729) Georgi Gerganov 2024-07-30 15:58:57 +03:00
  • 079ac82124 Add support for cpu_get_num_phsical_cores() on Windows Jia Liu 2024-07-30 18:56:27 +08:00
  • 6e2b6000e5
    cann: update cmake (#8765) b3490 wangshuai09 2024-07-30 18:37:35 +08:00
  • 3b681370e6 cann: update cmake wangshuai09 2024-07-30 07:25:45 +00:00
  • c887d8b017
    [SYCL] Add TIMESTEP_EMBEDDING OP (#8707) b3489 zhentaoyu 2024-07-30 14:56:51 +08:00
  • dd3f4085c6 sycl: fix half_ceil in tsembd zhentaoyu 2024-07-30 05:47:42 +00:00
  • 2202990ce9 sycl: add timestep_embedding op zhentaoyu 2024-07-26 06:09:55 +00:00
  • 75af08c475
    ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (#8748) b3488 CarterLi999 2024-07-30 00:38:34 +08:00
  • 1dff278c77 Add fallback for max_tokens ardfork 2024-07-29 16:19:30 +00:00
  • eab4a88210 Using dp4a ptx intrinsics for an improved Mul8MAT perf [By Alcpz] codeplay/sycl-main OuadiElfarouki 2024-07-29 16:52:29 +01:00
  • 980f3c81c6 Don't ignore llama.cpp params ardfork 2024-07-29 15:09:36 +00:00
  • 9a5f802bb6 refactoring: add convient macro to disable copy and move of class hongruichen 2024-07-29 22:18:48 +08:00
  • a272a7425d
    Update ggml/src/ggml.c l3utterfly 2024-07-29 22:43:39 +09:00
  • 687f8fdecb
    Update ggml/src/ggml.c l3utterfly 2024-07-29 22:43:34 +09:00
  • 45b6b799b6
    Update ggml/src/ggml.c l3utterfly 2024-07-29 22:43:29 +09:00
  • 8a3eceba0c
    Update ggml/src/ggml.c l3utterfly 2024-07-29 22:43:24 +09:00
  • e089459355
    Update ggml/src/ggml.c l3utterfly 2024-07-29 22:43:17 +09:00
  • 439b3fc75a
    cuda : organize vendor-specific headers into vendors directory (#8746) b3487 R0CKSTAR 2024-07-29 20:56:12 +08:00
  • 0bb7932d8d added android implementation of ggml_print_backtrace_symbols l3utterfly 2024-07-29 21:02:39 +09:00
  • e862defaa9 use int32_t for dry_penalty_last_n due to negative value needed as config l3utterfly 2024-07-29 20:53:42 +09:00
  • 236da599d4 fixed int/size_t comparison l3utterfly 2024-07-29 20:25:56 +09:00
  • 0229fc8255 added final new line for editor config check l3utterfly 2024-07-29 20:12:46 +09:00
  • 12bfa7820c added llama_sample_dry_impl in header l3utterfly 2024-07-29 19:44:23 +09:00
  • 802ddd78bf added sample_dry_impl l3utterfly 2024-07-29 19:41:47 +09:00
  • 2f9a36a4f9 Merge branch 'master' into dry-sampler l3utterfly 2024-07-29 19:41:33 +09:00
  • 6da82947df refactoring: set the default qnn lib search path at CMakeLists.txt by GGML_QNN_DEFAULT_LIB_SEARCH_PATH hongruichen 2024-07-29 15:51:54 +08:00
  • 9aed263d7a refactor: Organize vendor-specific headers into vendors directory Xiaodong Ye 2024-07-29 14:11:52 +08:00
  • 3f9698c9d4 ggml: bugfix: fix the inactive elements is agnostic for risc-v vector carter.li 2024-07-29 13:54:26 +08:00
  • 0832de7236
    [SYCL] add conv support (#8688) b3486 Meng, Hengyu 2024-07-29 10:50:27 +08:00
  • 5ecbeb5842 Merge branch 'master' into dev-refactoring hongruichen 2024-07-29 10:26:39 +08:00
  • 6b45680f21 Swift: Add cxx settings kingbri 2024-07-28 21:34:49 -04:00
  • 79a278e922 Merge branch 'master' into compilade/bitnet-ternary Francis Couture-Harpin 2024-07-28 21:27:33 -04:00
  • aac0c69dda Swift: Fix Windows build kingbri 2024-07-28 21:20:12 -04:00
  • dd3e62a703 ggml : add some informative comments in q1_3 vec_dot Francis Couture-Harpin 2024-07-28 21:17:16 -04:00
  • 964ee4b2ca
    Merge branch 'ggerganov:master' into gguf-model-template Austin 2024-07-28 18:57:52 -04:00
  • 6eeaeba126
    cmake: use 1 more thread for non-ggml in CI (#8740) b3485 Johannes Gäßler 2024-07-28 22:32:44 +02:00
  • c64368a7d0 cmake: use 1 more thread for non-ggml in CI Johannes Gäßler 2024-07-28 20:01:44 +02:00
  • 1f9d2a7e22 refactoring: improve tensor print hongruichen 2024-07-28 22:05:51 +08:00
  • 4730faca61
    chore : Fix vulkan related compiler warnings, add help text, improve CLI options (#8477) b3484 Austin 2024-07-28 03:52:42 -04:00
  • e2bd03092c
    Merge branch 'ggerganov:master' into fix-vulkan-shader-warnings Austin 2024-07-28 02:53:56 -04:00
  • 19aa1321be
    chore : Remove void and apply C++ style empty parameters teleprint-me 2024-07-28 02:50:32 -04:00
  • 963590ba13
    Merge branch 'fix-vulkan-shader-warnings' of github.com:teleprint-me/llama.cpp into fix-vulkan-shader-warnings teleprint-me 2024-07-28 02:48:37 -04:00
  • a9984327d2
    chore : Remove void and apply C++ style empty parameters teleprint-me 2024-07-28 02:48:33 -04:00
  • 704a303323 llama : fix Mamba session save and restore Francis Couture-Harpin 2024-07-28 01:59:10 -04:00
  • 0dea4263aa Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-07-28 01:20:13 -04:00
  • 4c676c85e5
    llama : refactor session file management (#8699) b3483 compilade 2024-07-28 00:42:05 -04:00
  • b8a415ac8d
    Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp Austin 2024-07-27 21:45:06 -04:00
  • aa694d491f flake.lock: Update github-actions[bot] 2024-07-28 00:20:33 +00:00
  • e54c35e4fb
    feat: Support Moore Threads GPU (#8383) b3482 R0CKSTAR 2024-07-28 07:41:25 +08:00
  • 40d169874d
    Update convert_hf_to_gguf.py Robert Sinclair 2024-07-28 02:03:40 +03:00
  • 9758c5a92b
    Update convert_hf_to_gguf.py Robert Sinclair 2024-07-28 02:03:28 +03:00
  • 9cddd9aeec llama : cast seq_id in comparison with unsigned n_seq_max compilade/refactor-session-files Francis Couture-Harpin 2024-07-27 15:50:23 -04:00
  • ffd5117def llama : more graceful error handling of invalid session files Francis Couture-Harpin 2024-07-27 14:31:57 -04:00
  • a59cae21c4 Added error handling for malloc and strdup norgera 2024-07-27 12:30:07 -04:00
  • 6db4f52d1c convert-*.py: hash pytorch array as numpy without type conversion (except for bf16 which is typecasted upward) brian khuu 2024-07-28 02:29:37 +10:00
  • e5ca7e9507
    Merge branch 'ggerganov:master' into master MONONOKE 2024-07-27 23:43:11 +08:00
  • 5e2727fe03
    scripts : sync vulkan-shaders (#0) b3481 Georgi Gerganov 2024-07-27 18:08:31 +03:00
  • 56f20aa25d
    scripts : sync ggml-aarch64 sources Georgi Gerganov 2024-07-27 17:19:35 +03:00
  • 345c8c0c87 ggml : add missing semicolon (#0) b3479 Georgi Gerganov 2024-07-27 15:57:09 +03:00
  • ae7985cd7b sync : ggml Georgi Gerganov 2024-07-27 15:53:48 +03:00
  • a05ca93697 ggml : loop tiling optimizations for scalar path (ggml/898) Mahesh Madhav 2024-07-25 00:54:08 -07:00
  • 9f77d899b7 ggml: add support for float16 input tensors in pooling operations (ggml/895) Ivan Filipov 2024-07-22 14:32:02 +03:00
  • 203b7f1531 vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893) Tony Wasserka 2024-07-20 20:49:44 +02:00
  • d2b851bfa1 cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885) Borislav Stanimirov 2024-07-12 17:24:20 +03:00
  • c12b6e8ee7 ggml : remove unnecessary UNUSED macro call (ggml/880) Daniel Bevenius 2024-07-08 12:03:42 +02:00
  • e66117076c llama : add support for llama 3.1 rope scaling factors (#8676) Jeffrey Morgan 2024-07-27 05:03:45 -07:00
  • 67501cf059 llama : add function for model-based max number of graph nodes (#8622) Georgi Gerganov 2024-07-27 14:59:29 +03:00
  • 1c3e99a199 common : add --no-warmup option for main/llama-cli (#8712) Daniel Bevenius 2024-07-27 12:45:02 +02:00
  • e2b8cf83eb cann: Fix Multi-NPU execution error (#8710) wangshuai09 2024-07-27 16:36:44 +08:00
  • e702f2ff11 ggml : reduce hash table reset cost (#8698) slaren 2024-07-27 04:41:55 +02:00
  • a1cf044dd1 llama : fix order of parameters (#8706) Judd 2024-07-26 16:38:12 +08:00
  • 3395a68a2d server : add Speech Recognition & Synthesis to UI (#8679) Yaiko 2024-07-25 18:10:16 -04:00
  • fbc71e9312 examples : export-lora : fix issue with quantized base models (#8687) Xuan Son Nguyen 2024-07-25 23:49:39 +02:00
  • df106e9211 ggml: handle ggml_init failure to fix NULL pointer deref (#8692) DavidKorczynski 2024-07-25 22:23:05 +01:00
  • 1353a813cc llama : fix build + fix fabs compile warnings (#8683) Georgi Gerganov 2024-07-25 19:57:31 +03:00
  • 21905dd445 ggml : fix build on Windows with Snapdragon X (#8531) Andreas (Andi) Kunar 2024-07-25 18:01:00 +02:00