Commit graph

  • 1f3512de68
    ppl : add --chunks argument to limit max number of chunks Georgi Gerganov 2023-07-18 13:48:00 +03:00
  • da4a773cbc
    ci : add README.md Georgi Gerganov 2023-07-18 13:30:26 +03:00
  • 9e8392a0c0
    ci : add short perplexity tests Georgi Gerganov 2023-07-18 12:55:21 +03:00
  • fd90d52127 API: Replace modelbusy bool with a lock. Ycros 2023-07-18 20:09:50 +10:00
  • 3d90f9f166
    ci : add K-quants Georgi Gerganov 2023-07-18 11:47:45 +03:00
  • a404142aec
    tests : try to fix tail free sampling test Georgi Gerganov 2023-07-17 17:35:01 +03:00
  • d7d1828613
    ci : add open llama 3B-v2 tg tests for q4 and q5 quantizations Georgi Gerganov 2023-07-17 17:13:11 +03:00
  • 5fd6650af4
    ci : disable wget progress output Georgi Gerganov 2023-07-17 17:03:41 +03:00
  • 68d4dd301d
    ci : add open llama 3B-v2 tests Georgi Gerganov 2023-07-17 16:46:56 +03:00
  • d2c3214a1a
    ci : run ctest Georgi Gerganov 2023-07-17 16:25:50 +03:00
  • 6cbf9dfb32
    llama : shorten quantization descriptions master-6cbf9df Georgi Gerganov 2023-07-18 11:50:49 +03:00
  • 64b8aafce1 support bpe tokenizer in convert, fix ldwang 2023-07-18 11:18:12 +08:00
  • 3db70b5f0a
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-18 01:54:17 +03:00
  • 8d351b8bd8 Merge upstream changes, fix conflict 0cc4m 2023-07-17 22:14:22 +02:00
  • 7568d1a2b2
    Support dup & cont ops on CUDA (#2242) master-7568d1a Jiahao Li 2023-07-18 01:39:29 +08:00
  • f0a8ba0414
    Add an api to get max devices at runtime. Yaohui Liu 2023-07-17 22:59:54 +08:00
  • 1102ff56db fix double-free with --no-mmap slaren 2023-07-17 12:00:17 +02:00
  • 4e94af3060 improve layer backend printing with ranges slaren 2023-07-17 11:53:01 +02:00
  • c2beeb8e3a only allocate as much memory as is required in each backend for the model slaren 2023-07-17 11:18:19 +02:00
  • 4088df14ca metal: update rms_norm kernel lshzh-ww 2023-07-16 22:28:59 -04:00
  • 2c9385289e More changes Howard Su 2023-07-17 09:47:31 +08:00
  • b7647436cc
    llama : fix t_start_sample_us initialization warning (#2238) master-b764743 Alex Klinkhamer 2023-07-16 14:01:45 -07:00
  • 1ea010a5c3
    llama : fix t_start_sample_us initialization warning grencez 2023-07-16 13:57:17 -07:00
  • 672dda10e4
    ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219) master-672dda1 Qingyou Meng 2023-07-17 03:57:28 +08:00
  • 27ab66e437
    py : turn verify-checksum-models.py into executable (#2245) Jiří Podivín 2023-07-16 21:54:47 +02:00
  • 36ffa16130 Turning verify-checksum-models.py into executable Jiri Podivin 2023-07-16 21:40:53 +02:00
  • 6035abe170 Setting new target for test binaries Jiri Podivin 2023-07-16 17:46:37 +02:00
  • 9c72e7e916 rebase to master (except ggml-cuda) slaren 2023-07-16 14:36:32 +02:00
  • 33ab185dd1 fix NVCC version on Makefile, __halves2half2 -> make_half2 slaren 2023-07-16 00:20:43 +02:00
  • 24cc6f008f minor fixes slaren 2023-07-15 19:04:37 +02:00
  • 5765d7a587 restore simple.cpp for now slaren 2023-07-15 12:44:47 +02:00
  • 0d2b66c638 ggml backend interface wip slaren 2023-07-10 17:32:06 +02:00
  • 929ae2017f Support dup & cont ops on CUDA lijiahao 2023-07-16 19:37:14 +08:00
  • 362da7b310 cmake : fix server example building on MSYS2 Przemyslaw Pawelczyk 2023-07-16 10:52:24 +02:00
  • 931a8921de Fix F32 matmul 0cc4m 2023-07-16 07:55:53 +02:00
  • cdba17d262
    chore (ci): remove release from ci Hunter LaTourette 2023-07-15 22:12:53 -04:00
  • b477e17406
    docs (gh): remove ISSUE_TEMPLATE Hunter LaTourette 2023-07-15 22:12:35 -04:00
  • ff29017f48
    chore (*): remove pocs Hunter LaTourette 2023-07-15 21:41:56 -04:00
  • 656449e4ea make : fix embdinput library and server examples building on MSYS2 Przemyslaw Pawelczyk 2023-07-16 01:25:56 +02:00
  • 6bfbdf84ce fix NVCC version on Makefile, __halves2half2 -> make_half2 slaren 2023-07-16 00:20:43 +02:00
  • 88c88778ad Fix macro expansion on gcc grahameth 2023-07-15 23:02:53 +02:00
  • f58fa51fd0 Increase matmul test runs for consistent results 0cc4m 2023-07-15 22:46:24 +02:00
  • 2dfe0aefc6 add log_callback to llama_context_params for custom logging. grahameth 2023-07-15 20:48:36 +02:00
  • 22a4cb7f03 Handle stage flags during command buffer submission properly 0cc4m 2023-07-15 22:00:47 +02:00
  • 39edee5136 Add flag to make reverse prompt case insensitive Dewi Jones 2023-07-15 19:56:57 +00:00
  • 83595ecbd6 minor fixes slaren 2023-07-15 19:04:37 +02:00
  • 5d03303bdc remove ifdef GGML_PERF; update fmt mqy 2023-07-15 17:36:39 +08:00
  • 09ab5c1718 restore simple.cpp for now slaren 2023-07-15 12:44:47 +02:00
  • fea4e9d25e ggml backend interface wip slaren 2023-07-10 17:32:06 +02:00
  • 6e7cca4047
    llama : add custom RoPE (#2054) master-6e7cca4 Xiao-Yong Jin 2023-07-15 06:34:16 -04:00
  • 6024bccdb9
    ggml : fix asserts Georgi Gerganov 2023-07-15 11:28:37 +03:00
  • ad3d28ee0a Reuse semaphores 0cc4m 2023-07-15 09:18:54 +02:00
  • 0c4d841cf1 Fix synchronization on AMD, add barriers for buffer ownership transfer, add debug flag and prints 0cc4m 2023-07-15 09:06:53 +02:00
  • d0b6c942fc
    style : minor fixes, mostly indentations Georgi Gerganov 2023-07-15 09:57:35 +03:00
  • bbce392890 metal: use uint16_t instead of uint8_t. lshzh-ww 2023-07-15 02:15:02 -04:00
  • ee6bc1426e support bpe tokenizer in convert ldwang 2023-07-15 14:14:00 +08:00
  • d7aab2e900 support bpe tokenizer in convert ldwang 2023-07-15 14:12:25 +08:00
  • a6803cab94
    flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -04:00
  • 7dabc66f3c
    make : use pkg-config for OpenBLAS (#2222) master-7dabc66 wzy 2023-07-15 03:05:08 +08:00
  • 7cdd30bf1f
    cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) master-7cdd30b Bach Le 2023-07-15 03:00:58 +08:00
  • e8035f141e
    ggml : fix static_assert with older compilers #2024 (#2218) master-e8035f1 Evan Miller 2023-07-14 14:55:56 -04:00
  • 7513b7b0a1
    llama : add functions that work directly on model (#2197) master-7513b7b Bach Le 2023-07-15 02:55:24 +08:00
  • de8342423d
    build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -07:00
  • c48c525f87
    examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +08:00
  • 206e01de11
    cuda : support broadcast add & mul (#2192) master-206e01d Jiahao Li 2023-07-15 02:38:24 +08:00
  • 4fc401434e
    Merge branch 'master' into bcast-cuda Georgi Gerganov 2023-07-14 21:33:19 +03:00
  • 4abdcd5479 adds runHook preInstall/postInstall to installPhase so hooks function Dave Della Costa 2023-07-14 13:59:44 -04:00
  • 4304bd3cde
    CUDA: mul_mat_vec_q kernels for k-quants (#2203) master-4304bd3 Johannes Gäßler 2023-07-14 19:44:08 +02:00
  • 229aab351c
    make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) master-229aab3 James Reynolds 2023-07-14 11:34:40 -06:00
  • b16785f713
    Merge 2777168618 into 697966680b Chad Brewbaker 2023-07-14 18:49:17 +02:00
  • 2a1add9374
    Fix #2221, use pkg-config Wu Zhenyu 2023-07-14 22:24:36 +08:00
  • 268d03b447 Allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer Bach Le 2023-07-14 21:59:29 +08:00
  • 697966680b
    ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) master-6979666 Georgi Gerganov 2023-07-14 16:36:41 +03:00
  • 218ab9ef89 fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG mqy 2023-07-14 20:21:53 +08:00
  • 9234f32bea Fix static_assert with older compilers #2024 Evan Miller 2023-07-14 06:57:02 -04:00
  • 27ad57a69b
    Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +03:00
  • da730c53bf Merge branch 'custom_rope' of github.com:jxy/llama.cpp into custom_rope Xiao-Yong Jin 2023-07-13 19:54:50 -04:00
  • a6b5695764 Merge remote-tracking branch 'upstream/master' into custom_rope Xiao-Yong Jin 2023-07-13 19:52:28 -04:00
  • 4cae9f5673
    Port CFG to server. Henri Vasserman 2023-07-13 22:37:57 +03:00
  • 0c5a305d9d build.zig: install config header Ali Chraghi 2023-07-13 22:58:49 +03:30
  • 358dcf0934 CUDA: mul_mat_vec_q kernels for k-quants JohannesGaessler 2023-07-11 20:58:44 +02:00
  • 3a13d1e829
    Apply formatting. Henri Vasserman 2023-07-13 20:24:31 +03:00
  • 32c5411631
    Revert "Support using mmap when applying LoRA (#2095)" (#2206) master-32c5411 Howard Su 2023-07-13 21:58:25 +08:00
  • ff5d58faec
    Fix compile error on Windows CUDA (#2207) master-ff5d58f Howard Su 2023-07-13 21:58:09 +08:00
  • b782422a3e
    devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +02:00
  • 495245bc32
    examples: fixed path typos in embd-input Shangning Xu 2023-07-13 21:05:51 +08:00
  • 2ec4466db5
    Update build flags. Henri Vasserman 2023-07-13 13:44:02 +03:00
  • cd36b185ff
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-13 13:03:01 +03:00
  • 08812fe2e6 Oops, forgot to delete the original Q4_1 kernel Iwan Kawrakow 2023-07-13 11:33:34 +02:00
  • 0f7967089f 7-25% faster Q4_1 on Metal Iwan Kawrakow 2023-07-13 10:51:45 +02:00
  • 585ac35b42 3-5% faster Q4_0 on Metal Iwan Kawrakow 2023-07-13 10:32:19 +02:00
  • 6fb06123dc
    Enable LLAMA_METAL and LLAMA_MPI in Makefile James Reynolds 2023-07-12 19:41:46 -06:00
  • cac6746cc6 Fix compile error on Windows CUDA Howard Su 2023-07-13 08:19:37 +08:00
  • 183de43647 Revert "Support using mmap when applying LoRA (#2095)" Howard Su 2023-07-13 07:52:29 +08:00
  • 7e01bc5ec6 Use loader field for clarity. Spencer Sutton 2023-07-12 18:33:59 -04:00
  • 421cc6cc01 Change comment Spencer Sutton 2023-07-12 18:26:06 -04:00
  • c14cde156e Rename parameter Spencer Sutton 2023-07-12 18:07:33 -04:00
  • f6c4e8dd6a Set different mmap flags for lora/non-lora Spencer Sutton 2023-07-12 17:56:54 -04:00
  • b3c1434d2e
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-12 22:12:18 +02:00
  • 48e3e99ea0 fix codre readability mendax0110 2023-07-12 22:11:24 +02:00