Commit graph

  • 03e566977b
    examples : fix typo in minigpt4.py (#2298) Ikko Eltociear Ashimine 2023-07-21 20:53:07 +09:00
  • 513f861953
    ggml : fix rope args order + assert (#2054) master-513f861 Georgi Gerganov 2023-07-21 14:51:34 +03:00
  • 3973b25a64
    gitignore : fix final newline Georgi Gerganov 2023-07-21 14:42:41 +03:00
  • 3d679827e7 improved memory management fixes slaren 2023-07-21 12:41:46 +02:00
  • ab0e26bdfb
    llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) master-ab0e26b Guillaume "Vermeille" Sanchez 2023-07-21 12:58:36 +02:00
  • 73643f5fb1
    gitignore : changes for Poetry users + chat examples (#2284) master-73643f5 Jose Maldonado 2023-07-21 06:53:27 -04:00
  • eef66e1d2e
    Merge branch 'master' into master Georgi Gerganov 2023-07-21 13:52:25 +03:00
  • a814d04f81
    make : fix indentation master-a814d04 Georgi Gerganov 2023-07-21 13:50:55 +03:00
  • 4c013bb738
    ci : fix MNT realpath usage (#2250) Georgi Gerganov 2023-07-21 13:48:18 +03:00
  • 56e9ae062c llama.cpp: partially restore state support, graph export slaren 2023-07-21 12:39:51 +02:00
  • 42c7c2e2e9
    make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275) master-42c7c2e Sky Yan 2023-07-21 18:38:57 +08:00
  • 78a3d13424
    flake : remove intel mkl from flake.nix due to missing files (#2277) master-78a3d13 wzy 2023-07-21 18:26:34 +08:00
  • ae178ab46b
    llama : make tensor_split ptr instead of array (#2272) master-ae178ab Georgi Gerganov 2023-07-21 13:10:51 +03:00
  • 54e3bc76fe
    make : add new target for test binaries (#2244) master-54e3bc7 Jiří Podivín 2023-07-21 12:09:16 +02:00
  • 647cef8bbd
    Merge branch 'master' into testtarget-removal Georgi Gerganov 2023-07-21 13:08:59 +03:00
  • b068f2f4b5 Adjusted look ahead in ggml_cuda_pool_malloc to 5% Iwan Kawrakow 2023-07-21 11:58:52 +03:00
  • 019fe257bb
    MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) Hatsune Miku 2023-07-21 08:13:18 +00:00
  • d3c3624c7b Better Q3_K for QK_K = 64 Iwan Kawrakow 2023-07-21 09:38:38 +03:00
  • 0099570f04 Q3_K for QK_K = 64 Iwan Kawrakow 2023-07-21 09:14:12 +03:00
  • 8dba28c00a Additional Q3_K speedup on Metal Iwan Kawrakow 2023-07-20 20:28:28 +03:00
  • 5bb23b5ab5 Faster Q3_K on Metal Iwan Kawrakow 2023-07-20 20:07:38 +03:00
  • e68c96f7fe
    Faster Q2_K on Metal (#2297) Kawrakow 2023-07-21 10:44:40 +03:00
  • 9cf022a188
    make : fix embdinput library and server examples building on MSYS2 (#2235) master-9cf022a Przemysław Pawełczyk 2023-07-21 09:42:21 +02:00
  • bdf3b6e0d7 Fixed bug in new metal Q2_K implementation Iwan Kawrakow 2023-07-21 10:21:19 +03:00
  • 1a61c1a5e1 server: allow json array in prompt or content Xiao-Yong Jin 2023-07-21 00:35:58 -05:00
  • 47031e4094 add --in-prefix-bos to prefix BOS to user inputs; keep EOS Xiao-Yong Jin 2023-07-20 22:04:06 -05:00
  • c047e8aec2 only sample full tokens (no peeking or truncation) Evan Jones 2023-07-19 23:28:57 -04:00
  • 37d3f6a260 remove unused code slaren 2023-07-21 02:33:06 +02:00
  • cd6f5dec92 improved memory management slaren 2023-07-21 00:28:49 +02:00
  • ad97ee3676
    Fix flake build on darwin Charles Duffy 2023-07-20 14:52:46 -05:00
  • 3432e378d5 Replace VMA library with native Vulkan buffer management 0cc4m 2023-07-20 21:57:33 +02:00
  • d45c1631bc
    metal : rewrite to fit new backend interface correctly (WIP) ggml-backends-metal Georgi Gerganov 2023-07-20 16:36:33 +03:00
  • b5b133723a Don't free before queue done 0cc4m 2023-07-20 19:32:17 +02:00
  • 66c2632f1d
    examples : fix typo in minigpt4.py Ikko Eltociear Ashimine 2023-07-21 02:32:00 +09:00
  • 417546c8c3 Deleting unnoticed and dangereous trailing white space Iwan Kawrakow 2023-07-20 20:30:50 +03:00
  • 09b51fc648 Faster Q2_K on Metal Iwan Kawrakow 2023-07-20 18:53:45 +03:00
  • 0f8d5aa091
    Update README.md repo-reviews 2023-07-20 17:26:08 +02:00
  • e782c9e735
    Faster Q5_K and Q6_K on Metal (#2294) Kawrakow 2023-07-20 18:19:45 +03:00
  • 1cdbbbb37c Custom RoPE + bettter memory management for CUDA Iwan Kawrakow 2023-07-20 17:52:27 +03:00
  • de69f8f20d initial implementation of delayed graph allocation slaren 2023-07-20 15:57:48 +02:00
  • 5f2e4bd8ba Another Q5_K speedup Iwan Kawrakow 2023-07-20 16:33:15 +03:00
  • 463f420710 Faster Q5_K on Metal Iwan Kawrakow 2023-07-20 16:09:40 +03:00
  • 06c08576f7 Merge remote-tracking branch 'origin/master' into concedo_experimental Concedo 2023-07-20 21:02:40 +08:00
  • f036109110 script for henky Concedo 2023-07-20 21:02:12 +08:00
  • fa9d54e36e Faster Q6_K on Metal Iwan Kawrakow 2023-07-20 16:00:22 +03:00
  • e4db70720d
    [wip] chat now has parameter and cfg Henri Vasserman 2023-07-20 15:37:31 +03:00
  • 785829dfe8
    Faster Q4_K on Metal (#2290) Kawrakow 2023-07-20 15:18:43 +03:00
  • cb82adadb8
    metal : first working version of the inference without prompt processing Georgi Gerganov 2023-07-20 14:56:29 +03:00
  • 290cb700bf
    metal : map the CPU buffers to Metal buffers (WIP) Georgi Gerganov 2023-07-20 14:30:34 +03:00
  • 8e03cfcb6a Faster Q4_K on Metal Iwan Kawrakow 2023-07-20 14:19:08 +03:00
  • fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models master-fff0e0e Georgi Gerganov 2023-07-20 13:47:26 +03:00
  • 417a85a001
    metal: minor q4 optimization and reduce code size (#2248) Shouzheng Liu 2023-07-20 06:32:22 -04:00
  • e85557f798 launcher for rope Concedo 2023-07-20 17:45:50 +08:00
  • 4379ed7085 Miku.sh: Switch sampler to mirostat_v2 and tiny prompt improvements at8u 2023-07-20 08:13:21 +01:00
  • 39dc1a46c4 added token count, updated lite Concedo 2023-07-20 14:41:06 +08:00
  • 569916ab44 Little changes in .gitignore for Poetry users A fix in Makefile for FreeBSD users. In the platfrom x86_64 is amd64. This fix resolve compilation using CFLAGS and CXXFLAGS with -march=native and -mtune=native Add two examples for interactive mode using Llama2 models (thx TheBloke for models) Jose Yukiteru Amano 2023-07-20 00:11:30 -04:00
  • f3f2e8eee3 metal: use template to reduce size lshzh-ww 2023-07-19 23:16:18 -04:00
  • ea0ea9ad36
    Merge branch 'ggerganov:master' into server-improve-yazan Yazan Agha-Schrader 2023-07-20 05:00:47 +02:00
  • 082dd81286
    [wip] chat improvements Henri Vasserman 2023-07-20 03:48:48 +03:00
  • cb205c0d13 automatically calculate compute buffer sizes (without graph allocator) slaren 2023-07-20 02:22:54 +02:00
  • 77ac8deaf1 llama.cpp: remove backend-specific code where possible slaren 2023-07-20 00:59:26 +02:00
  • 43694ca867
    consistent semicolons Henri Vasserman 2023-07-20 00:58:16 +03:00
  • 890d1b8446
    Merge master into server-cfg Henri Vasserman 2023-07-20 00:48:03 +03:00
  • dd3cf5760a
    last n tokens done Henri Vasserman 2023-07-20 00:36:36 +03:00
  • 42591a0acd
    remove "smooth factor" Henri Vasserman 2023-07-20 00:02:13 +03:00
  • 2cb8469e7f
    refactor evaluation logic Henri Vasserman 2023-07-19 23:45:40 +03:00
  • 9e97cb0baf Don't force aligned matmul 0cc4m 2023-07-19 21:59:03 +02:00
  • 105fd199be Use pinned memory for f16 preprocessing 0cc4m 2023-07-19 21:03:11 +02:00
  • 1e78b1b0a1 remove cfg smooth factor as it is only a reparameterization of the guidance scale Guillaume Sanchez 2023-07-19 16:50:32 +00:00
  • f38433ef5d
    Merge remote-tracking branch 'origin/ggml-backends' into ggml-backends-metal Georgi Gerganov 2023-07-19 17:45:45 +03:00
  • 02de94ef82
    Remove intel mkl from flake.nix due to missing files Wu Zhenyu 2023-07-19 21:33:08 +08:00
  • 70c55c17c7
    metal : create backend, mostly reuse CPU backend interface Georgi Gerganov 2023-07-19 16:47:43 +03:00
  • c49a469a79 updated lite Concedo 2023-07-19 21:13:00 +08:00
  • 6065f5dd15 Miku.sh: Add in-prefix/in-suffix opts at8u 2023-07-19 13:48:27 +01:00
  • 187b7dd297 Miku.sh: Set ctx_size to 4096 at8u 2023-07-19 13:37:39 +01:00
  • df15fcb598 Support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN Yan Lin 2023-07-19 20:25:49 +08:00
  • 79479bd201 Miku.sh: Set default model to llama-2-7b-chat at8u 2023-07-19 13:20:54 +01:00
  • 2a88d6d3ec Merge remote-tracking branch 'ycros/api-modelbusy-fix' into concedo_experimental Concedo 2023-07-19 18:32:13 +08:00
  • 13e34d5058 Merge remote-tracking branch 'origin/master' into concedo_experimental Concedo 2023-07-19 18:28:29 +08:00
  • e9467f5a44 auto rope scale adjustments, added sched yield fix for apple, adjust warning for mirostat Concedo 2023-07-19 16:44:44 +08:00
  • e4903957ec Add vectorized loading and zeropadding for matrix multiplication 0cc4m 2023-07-19 10:13:51 +02:00
  • 63ba9f3306
    llama : make tensor_split ptr instead of array Georgi Gerganov 2023-07-19 10:25:41 +03:00
  • 294f424554
    llama : extend API to get max devices at runtime (#2253) vbatts-ggmlv3-2023-july master-294f424 Rinne 2023-07-19 15:06:40 +08:00
  • 45a1b07e9b
    flake : update flake.nix (#2270) master-45a1b07 wzy 2023-07-19 15:01:55 +08:00
  • b1f4290953
    cmake : install targets (#2256) master-b1f4290 wzy 2023-07-19 15:01:11 +08:00
  • 3eefb221b0
    Update flake.nix Wu Zhenyu 2023-07-19 14:12:53 +08:00
  • 0d7240b320 modified rope for cuda Concedo 2023-07-19 14:16:27 +08:00
  • 8d37755bdc add inverse char ranges Evan Jones 2023-07-18 21:54:44 -04:00
  • 295f85654a allocators wip renamed ggml_backend functions changed ggml_buffer and ggml_backend to always be used as pointers rename ggml_tensor::params -> op_params slaren 2023-07-17 19:03:51 +02:00
  • 374fffb9c6 Reworking rope WIP Concedo 2023-07-19 00:54:41 +08:00
  • 63ec354ad1
    Fix #2252, add install() to CMakeLists.txt Wu Zhenyu 2023-07-18 15:51:06 +08:00
  • ed960fa1ab
    llama : separate compute buffer for metal Georgi Gerganov 2023-07-18 19:19:59 +03:00
  • 652c849643
    ggml : add is_ram_shared to ggml_backend Georgi Gerganov 2023-07-18 18:51:02 +03:00
  • 90503f150d
    llama : init metal backend as CPU backend for now Georgi Gerganov 2023-07-18 17:52:13 +03:00
  • 0a3861c47b
    metal : adapting to ggml_backend (WIP) Georgi Gerganov 2023-07-18 16:54:41 +03:00
  • 0a11f50da8 reenabled sched_yield, reduced sampler warning msg to once per session Concedo 2023-07-18 20:26:18 +08:00
  • d01bccde9f
    ci : integrate with ggml-org/ci (#2250) master-d01bccd Georgi Gerganov 2023-07-18 14:24:43 +03:00
  • 6d32e7fc8b Merge commit 'a6803cab94' into concedo_experimental Concedo 2023-07-18 19:12:06 +08:00
  • 775fb18857
    ci : update README Georgi Gerganov 2023-07-18 13:58:13 +03:00
  • 37855781fd updated runtimes to henky version Concedo 2023-07-18 18:48:54 +08:00