Commit graph

  • d919c6da2d Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h root 2024-02-06 08:54:14 +00:00
  • 78dd88a3ff
    format Abhilash Majumder 2024-02-06 14:08:06 +05:30
  • 815b956c54
    add nv/amd sycl target build cmd Abhilash Majumder 2024-02-06 14:02:50 +05:30
  • 4ffc7a17d4
    server : various fixes for the prompt field in /completion (#5300) b2076 Niall Coates 2024-02-06 08:16:23 +00:00
  • f9cf07881a correct minicpm model type (size) vincent 2024-02-06 16:07:38 +08:00
  • d3cc153362 Q5_K: slightly better quantization Iwan Kawrakow 2024-02-06 09:35:54 +02:00
  • 95a492a8c5
    feat: add new GGUFValueType.OBJ virtual type Riceball LEE 2024-01-26 16:32:12 +08:00
  • f58d49e5ce Q4_K: slightly better quantization Iwan Kawrakow 2024-02-06 07:53:10 +02:00
  • 906cff55c2
    py : handle byte tokens in get_token_type (#5341) Georgi Gerganov 2024-02-06 07:47:22 +02:00
  • 40b9ba117c
    Merge branch 'ggerganov:master' into master hsnmkls 2024-02-06 13:32:24 +08:00
  • 52c81e4365
    Update README.md Ben Williams 2024-02-05 17:07:47 -08:00
  • b5d00541f5 fix for editorconfig vincent 2024-02-06 07:31:42 +08:00
  • 05aef26850 remove convert-minicpm.py vincent 2024-02-06 07:28:00 +08:00
  • 319ab9d18c fix for flake8 lint vincent 2024-02-06 07:27:15 +08:00
  • d5f757b996
    Update README.md Michael Coppola 2024-02-05 16:30:46 -05:00
  • 2d8fcd8a6f server: added dynatemp_range and dynatemp_exponent Michael Coppola 2024-02-05 16:28:12 -05:00
  • c5239333be
    Fix possible typo in README-sycl.md valiray 2024-02-05 15:27:56 -06:00
  • 087ae64e5e Only do device info print in the beginning and initialize one backend for cpu assist 0cc4m 2024-02-05 20:52:47 +01:00
  • 034403de71 include total "num_slots" in default_generation_settings_for_props Justin Parker 2024-02-05 13:42:23 -05:00
  • 098f6d737b
    make: Use ccache for faster compilation (#5318) b2074 Johannes Gäßler 2024-02-05 19:33:00 +01:00
  • adcf16fd68
    py : fix empty bytes arg gg/convert-fix-byte-tokens Georgi Gerganov 2024-02-05 19:53:07 +02:00
  • b698d87e9a fix bug for quantize minicpm vincent 2024-02-06 00:41:52 +08:00
  • 36084596c1 try to make tokenizer work vincent 2024-02-06 00:10:27 +08:00
  • ec4507a99c convert minicpm model via convert-hf-gguf.py vincent 2024-02-05 23:20:19 +08:00
  • 78b00dda6c
    README: updated introduction (#5343) Johannes Gäßler 2024-02-05 15:55:10 +01:00
  • 853b6b980d
    readme : update Georgi Gerganov 2024-02-05 16:34:08 +02:00
  • 550ab5e1c5 fix tab/space typo. vincent 2024-02-05 22:16:39 +08:00
  • 60ed8191ff support minicpm arch. vincent 2024-02-05 22:10:27 +08:00
  • 05c9cd81a9 README: updated introduction JohannesGaessler 2024-02-05 14:18:09 +01:00
  • c6b395535a
    ggml : make use of ggml-quants.h possible in C++ code (#5338) b2072 Kawrakow 2024-02-05 14:09:47 +02:00
  • 0c3519912b fixup! fixup! make: Use ccache for faster compilation JohannesGaessler 2024-02-05 13:03:21 +01:00
  • ded2ad5b88
    py : handle byte tokens in get_token_type Georgi Gerganov 2024-02-05 13:42:54 +02:00
  • 91c453fb11 One cannot possibly be defining static_assert in a C++ compilation ik/ggml-quants-cpp Iwan Kawrakow 2024-02-05 13:20:49 +02:00
  • abb61944a5
    ggml : avoid duplicating function calls using MIN/MAX macros (#5325) b2071 Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -05:00
  • 522660eaa3
    Update ggml.c Georgi Gerganov 2024-02-05 13:13:12 +02:00
  • 51f3a8568b fixup! make: Use ccache for faster compilation JohannesGaessler 2024-02-05 12:01:01 +01:00
  • 853dbf17cd
    Spelling JohnnyB 2024-02-05 10:35:00 +00:00
  • 89503dcb5f
    iq3_xxs: quards for the no-imatrix situation (#5334) b2070 Kawrakow 2024-02-05 12:32:27 +02:00
  • 34432a39a8
    Fix spelling JohnnyB 2024-02-05 10:32:17 +00:00
  • 44bf949248 Make use of ggml-quants.h possible in C++ code Iwan Kawrakow 2024-02-05 11:22:10 +02:00
  • 7e1ae372f3
    py : fix internlm2-hf convert to gguf (#5305) Guoteng 2024-02-05 17:04:06 +08:00
  • 6fdfa2ecc6
    iq2_xxs: tune quantization (#5320) b2068 Kawrakow 2024-02-05 10:46:06 +02:00
  • 7278b0e5a2 iq3_xxs: quards for the no-imatrix situation Iwan Kawrakow 2024-02-05 10:32:38 +02:00
  • 01d17ff137
    Merge branch 'ggerganov:master' into Simple_chat_templates Christian Gollwitzer 2024-02-05 09:27:02 +01:00
  • a2d60c9158
    server : allow to get default generation settings for completion (#5307) b2067 Alexey Parfenov 2024-02-05 08:10:22 +00:00
  • e6f8177532
    common : add dynamic temperature parameters to main example cli (#5295) b2066 l3utterfly 2024-02-05 17:00:47 +09:00
  • 30679d438d
    scripts : fix typos, cleanup (#5303) Georgi Gerganov 2024-02-05 09:48:03 +02:00
  • 4be04c8965
    scripts : add non-interactive server-llm.sh (#5303) Нияз Гарифзянов 2024-02-05 10:43:57 +03:00
  • 6662929be2
    Update scripts/server-llm.sh Georgi Gerganov 2024-02-05 09:43:23 +02:00
  • 5d55b0cd82
    readme : add CodeShell models to the supported models list (#5330) chiranko 2024-02-05 15:41:38 +08:00
  • a5aa793ed4 quantize: add iq3_xxs warning ymcui 2024-02-05 15:14:40 +08:00
  • 4833ac209d
    [SYCL] Fix cpy with dims of 3 (#5289) b2062 AidanBeltonS 2024-02-05 07:08:24 +00:00
  • c7220fc61e
    rm asserts Abhilash Majumder 2024-02-05 09:38:27 +05:30
  • 7bd7abc57f add CodeShell models to the supported models list chiranko 2024-02-05 04:01:54 +00:00
  • daa6a9c303 generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h slaren 2024-02-05 02:16:19 +01:00
  • 2f2191fb70 Add minimal chat template support in server UI (pull-down menu) chris 2024-02-04 23:47:29 +01:00
  • c71316f825 Reduce code duplication in tensor split layer assignment 0cc4m 2024-02-04 21:57:13 +01:00
  • 43975144d6 Avoid duplicating function calls when using MIN/MAX macros. Tom 7 2024-02-04 14:39:06 -05:00
  • a1f9c008db Add further missing cleanup code 0cc4m 2024-02-04 19:11:59 +01:00
  • 5a1ad8c3e5 Add names to backend device functions 0cc4m 2024-02-04 18:17:21 +01:00
  • 770e435e1c
    Merge branch 'ggerganov:master' into master hsnmkls 2024-02-05 01:02:28 +08:00
  • 9392ebd49e flake.lock: Update b2061 github-actions[bot] 2024-02-04 00:17:24 +00:00
  • 15c2df9126
    Modify the vocab selection algorithm. Sang-Kil Park 2024-02-05 00:50:27 +09:00
  • df25b638a2
    Fix padding Raphaël Bourgeat 2024-02-04 16:32:36 +01:00
  • e72defac41 Work with Mac M1 krolhm 2024-02-04 16:21:33 +01:00
  • ca8110cfc9 Initial Vulkan multi-gpu implementation 0cc4m 2024-02-04 12:35:01 +01:00
  • edf46a38ff
    Merge branch 'ggerganov:master' into master hsnmkls 2024-02-04 18:56:19 +08:00
  • 49a483e0f2
    wip gg/flash-attn-interleave-cc Georgi Gerganov 2024-02-04 12:34:36 +02:00
  • a647257b47
    cuda : express strides with helper constants gg/flash-attn-32x8 Georgi Gerganov 2024-02-04 11:08:47 +02:00
  • 1846e92a90
    cuda : minor Georgi Gerganov 2024-02-04 09:57:58 +02:00
  • 5ed26e1fc9
    Adding some imatrix tools (#5302) b2060 Kawrakow 2024-02-04 10:39:58 +02:00
  • 89cba4796c make: Use ccache for faster compilation JohannesGaessler 2024-02-04 01:13:04 +01:00
  • f3798f7736 iq2_xxs: tune quantization Iwan Kawrakow 2024-02-04 09:39:47 +02:00
  • 277fad30c6
    cmake : use set() for LLAMA_WIN_VER (#5298) b2059 Welby Seely 2024-02-03 23:18:51 -05:00
  • 4ddaedac03 flake.lock: Update github-actions[bot] 2024-02-04 00:17:24 +00:00
  • ed6b91d57c Fix for LLAMA_WIN_VER default value, fixes #5158 Welby Seely 2024-02-03 02:00:38 -05:00
  • d7285be0e9
    Merge f3098f1a32 into 3c0d25c475 Nathan Ringo 2024-02-03 15:11:33 -05:00
  • 3c0d25c475
    make: add nvcc info print (#5310) b2058 Johannes Gäßler 2024-02-03 20:15:13 +01:00
  • 3cc5ed353c
    make: fix nvcc optimization flags for host code (#5309) b2057 Johannes Gäßler 2024-02-03 20:14:59 +01:00
  • 60ecf099ed add Vulkan support to Nix flake Martin Schwaighofer 2024-01-28 12:59:43 +01:00
  • 54dd7daed2 ggml-ci 877825076@qq.com 2024-02-04 01:25:03 +08:00
  • a57445bf48 make: add nvcc info print JohannesGaessler 2024-02-03 18:40:19 +01:00
  • 1582e4e373 Merge branch 'build.zig' Hasan Mukhlis 2024-02-04 01:22:45 +08:00
  • 5444f24893 Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) 0cc4m 2024-02-03 18:15:00 +01:00
  • a19a4b2a2f update build.zig to master build Hasan Mukhlis 2024-02-04 01:19:45 +08:00
  • e920ed393d
    Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) b2055 0cc4m 2024-02-03 18:15:00 +01:00
  • 7bab17262c make: fix nvcc optimization flags for host code JohannesGaessler 2024-02-03 18:05:58 +01:00
  • ef68fac2a8
    cuda : fix matrix names Georgi Gerganov 2024-02-03 18:36:58 +02:00
  • cfd9732b2e
    cuda : simplify softmax Georgi Gerganov 2024-02-03 18:31:55 +02:00
  • e04ff39181
    cuda : fix -INF block check Georgi Gerganov 2024-02-03 16:57:46 +02:00
  • 5b263dd83a
    cuda : unroll Q*K^T loop Georgi Gerganov 2024-02-03 16:12:20 +02:00
  • e59f825872 py : fix internlm2-hf convert to gguf 877825076@qq.com 2024-02-03 22:06:15 +08:00
  • fe0524f240 Also add Vulkan run tests command to Makefile and CMakeLists.txt 0cc4m 2024-02-03 15:01:40 +01:00
  • 91e7766848 Disable Vulkan async backend functions for now 0cc4m 2024-02-03 14:43:21 +01:00
  • 3b1c4e7673
    cuda : speed-up reduce part of the kernel Georgi Gerganov 2024-02-03 15:36:05 +02:00
  • a7b471569b
    cuda : switch to 1 warp for bs > 16 Georgi Gerganov 2024-02-03 15:17:49 +02:00
  • b958151e3f
    cuda : use half2 in softmax Georgi Gerganov 2024-02-03 15:00:25 +02:00
  • c51f27c0db
    cuda : avoid __hisinf branches Georgi Gerganov 2024-02-03 14:27:36 +02:00
  • dcc179ed1c
    server: allow to get default generation settings for completion ZXED 2024-02-03 14:53:22 +03:00
  • 92472ea22c
    cuda : unroll some of the loops Georgi Gerganov 2024-02-03 14:10:01 +02:00