Commit graph

  • 9281c2801f
    fix: don't send headers twice when streaming duncannah 2023-08-02 23:42:43 +02:00
  • f8f0d59765 Update Vim plugin Austin Mroz 2023-08-02 16:33:17 -05:00
  • 44bbc85aaf Add missing barrier 0cc4m 2023-08-02 22:04:43 +02:00
  • 468ea24fb4
    CUDA: faster non k-quant mul_mat_q kernels (#2483) master-468ea24 Johannes Gäßler 2023-08-02 18:04:04 +02:00
  • eec1ef8738 Fix missing abstract methods Keiichi TABATA 2023-08-03 00:14:30 +09:00
  • d6154f5b3a CUDA: faster non k-quant mul_mat_q kernels JohannesGaessler 2023-08-01 09:22:31 +02:00
  • b2eaec4261 updated lite Concedo 2023-08-02 22:54:17 +08:00
  • 4f6b60c776
    CUDA: Fix models with output size != 32000 (#2480) master-4f6b60c Johannes Gäßler 2023-08-02 16:48:10 +02:00
  • 4c90fdc5cd Merge remote-tracking branch 'johannes/cuda-fix-output-size' into concedo_experimental Concedo 2023-08-02 22:37:41 +08:00
  • 6fe92318f8 Merge branch 'master' into concedo_experimental Concedo 2023-08-02 22:36:00 +08:00
  • df659f6bef cleaning up code a little bit with removing extra printfs needed during debug Aniket 2023-08-02 09:16:00 -04:00
  • 48bea64d47 server : update index.html.hpp Jhen 2023-08-02 18:01:58 +08:00
  • c5ba5efda2
    convert-llama-h5-to-gguf.py : special tokens klosax 2023-08-02 11:26:07 +02:00
  • e1e9b28547
    convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens klosax 2023-08-02 11:15:33 +02:00
  • cc1ae32d41 server : Fix regenerated prompt Jhen 2023-08-02 16:34:00 +08:00
  • 6c798db041 added stream saving context data to file to avoid allocating unnecessary amounts of memory l3utterfly 2023-08-02 16:41:25 +08:00
  • 8772c255ab make use_buff and get_buf_max_mem static mendax0110 2023-08-02 10:33:16 +02:00
  • 750299726d server : adjust for dark/light mode Jhen 2023-08-02 16:29:15 +08:00
  • a51d1a416c
    Merge branch 'ggerganov:master' into master m3ndax 2023-08-02 10:28:12 +02:00
  • 1e64d511d5 CUDA: Fix models with output size != 32000 JohannesGaessler 2023-08-01 15:11:36 +02:00
  • 220d931864
    readme : add Aquila-7B model series to supported models (#2487) ldwang 2023-08-02 16:21:11 +08:00
  • 368c41cb5b server : make n_probs max to 10 for easy scroll Jhen 2023-08-02 16:14:40 +08:00
  • c3a65c4bbe gguf-util.h : update note M. Yusuf Sarıgöz 2023-08-02 11:16:23 +03:00
  • 7f02fead8c server : handle bytes Jhen 2023-08-02 16:14:10 +08:00
  • cf365fbc20 gguf : gguf counterpart of llama-util.h M. Yusuf Sarıgöz 2023-08-02 11:13:56 +03:00
  • 81844fbcfd
    tests : Fix compilation warnings (Linux/GCC) (#2451) master-81844fb Eve 2023-08-02 04:06:19 -04:00
  • d37be8dc9e server : implement Probabilites Jhen 2023-08-02 15:57:19 +08:00
  • b9b6cd2f21 server : fix completion_probabilities undefined if not set n_probs Jhen 2023-08-02 15:53:10 +08:00
  • 2a5bab4c9f server : add simple popover component Jhen 2023-08-02 15:46:49 +08:00
  • 6a8b9c27d4 server : keep message data array & show in probabilites component Jhen 2023-08-02 15:43:33 +08:00
  • 7862118886 server : add n_probs param in chat UI Jhen 2023-08-02 15:33:47 +08:00
  • 803c2ff7bf Up Aquila-7B models in README.md ldwang 2023-08-02 15:18:36 +08:00
  • 35ed27b1af Add Aquila-7B models in README.md ldwang 2023-08-02 14:48:03 +08:00
  • 128b2f1e47
    Merge branch 'ggerganov:master' into master ldwang 2023-08-02 14:19:50 +08:00
  • a312193e18
    readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475) Yiming Cui 2023-08-02 14:18:31 +08:00
  • 455b8d58b3
    Merge 5dc35d3b59 into c574bddb36 asctime 2023-08-02 01:46:41 -04:00
  • 8ee4cef747
    Merge branch 'ggerganov:master' into master Richard Roberson 2023-08-01 23:27:04 -06:00
  • 24dcf26b83 remove white spaces ymcui 2023-08-02 12:39:10 +08:00
  • 73b6402cff Merge remote-tracking branch 'upstream/master' Upstream merge staviq 2023-08-02 04:56:19 +02:00
  • 847f0af99c support for templates in browser LocalStorage staviq 2023-08-02 05:48:20 +02:00
  • 712c2e90b1 Use conv.set_system_message from upstream Elsa 2023-08-02 09:33:23 +08:00
  • 59484c6121 Merge remote-tracking branch 'origin/master' Elsa 2023-08-02 09:25:36 +08:00
  • 98369f62c5 Add comment for llama_log_callback and replace remaining printf calls grahameth 2023-08-02 01:16:12 +02:00
  • c857a33b19 Add back all the new lines in the logging strings grahameth 2023-08-02 00:50:31 +02:00
  • e39e45493c Merge branch 'master' into logging_callback grahameth 2023-08-02 00:33:57 +02:00
  • e23ba19da1 use initializer list for ggml_init_params netrunnereve 2023-08-01 17:53:14 -04:00
  • 1b4f9c8eb9
    convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens klosax 2023-08-01 23:40:50 +02:00
  • 49380a23a3
    gguf.py : accumulate kv and tensor info data + special tokens klosax 2023-08-01 23:37:48 +02:00
  • ff1cb02397
    constants.py : special tokens klosax 2023-08-01 23:17:21 +02:00
  • 0081032087
    Update Makefile Alex 2023-08-01 14:13:17 -07:00
  • c638955bfa Fix tests 0cc4m 2023-08-01 21:52:57 +02:00
  • 75788fe9b0 Add submission batching to mul_f32 0cc4m 2023-08-01 21:28:40 +02:00
  • f33b3dc306
    Merge branch 'ggerganov:master' into master Eve 2023-08-01 14:36:46 -04:00
  • c574bddb36
    fix a typo in examples/server/README.md (#2478) Bono Lv 2023-08-01 20:54:28 +08:00
  • 36a36c32a3
    Update gptneox-main.cpp klosax 2023-08-01 14:44:28 +02:00
  • c77fabb1f9
    gptneox-main.cpp : special tokens klosax 2023-08-01 14:32:53 +02:00
  • e7a741695c
    convert-gptneox-h5-to-gguf.py : Special tokens klosax 2023-08-01 14:30:00 +02:00
  • 556134cfe1 fix a typo Bono Lv 2023-08-01 20:00:01 +08:00
  • 5b61ec41e0
    Replaced the usage of ReadConsoleInputW and fgetwc with standard C++ input functions to make getchar32() work consistently in all environments, including cases when stdin is redirected. Kerim Büyükakyüz 2023-08-01 14:05:46 +03:00
  • c58ffc92e5 fixed compile error Concedo 2023-08-01 18:28:49 +08:00
  • 84b28c4282 Merge branch 'master' into concedo_experimental Concedo 2023-08-01 18:13:27 +08:00
  • 46682e5cb3 added mmq launch flag Concedo 2023-08-01 17:57:13 +08:00
  • 86aeb27734
    server : Support dark mode (#2414) master-86aeb27 ebraminio 2023-08-01 01:56:23 -07:00
  • 78edf98735 Setting correct format string for long unsigned. Jiri Podivin 2023-08-01 10:52:25 +02:00
  • 1873ff586b
    metal : add gqa8 kernel to allow llama-2-70B on metal (#2459) Matteo Boschini 2023-08-01 09:43:12 +02:00
  • 92e60dba8b port c tests to c++ netrunnereve 2023-07-31 20:59:43 -04:00
  • 0131ac0484 add support for chinese llama-2 / alpaca-2 ymcui 2023-08-01 08:07:05 +08:00
  • da4900e835
    Update convert-llama-h5-to-gguf.py klosax 2023-07-31 23:04:03 +02:00
  • f3de876a12 fix : update convert-llama-h5-to-gguf.py M. Yusuf Sarıgöz 2023-07-31 23:58:29 +03:00
  • 995c2204e5 Added ne03==ne13 assertion Matteo Boschini 2023-07-31 21:25:09 +02:00
  • 49e7cb5bb1
    CUDA: fixed LLAMA_FAST compilation option (#2473) master-49e7cb5 Johannes Gäßler 2023-07-31 21:02:19 +02:00
  • 5be88d1a30 CUDA: fixed LLAMA_FAST compilation option JohannesGaessler 2023-07-31 20:17:23 +02:00
  • b772bba42e
    CUDA: fixed cmake F16 option (#2471) master-b772bba Johannes Gäßler 2023-07-31 19:52:22 +02:00
  • d91456aaf1
    fix half2 decomposition ardfork 2023-07-31 20:35:00 +03:00
  • c1cb70d64d
    new build arg LLAMA_CUDA_MMQ_Y Henri Vasserman 2023-07-31 19:56:44 +03:00
  • f1c03f4b16 more bug fixn Aniket 2023-07-31 13:20:32 -04:00
  • 971464b920 CUDA: fixed cmake F16 option JohannesGaessler 2023-07-31 18:40:11 +02:00
  • c1664a00ae
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-31 19:32:27 +03:00
  • e221843147 trying out mmq Concedo 2023-07-31 22:51:15 +08:00
  • bb42aefaeb gguf : mmap tensor data example M. Yusuf Sarıgöz 2023-07-31 17:46:12 +03:00
  • 3e370f83ef Warning: Very experimental merge, do not use until confirmed stable. Concedo 2023-07-31 22:33:43 +08:00
  • 0728c5a8b9
    CUDA: mmq CLI option, fixed mmq build issues (#2453) master-0728c5a Johannes Gäßler 2023-07-31 15:44:35 +02:00
  • aebccdbf00 fixing bug that didnt unroll the 1d karpathy arrays Aniket 2023-07-31 09:33:57 -04:00
  • b26f5b2e43 gguf : fix typo in function call M. Yusuf Sarıgöz 2023-07-31 16:23:54 +03:00
  • d28b07ca7c Extend kernel_mul_mat_f16_f32 to handle gqa broadcast Matteo Boschini 2023-07-31 14:41:23 +02:00
  • 1b09b9439f
    Merge 204f76d52e into 1215ed7d5c maddes8cht 2023-07-31 14:37:52 +02:00
  • 1215ed7d5c
    CUDA: Implemented row flattening for non-glm RoPE (#2468) master-1215ed7 Johannes Gäßler 2023-07-31 14:32:30 +02:00
  • 5b5f04be97 CUDA: mmq CLI option, fixed mmq build issues JohannesGaessler 2023-07-30 12:34:18 +02:00
  • fee39ecd48 Update ggml-metal.m Matteo Boschini 2023-07-31 07:52:46 +02:00
  • ae58ac7dd4 Added gqa8 kernel to allow llama-2-70B on metal Matteo Boschini 2023-07-31 00:02:04 +02:00
  • 204f76d52e Fix: possible out-of-bounds error, remove default_params Mathias Bachmann 2023-07-31 13:19:13 +02:00
  • 2dbf518911
    CUDA: fewer memory bank conflicts for mul_mat_q (#2458) master-2dbf518 Johannes Gäßler 2023-07-31 13:18:51 +02:00
  • 58ff5e17e1 CUDA: Implemented row flattening for non-glm RoPE JohannesGaessler 2023-07-31 12:21:51 +02:00
  • 4d92be8813 the cur parameter is missing gklab 2023-07-31 17:41:34 +08:00
  • 84ce184c4f layout Concedo 2023-07-31 17:33:31 +08:00
  • 9d2382b3e4
    Fix Metal backend broken from the allocator changes (#2455) master-9d2382b slaren 2023-07-31 11:02:53 +02:00
  • f27972777f
    correct semantic error in import_vars (#355) YellowRoseCx 2023-07-31 02:51:35 -05:00
  • 7aa0a0e7f7 gguf : support custom alignment value M. Yusuf Sarıgöz 2023-07-31 09:59:36 +03:00
  • eab8335e33 use memcpy in test-double-float.c netrunnereve 2023-07-30 23:12:25 -04:00
  • 5ad9c2f320
    Fix broken build for LLAMA_METAL 唐鳳 2023-07-31 09:35:55 +08:00