Commit graph

  • e95b6554b4
    ggml : add Q8_0 quantization for intermediate results (#951) master-e95b655 Georgi Gerganov 2023-04-15 17:53:22 +03:00
  • 02258616ef minor jon-chuang 2023-04-15 22:27:23 +08:00
  • a38b9d7fab minor jon-chuang 2023-04-15 21:58:10 +08:00
  • 6bf6543a6a format jon-chuang 2023-04-15 21:57:39 +08:00
  • 00e86b97cc commit jon-chuang 2023-04-15 21:54:37 +08:00
  • 60f27ed887
    ggml : fix q4_1 dot func Georgi Gerganov 2023-04-15 15:06:38 +03:00
  • 69511b2c4a Merge branch 'master' into jon/tall-and-skinny-matmul jon-chuang 2023-04-15 19:57:48 +08:00
  • 73e7601bf3 stash jon-chuang 2023-04-15 19:57:01 +08:00
  • 01de5c5420
    quantize-stats : delete obsolete strings Georgi Gerganov 2023-04-14 21:57:05 +03:00
  • 3a111abd63
    minor : updates after rebase to latest master Georgi Gerganov 2023-04-14 21:38:24 +03:00
  • 312a927f0b
    ggml : fix quantize_row_q8_0() ARM_NEON rounding Georgi Gerganov 2023-04-14 21:27:55 +03:00
  • 2c4f9b658d
    Q8: use int8_t, AVX/AVX2 optimizations Stephan Walter 2023-04-13 18:55:55 +02:00
  • 19e7a6575d
    quantize-stats : fix test + add it to Makefile default Georgi Gerganov 2023-04-13 23:33:35 +03:00
  • 3b894ec657
    ggml : add Q8_0 quantization for intermediate results Georgi Gerganov 2023-04-13 23:03:27 +03:00
  • aa485cee33
    ggml : use posix_memalign on non-Windows env master-aa485ce Georgi Gerganov 2023-04-15 14:25:45 +03:00
  • 8fbfc80e03 Fix clblast device selection on Linux 0cc4m 2023-04-15 12:02:36 +02:00
  • c12b14b77f
    benchmark : fix result validation in benchmark-q4_0-matmult (#987) master-c12b14b Ivan Komarov 2023-04-15 07:51:54 +02:00
  • 106faaf297
    cmake : add finding the OpenBLAS header file (#992) master-106faaf katsu560 2023-04-15 14:51:11 +09:00
  • d00b865eb1 Merge branch 'master' into concedo Concedo 2023-04-15 11:33:43 +08:00
  • 9c6118c3fc convert.py: Fix loading safetensors and ggml format on Windows comex 2023-04-14 18:40:57 -07:00
  • 185dc24a19 add finding the OpenBLAS header file katsu560 2023-04-15 10:59:20 +09:00
  • c90261449f fix conflict wbpxre150 2023-04-15 09:19:46 +08:00
  • d071650c2c
    Merge branch 'master' into test wbpxre150 2023-04-15 08:50:11 +08:00
  • b148cd1eba Fix potential int8 overflow in non-SIMD vec_dot Stephan Walter 2023-04-14 22:31:38 +02:00
  • 0ef5704ea7 Fix result validation in benchmark-q4_0-matmult Ivan Komarov 2023-04-14 22:19:15 +02:00
  • c85e03d12e
    Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982) master-c85e03d Pavol Rusnak 2023-04-14 21:58:43 +02:00
  • 489093548c
    py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976) Pavol Rusnak 2023-04-14 21:46:49 +02:00
  • 93265e988a
    make : fix dependencies, use auto variables (#983) master-93265e9 Stephan Walter 2023-04-14 19:39:48 +00:00
  • dcf397c313
    Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" Pavol Rusnak 2023-04-14 21:23:03 +02:00
  • 814d411ea1 Makefile: fix dependencies, use auto variables Stephan Walter 2023-04-14 21:20:04 +02:00
  • 4d0f761a4c
    nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py Pavol Rusnak 2023-04-14 21:17:35 +02:00
  • 218201acfd
    py : bump sentencepiece to 0.1.98 to support Python 3.11 Pavol Rusnak 2023-04-14 20:42:47 +02:00
  • 59fb9e9eb8
    ggml : fix quantize_row_q8_0() ARM_NEON rounding Georgi Gerganov 2023-04-14 21:27:55 +03:00
  • c56b715269
    Expose type name from ggml (#970) master-c56b715 Pavol Rusnak 2023-04-14 20:05:37 +02:00
  • 327940beae add command line mode wbpxre150 2023-04-15 02:05:04 +08:00
  • 801aab14aa Q8: use int8_t, AVX/AVX2 optimizations Stephan Walter 2023-04-13 18:55:55 +02:00
  • 949dec0e42
    Merge branch 'master' into wbpxre150 wbpxre150 2023-04-15 01:20:25 +08:00
  • ea5d01002f Merge branch 'concedo' of https://github.com/LostRuins/llamacpp-for-kobold into concedo Concedo 2023-04-15 01:14:10 +08:00
  • 8dc06c7ab3 Fixed compile error in OSX Concedo 2023-04-15 01:13:56 +08:00
  • 624dc8809e
    Added openblas and clblas package names for debian (#63) AlpinDale 2023-04-14 21:38:56 +04:30
  • 58b91f1011
    Expose type name from ggml Håkon H. Hitland 2023-04-14 00:33:30 +02:00
  • c3b810868d fixed an offset bug? Concedo 2023-04-15 00:30:00 +08:00
  • f4d277ae17
    main : alternative instruct mode (Vicuna support, etc.) (#863) master-f4d277a Tomáš Pazdiora 2023-04-14 17:19:17 +02:00
  • 1b1c0730f5 Idk why people keep thinking its an error lol. Concedo 2023-04-14 22:58:45 +08:00
  • 1003c971ad update embedded kobold lite Concedo 2023-04-14 22:54:16 +08:00
  • c9a59b70a5
    ggml : add unary and binary map operations (#874) master-c9a59b7 Kerfuffle 2023-04-14 08:43:55 -06:00
  • 932d981222 more make targets Concedo 2023-04-14 21:54:18 +08:00
  • a819f22cac Merge branch 'master' into concedo Concedo 2023-04-14 21:40:33 +08:00
  • a32f7acc9f
    py : cleanup dependencies (#962) Pavol Rusnak 2023-04-14 15:37:11 +02:00
  • 8ad42a1102 read from inputs Concedo 2023-04-14 21:30:26 +08:00
  • adb4df78d6 Added SmartContext mode, a way of prompt context manipulation that avoids frequent context recalculation. Concedo 2023-04-14 21:24:16 +08:00
  • 995fe0303e
    py : cleanup dependencies Pavol Rusnak 2023-04-14 10:35:00 +02:00
  • 43ffdefb74
    py : fix flake8 and isort nitpicks (#960) Pavol Rusnak 2023-04-14 14:23:21 +02:00
  • 3bbcbe441f Merge remote-tracking branch 'origin/master' into cli-ui-update Tomáš Pazdiora 2023-04-14 14:00:12 +02:00
  • 1623a6e9b4
    ggml : minor master-1623a6e Georgi Gerganov 2023-04-14 13:31:29 +03:00
  • c14e0d2f23
    ggml : always allocate buffers with size multiple of GGML_MEM_ALIGN Georgi Gerganov 2023-04-14 13:31:15 +03:00
  • 7d03e6e417 Fix position of map ops cases in ggml_compute_forward KerfuffleV2 2023-04-14 04:01:50 -06:00
  • 7d695973a5 Various cleanups. KerfuffleV2 2023-04-14 03:55:46 -06:00
  • 1c73d4eec7 GGML map ops proof of concept. KerfuffleV2 2023-04-10 07:10:01 -06:00
  • dea05626de
    py : fix flake8 and isort nitpicks Pavol Rusnak 2023-04-14 10:28:04 +02:00
  • 723dac55fa
    py : new conversion script (#545) comex 2023-04-14 00:03:03 -07:00
  • 0f07cacb05
    ggml : fix q4_1 dot product types master-0f07cac Georgi Gerganov 2023-04-14 09:45:42 +03:00
  • c5d70f5c9e
    ggml : optimize rope function to avoid call powf in the tight loop (#807) master-c5d70f5 Howard Su 2023-04-14 14:24:52 +08:00
  • acb404ce4b
    Merge pull request #3 from wbpxre150/master wbpxre150 2023-04-14 12:52:21 +08:00
  • 030306d36a
    Merge pull request #2 from wbpxre150/wbpxre150 wbpxre150 2023-04-14 12:50:29 +08:00
  • 18cf46c781
    Merge pull request #1 from ggerganov/master wbpxre150 2023-04-14 12:48:36 +08:00
  • 10e73b08be fix conflict wbpxre150 2023-04-14 12:41:33 +08:00
  • e6468f95c1 whitespace wbpxre150 2023-04-14 12:38:56 +08:00
  • aa6bca453f Fix prints. wbpxre150 2023-04-14 12:36:20 +08:00
  • 241065eccd New conversion script (#545) comex 2023-04-13 21:06:30 -07:00
  • e524ce99fe add macos headers jon-chuang 2023-04-14 10:22:52 +08:00
  • be87b6ed20
    perplexity : add support for batch size to --perplexity (#407) master-be87b6e Gary Linscott 2023-04-13 14:50:42 -07:00
  • 3f93a00d9d
    quantize-stats : fix test + add it to Makefile default Georgi Gerganov 2023-04-13 23:33:35 +03:00
  • a520b33b3a
    ggml : add Q8_0 quantization for intermediate results Georgi Gerganov 2023-04-13 23:03:27 +03:00
  • 3fa8837068 improve jon-chuang 2023-04-14 04:00:41 +08:00
  • 02b0fe86f2 improve jon-chuang 2023-04-14 03:55:33 +08:00
  • b17d54eda3 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/use-hardware-cores jon-chuang 2023-04-14 03:07:49 +08:00
  • e0325353be apply code review jon-chuang 2023-04-14 03:07:45 +08:00
  • 315e69fd7f fix code indentation. wbpxre150 2023-04-14 01:58:38 +08:00
  • fa651909bb refector code into function. wbpxre150 2023-04-14 01:24:32 +08:00
  • 0e07e6a839
    common : remove unnecessary includes (#947) master-0e07e6a CRD716 2023-04-13 10:39:25 -05:00
  • a3a2a0eda8
    ggml : add GGML_DEFAULT_N_THREADS master-a3a2a0e Georgi Gerganov 2023-04-13 18:36:40 +03:00
  • d990e3fffc
    ggml : speed-up ggml_vec_dot_q4_1() ARM_NEON + 32-bit ARM support (#900) master-d990e3f Georgi Gerganov 2023-04-13 18:32:36 +03:00
  • 99e7c9b9e6
    ggml : try to use correct ifdef Georgi Gerganov 2023-04-13 18:31:15 +03:00
  • 63f7ecf47c
    ggml : fix comment Georgi Gerganov 2023-04-13 18:28:18 +03:00
  • 1b59a07380
    ggml : implement vzip when missing Georgi Gerganov 2023-04-13 18:26:44 +03:00
  • 23fd782d35 Update batch size for efficiency Gary Linscott 2023-04-13 08:20:54 -07:00
  • be21d538e6
    ggml : implement vminvq and vmaxvq when missing Georgi Gerganov 2023-04-13 18:19:20 +03:00
  • 14a0b207bc
    ggml : implement vaddvq when missing Georgi Gerganov 2023-04-13 18:16:35 +03:00
  • fbcecd59a9 Merge remote-tracking branch 'origin/master' into batch_perplexity Gary Linscott 2023-04-13 08:13:09 -07:00
  • decac7b124
    remove unnecessary includes CRD716 2023-04-13 10:08:29 -05:00
  • 2ae3164d29
    ggml : speed-up q4_1 ARM_NEON by ~5% Georgi Gerganov 2023-04-11 20:41:15 +03:00
  • 9190e8eac8
    llama : merge llama_internal.h into llama.h master-9190e8e Georgi Gerganov 2023-04-13 18:04:45 +03:00
  • c85980acd0
    gitignore : benchmark Georgi Gerganov 2023-04-13 18:01:22 +03:00
  • 6232f2d7fd
    ggml : optimize non-SIMD Q4_0 vector dot product (#703) master-6232f2d Stephan Walter 2023-04-13 14:59:50 +00:00
  • b7f38eec58 Optimize non-SIMD Q4_0 vector dot product Stephan Walter 2023-04-01 19:05:14 +02:00
  • 6c248707f5
    ggml : introduce GGML_ALIGNED_MALLOC/GGML_ALIGNED_FREE macros (#884) master-6c24870 Pavol Rusnak 2023-04-13 16:08:32 +02:00
  • bac39666cd
    Introduce GGML_ALIGNED_MALLOC/GGML_ALIGNED_FREE macros Pavol Rusnak 2023-04-10 22:53:16 +02:00
  • 8cda5c981d
    fix whitespace (#944) master-8cda5c9 CRD716 2023-04-13 09:03:57 -05:00
  • ec29272175
    readme : remove python 3.10 warning (#929) CRD716 2023-04-13 08:59:53 -05:00