Commit graph

  • eb363627fd
    cuda : deduplicated dequantization code (#1453) Johannes Gäßler 2023-05-14 20:53:23 +02:00
  • 7cec8c8b15 remove trailing whitespace FSSRepo 2023-05-14 11:31:23 -06:00
  • bec44ebdbc removed some whitespaces FSSRepo 2023-05-14 11:02:24 -06:00
  • 9939b87cbb
    Fix Q8_0 Henri Vasserman 2023-05-14 19:12:09 +03:00
  • 4339f8cf28
    improve softmax backward pass xaedes 2023-05-14 17:55:02 +02:00
  • 79b2d5b69d
    ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 (#1454) master-79b2d5b xaedes 2023-05-14 17:55:02 +02:00
  • d03a85a747
    Update ggml.c Georgi Gerganov 2023-05-14 18:51:26 +03:00
  • 279c372ea0
    Merge branch 'master' into fix-race-condition-diag-mask Georgi Gerganov 2023-05-14 18:48:11 +03:00
  • b81a09a152
    remove trailing whitespace xaedes 2023-05-14 17:38:16 +02:00
  • 13c351ad72
    ggml : various fixes (#1450) master-13c351a Georgi Gerganov 2023-05-14 18:22:50 +03:00
  • 0bb1ff4402 changed json11 to nlohmann-json FSSRepo 2023-05-14 09:17:07 -06:00
  • ec1aea09ec
    implement ggml_soft_max_back for more performant backward pass of soft_max xaedes 2023-05-14 17:16:26 +02:00
  • dc48799b85 Merge remote-tracking branch 'upstream/master' FSSRepo 2023-05-14 09:12:48 -06:00
  • 1e6b5bf111
    fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 xaedes 2023-05-14 17:00:19 +02:00
  • c77966524a Refactor OpenCL code to work more like the CUDA code, add missing functions 0cc4m 2023-05-14 17:01:46 +02:00
  • 82bc517b9a Move back to C++ for OpenCL 0cc4m 2023-05-14 17:00:37 +02:00
  • f89c278d83
    fix race condition bug in ggml_compute_forward_diag_mask_f32 xaedes 2023-05-14 17:00:19 +02:00
  • 0e8ca777e5 Deduplicated dequantization code JohannesGaessler 2023-05-14 15:36:53 +02:00
  • 6e968d22b0
    add text generating baby-llama from scratch example xaedes 2023-05-14 16:07:08 +02:00
  • 9c7dea12ec
    ggml : various fixes Georgi Gerganov 2023-05-14 15:21:50 +03:00
  • c81dd58e76 Merge commit 'f954edda93' into archive_lib Concedo 2023-05-14 18:34:56 +08:00
  • 394dabbc1a
    remove qk as well Henri Vasserman 2023-05-14 13:22:39 +03:00
  • 9074e353dd
    minor nitpicks Henri Vasserman 2023-05-14 13:15:09 +03:00
  • 60f8c361ca
    ggml : add AVX support based on AVX2 code (#1430) katsu560 2023-05-14 19:03:51 +09:00
  • 0453ce3f8b
    Remove all constants Henri Vasserman 2023-05-14 12:47:41 +03:00
  • b692e4d2a4 wip Concedo 2023-05-14 17:21:07 +08:00
  • 601a033475
    ggml : add GGML_QNT_VERSION to track quantization format changes master-601a033 Georgi Gerganov 2023-05-14 10:20:19 +03:00
  • e01e373e63 Merge branch 'master' into concedo_experimental Concedo 2023-05-14 11:34:41 +08:00
  • 9cd5b9a769 up ver Concedo 2023-05-14 11:10:26 +08:00
  • 8a5fe628df recognize q8_0 as an older format as the new clblast doesnt work correctly with it Concedo 2023-05-14 11:06:23 +08:00
  • 45ef62680e Improve support and API FSSRepo 2023-05-13 20:42:20 -06:00
  • 262a757989 ggml : delete SIMD optimizations for the quantization of the Q5 format katsu560 2023-05-14 07:02:59 +09:00
  • 81b65da7aa ggml : merge AVX2/AVX code in ggml_vec_dot_q4_1_q8_1, ggml_vec_dot_q8_0_q8_0 katsu560 2023-05-14 06:23:56 +09:00
  • 61a3046630 ggml : add AVX support to quantize_row_q5_0, quantize_row_q5_1 katsu560 2023-05-14 04:59:01 +09:00
  • b8fb5cdf5c
    rewrite platform and device selection Henri Vasserman 2023-05-13 22:04:46 +03:00
  • 4561838719 fix get_num_physical_cores() had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id" zrm 2023-05-13 15:04:55 -04:00
  • bb5c3e2c70
    remove constants Henri Vasserman 2023-05-13 22:04:17 +03:00
  • 6e88dc93bd
    update python bindings xaedes 2023-05-13 19:05:24 +02:00
  • 49d6334dc1 try fix kernel Concedo 2023-05-14 00:41:26 +08:00
  • e05455f852 fixed wrong sized struct from legacy q8_1, fixed opencl varsize arrays Concedo 2023-05-13 23:56:08 +08:00
  • 08737ef720 cuda : fix convert function (#1412) Georgi Gerganov 2023-05-13 17:40:58 +03:00
  • ed6b64fb98
    add python bindings for functions to get and set the whole llama state (rng, logits, embedding and kv_cache) xaedes 2023-04-14 03:16:50 +02:00
  • 5f6b715071
    fix decoding error. adds errors=ignore parameter xaedes 2023-04-14 14:40:06 +02:00
  • bc9e84daca
    add python wrapper xaedes 2023-04-21 16:31:12 +02:00
  • bda4d7c215 make : fix PERF build with cuBLAS master-bda4d7c Georgi Gerganov 2023-05-13 17:25:09 +03:00
  • 5a5aeb1e91
    llama : fix unused warning master-5a5aeb1 Georgi Gerganov 2023-05-13 16:55:14 +03:00
  • 66841fdb0e
    ggml : multi-thread mul and diag_mask ops (#1428) master-66841fd Georgi Gerganov 2023-05-13 16:48:03 +03:00
  • b849461e62
    ggml : fix clang-tidy warning Georgi Gerganov 2023-05-13 16:47:39 +03:00
  • 6d7c47b8de
    ggml : multi-thread mul and diag_mask ops Georgi Gerganov 2023-05-13 11:29:32 +03:00
  • 905d87b70a
    ggml : GPU-accelerated token generation (#1412) master-905d87b Johannes Gäßler 2023-05-13 15:38:36 +02:00
  • ad8a9e6971 llama : offload "output" tensor to GPU too + coding style fixes Georgi Gerganov 2023-05-13 16:35:21 +03:00
  • 7b6f3f3970 ggml : add AVX support based on AVX2 code katsu560 2023-05-13 22:26:58 +09:00
  • f954edda93
    ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360) master-f954edd xaedes 2023-05-13 14:56:40 +02:00
  • dae6ba2abe
    baby-llama : couple of clang-tidy warnings Georgi Gerganov 2023-05-13 15:38:50 +03:00
  • ef3d42a3aa
    ggml : fix clang-tidy warnings Georgi Gerganov 2023-05-13 15:34:56 +03:00
  • 95a487a17e
    ggml : remove Q4_2 remnants Georgi Gerganov 2023-05-13 15:22:24 +03:00
  • 092913ecea
    Merge remote-tracking branch 'origin/master' into HEAD Georgi Gerganov 2023-05-13 15:20:22 +03:00
  • 2956630a3d
    Merge 'origin/master' into hipblas Henri Vasserman 2023-05-13 13:12:52 +03:00
  • f048af0230
    ggml : sync alibi fix from ggml repo master-f048af0 Georgi Gerganov 2023-05-13 11:54:33 +03:00
  • ac0cd259d5
    Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413) master-ac0cd25 3ooabkhxtn 2023-05-13 10:43:33 +02:00
  • 0cd22e190a
    llama : fix various warnings master-0cd22e1 Georgi Gerganov 2023-05-13 11:23:15 +03:00
  • c9eb2ba1c5 Merge branch 'master' into concedo_experimental Concedo 2023-05-13 15:51:05 +08:00
  • b6594ab91e do not show tokenizer warning Concedo 2023-05-13 15:48:17 +08:00
  • 6456a4eb9f
    embedding : remove unused code (#1426) master-6456a4e Rinne 2023-05-13 15:24:20 +08:00
  • 0fa4624cb7
    Remove extra code of embedding example. Yaohui Liu 2023-05-13 15:19:45 +08:00
  • 33034cfede
    ggml : fix null ptr deref in backward pass Georgi Gerganov 2023-05-13 10:08:01 +03:00
  • bb0993ed48 dequantize_mul_mat_vec kernels for q5_1, q8_0, f16 JohannesGaessler 2023-05-13 08:10:38 +02:00
  • f977243ded
    minor : fix compiler warnings + indentation style Georgi Gerganov 2023-05-13 09:55:17 +03:00
  • cdd5350892
    readme : update Q4_0 perplexities Georgi Gerganov 2023-05-13 09:12:44 +03:00
  • 738ace394a
    llama : free ggml context in set / copy state data (close #1425) master-738ace3 Georgi Gerganov 2023-05-13 09:08:52 +03:00
  • 699b1ad7fe
    opencl : fix kernels for the new formats (#1422) master-699b1ad Henri Vasserman 2023-05-13 09:01:15 +03:00
  • 5a0ecf768d More readable dequantize_mul_mat_vec logic JohannesGaessler 2023-05-13 07:14:27 +02:00
  • 9da44fdcb3 q5_0 dequantize_mul_mat kernel JohannesGaessler 2023-05-12 23:57:10 +02:00
  • 0986c2f44e Shorter dequantize_mul_mat_vec line JohannesGaessler 2023-05-12 23:30:17 +02:00
  • cee8042793 integrated new version of clblast kernels as a separate file Concedo 2023-05-13 12:53:28 +08:00
  • 017023e477 updated kobold lite Concedo 2023-05-13 12:12:20 +08:00
  • 53e7256a25 should be good to merge, only thing missing is clblast new quants Concedo 2023-05-13 12:07:29 +08:00
  • 098277cf5e ADD Chatbot UI example Brendan Hubble 2023-05-13 11:16:29 +10:00
  • 05cf5f7d6e partially working, but the blas matmul is broken Concedo 2023-05-13 11:35:38 +08:00
  • cc798cc08c
    Fix Q5_0 alignment issues. Henri Vasserman 2023-05-13 03:37:28 +03:00
  • 3243b9943a
    Fix OpenCL kernels for the new formats Henri Vasserman 2023-05-13 01:02:36 +03:00
  • 1a8f93442f
    Add files via upload morpheus2448 2023-05-12 22:28:31 +01:00
  • 25b448a32f - rearranged defines, SSSE3 function only compiled if used 3ooabkhxtn 2023-05-12 20:48:41 +00:00
  • f0af475739 --gpu_layers -> --gpu-layers JohannesGaessler 2023-05-12 21:43:47 +02:00
  • 7dc2f57e5e Added missing __syncthreads(); JohannesGaessler 2023-05-12 21:37:34 +02:00
  • 12fc292ee6 Added q4_1 via template JohannesGaessler 2023-05-12 12:42:09 +02:00
  • 637be12f16 CUDA kernel for q4_0 dequant. + mat. vec. mult. JohannesGaessler 2023-05-08 22:21:03 +02:00
  • fb62f92433
    llama : fix --mtest option (close #1414) master-fb62f92 Georgi Gerganov 2023-05-12 21:44:20 +03:00
  • b335f73a60 BACKWARDS COMPAT QUANT SHIM is ready, but upstream model converter is BORKED. BORK BORK. Concedo 2023-05-13 01:30:11 +08:00
  • 08810d5fee interim merge. do not use Concedo 2023-05-13 00:33:55 +08:00
  • e9caff1cda Interim merge. Do not use. Concedo 2023-05-12 23:20:27 +08:00
  • 773ee249fb
    CLI args use - instead of _, backwards compatible (#1416) master-773ee24 Johannes Gäßler 2023-05-12 16:34:55 +02:00
  • 0fe6384755
    fix makefile Henri Vasserman 2023-05-12 17:22:11 +03:00
  • a3e6d62283 cuda : alternative q4_q8 kernel dequantize-matmul-3-gg Georgi Gerganov 2023-05-12 15:54:07 +03:00
  • fc26f54e74 - Put the whole line into defined() - Use __SSSE3__ instead of __SSE__ 3ooabkhxtn 2023-05-12 13:59:20 +00:00
  • e3c7dcf5c1 CLI args use - instead of _, backwards compatible JohannesGaessler 2023-05-12 15:23:49 +02:00
  • 553fd4d4b5
    Add clang-tidy reviews to CI (#1407) master-553fd4d slaren 2023-05-12 15:40:53 +02:00
  • 70c2b6c696 Put __SSE3__ into defined() 3ooabkhxtn 2023-05-12 13:32:00 +00:00
  • 605560d9ec
    Merge 'origin/master' into hipblas Henri Vasserman 2023-05-12 16:12:53 +03:00
  • ca54314a2f - Improved prefetching 3ooabkhxtn 2023-05-12 10:17:13 +00:00