Commit graph

  • 8699fd0d43 - Cleanup 3ooabkhxtn 2023-05-12 09:25:00 +00:00
  • 7379dd2dba - Added prefetch 3ooabkhxtn 2023-05-12 09:20:48 +00:00
  • 78bbb3cdfe - Use 4 accumulations instead of 2 - Removed first accumulation 3ooabkhxtn 2023-05-12 09:15:46 +00:00
  • 607b9c7373 - Split multiplication and addition to make it easier for the compiler to optimise - Accumulate two acc instead of one 3ooabkhxtn 2023-05-12 08:04:54 +00:00
  • 524d6c9447 - added sse instructions for ggml_vec_dot_q4_0_q8_0 3ooabkhxtn 2023-05-12 07:54:33 +00:00
  • e7b9d97bae More int mult, less float mult, worse performance JohannesGaessler 2023-05-12 09:11:47 +02:00
  • 089b1c93ba
    readme : add C#/.NET bindings repo (#1409) Rinne 2023-05-12 13:39:40 +08:00
  • e052d53e51 Update gpt_params_parse and fix a merge error take 2 Jason McCartney 2023-05-11 21:17:04 -07:00
  • 121c986d02 Revert "Update gpt_params_parse and fix a merge error" Jason McCartney 2023-05-11 21:11:55 -07:00
  • 2bb2ff1748 Update gpt_params_parse and fix a merge error Jason McCartney 2023-05-11 21:00:11 -07:00
  • ddc64202f6
    Add the dotnet binding info. Yaohui Liu 2023-05-12 11:44:30 +08:00
  • d882d1c2fe Performance no longer terrible JohannesGaessler 2023-05-11 23:27:06 +02:00
  • b9fd7eee57
    ggml : remove bit shuffling (#1405) master-b9fd7ee Georgi Gerganov 2023-05-12 00:23:08 +03:00
  • cbb6a3a7e8
    llama : fix return for unknown version Georgi Gerganov 2023-05-12 00:08:36 +03:00
  • b58b1f4bf6
    readme : add note that Q4 and Q5 have been changed Georgi Gerganov 2023-05-12 00:00:40 +03:00
  • 4b12881329 WAKE ME UP JohannesGaessler 2023-05-11 22:47:38 +02:00
  • ca7f069f39
    ggml : back to original bit order Georgi Gerganov 2023-05-11 23:33:07 +03:00
  • f92faf50ce Add clang-tidy reviews to CI slaren 2023-05-10 23:27:59 +02:00
  • 832c53f427
    ggml : fix WASM comments Georgi Gerganov 2023-05-11 21:59:25 +03:00
  • 1c87847b6b
    llama : update v2 PR number to 1405 Georgi Gerganov 2023-05-11 21:48:56 +03:00
  • 927afddf95
    Merge branch 'master' into add_stop_token Jason McCartney 2023-05-11 11:40:17 -07:00
  • 51c25fd995
    readme : update timings + remove warning banner Georgi Gerganov 2023-05-11 21:38:47 +03:00
  • e038e01e28
    sha : update hashes for 7B and 13B Georgi Gerganov 2023-05-11 21:33:29 +03:00
  • 5bc286ab18
    ggml : fix AVX2 implementation Georgi Gerganov 2023-05-11 21:22:27 +03:00
  • bd5e373058
    Revert "AVX implementations (#1370)" Georgi Gerganov 2023-05-11 20:57:28 +03:00
  • 6680244838
    ggml : fix Q8_0 and Q8_1 rounding Georgi Gerganov 2023-05-11 20:47:41 +03:00
  • 582a39fff5
    ggml : simplify Q8_1 - no need for low / high sums anymore Georgi Gerganov 2023-05-11 20:11:37 +03:00
  • 695f3963b1
    ggml : preserve old Q4 and Q5 formats Georgi Gerganov 2023-05-11 19:46:11 +03:00
  • b7ad385d42
    ggml : speed-up Q5_0 + Q5_1 at 4 threads Georgi Gerganov 2023-05-10 22:58:45 +03:00
  • 09032e0290
    llama : fix model magic/version write Georgi Gerganov 2023-05-09 18:25:28 +03:00
  • d52172a509
    llama : produce error upon loading old model files Georgi Gerganov 2023-05-09 18:19:13 +03:00
  • 489bd13fad
    ggml : uniform 5th bit extraction Georgi Gerganov 2023-05-08 22:18:15 +03:00
  • 9e49d20150
    AVX implementations (#1370) Stephan Walter 2023-05-08 19:14:06 +00:00
  • 928d2f335f
    scripts : add script for measuring the time per token Georgi Gerganov 2023-05-08 22:06:54 +03:00
  • 83674556b8
    ggml : fix Q5_0 quantization Georgi Gerganov 2023-05-07 20:26:02 +03:00
  • b08c39b16c
    ggml : minor formatting Georgi Gerganov 2023-05-07 20:00:01 +03:00
  • 4bf1c8a43e
    ggml : remove Q4_2 mode Georgi Gerganov 2023-05-07 18:26:59 +03:00
  • cdc9607329
    ggml : update cuBLAS + normalize variable names Georgi Gerganov 2023-05-07 18:23:59 +03:00
  • 9472d0ea8b
    ggml : fix Q4_1 quantization Georgi Gerganov 2023-05-07 18:07:11 +03:00
  • 0add6402bd
    ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit Georgi Gerganov 2023-05-05 17:23:41 +03:00
  • caaacd5765
    ggml : simplify scalar dot Georgi Gerganov 2023-05-05 17:12:58 +03:00
  • 292a778ca2
    ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) Georgi Gerganov 2023-05-05 17:09:11 +03:00
  • b37a08f646
    ggml : 2x faster scalar implementations Georgi Gerganov 2023-05-04 23:31:35 +03:00
  • aa78dfed7d
    ggml : remove Q5_0 bit shuffling (ARM NEON) Georgi Gerganov 2023-05-04 22:55:10 +03:00
  • 9f3285f741
    ggml : remove Q4_2 bit shuffling (WIP, BROKEN) Georgi Gerganov 2023-05-04 22:07:40 +03:00
  • fd2a137fac
    ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) Georgi Gerganov 2023-05-04 21:51:42 +03:00
  • 844d2af89d
    ggml : remove Q4_1 bit shuffling (ARM NEON + reference) Georgi Gerganov 2023-05-04 20:53:14 +03:00
  • 5fa47bf6c7
    ggml : remove Q4_0 bit shufling (ARM NEON) Georgi Gerganov 2023-05-03 23:13:37 +03:00
  • c46320ddf7 Eliminate shared memory, faster summation JohannesGaessler 2023-05-11 20:20:41 +02:00
  • f7229f23d7 Making requested review changes Jason McCartney 2023-05-11 11:14:08 -07:00
  • b9ef08ccab
    remove trailing whitespace xaedes 2023-05-11 20:03:18 +02:00
  • 581e5eb954
    cleanup code for batched training xaedes 2023-05-11 19:49:41 +02:00
  • 3e3ed9560c
    add parallel batched forward function for baby-llama training xaedes 2023-05-11 19:31:46 +02:00
  • 099a07fb87 Merge branch 'master' into add_stop_token Jason McCartney 2023-05-11 10:26:00 -07:00
  • 127f68eb5a
    Merge 'origin/master' into hipblas Henri Vasserman 2023-05-11 20:21:27 +03:00
  • e116eb638c
    ggml : speed-up Q5_0 + Q5_1 at 4 threads remove-vzip Georgi Gerganov 2023-05-10 22:58:45 +03:00
  • b608b55a3e
    prompts : model agnostic DAN (#1304) CRD716 2023-05-11 10:10:19 -05:00
  • 298d6a38fd
    Update README.md svupper 2023-05-11 11:03:51 +02:00
  • 8a9d7ce624 fixup! Store layers in VRAM JohannesGaessler 2023-05-11 07:05:52 +02:00
  • 2db16be308
    Merge branch 'master' into stop-keywords Evan Jones 2023-05-10 22:52:34 -04:00
  • 2a8935297b Remove commented-out regex code Branden Butler 2023-05-10 17:49:52 -05:00
  • 827ac3a457 Add check for whether regex should be used Branden Butler 2023-05-10 17:48:37 -05:00
  • 58d848dadd Remove unneeded llama_get_num_logits() function Branden Butler 2023-05-10 17:47:22 -05:00
  • e0acd1a7bf Fix partial_completion not being reset Branden Butler 2023-05-10 14:54:32 -05:00
  • 30754bbaf9 Add allowed response regex, response bias regex, and response bias value to main example Branden Butler 2023-05-10 14:39:40 -05:00
  • 779d7969c0 Add llama_get_num_logits() function to Llama.cpp API Branden Butler 2023-05-10 14:32:05 -05:00
  • cf348a60e0
    main : add option to save full output to session (#1338) master-cf348a6 Evan Jones 2023-05-10 11:37:14 -04:00
  • ac5584b3a8
    Fix whitespace Evan Jones 2023-05-10 11:32:44 -04:00
  • 95218177b0
    Merge branch 'ggerganov:master' into master CRD716 2023-05-10 09:19:12 -05:00
  • 19dbb3b2a5 Merge branch 'master' into concedo_experimental Concedo 2023-05-10 18:35:53 +08:00
  • dbd6c204b4 AVX2 implementation of Q8 quantization Stephan Walter 2023-05-10 11:28:36 +02:00
  • 7642b434cd fix for zig 0.11 version Sigis Dagilis 2023-05-10 08:59:16 +02:00
  • 2041e1e0b5 simplify code Evan Jones 2023-05-09 23:16:40 -04:00
  • 9d641b0d36 Fix the build error on Macbook FSSRepo 2023-05-09 21:03:14 -06:00
  • b4d04d1613 fix endline Claude Doppler 2023-04-08 10:05:34 +00:00
  • 72f102a4ae feat: add "stop" keywords as alternative to eot token Claude Doppler 2023-04-04 20:33:09 +00:00
  • 8826fb8e2f move the check for incompatible parameters to gpt_params_parse Evan Jones 2023-05-09 22:28:40 -04:00
  • 8c88b172c5 PR comments Evan Jones 2023-05-08 23:10:51 -04:00
  • e4429e912b restore original implementation with new names Evan Jones 2023-05-07 22:46:06 -04:00
  • 56758f033c split behavior into --session and --prompt-cache Evan Jones 2023-05-06 20:27:15 -04:00
  • 4c76d52bb8 main : add option to save full output to session Evan Jones 2023-05-05 23:01:12 -04:00
  • 7beb59a80b Remove Q4/Q5 bit shuffling without breaking compatibility Stephan Walter 2023-05-09 19:47:29 +02:00
  • e6a46b0ed1
    Locale fix for Windows (#1379) master-e6a46b0 DannyDaemonic 2023-05-09 10:53:28 -07:00
  • a62e2b8913 Locale fix for Windows Danny Daemonic 2023-05-09 09:54:17 -07:00
  • ffd76e18d6
    llama : fix model magic/version write Georgi Gerganov 2023-05-09 18:25:28 +03:00
  • 4201fa5cb8
    llama : produce error upon loading old model files Georgi Gerganov 2023-05-09 18:19:13 +03:00
  • 9266be259b ggml : add AVX support based on AVX2 code katsu560 2023-05-09 21:41:30 +09:00
  • 9f8dbc4787
    use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314) master-9f8dbc4 Sami Farin 2023-05-09 15:29:20 +03:00
  • e47f7ade05 updated kobold lite, patch oom errors Concedo 2023-05-09 19:16:45 +08:00
  • 7cea31c568 fix: typo Carsten Seeger 2023-05-09 11:58:00 +02:00
  • 6d87f67572 up ver Concedo 2023-05-09 17:25:46 +08:00
  • 3ed4588e22 Store layers in VRAM JohannesGaessler 2023-05-09 11:05:58 +02:00
  • 54194911ac Merge branch 'master' into concedo_experimental Concedo 2023-05-09 16:50:43 +08:00
  • e4c6a1e3ed update readme Concedo 2023-05-09 16:17:52 +08:00
  • 28086b52de fix: missing CLBLAS documentation Carsten Seeger 2023-05-09 10:14:22 +02:00
  • d052a0ed4c Faster than CPU without 80% runtime memcpy JohannesGaessler 2023-05-08 23:50:00 +02:00
  • 229aa1f504 Works but slower than CPU JohannesGaessler 2023-05-08 22:21:03 +02:00
  • 41654efea8
    Interface improvements and --multiline-input (previously --author-mode) (#1040) master-41654ef DannyDaemonic 2023-05-08 19:45:48 -07:00
  • f7caabf2d4
    Merge 4aa91a230a into 56551bc11f DannyDaemonic 2023-05-08 20:26:22 -04:00
  • 5423eb6642 use _mm_pause() in busyloop on x86_64 to reduce power consumption Sami Farin 2023-05-08 23:02:21 +03:00