Commit graph

  • a779a4bf9c server: tests - print only in case of DEBUG Pierrick HYMBERT 2024-02-24 11:13:43 +01:00
  • 60781f0a2b server: tests - add explanation about KV Cache. Pierrick HYMBERT 2024-02-24 11:13:31 +01:00
  • 482eb30f89 server: tests - README.md add build instruction and notice on @bug and @wrong_usage. Pierrick HYMBERT 2024-02-24 11:13:14 +01:00
  • 5957a2dbcb server: tests - allow print on debug Pierrick HYMBERT 2024-02-24 10:55:20 +01:00
  • decea31220
    llama : add comment about rope values Georgi Gerganov 2024-02-24 11:42:55 +02:00
  • 0db4cb0f5f
    code : cont Georgi Gerganov 2024-02-24 11:39:06 +02:00
  • 31e1ec928f
    llama : update llama_rope_type Georgi Gerganov 2024-02-24 11:38:00 +02:00
  • 42ddf4846c
    llama : revert enum name changes from this PR Georgi Gerganov 2024-02-24 11:23:37 +02:00
  • cb246633ed
    code : cont Georgi Gerganov 2024-02-24 11:25:49 +02:00
  • 0aead81f4a
    coda : normalize enum names Georgi Gerganov 2024-02-24 11:07:57 +02:00
  • 5f5b1b57ca
    llama : reuse n_rot from the build context Georgi Gerganov 2024-02-24 10:44:59 +02:00
  • 2b9a9bff2b
    minor : fix MPI builds Georgi Gerganov 2024-02-24 10:41:21 +02:00
  • 89b2a43cac
    llama : cont k-shift refactoring + normalize type names Georgi Gerganov 2024-02-24 10:28:44 +02:00
  • dd392191ca
    llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add Georgi Gerganov 2024-02-24 09:42:21 +02:00
  • 124ca773c6 server: tests: removing debug print Pierrick HYMBERT 2024-02-24 08:23:19 +01:00
  • 3c960953fd fix UNAME_M lindeer 2024-02-24 09:30:21 +08:00
  • 2a0d74d52e fix build ngxson 2024-02-24 00:57:23 +01:00
  • aeed190d9f wip: add support for functionary ngxson 2024-02-24 00:52:20 +01:00
  • 1e7325b613
    fix(ggml): typo Peron 2024-02-23 22:07:12 +00:00
  • 2d107babc4 server: tests: add a note regarding inference speed. Pierrick HYMBERT 2024-02-23 22:25:39 +01:00
  • c86d5f2b23 wip: model merge ngxson 2024-02-23 21:54:14 +01:00
  • fd43d66f46
    server : add KV cache quantization options (#5684) b2251 AlpinDale 2024-02-23 19:31:54 +00:00
  • 54fbcd2ce6
    convert : fix missing ftype for gemma (#5690) Jared Van Bortel 2024-02-23 13:39:14 -05:00
  • 1a999819a2
    llama : refactor k-shift implementation Georgi Gerganov 2024-02-23 20:29:40 +02:00
  • 71831494b1 server: tests: fix concurrent OAI streaming request Pierrick HYMBERT 2024-02-23 19:28:06 +01:00
  • fc252ea6ff convert : fix missing ftype for gemma Jared Van Bortel 2024-02-23 13:14:15 -05:00
  • 77b8589dbb server: tests: linter Pierrick HYMBERT 2024-02-23 18:57:38 +01:00
  • 6c0e6f4f9c server: tests: adding concurrent embedding in issue #5655 allow to enable VERBOSE mode Pierrick HYMBERT 2024-02-23 18:41:11 +01:00
  • c315736270 make: use arch variable for cublas lindeer 2024-02-24 01:22:56 +08:00
  • 30f802d0d7 server: tests: check if the server has not crashed after a scenario Pierrick HYMBERT 2024-02-23 18:28:05 +01:00
  • 608f449880
    swift : fix build gg/float-pos Georgi Gerganov 2024-02-23 19:02:09 +02:00
  • 2559f5ff0d
    build(nix): Introduce flake.formatter for nix fmt ditsuke 2024-02-23 22:13:48 +05:30
  • 2109743fe3 server: tests: print server logs only on github action Pierrick HYMBERT 2024-02-23 17:12:33 +01:00
  • 1c1fd40576 server: tests: allow to pass argument to the test file add wrong_usage.feature to demonstrate user issue which will not be fixed. Pierrick HYMBERT 2024-02-23 17:12:16 +01:00
  • 9e73cc17a7 server: add KV cache quantization AlpinDale 2024-02-23 16:02:45 +00:00
  • e1b8efb9ac Will this fix ROCm? Iwan Kawrakow 2024-02-23 16:24:52 +02:00
  • cbd950b220 iq3_s: make it work on metal for QK_K = 64 Iwan Kawrakow 2024-02-23 16:20:47 +02:00
  • fff1e8a54a
    batched.swift : fix build Georgi Gerganov 2024-02-23 16:15:37 +02:00
  • e6e61e3158 iq3_s: partial fix for QK_K = 64 Iwan Kawrakow 2024-02-23 16:04:28 +02:00
  • 4d27466ca5 server: tests: move all requests call to asyncio Pierrick HYMBERT 2024-02-23 14:44:12 +01:00
  • e10b83a217 server: test: ci rename job name to Server Pierrick HYMBERT 2024-02-23 13:54:19 +01:00
  • 777bdcf58f server: test: ci rename step name to Test, change matrix order for better clarity Pierrick HYMBERT 2024-02-23 13:44:37 +01:00
  • 2c8bf2407b server: test: ci give up with nvidia as it requires the nvidia docker runtime Pierrick HYMBERT 2024-02-23 13:32:39 +01:00
  • 8772658b11
    ggml : add I32 <-> F32 conversion Georgi Gerganov 2024-02-23 14:14:49 +02:00
  • c75e0e106b server: test: ci switch to nvidia based docker image for cuda Pierrick HYMBERT 2024-02-23 13:18:37 +01:00
  • 0d380aefc3 server: test: ci debug CI LD path Pierrick HYMBERT 2024-02-23 13:04:43 +01:00
  • 1d47de3258 ROCm again Iwan Kawrakow 2024-02-23 14:03:52 +02:00
  • 0d6d185e0f Attempt to fix ROCm Iwan Kawrakow 2024-02-23 13:52:33 +02:00
  • 83c386f237 server: test: ci debug LD path Pierrick HYMBERT 2024-02-23 12:51:49 +01:00
  • 6dc3af5432 server: test: fix CUDA LD PATH Pierrick HYMBERT 2024-02-23 12:42:51 +01:00
  • 5b2ce45d57 server: test: display server logs in case of failure Pierrick HYMBERT 2024-02-23 12:32:30 +01:00
  • 54ea4d4d8c server: test: ax512 experimental Pierrick HYMBERT 2024-02-23 12:28:59 +01:00
  • 5a621e714d server: test: ci make arch not available pass the test Pierrick HYMBERT 2024-02-23 12:04:01 +01:00
  • b94809b63e server: test: ci cmake remove all warning as it is done by the classical build and does not matter for testing Pierrick HYMBERT 2024-02-23 11:53:52 +01:00
  • 4d3791a4cb server: test: ci matrix, experimental on matrix avx512 entry which fail test Pierrick HYMBERT 2024-02-23 11:43:06 +01:00
  • 13863ef956 server: test: ci matrix Pierrick HYMBERT 2024-02-23 11:36:21 +01:00
  • d159e29d4b server: test: ci fix openblas build Pierrick HYMBERT 2024-02-23 11:34:22 +01:00
  • fc775366f1
    llama : switch to floating-point token positions Georgi Gerganov 2024-02-23 12:18:30 +02:00
  • 606738eeef server: test: ci fix clblast Pierrick HYMBERT 2024-02-23 11:32:25 +01:00
  • fa51baca9a server: test: ci fix matrix Pierrick HYMBERT 2024-02-23 11:30:24 +01:00
  • 2edd995f2a server: test: ci fix cublas build Pierrick HYMBERT 2024-02-23 11:27:19 +01:00
  • e4fb790077 server: test: ci fix cuda build Pierrick HYMBERT 2024-02-23 11:19:49 +01:00
  • 62ef858d00
    Update README.md pudepiedj 2024-02-23 10:01:24 +00:00
  • 9aa1457066
    Update README.md pudepiedj 2024-02-23 10:00:02 +00:00
  • fce2e00023 server: tests: ci : fix cuda install Pierrick HYMBERT 2024-02-23 10:58:59 +01:00
  • a9dd5f3769 Revised server hostname pudepiedj 2024-02-23 09:56:14 +00:00
  • 334902b13e server: tests: ci : fix step id duplicated Pierrick HYMBERT 2024-02-23 10:56:07 +01:00
  • 86896aadd0 server: tests: ci : continue on error Pierrick HYMBERT 2024-02-23 10:53:46 +01:00
  • 68cd1a4c16 server: tests: ci : matrix cuda Pierrick HYMBERT 2024-02-23 10:46:17 +01:00
  • 12bb797193 server: tests: ci : add git Pierrick HYMBERT 2024-02-23 10:41:41 +01:00
  • 29f8833058 server: tests: ci : fix wget missing Pierrick HYMBERT 2024-02-23 10:39:45 +01:00
  • 0b0f0565dd server: tests: ci : build and run tests for all matrix defines, sanitizer and type Pierrick HYMBERT 2024-02-23 10:33:21 +01:00
  • 36ddb962d8 server: tests: parallel fix server is started twice, add colors to help to monitor in the CI jobs Pierrick HYMBERT 2024-02-23 10:09:19 +01:00
  • 303f3f3258 Another attempt to fix the Windows builds Iwan Kawrakow 2024-02-23 10:38:02 +02:00
  • 436a146f98 Attempt to fix failing tests Iwan Kawrakow 2024-02-23 10:16:15 +02:00
  • 011ea9852a Minor updates pudepiedj 2024-02-23 07:48:00 +00:00
  • 593627a8b1 lookahead: set parameter W,N,G from environment variable pomoke 2024-02-23 13:38:33 +08:00
  • cd6a0f08be Move Q3_K_XS mix to 3.25 bpw Iwan Kawrakow 2024-02-23 07:41:43 +02:00
  • 47cf30b0ee iq3_s: make tests pass Iwan Kawrakow 2024-02-22 18:14:41 +02:00
  • 2730225c5f iq3_xs: rename to iq3_s Iwan Kawrakow 2024-02-22 18:01:34 +02:00
  • 272c7f7739 Q3_K_XS now uses a mix of IQ3_XS and IQ3_XXS Iwan Kawrakow 2024-02-22 17:45:25 +02:00
  • b25f99607d Fix stupid warning Iwan Kawrakow 2024-02-22 16:55:13 +02:00
  • 4d5feebeb6 iq3_xs: tiny Metal speed improvement Iwan Kawrakow 2024-02-22 16:51:00 +02:00
  • 87038fe198 iq3_xs: tiny Metal speed improvement Iwan Kawrakow 2024-02-22 16:30:09 +02:00
  • 1777825550 iq3_xs: make new version work on metal Iwan Kawrakow 2024-02-22 15:57:34 +02:00
  • 1328331db7 iq3_s: make ARM_NEON work with new version Iwan Kawrakow 2024-02-22 15:26:45 +02:00
  • 1fef4b8b68 iq3_xs: make scalar and AVX2 work for new version Iwan Kawrakow 2024-02-22 11:58:42 +02:00
  • eacff4aa81 iq3_xs: make CUDA work for new version Iwan Kawrakow 2024-02-22 11:09:10 +02:00
  • d83fddaa3b iiq3_xs: a 3.4375 bpw variant Iwan Kawrakow 2024-02-22 09:57:24 +02:00
  • 2ec600b7a4 Adding IQ3_M - IQ3_XS mix with mostly Q4_K Iwan Kawrakow 2024-02-21 16:24:47 +02:00
  • 38aa7b176f iq3_xs: working Metal implementation Iwan Kawrakow 2024-02-20 19:49:59 +02:00
  • 76214ab655 iq3_xs: ARM_NEON dot product - works but extremely slow (10 t/s) Iwan Kawrakow 2024-02-20 18:24:01 +02:00
  • f1255c50c0 iq3_xs: working scalar and AVX2 dot products Iwan Kawrakow 2024-02-20 17:11:31 +02:00
  • 5be4e7ac4a Minor improvement via 3 neighbours Iwan Kawrakow 2024-02-20 14:54:11 +02:00
  • 76aff093b4 Minor PPL improvement via a block scale fudge factor Iwan Kawrakow 2024-02-20 13:10:46 +02:00
  • 5691fecd06 Resurrecting iq3_xs Iwan Kawrakow 2024-02-20 12:26:17 +02:00
  • 10a47fa678 iq4_nl: squash commits for easier rebase Iwan Kawrakow 2024-02-19 10:45:31 +02:00
  • 3f96f1c079
    P-Step truncation sampling Philipp Emanuel Weidmann 2024-02-23 10:46:06 +05:30
  • e31fb67267 Update build.zig Hasan Mukhlis 2024-02-23 12:55:27 +08:00
  • 530d3ae4c4 server: tests: reducing sleep time during scenario Pierrick HYMBERT 2024-02-23 02:38:54 +01:00