Commit graph

  • e190f1fca6
    nix: make xcrun visible in Nix sandbox for precompiling Metal shaders (#6118) b2534 Joseph Stahl 2024-03-25 20:51:46 -04:00
  • 2cc1b8cd31
    Update llava-cli.cpp cpumaxx 2024-03-25 17:47:53 -07:00
  • d0541094a1
    Update examples/embedding/embedding.cpp Minsoo Cheong 2024-03-26 09:25:16 +09:00
  • 280345968d
    cuda : rename build flag to LLAMA_CUDA (#6299) slaren 2024-03-26 01:16:01 +01:00
  • 9c27b0e6ea
    Update quantize.cpp - mix label Nexesenex 2024-03-26 01:12:35 +01:00
  • 93434fdc7e ci: bench: add mermaid in case of image cannot be uploaded Pierrick HYMBERT 2024-03-26 01:08:59 +01:00
  • 784fa90cbe
    Add support to parse Openai function call input and results format to mistral_rubra format. TODO: need to clean up prints after testing. Yingbei 2024-03-25 17:00:26 -07:00
  • 41d7c5eaca
    Update llava-cli.cpp to support comma-delimited image lists cpumaxx 2024-03-25 16:44:38 -07:00
  • 3031c01db0
    Update llama.cpp - correction wrong case declaration Nexesenex 2024-03-26 00:06:41 +01:00
  • 066efbb18f
    Update llama.cpp - adjustements non-FFN layer tensors Nexesenex 2024-03-25 23:08:19 +01:00
  • 5c0b2a2b59 server: bench: fix graph, fix output artifact Pierrick HYMBERT 2024-03-25 21:44:45 +01:00
  • 0a0ef09aca zig: add unicodedata.cpp Jared Van Bortel 2024-03-25 16:32:34 -04:00
  • bb27cd95d8 swift : add unicodedata.cpp Jared Van Bortel 2024-03-25 16:32:19 -04:00
  • 89e60cbfa3 make : fix unicodedata.o build Jared Van Bortel 2024-03-25 16:30:55 -04:00
  • 4af8617e81
    add precompileMetalShaders flag (defaults to false) to disable precompilation of metal shader Joseph Stahl 2024-03-23 09:07:05 -04:00
  • 0af5c68a0c
    cmake - copy default.metallib to install directory Joseph Stahl 2024-03-21 21:08:28 -04:00
  • 973057a879
    Symlink to /usr/bin/xcrun so that xcrun binary is usable during build (used for compiling Metal shaders) Joseph Stahl 2024-03-17 15:19:10 -04:00
  • 799317b27d server: bench: reduce list of GPU nodes Pierrick HYMBERT 2024-03-25 21:15:09 +01:00
  • b460e7f5b4 wpm : portable unicode tolower Jared Van Bortel 2024-03-25 16:01:58 -04:00
  • e5ddf2fcdd llama : split unicodedata.cpp from unicode.cpp Jared Van Bortel 2024-03-25 16:00:03 -04:00
  • 4146960d52 server: bench: init Pierrick HYMBERT 2024-03-24 08:16:22 +01:00
  • b3553335a3
    Update llama.h - change IQ1_XS enum number Nexesenex 2024-03-25 21:06:46 +01:00
  • ddc7701588
    Update llama.cpp - Non-FFN layer-tensors strategy Nexesenex 2024-03-25 21:04:01 +01:00
  • b80c0af078 wpm : use C locale for ispunct/isspace Jared Van Bortel 2024-03-25 15:52:28 -04:00
  • 12c9576aec fix vector sizes. Julia Longtin 2024-03-25 19:43:37 +00:00
  • 1c4da5ddac
    Update llama.cpp - Embeddings and output tensors strategy. Nexesenex 2024-03-25 20:37:11 +01:00
  • 51ff04e77e
    Update llama.cpp - Fix possible typo Nexesenex 2024-03-25 19:31:51 +01:00
  • 8eff402498
    Update llama.cpp - Case IQ1_XS Nexesenex 2024-03-25 19:30:19 +01:00
  • 3d88431113
    Update llama.h - Enum IQ1_XS Nexesenex 2024-03-25 19:25:31 +01:00
  • 8f7a7ee370
    Update quantize.cpp - Quant option IQ1_XS Nexesenex 2024-03-25 19:23:21 +01:00
  • 7c5ad052af
    Rename LLAMA_CUBLAS flag to LLAMA_CUDA in fulfilment of the prophecy. JohnnyB 2024-03-25 18:16:19 +00:00
  • f4949bc1ca
    b2532 Nexesenex 2024-03-25 19:13:45 +01:00
  • 22fa121344 iq1_m: add to backend-ops tests Iwan Kawrakow 2024-03-25 20:04:08 +02:00
  • 62dd11f3e5 iq1_m: remove unused variable Iwan Kawrakow 2024-03-25 18:58:05 +01:00
  • b06c16ef9f
    nix: fix blas support (#6281) Christian Kögler 2024-03-25 18:52:45 +01:00
  • 480d6d6c36 iq1_m: adapt to CUDA refactoring Iwan Kawrakow 2024-03-25 19:40:39 +02:00
  • 1f2fd4e727
    tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) b2531 Kawrakow 2024-03-25 18:33:15 +01:00
  • 3d9c21f61a iq1_m: small PPL improvement via super-block scale adjustment Iwan Kawrakow 2024-03-25 17:39:35 +02:00
  • 78ce561a31 iq1_m: another minor ARM_NEON dot product improvement Iwan Kawrakow 2024-03-25 13:38:38 +01:00
  • b1d1c26034 iq1_m: faster ARM_NEON dot product Iwan Kawrakow 2024-03-25 12:47:21 +01:00
  • f664692fa8 iiq1_m: slightly faster ARM_NEON dot product Iwan Kawrakow 2024-03-25 12:34:25 +01:00
  • dff85a804b iq1_m: checking pure iq1_m quantization Iwan Kawrakow 2024-03-25 10:57:31 +02:00
  • abc1d4f951 iq1_m: minor Iwan Kawrakow 2024-03-25 09:32:14 +02:00
  • 19fb974d77 iq1_m: Metal now works Iwan Kawrakow 2024-03-25 07:41:26 +01:00
  • 0e36afa0ca iq1_m: Metal - dequantize works, dot product does not Iwan Kawrakow 2024-03-25 07:18:23 +01:00
  • 8009b6d63b iq1_m: ARM_NEON dot product Iwan Kawrakow 2024-03-24 19:48:55 +01:00
  • 379fdb671b iq1_m: very slightly faster AVX2 dot product Iwan Kawrakow 2024-03-24 19:52:22 +02:00
  • a139de51b6 iq1_m: AVX2 dot product Iwan Kawrakow 2024-03-23 16:38:13 +02:00
  • 64b9dfd7ff iq1_m: scalar dot product Iwan Kawrakow 2024-03-23 15:57:31 +02:00
  • 308c50d030 iq1_m: go to 3-bit scales Iwan Kawrakow 2024-03-23 12:26:41 +02:00
  • 282f2788af iq1_m: separate shifts for each group of 8 in a block Iwan Kawrakow 2024-03-23 08:33:26 +02:00
  • 1df37b654b iq1_m: CUDA dequantize works Iwan Kawrakow 2024-03-22 19:46:35 +02:00
  • ac8b3dd2eb iq1_m: basics-2 Iwan Kawrakow 2024-03-22 19:28:03 +02:00
  • 2a2d66de46 iq1_m: basics Iwan Kawrakow 2024-03-22 19:19:12 +02:00
  • fe080353ff update more files slaren 2024-03-25 17:31:05 +01:00
  • 6f20e2672f Include IQ2_XXS and IQ2_XS in teet-quantize-fns ik/test_quantize_fns Iwan Kawrakow 2024-03-25 19:01:20 +02:00
  • 485358aa62 Whitespace trollkotze 2024-03-25 17:51:13 +01:00
  • f5b8622b3a Copy usage output for control-vector params to server.cpp trollkotze 2024-03-25 17:38:49 +01:00
  • f0722b1352 clean up failed attempt at implementing control-vector hot-swapping trollkotze 2024-03-25 17:04:31 +01:00
  • 43139cc528
    flake.lock: Update (#6266) Georgi Gerganov 2024-03-25 17:22:27 +02:00
  • 4a60e88065 cuda : rename build flag to LLAMA_CUDA slaren 2024-03-25 16:04:50 +01:00
  • 2f34b865b6
    cuda : fix LLAMA_CUDA_F16 build (#6298) b2529 slaren 2024-03-25 15:43:22 +01:00
  • 210e469114 cuda : fix LLAMA_CUDA_F16 build sl/cuda-f16-fix3 slaren 2024-03-25 15:31:10 +01:00
  • c14d4e8723 Merge branch 'master' of https://github.com/hxer7963/llama.cpp into xverse root 2024-03-25 22:20:53 +08:00
  • ffa9abd9c3 Merge branch 'master' into compilade/smaller-output-buffer Francis Couture-Harpin 2024-03-25 08:43:01 -04:00
  • 082611a2d0
    server : add n_discard parameter to specify the number of tokens to discard when context is shifted Jan Boon 2024-03-21 19:04:38 +08:00
  • b1d933793c make get_weights to return pointer ngxson 2024-03-25 14:21:21 +01:00
  • ae1f211ce2
    cuda : refactor into multiple files (#6269) b2528 slaren 2024-03-25 13:50:23 +01:00
  • 551f5a0378 fix format abhilash1910 2024-03-25 04:54:43 -07:00
  • 7ea2e1574f fix format abhilash1910 2024-03-25 04:48:28 -07:00
  • 6e27406352 embedding: assign n_ubatch value, print error on n_batch overflow Minsoo Cheong 2024-03-25 20:22:02 +09:00
  • d6dcd1738e Merge branch 'master' into sycl_readme_update OuadiElfarouki 2024-03-25 10:36:24 +00:00
  • 36c7f02abc
    Merge branch 'ggerganov:master' into iq2_s Abhilash Majumder 2024-03-25 15:02:29 +05:30
  • ad3a0505e3
    Server: clean up OAI params parsing function (#6284) b2527 Xuan Son Nguyen 2024-03-25 09:42:17 +01:00
  • 95ad616cdd
    [SYCL] fix SYCL backend build on windows is break by LOG() error (#6290) b2526 Neo Zhang Jianyu 2024-03-25 15:52:41 +08:00
  • 64e7b47c69
    examples : add "retrieval" (#6193) Minsoo Cheong 2024-03-25 16:38:22 +09:00
  • f6be52d793
    retrieval : minor Georgi Gerganov 2024-03-25 09:35:50 +02:00
  • 11a64f8827 add newline at end of file Zhang 2024-03-25 14:51:46 +08:00
  • 743dd102b1
    Merge 44a80b4119 into 7733f0c760 Johannes Gäßler 2024-03-25 14:45:04 +08:00
  • 27f007b4a9 rollback to bash Zhang 2024-03-25 14:43:28 +08:00
  • 7733f0c760
    ggml : support AVX512VNNI (#6280) Justine Tunney 2024-03-25 01:39:56 -04:00
  • e139651c0f Merge branch 'xverse' into master willhe 2024-03-25 11:59:42 +08:00
  • 92070cab2a Maybe adding a memory leak? But it werks now. trollkotze 2024-03-25 04:33:44 +01:00
  • eb5fd2d8c5 increase file read buffer size Minsoo Cheong 2024-03-25 10:57:00 +09:00
  • 446dc1a128 fix LOG() error for SYCL, enhance erro check by CI Zhang 2024-03-25 09:47:03 +08:00
  • 7dbed974dc hmm... trollkotze 2024-03-25 02:07:54 +01:00
  • a32b77c4b2
    Fix heap corruption from wmode out-of-bound writes on windows (#6272) b2523 Rick G 2024-03-24 14:45:56 -07:00
  • 0274e6b364 Control vectors in server trollkotze 2024-03-24 22:13:55 +01:00
  • ada487b283 update docs ngxson 2024-03-24 21:43:29 +01:00
  • 8e2d769a64 add TODO for logprobs ngxson 2024-03-24 21:30:49 +01:00
  • 950db2bd77 minor fixes ngxson 2024-03-24 21:28:12 +01:00
  • f73c470980 fix empty response_format ngxson 2024-03-24 21:07:33 +01:00
  • 4aeef9bc31 fix response_format ngxson 2024-03-24 19:44:52 +01:00
  • faaec65fdb server: clean up oai parsing function ngxson 2024-03-24 19:25:58 +01:00
  • 96941546aa fix HIP build slaren 2024-03-24 17:16:15 +01:00
  • 475824edf5 update Makefile for HIP slaren 2024-03-24 16:35:44 +01:00
  • 290f81aa4f update cmake for HIP slaren 2024-03-24 16:34:01 +01:00
  • 121230540d nix: fix blas support Christian Kögler 2024-03-24 16:32:51 +01:00
  • 2cdb44ddf3 update Makefile slaren 2024-03-24 16:32:09 +01:00
  • 209df3defb
    Support AVX512VNNI Justine Tunney 2024-03-24 08:31:48 -07:00