Commit graph

  • 4189cf7089 log : fix MSVC compile errors (#5643) UEXTM.com 2024-03-08 04:35:04 -05:00
  • 5fd456b5a9 llama-bench : add embeddings option (#5924) Georgi Gerganov 2024-03-07 16:32:38 +02:00
  • 979373c17f Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
  • c810764c7e server : add /v1/completions endpoint (#5914) Minsoo Cheong 2024-03-07 19:42:39 +09:00
  • cd045ddfca server : refactor (#5882) Georgi Gerganov 2024-03-07 11:41:53 +02:00
  • b111721998 fix conflict Jianyu Zhang 2024-03-12 21:00:07 +08:00
  • 504850f851 ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) Jared Van Bortel 2024-03-06 15:42:23 -05:00
  • f11260a020 ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) bobqianic 2024-03-06 07:35:07 +00:00
  • 4a1d95062d convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
  • 3da33990f4 add wait() to make code stable (#5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
  • 8030da7afe
    ggml : reuse quantum structs across backends (#5943) b2408 Georgi Gerganov 2024-03-12 14:27:20 +02:00
  • 1c2b593ba4 fix block_iq1_s; Jianyu Zhang 2024-03-12 20:04:00 +08:00
  • 184215e783
    ggml : fix UB in IQ2_S and IQ3_S (#6012) b2407 Georgi Gerganov 2024-03-12 13:49:55 +02:00
  • 66b88a80cb
    ggml : define helper quantum constants for SYCL Georgi Gerganov 2024-03-12 12:59:12 +02:00
  • 831075cc63
    ggml : silence thread sanitizer warnings Georgi Gerganov 2024-03-12 12:43:10 +02:00
  • 895f437e54
    Merge branch 'master' into gg/ggml-common-decl Georgi Gerganov 2024-03-12 11:26:25 +02:00
  • 792aa3487f
    Merge branch 'master' into fix_build_break_iq1s Neo Zhang Jianyu 2024-03-12 17:20:30 +08:00
  • 59f1f6aefc fix build break by iq1s Jianyu Zhang 2024-03-12 17:15:59 +08:00
  • 48358b2e5b
    sycl : update IQ1_S kernels (WIP - not working!) (#5995) b2406 Georgi Gerganov 2024-03-12 11:15:05 +02:00
  • 59c899daac json: dirty include for test ochafik 2024-03-12 04:32:24 +00:00
  • bed826fa56 json: nit fixes ochafik 2024-03-12 04:26:47 +00:00
  • 6165c55d3a json: revert from c++17 to 11 ochafik 2024-03-12 04:07:22 +00:00
  • ee6166af73 json: cleanup test ochafik 2024-03-12 03:55:14 +00:00
  • 917b5d2260 json: nits + regen deps ochafik 2024-03-12 03:51:32 +00:00
  • 7e1440cc01 Merge branch 'json-fixes-cpp' into json-fixes ochafik 2024-03-12 03:48:38 +00:00
  • 192a58a5d3 json: test C++, JS & Python versions ochafik 2024-03-12 03:46:55 +00:00
  • f36726edd2 rename function jianyuzh 2024-03-12 11:30:59 +08:00
  • 8d09376e62 order the device by backend type and max compute unit jianyuzh 2024-03-12 10:59:28 +08:00
  • a740bfacad Update json-schema-to-grammar.mjs.hpp ochafik 2024-03-12 02:09:07 +00:00
  • 0be059def8 json: fix mjs implementation + align outputs ochafik 2024-03-12 02:08:07 +00:00
  • 8fee84b45c Update json-schema-to-grammar.cpp ochafik 2024-03-12 02:06:48 +00:00
  • 8caaf1641d Update json-schema-to-grammar.cpp ochafik 2024-03-12 00:02:45 +00:00
  • d934adccea Update json-schema-to-grammar.cpp ochafik 2024-03-11 23:09:51 +00:00
  • cb364ef542 Merge branch 'json-fixes' into json-fixes-cpp ochafik 2024-03-11 22:23:19 +00:00
  • 51ca7cb863 json: nits Olivier Chafik 2024-03-11 22:20:19 +00:00
  • d0dd75c902 json: port schema converter to C++, wire in ./server Olivier Chafik 2024-03-11 22:19:26 +00:00
  • a601da6fd4 specify types ngxson 2024-03-11 22:14:27 +01:00
  • 5cdb371731
    grammar : fix unnecessarily retained pointer to rules (#6003) b2405 gliptic 2024-03-11 20:59:03 +01:00
  • f889aa83e4
    Fix retained pointer to rules parameter gliptic 2024-03-11 20:26:24 +01:00
  • dca5020a74
    ggml : define helper constants only for CUDA and SYCL Georgi Gerganov 2024-03-11 19:21:40 +02:00
  • 1a29871348 use multitask for embd endpoint ngxson 2024-03-11 18:02:32 +01:00
  • 54ebe70ea5
    Merge branch 'master' into gg/ggml-common-decl Georgi Gerganov 2024-03-11 17:53:54 +02:00
  • 44ca159faf
    1.5 bit: we can do even better (#5999) b2404 Kawrakow 2024-03-11 16:53:15 +01:00
  • 05b06210c9
    llama : more consistent names of count variables (#5994) b2403 Georgi Gerganov 2024-03-11 17:49:47 +02:00
  • 83796e62bc
    llama : refactor unicode stuff (#5992) b2402 Georgi Gerganov 2024-03-11 17:47:47 +02:00
  • 18504b6a34 handle wide characters in llama_file examples Bruce MacDonald 2024-03-11 11:13:35 -04:00
  • 8a11598b1e Revert "move repeated llama_file logic to llama.cpp" Bruce MacDonald 2024-03-11 10:55:38 -04:00
  • 555c4976df call set_single/mul_gpu_mode in init, order the devices Jianyu Zhang 2024-03-11 22:54:28 +08:00
  • 3680bc244b
    unicode : pass as cpts as const ref Georgi Gerganov 2024-03-11 15:59:58 +02:00
  • af0621e6bd
    unicode : add <cstdint> Georgi Gerganov 2024-03-11 15:51:24 +02:00
  • 828defefb6
    Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
  • 5440a127c7 iq1_s: fix dequantize on the CPU ik/even_better_iq1s Iwan Kawrakow 2024-03-11 14:17:28 +01:00
  • 76be02aebc
    sycl : fix grid type gg/try-fix-sycl-iq1_s Georgi Gerganov 2024-03-11 15:17:08 +02:00
  • 6d682fb62d
    examples : fix param name Georgi Gerganov 2024-03-11 15:12:23 +02:00
  • 52492a1cb9
    common : fix param name Georgi Gerganov 2024-03-11 15:04:30 +02:00
  • cb5a702e9c
    sycl : iq1s_grid -> iq1s_grid_gpu Georgi Gerganov 2024-03-11 14:50:03 +02:00
  • 9046268fed
    Update docker URLs Jakub N 2024-03-11 13:46:15 +01:00
  • 491d2da02f
    llama : n_parallel -> n_seq_max Georgi Gerganov 2024-03-11 14:45:55 +02:00
  • 436c65e1a8 iq1_s: very slightly faster dequantize on Metal Iwan Kawrakow 2024-03-11 13:13:38 +01:00
  • 58d549152c
    unicode : add BOM Georgi Gerganov 2024-03-11 14:04:34 +02:00
  • da4528bce3 iq1_s: make Metal work with new version Iwan Kawrakow 2024-03-11 13:03:49 +01:00
  • 4fba3e00c6 iq1_s: make Neon work with new version. Iwan Kawrakow 2024-03-11 12:55:28 +01:00
  • 77d586f534
    sycl : try to fix after IQ1_S changes Georgi Gerganov 2024-03-11 13:31:49 +02:00
  • b816734c17 json: preserve order of props from TS defs Olivier Chafik 2024-03-11 11:48:08 +00:00
  • c09f73490d iq1_s: make scalar and AVX2 work with the new version Iwan Kawrakow 2024-03-11 13:44:02 +02:00
  • 32daccd755
    llama : more consistent names of count variables Georgi Gerganov 2024-03-11 13:25:38 +02:00
  • 4600538baa
    swift : fix build Georgi Gerganov 2024-03-11 13:21:12 +02:00
  • 6568c62bca
    unicode : put nfd normalization behind API Georgi Gerganov 2024-03-11 13:19:55 +02:00
  • 82380acf10 iq1_s: we can do even better Iwan Kawrakow 2024-03-11 13:12:33 +02:00
  • be12d8b12a
    zig : fix build Georgi Gerganov 2024-03-11 13:09:43 +02:00
  • caa106d4e0
    Server: format error to json (#5961) b2400 Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
  • e607540ec9
    unicode : straighten tables Georgi Gerganov 2024-03-11 11:53:17 +02:00
  • de0929ae7d
    unicode : names Georgi Gerganov 2024-03-11 11:44:42 +02:00
  • 9f3f7d8085
    make : fix c++ compiler Georgi Gerganov 2024-03-11 11:42:45 +02:00
  • 9654d62f7e
    unicode : names Georgi Gerganov 2024-03-11 11:41:29 +02:00
  • 3202361c5b
    ggml, ci : Windows ARM runner and build fixes (#5979) b2399 Michael Podvitskiy 2024-03-11 10:28:51 +01:00
  • 0458996ec1
    llama : refactor unicode stuff Georgi Gerganov 2024-03-11 11:21:43 +02:00
  • 77414564f9
    ggml : reuse quant blocks across backends Georgi Gerganov 2024-03-11 10:52:10 +02:00
  • 3f9f970f2c
    Merge 4447e95ec5 into 332bdfd798 Riceball LEE 2024-03-11 16:35:23 +08:00
  • 332bdfd798
    server : maintain chat completion id for streaming responses (#5988) b2398 Minsoo Cheong 2024-03-11 17:09:32 +09:00
  • fc7442aaba
    Update examples/server/utils.hpp Georgi Gerganov 2024-03-11 10:09:17 +02:00
  • ade9b90e95
    Update examples/server/utils.hpp Georgi Gerganov 2024-03-11 10:09:11 +02:00
  • ecab1c75de
    cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) b2397 Gilad S 2024-03-11 10:00:08 +02:00
  • ee35600b90
    llama : fix F16/F32 downcast + improve names (#5980) b2396 Georgi Gerganov 2024-03-11 09:56:47 +02:00
  • be858f6205
    Better 1.5 bit quantization (#5971) b2395 Kawrakow 2024-03-11 07:51:49 +01:00
  • 9d8317122e iq1s_blocks16: adjust to ggml-common.h Iwan Kawrakow 2024-03-10 11:22:05 +02:00
  • 34bc21ff90 iq1s_blocks16: faster AVX2 dot product Iwan Kawrakow 2024-03-09 11:48:30 +02:00
  • 101b18d509 iq1s_blocks16: slightly faster Neon dot product Iwan Kawrakow 2024-03-09 10:36:43 +02:00
  • 156220f8ca iq1s_blocks16: uint32_t codebook is also better in CUDA Iwan Kawrakow 2024-03-09 08:58:14 +02:00
  • 7545d69312 Formatting Iwan Kawrakow 2024-03-09 08:31:46 +02:00
  • d3da9d1617 iq1s_blocks16: speedup Metal by packing codebook into uint32_t's Iwan Kawrakow 2024-03-09 07:05:20 +01:00
  • 8561139a48 iq1s_blocks16: very slightly faster TG on Metal Iwan Kawrakow 2024-03-08 17:51:08 +01:00
  • 15acc7923b iq1s_blocks16: fixed Neon Iwan Kawrakow 2024-03-08 16:20:50 +01:00
  • fbb001e698 iq1s_blocks16: Metal works, Neon does not Iwan Kawrakow 2024-03-08 16:13:39 +01:00
  • f092d049fa iq1s_blocks16: CUDA dot product Iwan Kawrakow 2024-03-08 16:12:48 +02:00
  • 864a5c2ce4 iq1s_blocks16: scalar and AVX2 dot products Iwan Kawrakow 2024-03-08 15:50:36 +02:00
  • c55e66f997 iq1s_blocks16: Use 2*<x^2> as sigma2 in weight adjustment Iwan Kawrakow 2024-03-08 15:14:28 +02:00
  • 4c4404ace5 iq1s_blocks16: going to blocks of 32 Iwan Kawrakow 2024-03-08 14:42:55 +02:00
  • cd83a7d362 iq1s_blocks16: Adjust scale fudge factor to 1.125 Iwan Kawrakow 2024-03-08 13:38:45 +02:00
  • c9e9acf2be Trying blocvks of 16 for IQ1_S - seems slightly better Iwan Kawrakow 2024-03-08 11:36:42 +02:00