Commit graph

  • 160acecaba iq3_s_multiplier: CUDA and AVX2 works Iwan Kawrakow 2024-03-01 13:44:06 +02:00
  • 3ab8b3a92e
    llama : cleanup unused mmq flags (#5772) b2303 Pierrick Hymbert 2024-03-01 12:39:06 +01:00
  • 4c21c826e1 WIP Iwan Kawrakow 2024-03-01 13:28:20 +02:00
  • 1cc7cb2b46 iq3_s(multiplier): use SIMD also in dequantize Iwan Kawrakow 2024-03-01 12:02:39 +02:00
  • b67b8f6451 handle rope-theta Sourab Mangrulkar 2024-03-01 15:29:36 +05:30
  • 9c752ff0d3 Trying IQ3_S without a lookup table Iwan Kawrakow 2024-03-01 11:52:17 +02:00
  • fdd886f7b4 remove redundant changes Sourab Mangrulkar 2024-03-01 15:14:26 +05:30
  • 5db8896f25 fix merge error Jianyu Zhang 2024-03-01 17:29:08 +08:00
  • 9600d59e01
    unicode : switch to multimap based nfd_map (#5799) b2302 Douglas Hanley 2024-03-01 03:15:36 -06:00
  • 5cb02b4a01
    server: allow to override threads server pool with --threads-http (#5794) b2301 Pierrick Hymbert 2024-03-01 10:08:08 +01:00
  • 705f1237bf initial support Ariadne 2024-03-01 17:04:31 +08:00
  • 6ea0f010ff
    ci : add Ubuntu 22 Vulkan CI run (#5789) b2300 Eve 2024-03-01 08:54:53 +00:00
  • fff73b75c2
    Add unit offset after dtype change kunal-vaishnavi 2024-03-01 00:21:23 -08:00
  • 0231bbc516
    server : remove api_like_OAI.py proxy script Georgi Gerganov 2024-03-01 10:17:00 +02:00
  • f105471ef6
    server : fix newlines in help (#5785) b2299 Georgi Gerganov 2024-03-01 09:59:43 +02:00
  • 6b01068081
    Merge branch 'master' into mulcards Neo Zhang Jianyu 2024-03-01 15:59:18 +08:00
  • 47a572df16 update news Jianyu Zhang 2024-03-01 15:52:41 +08:00
  • 38d1521608
    [SYCL] Use batched mul_mat pathway (#5591) b2298 AidanBeltonS 2024-03-01 07:36:47 +00:00
  • 4c29df303d rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test Jianyu Zhang 2024-03-01 15:36:37 +08:00
  • ed9ff0c93d Fix flag name in help message Miwa / Ensan 2024-03-01 16:32:27 +09:00
  • 5c06625f58 Update llama.cpp Sourab Mangrulkar 2024-03-01 12:35:18 +05:30
  • 10aa6e927e resolve comments Sourab Mangrulkar 2024-03-01 11:09:35 +05:30
  • 4a24bdfabd dont construct new locale every time Douglas Hanley 2024-02-29 16:08:44 -06:00
  • c3eba7c1c9 Rework matmul pipeline selection 0cc4m 2024-02-29 22:34:09 +01:00
  • 052051d8ae
    Server: normalize naming (#5779) b2297 Xuan Son Nguyen 2024-02-29 21:42:11 +01:00
  • 4bffc07144 simplify multimap keys Douglas Hanley 2024-02-29 14:17:12 -06:00
  • 8150be09d9 first commit Ariadne 2024-03-01 03:12:56 +08:00
  • 13d0948fdc server tweak pudepiedj 2024-02-29 18:14:08 +00:00
  • ff23aced4c switch to multimap based nfd_map due to compile time issues Douglas Hanley 2024-02-29 11:50:17 -06:00
  • 7463569cad fix uniform int distribution initialization Minsoo Cheong 2024-03-01 02:24:55 +09:00
  • 71f885f2d0 Llamaserver.py changes pudepiedj 2024-02-29 16:56:51 +00:00
  • d62ce1c6b4 skip rope freq and rotary embeddings from being serialized Sourab Mangrulkar 2024-02-29 19:32:04 +05:30
  • b6da762d68 Merge branch 'master' into xsn/model_merge ngxson 2024-02-29 14:48:22 +01:00
  • a451708e90 n_ctx change pudepiedj 2024-02-29 12:40:53 +00:00
  • 6c108068b1 handle rope type Sourab Mangrulkar 2024-02-29 17:56:32 +05:30
  • ab4eab3a82 Add support for starcoder2 Sourab Mangrulkar 2024-02-29 17:31:25 +05:30
  • abed262eeb Explicitly state scaled data type Aidan 2024-02-27 12:02:41 +00:00
  • b2aaee3500 rm extra line Abhilash Majumder 2024-02-27 17:03:16 +05:30
  • 1b3c1fe475 Use batched mul_mat pathway Aidan 2024-02-19 16:43:09 +00:00
  • 6e0733b3b7 update llama-bench slaren 2024-02-29 11:39:53 +01:00
  • a55e8fc6f0 server: allow to override threads server pool with --threads-http Pierrick HYMBERT 2024-02-29 11:35:49 +01:00
  • 4b08df1f8b fix broken merge hazelnutcloud 2024-02-29 18:13:15 +08:00
  • 36fed7af50 remove: mul_mat_q in compare llama bench and usage Pierrick HYMBERT 2024-02-29 10:38:36 +01:00
  • 0046e58d85 fix(convert-hf-to-gguf): requires einops for InternLM2ForCausalLM models nold 2024-02-29 09:26:40 +01:00
  • e063bce2b6
    Merge branch 'ggerganov:master' into master hsnmkls 2024-02-29 16:21:59 +08:00
  • d5ab29757e
    llama : constified llama_set_state_data's src (#5774) b2296 Marcus Dunn 2024-02-29 00:17:23 -08:00
  • c2cd292307 fix bug in active_seqs sync Minsoo Cheong 2024-02-29 16:01:34 +09:00
  • 2ad3f7c28c randomly select next sequence to verify + fix bug in memory freeing Minsoo Cheong 2024-02-29 15:47:41 +09:00
  • 6b35c8b3cf fix r random generation Minsoo Cheong 2024-02-29 13:27:29 +09:00
  • bd025377bf
    add ubuntu 22 vulkan ci Eve 2024-02-29 03:05:34 +00:00
  • 8a566ab8b8
    Merge branch 'fix/xcframework-ci' Philipp Zagar 2024-02-28 14:51:41 -08:00
  • 8f83ea7d47
    Fix CI build, lift package availability to macOS and visionOS Philipp Zagar 2024-02-28 14:45:41 -08:00
  • 4a7ab3f173
    Temp commit Philipp Zagar 2024-02-28 14:16:58 -08:00
  • 6baa61c1e0 Enable CORS requests on all routes StrangebytesDev 2024-02-28 13:36:17 -08:00
  • 6314096db9 Speed up q4_0 dequant code, enable mmq for q4_0 0cc4m 2024-02-28 22:14:14 +01:00
  • 51381f8f5d fix spacing ngxson 2024-02-28 22:06:21 +01:00
  • e2992ea332 server: normalize naming ngxson 2024-02-28 21:56:49 +01:00
  • b52285d056
    Fix external parameters Philipp Zagar 2024-02-28 12:44:01 -08:00
  • 8776d1c051
    Add XCFramework building Philipp Zagar 2024-02-28 12:19:44 -08:00
  • b389e90ec8
    Merge a402d3cf74 into 87c91c0766 GangT 2024-02-28 19:56:45 +00:00
  • 87c91c0766
    ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) b2295 Georgi Gerganov 2024-02-28 21:44:21 +02:00
  • 317709b2a8
    make portability_enumeration_ext apple only (#5757) b2294 Eve 2024-02-28 19:33:37 +00:00
  • ce70bdc096 constified llama_set_state_data marcus 2024-02-28 09:57:21 -08:00
  • 645bb3d5a2
    Update convert-hf-to-gguf.py rombodawg 2024-02-28 12:51:08 -05:00
  • bcb60f306f
    server: error handling ZXED 2024-02-28 20:46:10 +03:00
  • 2e64897d2e cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q Pierrick HYMBERT 2024-02-28 18:02:55 +01:00
  • 08c5ee87e4
    llama : remove deprecated API (#5770) b2293 Georgi Gerganov 2024-02-28 18:43:38 +02:00
  • 5fc98fb8ac
    ci : reduce 3b chunks to 1 to avoid timeout Georgi Gerganov 2024-02-28 18:40:34 +02:00
  • e4896e71b5 fixes based on review (@JohannesGaessler) Minsoo Cheong 2024-02-29 00:41:31 +09:00
  • 78aacf3634
    awq-py : remove (#5768) Georgi Gerganov 2024-02-28 17:36:53 +02:00
  • 5834217d3f
    llama : remove deprecated API Georgi Gerganov 2024-02-28 17:35:24 +02:00
  • 94f6256fd0 replace use of rand() with mt19937 sampling Minsoo Cheong 2024-02-29 00:26:23 +09:00
  • 3921ff5c88
    awq-py : remove Georgi Gerganov 2024-02-28 16:28:31 +02:00
  • ee7f05b52b Exploring stdout redirection pudepiedj 2024-02-28 12:22:25 +00:00
  • b56b9895ed std::cerr pudepiedj 2024-02-28 12:05:08 +00:00
  • dade1cefd4 Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch pudepiedj 2024-02-28 12:03:04 +00:00
  • 09e087f691 Merge remote-tracking branch 'origin/master' into server_branch pudepiedj 2024-02-28 12:03:01 +00:00
  • 7516a5b9ee
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-02-28 12:01:51 +00:00
  • 9f40bb7983 LOG_VERBOSE sorted pudepiedj 2024-02-28 11:59:45 +00:00
  • 33563a8a52 rm warning Jianyu Zhang 2024-02-28 19:54:44 +08:00
  • f87da8ebf3 suport multiple cards: split-mode - layer|row Jianyu Zhang 2024-02-28 19:34:10 +08:00
  • f50bf00c02 fix type ngxson 2024-02-28 10:54:46 +01:00
  • 8c0e8f4e73
    sync : ggml b2291 Georgi Gerganov 2024-02-28 11:17:32 +02:00
  • 2774b0c974
    add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +01:00
  • 5f70671856
    Introduce backend GUIDs (ggml/743) UEXTM.com 2024-02-24 11:27:36 -05:00
  • a693bea1e6
    server : hit Ctrl+C twice to exit (#5734) b2288 Xuan Son Nguyen 2024-02-28 09:55:37 +01:00
  • adcb12a9ba
    llama : fix non-quantization of expert gating tensors (#5754) b2287 compilade 2024-02-28 03:52:56 -05:00
  • 177628bfd8
    llama : improve BERT tokenization (#5740) b2286 Douglas Hanley 2024-02-28 02:51:11 -06:00
  • 6c4416868d
    readme : add link to LLaVA 1.6 models (#5758) Daniel Bevenius 2024-02-28 09:39:39 +01:00
  • efc72253f7
    server : add "/chat/completions" alias for "/v1/...` (#5722) b2284 Jorge A 2024-02-28 01:39:15 -07:00
  • 7c4263d426
    ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) b2283 Kawrakow 2024-02-28 10:37:02 +02:00
  • f993d1448b
    minor : fix trailing whitespace Georgi Gerganov 2024-02-28 10:35:18 +02:00
  • f0cbb6ddf6 iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work) ik/i-quants-64 Iwan Kawrakow 2024-02-28 08:28:10 +02:00
  • 47d52b2b24 Q2_K: fixed bug in imatrix quantization for QK_K = 64 Iwan Kawrakow 2024-02-28 08:15:52 +02:00
  • 41bb3a4382
    readme: add link to LLaVA 1.6 models Daniel Bevenius 2024-02-28 06:37:40 +01:00
  • bc88fc6371
    make portability_enumeration_ext apple only Eve 2024-02-28 01:28:15 +00:00
  • 0fcd27dc12
    Update examples/server/server.cpp Xuan Son Nguyen 2024-02-28 00:25:49 +01:00
  • 1add0ea518
    Update convert-hf-to-gguf.py rombodawg 2024-02-27 17:41:13 -05:00
  • ebdc0d3907 Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch pudepiedj 2024-02-27 22:27:12 +00:00
  • 87d501fc10 Enable log redirection pudepiedj 2024-02-27 22:27:10 +00:00