Commit graph

  • 6cdabe6526
    llama-bench : add embeddings option (#5924) b2360 Georgi Gerganov 2024-03-07 16:32:38 +02:00
  • da952a6011 llama-bench : do not hard code embd default value slaren 2024-03-07 15:29:46 +01:00
  • ca4c069229
    llama-bench : add embeddings option Georgi Gerganov 2024-03-07 16:18:54 +02:00
  • e0504d536c PR clean up Michael Podvitskiy 2024-02-29 18:01:14 +01:00
  • afa9d0953b models without vocabulary, llama.cpp part Michael Podvitskiy 2024-02-28 10:49:26 +01:00
  • 4f4258fbde models without vocabulary, convert.py part Michael Podvitskiy 2024-02-27 18:29:34 +01:00
  • cc3fe18b43 vocab size as a part of a model metadata Michael Podvitskiy 2024-02-27 14:34:29 +01:00
  • e700b44217 additional methods to read model and ctx parameters Michael Podvitskiy 2024-02-26 11:58:25 +01:00
  • 6b2921423e llama: fix crash when tokenize unkown spm vocab token. iohub 2024-03-07 19:17:26 +08:00
  • 89fb735fcf
    Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) b2359 Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
  • 55a2a900ff
    server : add /v1/completions endpoint (#5914) b2358 Minsoo Cheong 2024-03-07 19:42:39 +09:00
  • 8f6584a710 add legacy comment to /completion endpoint Minsoo Cheong 2024-03-07 19:26:47 +09:00
  • 1d23285b4e add-/v1/completions-endpoint Minsoo Cheong 2024-03-07 19:23:23 +09:00
  • 15c537ef18 Add lobe-chat in UIs list admin 2024-03-07 10:59:54 +01:00
  • 94f33d7ae3 rm macro abhilash1910 2024-03-07 01:54:26 -08:00
  • 2002bc96bf
    server : refactor (#5882) b2357 Georgi Gerganov 2024-03-07 11:41:53 +02:00
  • 87a4a105b2
    server : add comments Georgi Gerganov 2024-03-07 11:35:03 +02:00
  • 818d898fe7
    server : allow to override FQDN in tests Georgi Gerganov 2024-03-07 11:15:58 +02:00
  • b5b0270372
    Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" revert-5901-fix_set_gpu Neo Zhang Jianyu 2024-03-07 17:11:18 +08:00
  • 234ab58af1
    server : rename server structs Georgi Gerganov 2024-03-07 10:36:39 +02:00
  • ceca1aef07
    [SYCL] fix error when set main gpu to non-zero (#5901) b2356 Neo Zhang Jianyu 2024-03-07 16:34:31 +08:00
  • fd74b5ea34
    server : simplify json parsing + add comment about t_last Georgi Gerganov 2024-03-07 10:17:10 +02:00
  • f618e5060a add to gitignore Douglas Hanley 2024-03-07 01:38:30 -06:00
  • 1ab6aeeeee gritlm embeddings are back babeee Douglas Hanley 2024-03-07 01:37:08 -06:00
  • fd4b59be14 add eos Seungwon 2024-03-07 16:28:39 +09:00
  • 9c8d3c8a25 server: refactor: better http codes Pierrick HYMBERT 2024-03-06 22:48:39 +01:00
  • c53d84ec16
    server : avoid n_available var Georgi Gerganov 2024-03-06 23:23:17 +02:00
  • c50a510092
    server: refactor: clean up http code (#5912) Pierrick Hymbert 2024-03-06 22:04:12 +01:00
  • 3166ccf5aa server: tests: embeddings, no need to wait for server idle as it can timout Pierrick HYMBERT 2024-03-06 21:59:31 +01:00
  • 65cb1fbbf9 server: refactor: clean up http code Pierrick HYMBERT 2024-03-06 21:49:14 +01:00
  • e04e04f8fa
    ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) b2355 Jared Van Bortel 2024-03-06 15:42:23 -05:00
  • 59850f18e5
    server: tests: embeddings fix build type Debug is randomly failing (#5911) Pierrick Hymbert 2024-03-06 21:25:55 +01:00
  • 396106c532 server: tests: embeddings, fixed prompt do not exceed n_batch, increase embedding timeout, reduce number of concurrent embeddings Pierrick HYMBERT 2024-03-06 21:18:48 +01:00
  • ec6ba3bff1 server: tests: embeddings, use different KV Cache size Pierrick HYMBERT 2024-03-06 21:03:26 +01:00
  • 36e12f8fd3
    server, tests : bump batch to fit 1 embedding prompt Georgi Gerganov 2024-03-06 21:28:10 +02:00
  • 79ef3c0585
    server: tests: embeddings use a real embeddings model (#5908) Pierrick Hymbert 2024-03-06 20:24:20 +01:00
  • bfb121fd2e
    server : do not process more than n_batch tokens per iter Georgi Gerganov 2024-03-06 21:04:09 +02:00
  • fa7214c3f8 server: tests: embeddings, add dedicated feature and real model, if KV cache size exceeds batch size, embeddings differs Pierrick HYMBERT 2024-03-06 19:58:47 +01:00
  • aef02b11ec
    server : disable cached prompts with self-extend Georgi Gerganov 2024-03-06 18:51:40 +02:00
  • 069d671a69 ggml : use SYS_get_cpu if SYS_getcpu is not defined Jared Van Bortel 2024-03-06 11:08:08 -05:00
  • ad2ed8fa11 fix delete condition Jianyu Zhang 2024-03-06 22:35:39 +08:00
  • 2d50a35988 Merge remote-tracking branch 'origin/master' into server_branch pudepiedj 2024-03-06 12:06:44 +00:00
  • c810047b53 enable ops abhilash1910 2024-03-06 03:06:25 -08:00
  • b060e8a3cc fix error when set main gpu to non-zero Jianyu Zhang 2024-03-06 17:41:11 +08:00
  • 50f9ba353c fix build abhilash1910 2024-03-05 23:37:27 -08:00
  • 15ecc09971
    Merge 17dfcde615 into e25fb4b18f Robey Holderith 2024-03-06 15:35:43 +08:00
  • e25fb4b18f
    ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) b2354 bobqianic 2024-03-06 07:35:07 +00:00
  • 1e35d619a6
    convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
  • 600193ca9a fix build abhilash1910 2024-03-05 23:13:32 -08:00
  • 5a790a38b0 Free Cublas GPU memory Zoli Somogyi 2024-03-06 07:56:41 +01:00
  • 97936078b7 rebase to new embed Douglas Hanley 2024-03-05 23:23:17 -06:00
  • a98a166d12 add back the [img-id] CJ Pais 2024-03-05 20:45:39 -08:00
  • 5db4c71a16 fix num tokens for multimodal + empty prompt in response CJ Pais 2024-03-05 20:37:16 -08:00
  • 8ced9f7e32
    add wait() to make code stable (#5895) b2352 Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
  • 418f4f7c98 add wait() to make code stable Jianyu Zhang 2024-03-06 11:30:51 +08:00
  • 304b9b7732
    Update ggml-quants.c bobqianic 2024-03-05 22:21:29 +00:00
  • 8a16be0fd3
    use uint8x16_t bobqianic 2024-03-05 22:16:21 +00:00
  • 652ca2bded
    compare-llama-bench.py : remove mul_mat_q (#5892) slaren 2024-03-05 22:27:29 +01:00
  • eeaf8037af compare-llama-bench.py : remove mul_mat_q slaren 2024-03-05 21:05:03 +01:00
  • 805ae529c4 comment out debug printing Douglas Hanley 2024-03-04 00:18:41 -06:00
  • a71842d7ef tabs to spaces Douglas Hanley 2024-03-04 00:16:29 -06:00
  • e79195fc53 gritlm results match Douglas Hanley 2024-03-03 23:59:28 -06:00
  • 4be8fb18ed add gritlm example Douglas Hanley 2024-02-29 09:50:41 -06:00
  • 61b63705dc
    server : refactor system prompt update at start Georgi Gerganov 2024-03-05 19:55:19 +02:00
  • 4a2d5f63f2
    server : merge oai.hpp in utils.hpp Georgi Gerganov 2024-03-05 19:47:36 +02:00
  • cb3ce0bfff
    server : reorganize structs and enums + naming fixes Georgi Gerganov 2024-03-05 19:25:22 +02:00
  • 5544f5211b Merge branch 'master' into support-mamba-ssm Francis Couture-Harpin 2024-03-05 12:12:01 -05:00
  • 93fd4b8d5b mamba : clarify some comments Francis Couture-Harpin 2024-03-04 15:57:40 -05:00
  • 22ae1a622e
    server : do not process embedding requests when disabled Georgi Gerganov 2024-03-05 18:58:26 +02:00
  • bd836944f8
    quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889) b2350 Jared Van Bortel 2024-03-05 11:56:37 -05:00
  • 3de31677d3
    grammars : blacklists character control set (#5888) ExtReMLapin 2024-03-05 17:33:08 +01:00
  • 10865122c1 quants : use MM256_SET_M128I consistently to fix gcc 7 build Jared Van Bortel 2024-03-05 11:22:55 -05:00
  • f84809b7ad
    llama : llama_chat_apply_template support null buf Georgi Gerganov 2024-03-05 17:23:18 +02:00
  • 7635b13ad7
    server : minor Georgi Gerganov 2024-03-05 17:22:28 +02:00
  • f4800d54e7
    server : code style Georgi Gerganov 2024-03-05 16:42:50 +02:00
  • 1917585810
    Prevent control characters from being served in json string (array) ExtReMLapin 2024-03-05 15:42:18 +01:00
  • fb2f376103
    Prevent control characters from being served in json string ExtReMLapin 2024-03-05 15:41:36 +01:00
  • b1b3ba886e
    server : simplify model chat template validation Georgi Gerganov 2024-03-05 16:18:15 +02:00
  • 293378b20b consolidate gen_chatcmplid() callsite Minsoo Cheong 2024-03-05 23:11:35 +09:00
  • 82cb31eb93
    Revert "grammars : don't allow to output unescaped new line in string (#5885)" Georgi Gerganov 2024-03-05 15:56:24 +02:00
  • b1a4e994fd
    grammars : don't allow to output unescaped new line in string (#5885) ExtReMLapin 2024-03-05 14:44:29 +01:00
  • fef64c587d
    server : code style Georgi Gerganov 2024-03-05 15:36:14 +02:00
  • ad1d746caa
    server : normalize id vars Georgi Gerganov 2024-03-05 14:57:15 +02:00
  • c999536320
    fix build Abhilash Majumder 2024-03-05 18:17:33 +05:30
  • 6fd581e075
    fix compilation Abhilash Majumder 2024-03-05 18:09:00 +05:30
  • 61d1c88e15
    Vulkan Improvements (#5835) b2346 0cc4m 2024-03-05 13:33:42 +01:00
  • 134f5fec22
    server : fix empty prompt handling + all slots idle logic Georgi Gerganov 2024-03-05 14:33:12 +02:00
  • ad251954eb
    Add q3_s and q1_s Abhilash Majumder 2024-03-05 17:51:29 +05:30
  • 372ea4a7b4
    Don't allow new line in json object string ExtReMLapin 2024-03-05 11:33:57 +01:00
  • 2cf420218b
    Don't allow grammar json array to output unescaped new line in string ExtReMLapin 2024-03-05 11:31:51 +01:00
  • ef7eb33937
    server : remove llava/clip objects from build Georgi Gerganov 2024-03-05 11:51:43 +02:00
  • aa95dc5568 persist completion_id only for same stream Minsoo Cheong 2024-03-05 18:24:05 +09:00
  • f4e6e7e61f
    server : refactoring (wip) Georgi Gerganov 2024-03-05 11:16:43 +02:00
  • c5efd837b6 server: maintain chat completion id for streaming responses Minsoo Cheong 2024-03-05 17:37:13 +09:00
  • 31cecc8734 iq3_s_mult_shuffle: use lookup table on Metal ik/iq3_s_multiplier Iwan Kawrakow 2024-03-05 10:19:44 +02:00
  • 21b0867433
    [SYCL] fix mul_mat fault in CI/unit-test (#5862) b2345 Neo Zhang Jianyu 2024-03-05 16:08:35 +08:00
  • 93034df760 iq3_s_mult_shuffle: use lookup table on CUDA Iwan Kawrakow 2024-03-05 10:06:07 +02:00
  • 6d15da1ec0 iq3_s_mult_shuffle: use new multiplier and cleanup Iwan Kawrakow 2024-03-05 08:36:57 +02:00
  • b1d753be34 iq3_s_mult: remove SLOW_MULT option Iwan Kawrakow 2024-03-05 08:23:37 +02:00
  • 6a87ac3a52
    fix editorconfig check break (#5879) Minsoo Cheong 2024-03-05 15:12:23 +09:00