Commit graph

  • 8b350356b2
    server: docs - refresh and tease a little bit more the http server (#5718) Pierrick Hymbert 2024-02-25 21:46:29 +01:00
  • bb363b9879 adapt to new api change ngxson 2024-02-25 21:46:13 +01:00
  • 69e8e66afd
    Update README.md Pierrick Hymbert 2024-02-25 21:44:11 +01:00
  • e647ed4ada
    Update examples/server/README.md Pierrick Hymbert 2024-02-25 21:43:34 +01:00
  • 42d781e264
    Update examples/server/README.md Pierrick Hymbert 2024-02-25 21:43:01 +01:00
  • 72a8d59d48 move llama_client_slot back to server.cpp ngxson 2024-02-25 21:41:32 +01:00
  • 85c0334084 Merge branch 'master' into xsn/improve_server_works ngxson 2024-02-25 21:41:21 +01:00
  • 1294debdc5
    Rephrase README.md server doc Pierrick Hymbert 2024-02-25 21:23:19 +01:00
  • bf08e00643
    llama : refactor k-shift implementation + KV defragmentation (#5691) b2264 Georgi Gerganov 2024-02-25 22:12:24 +02:00
  • 4b419a2c92
    Merge 5271c75666 into f7625019c5 slaren 2024-02-25 14:20:33 -05:00
  • b852f75a95
    Merge 608f449880 into f7625019c5 Georgi Gerganov 2024-02-25 14:17:53 -05:00
  • f7625019c5
    server : fix crash when system prompt is bigger than batch size (#5714) b2263 compilade 2024-02-25 13:43:50 -05:00
  • abbabc5e51
    ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711) b2262 Radosław Gryta 2024-02-25 19:43:00 +01:00
  • 18239fa7fb server: docs - refresh and tease a little bit more the http server Pierrick HYMBERT 2024-02-25 19:30:15 +01:00
  • cc9288b0c7
    [github-workflows] Do not skip Android armeabi-v7a build Radosław Gryta 2024-02-25 13:38:48 +01:00
  • f5006c140a
    [android-example] Remove abi filter after arm v7a fix Radosław Gryta 2024-02-25 13:35:09 +01:00
  • a3ee7a1342
    [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility Radosław Gryta 2024-02-25 12:27:12 +01:00
  • f09d46e3b9 sampling: do not flood log sampled token Pierrick HYMBERT 2024-02-25 18:46:50 +01:00
  • 8dcf10028f server: tests - longer inference timeout for CI Pierrick HYMBERT 2024-02-25 18:21:01 +01:00
  • f1a98c5254
    make : fix nvcc version is empty (#5713) b2261 kwin1412 2024-02-26 00:46:49 +08:00
  • 93cdea1d7b Merge upstream changes, fix conflicts 0cc4m 2024-02-25 17:35:05 +01:00
  • 5a122c25a0
    llama : add comments Georgi Gerganov 2024-02-25 18:16:45 +02:00
  • 63d409551a server : fix crash when system prompt is bigger than batch size Francis Couture-Harpin 2024-02-25 10:54:14 -05:00
  • bfbf31715f
    fix nvcc version is empty kwin1412 2024-02-26 00:06:39 +08:00
  • aa0f428e2a Add argsort 0cc4m 2024-02-25 17:06:10 +01:00
  • 7d548a1827
    readme : add Msty to UI list (#5618) Ashok Gelal 2024-02-25 10:57:34 -05:00
  • 0b72ded501
    llama : switch the loop order in build_defrag Georgi Gerganov 2024-02-25 17:51:02 +02:00
  • 4eaaace394
    llama : ggml_graph based defrag implementation Georgi Gerganov 2024-02-25 17:36:37 +02:00
  • 65323bc770
    llama : defragment via non-overlapping moves Georgi Gerganov 2024-02-25 17:21:33 +02:00
  • 624214aedf revert test addr change ngxson 2024-02-25 15:45:10 +01:00
  • a5603ded45 move llama_client_slot to utils.hpp ngxson 2024-02-25 15:44:03 +01:00
  • 0eac1a3ec3 Merge branch 'master' into xsn/improve_server_works ngxson 2024-02-25 15:01:23 +01:00
  • c420e05383 remove_waiting_task_id: also clean pending results ngxson 2024-02-25 14:58:38 +01:00
  • 2d7203b975
    llama : remove llama_kv_cache_compress Georgi Gerganov 2024-02-25 15:32:02 +02:00
  • 1b6aeb8309
    llama : comments Georgi Gerganov 2024-02-25 15:30:06 +02:00
  • d141c749d9
    Merge branch 'master' into gg/refactor-k-shift Georgi Gerganov 2024-02-25 15:20:13 +02:00
  • 65f21ec5d3
    llama : add llama_kv_cache_defrag Georgi Gerganov 2024-02-25 15:00:45 +02:00
  • 930b178026
    server: logs - unified format and --log-format option (#5700) b2259 Pierrick Hymbert 2024-02-25 13:50:32 +01:00
  • d52d7819b8
    server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708) b2258 Pierrick Hymbert 2024-02-25 13:49:43 +01:00
  • 3b2dea18c3 condition_results.notify_all ngxson 2024-02-25 13:18:55 +01:00
  • d5ef563161 check opt first lindeer 2024-02-25 20:10:15 +08:00
  • 9ec749df59
    llama : add alternative KV cache merging (EXPERIMENTAL) Georgi Gerganov 2024-02-25 13:57:43 +02:00
  • 91e7e0ff17 refactor work queue related stuff ngxson 2024-02-25 12:56:34 +01:00
  • 542f42a604 server: metrics - move to a dedicated struct Pierrick HYMBERT 2024-02-25 12:44:39 +01:00
  • 7b29648da5 server: concurrency issue, when 2 task are waiting for results, only one call thread is notified Pierrick HYMBERT 2024-02-25 12:19:34 +01:00
  • 1289408817
    cmake : fix compilation for Android armeabi-v7a (#5702) b2257 Radosław Gryta 2024-02-25 11:53:11 +01:00
  • 0d6f8734b3
    Merge branch 'master' into gg/refactor-k-shift Georgi Gerganov 2024-02-25 12:28:08 +02:00
  • a69c446f4b server: logs PR feedback: change text log format to: LEVEL [function_name] message | additional=data Pierrick HYMBERT 2024-02-25 11:24:32 +01:00
  • ab336a9d5e
    code : normalize enum names (#5697) b2256 Georgi Gerganov 2024-02-25 12:09:09 +02:00
  • 69917dfa55
    py : fix StableLM conversion after config.json changes (#5703) Anas Ahouzi 2024-02-25 10:54:04 +01:00
  • c6c9f7cfd4
    server: logs avoid static in general Pierrick Hymbert 2024-02-25 10:49:03 +01:00
  • 371e955b99
    Add StableLMEpochForCausalLM for safety 2 Anas Ahouzi 2024-02-25 10:34:21 +01:00
  • 2e0ae2e053
    Add StableLMEpochForCausalLM for safety Anas Ahouzi 2024-02-25 10:33:18 +01:00
  • 6f0bfdbe55 Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch pudepiedj 2024-02-25 09:29:35 +00:00
  • c80d429c42 Various server updates pudepiedj 2024-02-25 09:29:31 +00:00
  • fdfa5bc76b
    llama : add llama_kv_cache_compress (EXPERIMENTAL) Georgi Gerganov 2024-02-25 11:00:19 +02:00
  • 715a343343
    llama : some llama_kv_cell simplifications Georgi Gerganov 2024-02-25 10:59:52 +02:00
  • 032ff85706
    passkey : fix llama_kv_cache_seq_pos_max() usage Georgi Gerganov 2024-02-25 10:58:18 +02:00
  • 107bca3e2d cuda: improve gpu architecture detection for cuda version < 11.7 KyL0N 2024-02-25 17:38:47 +09:00
  • b8322be564 server: monitoring - add /metrics prometheus compatible endpoint Pierrick HYMBERT 2024-02-25 08:47:49 +01:00
  • e442c50c54 flake.lock: Update github-actions[bot] 2024-02-25 00:17:11 +00:00
  • 3471052504 server: logs lower case as other log messages Pierrick HYMBERT 2024-02-24 23:10:09 +01:00
  • 5ffcbd7346 server: logs reduce level VERBOSE to VERB to max 4 chars Pierrick HYMBERT 2024-02-24 23:07:50 +01:00
  • a0c57b39a5 fix typo Jared Van Bortel 2024-02-24 16:31:36 -05:00
  • 19891864e2
    Support rotary_factor for LlavaStableLM Anas Ahouzi 2024-02-24 22:26:24 +01:00
  • 79959cae73
    Add missing parenthesis Anas Ahouzi 2024-02-24 22:19:33 +01:00
  • a6dbec8822
    Support layer_norm_eps for LlavaStableLM Anas Ahouzi 2024-02-24 22:06:08 +01:00
  • 48582575ab sync ngxson 2024-02-24 22:02:36 +01:00
  • d4a952e099 Fix hard coded layer_norm_eps Anas Ahouzi 2024-02-24 12:46:35 -08:00
  • b5e44777e7 server: logs ensure value json value does not raised error Pierrick HYMBERT 2024-02-24 21:17:17 +01:00
  • 440dd7aecd server: logs switch init logs to server logs macro Pierrick HYMBERT 2024-02-24 21:01:48 +01:00
  • fea35ac0a0 server: tests: output server logs in text Pierrick HYMBERT 2024-02-24 20:53:39 +01:00
  • b9bebddcef Merge branch 'master' into feature/server-logs-improvment Pierrick HYMBERT 2024-02-24 20:47:58 +01:00
  • 54e4271487 server: logs: allow to choose log format in json or plain text Pierrick HYMBERT 2024-02-24 20:20:45 +01:00
  • 8a529990fa Fix issues during StableLM models conversion Anas Ahouzi 2024-02-24 10:37:44 -08:00
  • 2e12ae7a3e
    Merge branch 'ggerganov:master' into master Radosław Gryta 2024-02-24 19:28:55 +01:00
  • 28d627b71b
    cmake: Add fix compilation for Android armeabi-v7a Radosław Gryta 2024-02-24 19:28:32 +01:00
  • 8b8b491179
    Merge aea81772d6 into 9e359a4f47 Xuan Son Nguyen 2024-02-24 13:19:39 -05:00
  • 9e359a4f47
    server: continue to update other slots on embedding concurrent request (#5699) b2254 Pierrick Hymbert 2024-02-24 19:16:04 +01:00
  • 04b3189377 server: logs: PR feedback on log level Pierrick HYMBERT 2024-02-24 19:07:45 +01:00
  • 04f4cbbd9e server: tests: adding OAI compatible embedding with multiple inputs Pierrick HYMBERT 2024-02-24 18:32:29 +01:00
  • 466987eb7b server: tests: adding OAI compatible embedding concurrent endpoint Pierrick HYMBERT 2024-02-24 18:06:32 +01:00
  • 4c4cb30736
    IQ3_S: a much better alternative to Q3_K (#5676) b2253 Kawrakow 2024-02-24 16:23:52 +02:00
  • fb36d874ce
    server : fix tests regex patterns on M2 Ultra Georgi Gerganov 2024-02-24 15:24:45 +02:00
  • af1ba7ea88
    server : fix compile warning Georgi Gerganov 2024-02-24 15:24:22 +02:00
  • cd5b922f26
    server : log style consistency Georgi Gerganov 2024-02-24 15:05:15 +02:00
  • 84df4b58d9
    server : no need to repeat log in comment Georgi Gerganov 2024-02-24 15:04:24 +02:00
  • f23aa90994
    server : change message format of server_log() Georgi Gerganov 2024-02-24 15:03:08 +02:00
  • 65e65fd6c0
    server : skip GH copilot requests from logging Georgi Gerganov 2024-02-24 15:02:25 +02:00
  • 04b5a92478 server: logs - always use JSON logger, add add thread_id in message, log task_id and slot_id Pierrick HYMBERT 2024-02-24 13:12:47 +01:00
  • 09b77b4c9e server: #5655 - continue to update other slots on embedding concurrent request. Pierrick HYMBERT 2024-02-24 13:01:48 +01:00
  • aea81772d6 first working version ngxson 2024-02-24 12:29:06 +01:00
  • 525213d2f5
    server: init functional tests (#5566) b2252 Pierrick Hymbert 2024-02-24 12:28:55 +01:00
  • b75ec64ed2
    llama : add llama_kv_cache_seq_pos_max() Georgi Gerganov 2024-02-24 12:54:29 +02:00
  • 18da970e1c
    llama : change name to llama_kv_cache_update() Georgi Gerganov 2024-02-24 12:46:33 +02:00
  • 79e276175e
    passkey : apply kv cache updates explicitly Georgi Gerganov 2024-02-24 12:44:02 +02:00
  • 8f9fe6dd7f
    llama : fix build Georgi Gerganov 2024-02-24 12:40:44 +02:00
  • 99163c83bd github issue template: add link to the tests server framework Pierrick HYMBERT 2024-02-24 11:26:39 +01:00
  • 5ed445283c server: tests: improved README.md Pierrick HYMBERT 2024-02-24 11:26:08 +01:00
  • a2a928c5a9 server: add link to tests in the README.md Pierrick HYMBERT 2024-02-24 11:25:50 +01:00