Commit graph

  • 15d5c257a0 fix cb_eval ngxson 2024-06-02 10:58:11 +02:00
  • aa44fb6d76
    Update rpc-server.cpp nickp27 2024-06-02 18:04:52 +10:00
  • fe3f6958bd Fix crash when using split mode none and setting a main GPU 0cc4m 2024-06-02 09:50:02 +02:00
  • 6e0e0beb56 Update Vulkan CPU offload for MUL_MAT_ID 0cc4m 2024-06-02 09:05:47 +02:00
  • a95a6d995d receive review comments and modify caitianchi 2024-06-02 14:23:45 +08:00
  • eb589d5e36 llama : avoid copies for simple batch splits Francis Couture-Harpin 2024-06-01 23:05:13 -04:00
  • ce8524afd0
    Merge branch 'ignore-gen-themes' into auto-model-support teleprint-me 2024-06-01 22:25:28 -04:00
  • e2b760800d
    chore: Add ignore rule for generated server themes teleprint-me 2024-06-01 22:23:20 -04:00
  • a23c72e4c0 fix ggml errors and make new ones Christian Zhou-Zheng 2024-06-01 22:19:33 -04:00
  • 250bddfa63
    Merge branch 'master' into auto-model-support teleprint-me 2024-06-01 21:59:35 -04:00
  • 4e49532cc1
    refine .gitignore zhou.weiguo 2024-06-02 09:35:14 +08:00
  • ea7a26e3d5
    refine .gitignore zhou.weiguo 2024-06-02 09:22:41 +08:00
  • 3af9371811 convert-hf : match model part name prefix and suffix Francis Couture-Harpin 2024-06-01 20:22:40 -04:00
  • b67ea65983 tentatively translate the rest Christian Zhou-Zheng 2024-06-01 20:47:28 -04:00
  • 8564c1989a Update phi-3 GGUF file (obsolete since 917dc8c) jaime-m-p 2024-06-02 02:13:04 +02:00
  • 0e1f9734de translated everything but PCA (I think) Christian Zhou-Zheng 2024-06-01 19:50:46 -04:00
  • df623fffe8 interim fix memory leak Christian Zhou-Zheng 2024-06-01 18:36:54 -04:00
  • 3090c485b6 remove unnecessary multithreading Christian Zhou-Zheng 2024-06-01 18:32:14 -04:00
  • e141ce624a
    Fix FlashAttention debug test, FP32 assert (#7684) b3066 Johannes Gäßler 2024-06-01 23:26:10 +02:00
  • 544268888b in-series multithreading for prompt embedding? Christian Zhou-Zheng 2024-06-01 17:25:21 -04:00
  • 61200ef29f llama : fix edge case finding batch seq_id of split recurrent cell Francis Couture-Harpin 2024-06-01 16:41:22 -04:00
  • 451023633f Fix FlashAttention debug test, FP32 assert Johannes Gäßler 2024-06-01 22:19:28 +02:00
  • 2e666832e6
    server : new UI (#7633) b3065 Yazan Agha-Schrader 2024-06-01 21:31:48 +02:00
  • 01c9229186 Refactor + add 'jina-v2' for testing 'lstrip' jaime-m-p 2024-06-01 21:22:57 +02:00
  • 18d1c14047 llama : minimize swaps when reordering logits Francis Couture-Harpin 2024-06-01 15:01:34 -04:00
  • ada961cec2 Implement 'rstrip' properly jaime-m-p 2024-06-01 20:30:42 +02:00
  • 33de247948 bugfix: assertions, wrong special token list jaime-m-p 2024-06-01 20:27:32 +02:00
  • 3ead1b9757 Using phi-3 for testing 'rstrip' jaime-m-p 2024-06-01 19:45:14 +02:00
  • cec6a3bde9 Add per token attrib enum jaime-m-p 2024-06-01 19:42:21 +02:00
  • 2c3d0b42f3 Fix MUL_MAT_ID matrix vector shader and dispatch code 0cc4m 2024-06-01 19:35:57 +02:00
  • 72eea49224 llama : fix batch split output count for embeddings Francis Couture-Harpin 2024-06-01 12:24:19 -04:00
  • 2ac95c9d56
    SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548) b3064 HanishKVC 2024-06-01 21:50:18 +05:30
  • 5d3c7b9585 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-01 11:51:41 -04:00
  • 3587a94987 llama : use equal-sequence-length sub-batches for recurrent models Francis Couture-Harpin 2024-06-01 11:37:14 -04:00
  • 750f60c03e
    CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) b3063 Johannes Gäßler 2024-06-01 15:47:04 +02:00
  • 9002161509 CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 Johannes Gäßler 2024-06-01 10:33:21 +02:00
  • 4d78cff3ed
    Merge branch 'ggerganov:master' into patches Robert Sinclair 2024-06-01 16:15:50 +03:00
  • 1ac70bae7e
    Update server.cpp Robert Sinclair 2024-06-01 16:13:48 +03:00
  • c4141a593c SimpleChat: Update main README wrt usage with server HanishKVC 2024-06-01 00:38:06 +05:30
  • bb0f0c8a9a SimpleChat: AutoCreate ChatRequestOptions settings to an extent HanishKVC 2024-05-31 20:04:50 +05:30
  • bc68803605 SimpleChat:Show chat session restore button, only if saved session HanishKVC 2024-05-31 15:19:11 +05:30
  • 6ef57cc1bf SimpleChat:Readme updated wrt save and restore chat session info HanishKVC 2024-05-31 01:05:20 +05:30
  • 4abcfde4e7 SimpleChat:ODS:Move restore/load saved chat btn setup to Me HanishKVC 2024-05-31 00:47:20 +05:30
  • 5d408660f2 SimpleChat:ODS:WIP:TMP: Add UI to load previously saved chat HanishKVC 2024-05-31 00:34:05 +05:30
  • a15d4dc6a2 SimpleChat:ODS: Add a prefix to chatid wrt ondiskstorage key HanishKVC 2024-05-31 00:20:38 +05:30
  • e2efcb4fc2 SimpleChat: Add basic skeleton for saving and loading chat HanishKVC 2024-05-31 00:05:43 +05:30
  • 1d7739b7f5 SimpleChat: Allow for multiline system prompt HanishKVC 2024-05-30 10:47:18 +05:30
  • 3d925cbdc9 SimpleChat: Show Non SettingsUI config field by default HanishKVC 2024-05-30 10:00:28 +05:30
  • 803ee72b00 SimpleChat:UI: CreateDiv Divs map to GridX2 class HanishKVC 2024-05-30 09:41:43 +05:30
  • ec79b8d350 SimpleChat:Cleanup: Add spacing wrt shown req-options HanishKVC 2024-05-30 09:17:35 +05:30
  • 872ee2c73d SimpleChat: Save message internally in handle_response itself HanishKVC 2024-05-30 04:46:23 +05:30
  • cdb4f6d243 SimpleChat:theResp-origMsg: Undo a prev change to fix non trim HanishKVC 2024-05-30 04:32:35 +05:30
  • b75b3db7bf SimpleChat:WIP:Collate internally, Stream mode Trap exceptions HanishKVC 2024-05-30 03:51:28 +05:30
  • 009563d1d7 Readme: Add a entry for simplechat in the http server section HanishKVC 2024-05-30 00:54:38 +05:30
  • 48f02e0b5c SimpleChat: readme stream-utf-8 trim-english deps, exception2error HanishKVC 2024-05-30 00:39:58 +05:30
  • 0e7880a694 SimpleChat: model request field for openai/equivalent compat HanishKVC 2024-05-29 23:23:50 +05:30
  • 85fd2d0d84 SimpleChat: readme wrt authorization, maybe minimal openai testing HanishKVC 2024-05-29 23:16:25 +05:30
  • 7a0399e582 SimpleChat:UI+: Return div and element wrt creatediv helpers HanishKVC 2024-05-29 20:44:31 +05:30
  • af342b3bd0 SimpleChat: Allow Authorization header to be set by end user HanishKVC 2024-05-29 20:39:52 +05:30
  • c9559d2b26 SimpleChat: Rather need to use append to insert headers HanishKVC 2024-05-29 20:30:34 +05:30
  • dce4e6a64b SimpleChat: Move request headers into Me and gMe HanishKVC 2024-05-29 20:15:40 +05:30
  • f54e000039 SimpleChat: Add support for changing the base url HanishKVC 2024-05-29 19:49:56 +05:30
  • ebf978d2bf SimpleChat:UI: Add input element helper HanishKVC 2024-05-29 19:49:13 +05:30
  • 104848b097 SimpleChat: Move baseUrl to Me and inturn gMe HanishKVC 2024-05-29 19:04:14 +05:30
  • ace37042fa SimpleChat:MultiPart/Stream flow cleanup HanishKVC 2024-05-29 12:46:40 +05:30
  • fcd385c36a SimpleChat: Disable console debug by default by making it dummy HanishKVC 2024-05-28 23:16:18 +05:30
  • 07923745cf SimpleChat:HandleResponseMultiPart using NewLines helper HanishKVC 2024-05-28 21:54:45 +05:30
  • 7251714bcb SimpleChat:DU: Make NewLines shift more robust and flexible HanishKVC 2024-05-28 21:48:06 +05:30
  • b7a5424c13 SimpleChat:DU: Add NewLines helper class HanishKVC 2024-05-28 20:32:26 +05:30
  • 4d354556dc SimpleChat: show streamed generative text as it becomes available HanishKVC 2024-05-28 10:15:15 +05:30
  • 08b117b4a7 SimpleChat: Add MultiPart Response handling, common trimming HanishKVC 2024-05-28 09:20:35 +05:30
  • aecf0e23fd SimpleChat: Move multi part server response handling in HanishKVC 2024-05-28 09:01:20 +05:30
  • 8f97c23895 SimpleChat: Move handling oneshot mode server response HanishKVC 2024-05-28 08:46:32 +05:30
  • 9d0e65d16a SimpleChat:Stream:Initial handshake skeleton HanishKVC 2024-05-28 06:06:16 +05:30
  • 060925cda3 SimpleChat: Cleanup readme a bit, add one more chathistory length HanishKVC 2024-05-28 02:15:36 +05:30
  • f5f9a2b35e SimpleChat:DU: Bring in both trim garbage logics to try trim HanishKVC 2024-05-28 00:32:37 +05:30
  • 269cf3f596 SimpleChat:Move extracting assistant response to SimpleChat class HanishKVC 2024-05-27 20:15:41 +05:30
  • b2c10b960d SimpleChat: Cleanup a bit wrt Api end point related flow HanishKVC 2024-05-27 18:45:42 +05:30
  • f9fc543190 SimpleChat: highlight trim, garbage trimming bitmore aggressive HanishKVC 2024-05-27 18:07:36 +05:30
  • 42b4fe555e SimpleChat: GarbageTrim enable/disable, show trimmed part ifany HanishKVC 2024-05-27 04:18:51 +05:30
  • 1db965d00d SimpleChat: Update a bit wrt readme and notes in du HanishKVC 2024-05-27 03:00:39 +05:30
  • 452813f235 SimpleChat:UI:Settings make boolean button text show meaning HanishKVC 2024-05-27 02:05:49 +05:30
  • 0dae12ba6b SimpleChat:UI:Add settings button and bring in settings ui HanishKVC 2024-05-27 01:56:10 +05:30
  • e17f5e0204 SimpleChat:UI: Add Div wrapped label+element helpers HanishKVC 2024-05-27 01:40:52 +05:30
  • 94bc0b08d8 SimpleChat:UI:Select: dict-name-value, value wrt default, change HanishKVC 2024-05-27 00:55:02 +05:30
  • 1e47a48b30 SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt HanishKVC 2024-05-27 00:46:42 +05:30
  • e42249d82d SimpleChat:UI: Helper to create bool button and use it wrt settings HanishKVC 2024-05-27 00:11:16 +05:30
  • ae7e66d27a SimpleChat:UI: Add and use a para-create-append helper HanishKVC 2024-05-26 22:27:35 +05:30
  • ed345abac8 SimpleChat:DU:Avoid setting frequence/Presence penalty HanishKVC 2024-05-26 22:09:32 +05:30
  • a41f701159 SimpleChat:UI: Move html ui base helpers into its own module HanishKVC 2024-05-26 18:19:24 +05:30
  • 15152af94f SimpleChat:DU: Cleanup debug log messages HanishKVC 2024-05-26 02:42:24 +05:30
  • ae9f610663 SimpleChat:DU: Bring in maxType to the mix along with maxUniq HanishKVC 2024-05-26 02:14:26 +05:30
  • d1e73d8777 SimpleChat:DU: Switch trim garbage hist based to maxUniq simple HanishKVC 2024-05-26 01:57:28 +05:30
  • f33aa28149 SimpleChat:DU: Try trim using histogram based info HanishKVC 2024-05-26 01:07:07 +05:30
  • 6390f3489a SimpleChat:DU:TrimGarbage if unable try skip char and retry HanishKVC 2024-05-25 23:37:24 +05:30
  • 54802dc184 SimpleChat:DU: Add trim garbage at end in loop helper HanishKVC 2024-05-25 21:45:54 +05:30
  • c83c19ad4c SimpleChat:DU:BringIn local helper js modules using importmap HanishKVC 2024-05-25 20:28:38 +05:30
  • 715fb750df
    Update rpc-server.cpp to include SYCL backend nickp27 2024-06-01 20:57:48 +10:00
  • 9b596417af
    CUDA: quantized KV support for FA vec (#7527) Johannes Gäßler 2024-06-01 08:44:14 +02:00
  • 86842b20e5 fix compiler warnings Christian Zhou-Zheng 2024-05-31 22:25:46 -04:00