Commit graph

  • 2c4f566c88
    tests : gitignore ggml-common.h Georgi Gerganov 2024-03-09 14:17:11 +02:00
  • e41add2238
    server : clarify some items in the readme Georgi Gerganov 2024-03-09 13:51:23 +02:00
  • 0db32beaf0
    server : fix passing prompt as tokens (#5955) b2372 Alexey Parfenov 2024-03-09 11:16:53 +00:00
  • f2118a0e61
    Update examples/server/server.cpp Georgi Gerganov 2024-03-09 13:16:28 +02:00
  • 08d2ea1edb output normalize embedding in '/v1/embeddings' Seungwon 2024-03-09 20:14:30 +09:00
  • 46132b81f3
    server: fix passing prompt as tokens ZXED 2024-03-09 13:58:20 +03:00
  • 8a3012a4ad
    ggml : add ggml-common.h to deduplicate shared code (#5940) b2371 Georgi Gerganov 2024-03-09 12:47:57 +02:00
  • 9674aaf35c
    server : simplify logic for empty prompts (#5953) b2370 Georgi Gerganov 2024-03-09 12:34:18 +02:00
  • 950ba1ab84
    Server: reorganize some http logic (#5939) b2369 Xuan Son Nguyen 2024-03-09 11:27:53 +01:00
  • 51f80eb7ee Merge branch 'master' into xsn/cleanup_oai ngxson 2024-03-09 11:20:44 +01:00
  • a5b6086065
    Merge remote-tracking branch 'upstream/master' into errors ZXED 2024-03-09 13:05:57 +03:00
  • c4d1b5aaf1 server: bench: fix assistant message sent instead of user message Pierrick HYMBERT 2024-03-09 11:04:27 +01:00
  • 7809dbca32
    server: error handling: always use new format for API errors ZXED 2024-03-09 13:00:46 +03:00
  • ba7114c0e8 server: bench: fix assistant message sent instead of user message Pierrick HYMBERT 2024-03-09 10:57:33 +01:00
  • 29c635b411 server: bench: allow to filter out conversation in the dataset based on env variable Pierrick HYMBERT 2024-03-09 10:57:14 +01:00
  • e1fa9569ba
    server : add SSL support (#5926) b2368 Gabe Goodhart 2024-03-09 02:57:09 -07:00
  • 28dae045b5
    server : simplify logic for empty prompts Georgi Gerganov 2024-03-09 11:44:31 +02:00
  • fd72d2d2a5
    server: tests: add truncated prompt tests, better kv cache size (#5933) b2367 Pierrick Hymbert 2024-03-09 10:30:04 +01:00
  • 46148592bc
    server, tests : update regex Georgi Gerganov 2024-03-09 11:28:39 +02:00
  • bd5cca11b0
    server: error handling: convert all unhandled exceptions to API errors ZXED 2024-03-09 12:07:44 +03:00
  • a4f25f8793
    server: error handling: fix double free of ctx_sampling ZXED 2024-03-09 12:01:23 +03:00
  • a4b0d107d3 server: bench: filter dataset too short and too long sequences Pierrick HYMBERT 2024-03-09 09:56:31 +01:00
  • 06e225f843 server: bench: doc add an option to debug http request Pierrick HYMBERT 2024-03-09 09:55:11 +01:00
  • 572758a665 server: bench: change gauge custom metrics to trend server: bench: add trend custom metrics for total tokens per second average Pierrick HYMBERT 2024-03-09 09:15:15 +01:00
  • bed1cdda9a server: bench: change gauge custom metrics to trend Pierrick HYMBERT 2024-03-09 08:58:22 +01:00
  • a179e55256
    Merge branch 'master' into master Robert Washbourne 2024-03-09 02:53:27 -05:00
  • 9914a71e34
    ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) Georgi Gerganov 2024-03-09 09:38:15 +02:00
  • 492ad4b0e0 Fix Vulkan no kv offload incoherence 0cc4m 2024-03-09 07:41:36 +01:00
  • ac07f7d0f7 set cparams.n_parallel to the number of sequences slaren 2024-03-09 03:42:37 +01:00
  • f425240e1d server: bench: fix doc Pierrick HYMBERT 2024-03-09 01:23:52 +01:00
  • 7d999555b7 perplexity : support using multiple sequences to allow larger batch sizes slaren 2024-03-08 22:49:24 +01:00
  • ab0a59d6d3 server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading Pierrick HYMBERT 2024-03-09 01:09:56 +01:00
  • 548bc9635a server: bench: PR feedback and improved k6 script configuration Pierrick HYMBERT 2024-03-09 00:13:54 +01:00
  • c2101a2e90
    llama : support Mamba Selective State Space Models (#5328) b2366 compilade 2024-03-08 17:31:00 -05:00
  • 09d3b658d0 fix code style ngxson 2024-03-08 23:30:09 +01:00
  • 1a7c5fd50a fix test case CORS Options ngxson 2024-03-08 22:37:49 +01:00
  • 0b822b6a0f server: bench: EOL EOF Pierrick HYMBERT 2024-03-08 19:49:49 +01:00
  • 444f32e370
    server: error handling: rename error fields to match OpenAI API ZXED 2024-03-08 20:32:52 +03:00
  • 8a8aaee714
    server: error handling: fixes after merge ZXED 2024-03-08 20:26:27 +03:00
  • 39579d3ceb mamba : move state_seq and state_mask views outside layer loop Francis Couture-Harpin 2024-03-08 12:24:11 -05:00
  • fb838636d1
    Merge remote-tracking branch 'upstream/master' into errors ZXED 2024-03-08 20:12:35 +03:00
  • 616948d5b8 Add LLAMA_SERVER_SSL variable setup to top-level Makefile Gabe Goodhart 2024-03-08 09:53:44 -07:00
  • a7bfcb2e61 Update readme for SSL support in server Gabe Goodhart 2024-03-08 09:53:29 -07:00
  • 1d822b0890
    Merge 14d757066b into 515f7d0d4f Georgi Gerganov 2024-03-09 00:28:16 +08:00
  • 7bb531421f
    increase grid space Abhilash Majumder 2024-03-08 21:45:57 +05:30
  • 3e5685f7ea readme : add Mamba to supported models, and add recent API changes Francis Couture-Harpin 2024-03-08 11:03:37 -05:00
  • 515f7d0d4f
    llama : fix quantization of shared token_embd (#5944) b2365 compilade 2024-03-08 10:53:37 -05:00
  • 1ea68ab495
    sycl : try to fix build Georgi Gerganov 2024-03-08 17:43:42 +02:00
  • d0d32dced9 convert-hf : omit output.weight when identical with token_embd.weight Francis Couture-Harpin 2024-03-08 10:06:33 -05:00
  • 8d3db362cb llama : fix quantization of shared token_embd Francis Couture-Harpin 2024-03-08 10:02:38 -05:00
  • 7445bb8ad9 fix embedding response ngxson 2024-03-08 15:17:11 +01:00
  • bd52541110 correct http verb for endpoints ngxson 2024-03-08 15:11:34 +01:00
  • a1351efc0b Merge branch 'master' into xsn/cleanup_oai ngxson 2024-03-08 14:59:56 +01:00
  • 1866e18513 merge embedding handlers ngxson 2024-03-08 14:55:54 +01:00
  • a167b6df7d
    ggml : minor Georgi Gerganov 2024-03-08 15:43:21 +02:00
  • 97ff2abc0e Resolve merge conflicts in server pudepiedj 2024-03-08 13:38:16 +00:00
  • 2498f6ad50 fix error C2065: '__fp16': undeclared identifier Michael Podvitskiy 2024-03-08 11:55:37 +01:00
  • 8dd390519b fix warning C4146: unary minus operator applied to unsigned type, result still unsigned Michael Podvitskiy 2024-03-08 11:49:23 +01:00
  • bbde9e2269 fix error C2078: too many initializers with ggml_vld1q_u32 macro for MSVC ARM64 Michael Podvitskiy 2024-03-08 11:46:31 +01:00
  • 11dcc4144a windows arm ci Michael Podvitskiy 2024-03-08 01:00:12 +01:00
  • 07f120eb4e use set_pre_routing_handler for validate_api_key ngxson 2024-03-08 14:04:11 +01:00
  • a6d2611624 utils.hpp refactored pudepiedj 2024-03-08 12:53:28 +00:00
  • 68d1d8fe28 server: bench: Init a bench scenario with K6 See #5827 Pierrick HYMBERT 2024-03-08 13:16:16 +01:00
  • ddc6397a20
    ggml : minor Georgi Gerganov 2024-03-08 14:47:20 +02:00
  • f44aa1f23a Merge remote-tracking branch 'origin/master' into server_branch pudepiedj 2024-03-08 12:31:31 +00:00
  • b39b443ba6
    sycl : reuse quantum tables Georgi Gerganov 2024-03-08 14:19:46 +02:00
  • fc427b724c
    scripts : update sync scripts Georgi Gerganov 2024-03-08 13:40:21 +02:00
  • e2a4760bd8
    ggml : add ggml-common.h to shared code Georgi Gerganov 2024-03-08 13:33:52 +02:00
  • 76e868821a
    server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) b2364 Pierrick Hymbert 2024-03-08 12:25:04 +01:00
  • d8a8bd4cc6 refactor static file handler ngxson 2024-03-08 12:06:13 +01:00
  • 55f13005f2 Change host IP pudepiedj 2024-03-08 11:01:28 +00:00
  • e457fb3540
    llama : assume tied weights if lm_head/output weights is missing (#5824) b2363 Don Mahurin 2024-03-08 02:41:50 -08:00
  • af37fd8b30
    server : fix EOS token detection with disabled cache (#5938) b2362 Georgi Gerganov 2024-03-08 12:40:02 +02:00
  • 8f7c98ba3d Fix errors Aidan 2024-03-04 12:00:01 +00:00
  • 9745ac3b42 Update sycl read-me for Nvidia target Aidan 2024-02-26 16:46:34 +00:00
  • acaf1ac2d5 Add support for nvidia target in CMake Aidan 2024-02-26 16:16:17 +00:00
  • b2fe31a572
    server : fix EOS token detection with disabled cache Georgi Gerganov 2024-03-08 12:19:14 +02:00
  • 3700bc3326 Yet more Llamaserver.py indentation fixes pudepiedj 2024-03-08 10:06:53 +00:00
  • e06a3d5e1b More Llamaserver.py indent fixes pudepiedj 2024-03-08 10:03:08 +00:00
  • c31e3b89c7 Fix Llamaserver.py indentation pudepiedj 2024-03-08 09:58:55 +00:00
  • ab385fd812 server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. Closes #5850 Pierrick HYMBERT 2024-03-07 16:11:17 +01:00
  • 3d938e8803 New Carmichaels pudepiedj 2024-03-08 09:55:28 +00:00
  • 8d7ea8ec68 examples: fix utf8 decoding error zhangfuwen 2024-03-08 17:25:18 +08:00
  • 581ed5c4fe
    log : fix MSVC compile errors (#5643) b2361 UEXTM.com 2024-03-08 04:35:04 -05:00
  • 13ee965026 Carmichael revision pudepiedj 2024-03-08 09:31:09 +00:00
  • 4b174af020 Sorting lint pudepiedj 2024-03-08 09:30:48 +00:00
  • 3690db5f43 server: tests: add truncated prompt tests, better size Pierrick HYMBERT 2024-03-08 10:19:31 +01:00
  • 1c8ea55843 mamba : add missing spaces Francis Couture-Harpin 2024-03-07 22:29:45 -05:00
  • 17e4d6c96a mamba : rename metadata to be more similar to transformers library Francis Couture-Harpin 2024-03-07 21:32:48 -05:00
  • 660e8321f5 Update json-schema-to-grammar.mjs ochafik 2024-03-08 01:28:47 +00:00
  • d8024a486b convert-hf : support new metadata keys for Mamba Francis Couture-Harpin 2024-03-07 20:28:42 -05:00
  • 2c65d77583 restore locale after opening windows file Bruce MacDonald 2024-03-07 17:27:02 -05:00
  • accdc9bb23 remove trailing whitespace Bruce MacDonald 2024-03-07 16:46:10 -05:00
  • d90b523a0b Add Unicode model filename support for Windows Bruce MacDonald 2024-03-07 16:38:14 -05:00
  • 7cd5a1f986 server : fix cache_tokens not getting correctly resized Francis Couture-Harpin 2024-03-07 13:52:58 -05:00
  • 916b586386 Merge branch 'master' into support-mamba-ssm Francis Couture-Harpin 2024-03-07 10:56:26 -05:00
  • 0e94d72f12 add flags for ssl key/cert files and use SSLServer if set Gabe Goodhart 2024-03-06 15:45:41 -07:00
  • 1d56060f8c add cmake build toggle to enable ssl support in server Gabe Goodhart 2024-03-06 15:45:09 -07:00
  • bd3d9fbfed allow to toggle embedding mode Douglas Hanley 2024-03-07 11:55:27 -06:00
  • 0ba20ed97a
    llama : compute BERT graph with F16 K, V gg/bert-f16 Georgi Gerganov 2024-03-05 21:22:20 +02:00