Commit graph

  • ef3ced26a3
    [SYCL] Add q3_s and q1_s (#5886) b2394 Abhilash Majumder 2024-03-11 10:27:56 +05:30
  • e1ed7a04d6 json: add date, time, date-time formats ochafik 2024-03-11 04:03:05 +00:00
  • 989e15b3c1
    Merge branch 'master' into sycl_q3s_q1s sycl_q3s_q1s Abhilash Majumder 2024-03-11 08:41:35 +05:30
  • 9a61802a28 json: add date format + fix uuid ochafik 2024-03-11 02:58:14 +00:00
  • 5125896beb Revert "move cfloat to try and fix ci build" Bruce MacDonald 2024-03-10 22:54:02 -04:00
  • 7e72b9915d server: maintain chat completion id for streaming responses Minsoo Cheong 2024-03-11 11:51:15 +09:00
  • 54a704a041 move cfloat to try and fix ci build Bruce MacDonald 2024-03-10 22:38:06 -04:00
  • d736e928d2 json: support prefixItems alongside array items ochafik 2024-03-11 02:32:58 +00:00
  • a8a922ca18 move repeated llama_file logic to llama.cpp Bruce MacDonald 2024-03-10 22:17:00 -04:00
  • 56b8744158 Update ts-type-to-grammar.sh ochafik 2024-03-11 02:11:22 +00:00
  • c8254e5f8a json: port fixes from mjs to python ochafik 2024-03-11 02:10:48 +00:00
  • 4e2d06c741 json: updated server & chat ( cd examples/server && ./deps.sh ) ochafik 2024-03-11 01:51:26 +00:00
  • 5389820453 Update json-schema-to-grammar.mjs ochafik 2024-03-11 01:47:22 +00:00
  • 3814a07392
    [SYCL] Add support for SYCL Nvidia target (#5738) b2393 AidanBeltonS 2024-03-11 01:13:57 +00:00
  • 11813a6b0a json: rm trailing spaces ochafik 2024-03-11 00:27:50 +00:00
  • 0e9494183b json: custom regex parser, adds dot support & JS-portable ochafik 2024-03-11 00:24:34 +00:00
  • f8a0b9a8fe
    fix: make compiling with LLAMA_METAL_EMBED_LIBRARY work when embedding llama.cpp in another project Gilad S 2024-03-11 00:37:39 +02:00
  • bb6d00bbf9
    metal : move mm_id indices to shared mem (#5982) b2392 Georgi Gerganov 2024-03-10 23:12:48 +02:00
  • 7ab7b733bb
    android : fix utf8 decoding error (#5935) b2391 Dean 2024-03-11 04:03:17 +08:00
  • 13d21fa4bf
    android : minor Georgi Gerganov 2024-03-10 22:02:44 +02:00
  • 192dd23835
    metal : move mm_id indeces to shared mem Georgi Gerganov 2024-03-10 21:43:15 +02:00
  • d9f65c97c3
    readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
  • 13a39058d3 quantize: fix F16/F32 downcast to q6_K Johannes Gäßler 2024-03-10 19:50:11 +01:00
  • b838b53ad6
    sync : ggml b2389 Georgi Gerganov 2024-03-10 20:10:46 +02:00
  • df4dc3e7cb
    ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
  • bf47a5eefc
    ggml : remove __constant__ specifier for CUDA tables (#5940) b2387 Georgi Gerganov 2024-03-10 20:09:24 +02:00
  • 27b1fefdf4 Delete commit.txt ochafik 2024-03-10 17:44:46 +00:00
  • 478f62ef5c json: support negative ranges in patterns ochafik 2024-03-10 17:35:32 +00:00
  • d1fda6f450 json: simplify range escapes ochafik 2024-03-10 17:32:45 +00:00
  • 80f66a8af7 llama_vocab_type update (renamed the new key) Michael Podvitskiy 2024-03-10 18:32:32 +01:00
  • 0c69016171 converter scrypt fixes Michael Podvitskiy 2024-03-10 18:28:10 +01:00
  • f57b467c74 json: add --allow-fetch ochafik 2024-03-10 17:20:05 +00:00
  • 54291e10d0 json: fix literal escapes ochafik 2024-03-10 17:19:27 +00:00
  • fa8a809a91
    server: ci: windows build and tests (#5968) b2386 Pierrick Hymbert 2024-03-10 18:17:47 +01:00
  • e8f25d6f0c json: handle uuid string format ochafik 2024-03-10 16:50:06 +00:00
  • 37b59d1d3b json: reuse regexp pattern subrules ochafik 2024-03-10 16:49:53 +00:00
  • e8b78c28eb json: revert space to 1 at most ochafik 2024-03-10 16:49:15 +00:00
  • ade339d55e json: accept duplicate identical rules ochafik 2024-03-10 16:48:56 +00:00
  • dab2ea91a6 json: simplify nullable fields handling ochafik 2024-03-10 16:48:27 +00:00
  • bcebd7dbf6
    llama : add support for GritLM (#5959) b2385 DAN™ 2024-03-10 11:56:30 -04:00
  • ecad2afbdd
    llama : minor Georgi Gerganov 2024-03-10 17:55:32 +02:00
  • 8ee58929fc
    gritml : minor Georgi Gerganov 2024-03-10 17:51:57 +02:00
  • 8597caa685 Update ts-type-to-grammar.sh ochafik 2024-03-10 15:47:03 +00:00
  • ce05fff8ec
    Merge branch 'master' into HEAD Georgi Gerganov 2024-03-10 17:46:16 +02:00
  • 364bf9ec3d Update ts-type-to-grammar.sh ochafik 2024-03-10 15:44:51 +00:00
  • 5764d9ffbc Update json-schema-to-grammar.py ochafik 2024-03-10 15:33:59 +00:00
  • 2960eae847
    grammar : verify parsed state (#5950) b2384 Clint Herron 2024-03-10 11:17:43 -04:00
  • ee492c9e4d Merge remote-tracking branch 'origin/master' into json-fixes ochafik 2024-03-10 15:01:23 +00:00
  • 307110ad2c Update json-schema-to-grammar.py ochafik 2024-03-10 15:00:07 +00:00
  • f37ad0a043 json: handle schema from pydantic Optional fields ochafik 2024-03-10 14:55:03 +00:00
  • ee6854a71e
    Merge 10c477b8a8 into c78541479c Xuan Son Nguyen 2024-03-10 16:49:07 +02:00
  • c78541479c
    nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
  • ba57964f92 Update json-schema-to-grammar.py ochafik 2024-03-10 14:42:39 +00:00
  • b061de52a7 Update json-schema-to-grammar.py ochafik 2024-03-10 13:49:27 +00:00
  • 259f3505bc Update json-schema-to-grammar.py ochafik 2024-03-10 13:38:40 +00:00
  • 1cde8ded7c json: extract repeated regexp patterns to subrule ochafik 2024-03-10 13:29:56 +00:00
  • add8fee04a Create regex-to-grammar.py ochafik 2024-03-10 13:23:00 +00:00
  • e9d251ef5f update readme ngxson 2024-03-10 13:01:44 +01:00
  • 0e728796bf update webui ngxson 2024-03-10 12:57:32 +01:00
  • f22a18c90a server: tests: ci windows: pid exists better handling Pierrick HYMBERT 2024-03-10 09:36:37 +01:00
  • eea4f20f06 server: tests: remove dependency to killall Pierrick HYMBERT 2024-03-10 09:24:52 +01:00
  • 479ba3c064 server: tests: server kill, if pid exists Pierrick HYMBERT 2024-03-10 09:23:45 +01:00
  • c12ea239a9 server: tests: remove wrong comment on server starting, close_fds is always true Pierrick HYMBERT 2024-03-10 09:05:02 +01:00
  • accbdac394 server: tests: remove python2 unicode string Pierrick HYMBERT 2024-03-10 09:02:14 +01:00
  • 2071f85340 server: tests: server graceful shutdown, then kill, then hard kill Pierrick HYMBERT 2024-03-10 09:00:49 +01:00
  • 1b0d1577f5
    Use builti Pierrick Hymbert 2024-03-10 09:03:07 +01:00
  • be2faadfda
    Merge branch 'ggerganov:master' into master hsnmkls 2024-03-10 15:40:02 +08:00
  • 7206589e02 add paralled example exe Hasan Mukhlis 2024-03-10 15:37:47 +08:00
  • 057cc747d8 server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • cb88292884 server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • fa4ffdc0c7 perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
  • 1792a60645 readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 1d0795eb6f ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) Georgi Gerganov 2024-03-09 17:36:20 +02:00
  • 0339d9a675 server : fix metrics init (#5964) Georgi Gerganov 2024-03-09 17:34:15 +02:00
  • e1b5c210d1 ggml : remove old quantization functions (#5942) Georgi Gerganov 2024-03-09 15:53:59 +02:00
  • 303515a13d server : clarify some items in the readme (#5957) Georgi Gerganov 2024-03-09 15:47:47 +02:00
  • 373f9169ca server : normalize embeddings (#5956) SeungWon Jeong 2024-03-09 21:27:58 +09:00
  • f0fa2370cb tests : gitignore ggml-common.h Georgi Gerganov 2024-03-09 14:17:11 +02:00
  • c1eddf621e server : fix passing prompt as tokens (#5955) Alexey Parfenov 2024-03-09 11:16:53 +00:00
  • 005364ef8f ggml : add ggml-common.h to deduplicate shared code (#5940) Georgi Gerganov 2024-03-09 12:47:57 +02:00
  • 9db4c93b4f server : simplify logic for empty prompts (#5953) Georgi Gerganov 2024-03-09 12:34:18 +02:00
  • 65c2636440 Server: reorganize some http logic (#5939) Xuan Son Nguyen 2024-03-09 11:27:53 +01:00
  • c6cf089896 server : add SSL support (#5926) Gabe Goodhart 2024-03-09 02:57:09 -07:00
  • 76f630308f server: tests: add truncated prompt tests, better kv cache size (#5933) Pierrick Hymbert 2024-03-09 10:30:04 +01:00
  • e0719f6180 llama : support Mamba Selective State Space Models (#5328) compilade 2024-03-08 17:31:00 -05:00
  • 14807954b4 llama : fix quantization of shared token_embd (#5944) compilade 2024-03-08 10:53:37 -05:00
  • a318c84460 server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Pierrick Hymbert 2024-03-08 12:25:04 +01:00
  • e30f26b988 llama : assume tied weights if lm_head/output weights is missing (#5824) Don Mahurin 2024-03-08 02:41:50 -08:00
  • 2aba0cb697 server : fix EOS token detection with disabled cache (#5938) Georgi Gerganov 2024-03-08 12:40:02 +02:00
  • d3c6811adc log : fix MSVC compile errors (#5643) UEXTM.com 2024-03-08 04:35:04 -05:00
  • 5d5962c932 llama-bench : add embeddings option (#5924) Georgi Gerganov 2024-03-07 16:32:38 +02:00
  • d4ef64e74d Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
  • fefe748283 server : add /v1/completions endpoint (#5914) Minsoo Cheong 2024-03-07 19:42:39 +09:00
  • a057bff5c8 server : refactor (#5882) Georgi Gerganov 2024-03-07 11:41:53 +02:00
  • 71917d8358 [SYCL] fix error when set main gpu to non-zero (#5901) Neo Zhang Jianyu 2024-03-07 16:34:31 +08:00
  • 8994d689e2 ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) Jared Van Bortel 2024-03-06 15:42:23 -05:00
  • acb4657a3a ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) bobqianic 2024-03-06 07:35:07 +00:00
  • 1f745cd87c convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
  • 57b7f51a8e add wait() to make code stable (#5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
  • 906a743113 compare-llama-bench.py : remove mul_mat_q (#5892) slaren 2024-03-05 22:27:29 +01:00