Commit graph

2439 commits

Author SHA1 Message Date
ochafik
d934adccea Update json-schema-to-grammar.cpp 2024-03-11 23:09:51 +00:00
ochafik
cb364ef542 Merge branch 'json-fixes' into json-fixes-cpp 2024-03-11 22:23:19 +00:00
Olivier Chafik
51ca7cb863 json: nits 2024-03-11 22:20:19 +00:00
Olivier Chafik
d0dd75c902 json: port schema converter to C++, wire in ./server 2024-03-11 22:19:26 +00:00
Olivier Chafik
b816734c17 json: preserve order of props from TS defs 2024-03-11 11:48:08 +00:00
ochafik
e1ed7a04d6 json: add date, time, date-time formats 2024-03-11 04:03:05 +00:00
ochafik
9a61802a28 json: add date format + fix uuid 2024-03-11 02:58:14 +00:00
ochafik
d736e928d2 json: support prefixItems alongside array items 2024-03-11 02:32:58 +00:00
ochafik
56b8744158 Update ts-type-to-grammar.sh 2024-03-11 02:11:22 +00:00
ochafik
c8254e5f8a json: port fixes from mjs to python 2024-03-11 02:10:48 +00:00
ochafik
4e2d06c741 json: updated server & chat ( cd examples/server && ./deps.sh ) 2024-03-11 01:51:26 +00:00
ochafik
5389820453 Update json-schema-to-grammar.mjs 2024-03-11 01:47:22 +00:00
ochafik
11813a6b0a json: rm trailing spaces 2024-03-11 00:27:50 +00:00
ochafik
0e9494183b json: custom regex parser, adds dot support & JS-portable 2024-03-11 00:24:34 +00:00
ochafik
27b1fefdf4 Delete commit.txt 2024-03-10 17:44:46 +00:00
ochafik
478f62ef5c json: support negative ranges in patterns 2024-03-10 17:35:32 +00:00
ochafik
d1fda6f450 json: simplify range escapes 2024-03-10 17:32:45 +00:00
ochafik
f57b467c74 json: add --allow-fetch 2024-03-10 17:20:05 +00:00
ochafik
54291e10d0 json: fix literal escapes 2024-03-10 17:19:27 +00:00
ochafik
e8f25d6f0c json: handle uuid string format 2024-03-10 16:50:06 +00:00
ochafik
37b59d1d3b json: reuse regexp pattern subrules 2024-03-10 16:49:53 +00:00
ochafik
e8b78c28eb json: revert space to 1 at most 2024-03-10 16:49:15 +00:00
ochafik
ade339d55e json: accept duplicate identical rules 2024-03-10 16:48:56 +00:00
ochafik
dab2ea91a6 json: simplify nullable fields handling 2024-03-10 16:48:27 +00:00
ochafik
8597caa685 Update ts-type-to-grammar.sh 2024-03-10 15:47:03 +00:00
ochafik
364bf9ec3d Update ts-type-to-grammar.sh 2024-03-10 15:44:51 +00:00
ochafik
5764d9ffbc Update json-schema-to-grammar.py 2024-03-10 15:33:59 +00:00
ochafik
ee492c9e4d Merge remote-tracking branch 'origin/master' into json-fixes 2024-03-10 15:01:23 +00:00
ochafik
307110ad2c Update json-schema-to-grammar.py 2024-03-10 15:00:07 +00:00
ochafik
f37ad0a043 json: handle schema from pydantic Optional fields 2024-03-10 14:55:03 +00:00
Georgi Gerganov
c78541479c
nix: update flake.lock (#5969)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8' (2024-02-29)
  → 'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-03-10 16:43:08 +02:00
ochafik
ba57964f92 Update json-schema-to-grammar.py 2024-03-10 14:42:39 +00:00
ochafik
b061de52a7 Update json-schema-to-grammar.py 2024-03-10 13:49:27 +00:00
ochafik
259f3505bc Update json-schema-to-grammar.py 2024-03-10 13:38:40 +00:00
ochafik
1cde8ded7c json: extract repeated regexp patterns to subrule 2024-03-10 13:29:56 +00:00
ochafik
add8fee04a Create regex-to-grammar.py 2024-03-10 13:23:00 +00:00
Pierrick Hymbert
621e86b331
server: benchmark: chat/completions scenario and other llm servers comparison (#5941)
* server: bench: Init a bench scenario with K6
See #5827

* server: bench: EOL EOF

* server: bench: PR feedback and improved k6 script configuration

* server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading

server: bench: add max_tokens from SERVER_BENCH_MAX_TOKENS

server: bench: increase truncated rate to 80% before failing

* server: bench: fix doc

* server: bench: change gauge custom metrics to trend

* server: bench: change gauge custom metrics to trend
server: bench: add trend custom metrics for total tokens per second average

* server: bench: doc add an option to debug http request

* server: bench: filter dataset too short and too long sequences

* server: bench: allow to filter out conversation in the dataset based on env variable

* server: bench: fix assistant message sent instead of user message

* server: bench: fix assistant message sent instead of user message

* server : add defrag thold parameter

* server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-09 23:41:49 +01:00
Georgi Gerganov
77d1ac7e00
server : print chat template info 2024-03-09 22:04:00 +02:00
slaren
d894f352bf
perplexity : support using multiple sequences to allow larger batch sizes (#5946)
* perplexity : support using multiple sequences to allow larger batch sizes

ggml-ci

* set cparams.n_parallel to the number of sequences

* print tested n_ctx, add assert
2024-03-09 19:55:54 +01:00
Georgi Gerganov
098dbaab44
readme : update hot topics 2024-03-09 18:14:13 +02:00
Georgi Gerganov
8380ecfb21
ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) 2024-03-09 17:36:20 +02:00
Georgi Gerganov
58308a0ecc
server : fix metrics init (#5964) 2024-03-09 17:34:15 +02:00
Georgi Gerganov
5b09797321
ggml : remove old quantization functions (#5942)
* ggml : remove old quantization functions

ggml-ci

* ggml : simplify ggml_quantize_chunk

ggml-ci

* ggml : restrict correctness

ggml-ci

* ggml : remove hist data from the quantization API

ggml-ci

* tests : remove hist usage in test-backend-ops

ggml-ci

* vulkan : remove hist and fix typo
2024-03-09 15:53:59 +02:00
Georgi Gerganov
97c09585d6
server : clarify some items in the readme (#5957)
* server : clarify some items in the readme

* server : fix typo
2024-03-09 15:47:47 +02:00
SeungWon Jeong
fb215c3832
server : normalize embeddings (#5956)
* output normalize embedding in '/v1/embeddings'

* common : reuse llama_embd_normalize

* common : better normalize impl

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-09 14:27:58 +02:00
Georgi Gerganov
2c4f566c88
tests : gitignore ggml-common.h 2024-03-09 14:17:11 +02:00
Alexey Parfenov
0db32beaf0
server : fix passing prompt as tokens (#5955)
* server: fix passing prompt as tokens

* Update examples/server/server.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-09 13:16:53 +02:00
Georgi Gerganov
8a3012a4ad
ggml : add ggml-common.h to deduplicate shared code (#5940)
* ggml : add ggml-common.h to shared code

ggml-ci

* scripts : update sync scripts

* sycl : reuse quantum tables

ggml-ci

* ggml : minor

* ggml : minor

* sycl : try to fix build
2024-03-09 12:47:57 +02:00
Georgi Gerganov
9674aaf35c
server : simplify logic for empty prompts (#5953) 2024-03-09 12:34:18 +02:00
Xuan Son Nguyen
950ba1ab84
Server: reorganize some http logic (#5939)
* refactor static file handler

* use set_pre_routing_handler for validate_api_key

* merge embedding handlers

* correct http verb for endpoints

* fix embedding response

* fix test case CORS Options

* fix code style
2024-03-09 11:27:53 +01:00