Commit graph

2504 commits

Author SHA1 Message Date
Georgi Gerganov
83796e62bc
llama : refactor unicode stuff (#5992)
* llama : refactor unicode stuff

ggml-ci

* unicode : names

* make : fix c++ compiler

* unicode : names

* unicode : straighten tables

* zig : fix build

* unicode : put nfd normalization behind API

ggml-ci

* swift : fix build

* unicode : add BOM

* unicode : add <cstdint>

ggml-ci

* unicode : pass as cpts as const ref
2024-03-11 17:47:47 +02:00
Jakub N
828defefb6
Update server docker image URLs (#5997) 2024-03-11 14:40:42 +01:00
Olivier Chafik
b816734c17 json: preserve order of props from TS defs 2024-03-11 11:48:08 +00:00
Xuan Son Nguyen
caa106d4e0
Server: format error to json (#5961)
* server: format error to json

* server: do not crash on grammar error

* fix api key test case

* revert limit max n_predict

* small fix

* correct coding style

* update completion.js

* launch_slot_with_task

* update docs

* update_slots

* update webui

* update readme
2024-03-11 10:56:41 +01:00
Michael Podvitskiy
3202361c5b
ggml, ci : Windows ARM runner and build fixes (#5979)
* windows arm ci

* fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64

* fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned`

* fix `error C2065: '__fp16': undeclared identifier`
2024-03-11 11:28:51 +02:00
Minsoo Cheong
332bdfd798
server : maintain chat completion id for streaming responses (#5988)
* server: maintain chat completion id for streaming responses

* Update examples/server/utils.hpp

* Update examples/server/utils.hpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-11 10:09:32 +02:00
Gilad S
ecab1c75de
cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) 2024-03-11 10:00:08 +02:00
Georgi Gerganov
ee35600b90
llama : fix F16/F32 downcast + improve names (#5980) 2024-03-11 09:56:47 +02:00
Kawrakow
be858f6205
Better 1.5 bit quantization (#5971)
* Trying blocvks of 16 for IQ1_S - seems slightly better

* iq1s_blocks16: Adjust scale fudge factor to 1.125

* iq1s_blocks16: going to blocks of 32

with 2048 lattice points, so same bpw.
This is even better than blocks of 16.
Should I try blocks of 64? But to keep the same
bpw, when I go to 4096 lattice points, I need to
remove blocks alltogether and just have superblocks of
256 weights.

* iq1s_blocks16: Use 2*<x^2> as sigma2 in weight adjustment

* iq1s_blocks16: scalar and AVX2 dot products

* iq1s_blocks16: CUDA dot product

* iq1s_blocks16: Metal works, Neon does not

Metal works but TG is dog slow (35 t/s). PP is OKish (493 t/s).
Not seeing the bug in the Neon implementation for now.

* iq1s_blocks16: fixed Neon

* iq1s_blocks16: very slightly faster TG on Metal

Still pathetic at 37 t/s

* iq1s_blocks16: speedup Metal by packing codebook into uint32_t's

* Formatting

* iq1s_blocks16: uint32_t codebook is also better in CUDA

TG-128 is now 204 t/s up from 194 t/s.
PP-512 is 5890 t/s, so significantly better than other quants

* iq1s_blocks16: slightly faster Neon dot product

* iq1s_blocks16: faster AVX2 dot product

* iq1s_blocks16: adjust to ggml-common.h

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-03-11 07:51:49 +01:00
Abhilash Majumder
ef3ced26a3
[SYCL] Add q3_s and q1_s (#5886)
* Add q3_s and q1_s

* fix compilation

* fix build

* fix build

* fix build

* enable ops

* rm macro

* increase grid space
2024-03-11 10:27:56 +05:30
ochafik
e1ed7a04d6 json: add date, time, date-time formats 2024-03-11 04:03:05 +00:00
ochafik
9a61802a28 json: add date format + fix uuid 2024-03-11 02:58:14 +00:00
ochafik
d736e928d2 json: support prefixItems alongside array items 2024-03-11 02:32:58 +00:00
ochafik
56b8744158 Update ts-type-to-grammar.sh 2024-03-11 02:11:22 +00:00
ochafik
c8254e5f8a json: port fixes from mjs to python 2024-03-11 02:10:48 +00:00
ochafik
4e2d06c741 json: updated server & chat ( cd examples/server && ./deps.sh ) 2024-03-11 01:51:26 +00:00
ochafik
5389820453 Update json-schema-to-grammar.mjs 2024-03-11 01:47:22 +00:00
AidanBeltonS
3814a07392
[SYCL] Add support for SYCL Nvidia target (#5738)
* Add support for nvidia target in CMake

* Update sycl read-me for Nvidia target

* Fix errors
2024-03-11 09:13:57 +08:00
ochafik
11813a6b0a json: rm trailing spaces 2024-03-11 00:27:50 +00:00
ochafik
0e9494183b json: custom regex parser, adds dot support & JS-portable 2024-03-11 00:24:34 +00:00
Georgi Gerganov
bb6d00bbf9
metal : move mm_id indices to shared mem (#5982) 2024-03-10 23:12:48 +02:00
Dean
7ab7b733bb
android : fix utf8 decoding error (#5935)
* examples: fix utf8 decoding error

some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token
one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137

* android : minor

---------

Co-authored-by: zhangfuwen <zhangfuwen@foxmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-10 22:03:17 +02:00
Georgi Gerganov
d9f65c97c3
readme : update hot topics 2024-03-10 20:58:26 +02:00
Georgi Gerganov
b838b53ad6
sync : ggml 2024-03-10 20:10:46 +02:00
Georgi Gerganov
df4dc3e7cb
ggml : try fix 32-bit arm compat (whisper/1938)
* ggml : try fix 32-bit arm compat

* ggml : fix cont
2024-03-10 20:10:39 +02:00
Georgi Gerganov
bf47a5eefc
ggml : remove __constant__ specifier for CUDA tables (#5940) 2024-03-10 20:09:24 +02:00
ochafik
27b1fefdf4 Delete commit.txt 2024-03-10 17:44:46 +00:00
ochafik
478f62ef5c json: support negative ranges in patterns 2024-03-10 17:35:32 +00:00
ochafik
d1fda6f450 json: simplify range escapes 2024-03-10 17:32:45 +00:00
ochafik
f57b467c74 json: add --allow-fetch 2024-03-10 17:20:05 +00:00
ochafik
54291e10d0 json: fix literal escapes 2024-03-10 17:19:27 +00:00
Pierrick Hymbert
fa8a809a91
server: ci: windows build and tests (#5968)
* server: ci: windows build and tests

* server: ci: remove tmp push branch

* server: ci: EOF EOL

* Use builti

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* server: tests: server graceful shutdown, then kill, then hard kill

* server: tests: remove python2 unicode string

* server: tests: remove wrong comment on server starting,  close_fds is always true

* server: tests: server kill, if pid exists

* server: tests: remove dependency to killall

* server: tests: ci windows: pid exists better handling

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-03-10 18:17:47 +01:00
ochafik
e8f25d6f0c json: handle uuid string format 2024-03-10 16:50:06 +00:00
ochafik
37b59d1d3b json: reuse regexp pattern subrules 2024-03-10 16:49:53 +00:00
ochafik
e8b78c28eb json: revert space to 1 at most 2024-03-10 16:49:15 +00:00
ochafik
ade339d55e json: accept duplicate identical rules 2024-03-10 16:48:56 +00:00
ochafik
dab2ea91a6 json: simplify nullable fields handling 2024-03-10 16:48:27 +00:00
DAN™
bcebd7dbf6
llama : add support for GritLM (#5959)
* add gritlm example

* gritlm results match

* tabs to spaces

* comment out debug printing

* rebase to new embed

* gritlm embeddings are back babeee

* add to gitignore

* allow to toggle embedding mode

* Clean-up GritLM sample code.

* Fix types.

* Flush stdout and output ending newline if streaming.

* mostly style fixes; correct KQ_mask comment

* add causal_attn flag to llama_cparams

* gritml : minor

* llama : minor

---------

Co-authored-by: Douglas Hanley <thesecretaryofwar@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-10 17:56:30 +02:00
ochafik
8597caa685 Update ts-type-to-grammar.sh 2024-03-10 15:47:03 +00:00
ochafik
364bf9ec3d Update ts-type-to-grammar.sh 2024-03-10 15:44:51 +00:00
ochafik
5764d9ffbc Update json-schema-to-grammar.py 2024-03-10 15:33:59 +00:00
Clint Herron
2960eae847
grammar : verify parsed state (#5950) 2024-03-10 17:17:43 +02:00
ochafik
ee492c9e4d Merge remote-tracking branch 'origin/master' into json-fixes 2024-03-10 15:01:23 +00:00
ochafik
307110ad2c Update json-schema-to-grammar.py 2024-03-10 15:00:07 +00:00
ochafik
f37ad0a043 json: handle schema from pydantic Optional fields 2024-03-10 14:55:03 +00:00
Georgi Gerganov
c78541479c
nix: update flake.lock (#5969)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8' (2024-02-29)
  → 'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-03-10 16:43:08 +02:00
ochafik
ba57964f92 Update json-schema-to-grammar.py 2024-03-10 14:42:39 +00:00
ochafik
b061de52a7 Update json-schema-to-grammar.py 2024-03-10 13:49:27 +00:00
ochafik
259f3505bc Update json-schema-to-grammar.py 2024-03-10 13:38:40 +00:00
ochafik
1cde8ded7c json: extract repeated regexp patterns to subrule 2024-03-10 13:29:56 +00:00