Commit graph

3913 commits

Author SHA1 Message Date
Joe Eli McIlvain
c5ae329409
Fix JSON Schema to Grammar for string regexp with top-level alternation.
Prior to this commit, using a JSON Schema containing a string
with `pattern` regular expression that uses top-level alternation
(e.g. `"pattern": "^A|B|C|D$"`) would result in invalid JSON
output from the constrained sampling grammar, because it
ended up creating a grammar rule like this for the string:

```
thing ::= "\"" "A" | "B" | "C" | "D" "\"" space
```

Note that this rule will only match a starting quote for the "A" case,
and will only match an ending quote for the "D" case,
so this rule will always produce invalid JSON when used for sampling
(that is, the JSON will always be lacking the starting quote,
the ending quote, or both).

This was fixed in a simple way by adding parentheses to the
generated rule (for all string pattern rules, to keep it simple),
such that the new generated rule looks like this (correct):

```
thing ::= "\"" ("A" | "B" | "C" | "D") "\"" space
```
2024-10-15 18:58:29 -07:00
Georgi Gerganov
edc265661c
server : add option to time limit the generation phase (#9865)
ggml-ci
2024-10-12 16:14:27 +03:00
Georgi Gerganov
1bde94dd02
server : remove self-extend features (#9860)
* server : remove self-extend

ggml-ci

* server : fix context limit check to use slot.n_past

ggml-ci
2024-10-12 16:06:31 +03:00
Georgi Gerganov
95c76e8e92
server : remove legacy system_prompt feature (#9857)
* server : remove legacy system_prompt feature

ggml-ci

* readme : update [no ci]

* server : fix non-transformer logic + remove response from /props
2024-10-12 14:51:54 +03:00
Georgi Gerganov
11ac9800af
llama : improve infill support and special token detection (#9798)
* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (#9800)

* gguf : deprecate old FIM token KVs
2024-10-12 08:21:51 +03:00
R0CKSTAR
943d20b411
musa : update doc (#9856)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-12 08:09:53 +03:00
Diego Devesa
96776405a1
ggml : move more prints to the ggml log system (#9839)
* ggml : move more prints to the ggml log system

* show BLAS OpenMP warnings in all builds using debug print
2024-10-11 15:34:45 +02:00
Diego Devesa
7eee341bee
common : use common_ prefix for common library functions (#9805)
* common : use common_ prefix for common library functions

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-10-10 22:57:42 +02:00
Diego Devesa
0e9f760eb1
rpc : add backend registry / device interfaces (#9812)
* rpc : add backend registry / device interfaces

* llama : add llama_supports_rpc API

* ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server
2024-10-10 20:14:55 +02:00
R0CKSTAR
cf8e0a3bb9
musa: add docker image support (#9685)
* mtgpu: add docker image support

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: enable docker workflow

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-10 20:10:37 +02:00
Diego Devesa
c7499c557c
examples : do not use common library in simple example (#9803)
* examples : do not use common library in simple example

* add command line parser, simplify code
2024-10-10 19:50:49 +02:00
Diego Devesa
c81f3bbb05
cmake : do not build common library by default when standalone (#9804) 2024-10-09 18:49:52 +02:00
Georgi Gerganov
e7022064ab
perplexity : fix integer overflow (#9783)
* perplexity : fix integer overflow

ggml-ci

* perplexity : keep n_vocab as int and make appropriate casts

ggml-ci
2024-10-09 17:00:18 +03:00
Georgi Gerganov
3dc48fe75a
examples : remove llama.vim
An updated version will be added in #9787
2024-10-09 10:55:42 +03:00
Diego Devesa
dca1d4b58a
ggml : fix BLAS with unsupported types (#9775)
* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it
2024-10-08 14:21:43 +02:00
Xuan Son Nguyen
458367a906
server : better security control for public deployments (#9776)
* server : more explicit endpoint access settings

* protect /props endpoint

* fix tests

* update server docs

* fix typo

* fix tests
2024-10-08 13:27:04 +02:00
standby24x7
fa42aa6d89
scripts : fix spelling typo in messages and comments (#9782)
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
2024-10-08 09:19:53 +03:00
Diego Devesa
6374743747
ggml : add backend registry / device interfaces to BLAS backend (#9752)
* ggml : add backend registry / device interfaces to BLAS backend

* fix mmap usage when using host buffers
2024-10-07 21:55:08 +02:00
Andrew Minh Nguyen
f1af42fa8c
Update building for Android (#9672)
* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android
2024-10-07 09:37:31 -07:00
Georgi Gerganov
6279dac039
flake.lock: Update (#9753)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/bcef6817a8b2aa20a5a6dbb19b43e63c5bf8619a?narHash=sha256-HO4zgY0ekfwO5bX0QH/3kJ/h4KvUDFZg8YpkNwIbg1U%3D' (2024-09-12)
  → 'github:hercules-ci/flake-parts/3d04084d54bedc3d6b8b736c70ef449225c361b1?narHash=sha256-K5ZLCyfO/Zj9mPFldf3iwS6oZStJcU4tSpiXTMYaaL0%3D' (2024-10-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
  → 'https://github.com/NixOS/nixpkgs/archive/fb192fec7cc7a4c26d51779e9bab07ce6fa5597a.tar.gz?narHash=sha256-0xHYkMkeLVQAMa7gvkddbPqpxph%2BhDzdu1XdGPJR%2BOs%3D' (2024-10-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26)
  → 'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-10-07 09:35:42 -07:00
Georgi Gerganov
d5ac8cf2f2
ggml : add metal backend registry / device (#9713)
* ggml : add metal backend registry / device

ggml-ci

* metal : fix names [no ci]

* metal : global registry and device instances

ggml-ci

* cont : alternative initialization of global objects

ggml-ci

* llama : adapt to backend changes

ggml-ci

* fixes

* metal : fix indent

* metal : fix build when MTLGPUFamilyApple3 is not available

ggml-ci

* fix merge

* metal : avoid unnecessary singleton accesses

ggml-ci

* metal : minor fix [no ci]

* metal : g_state -> g_ggml_ctx_dev_main [no ci]

* metal : avoid reference of device context in the backend context

ggml-ci

* metal : minor [no ci]

* metal : fix maxTransferRate check

* metal : remove transfer rate stuff

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-10-07 18:27:51 +03:00
Paul Tsochantaris
96b6912103
metal : single allocation of encode_async block (#9747)
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-10-07 15:26:31 +03:00
Georgi Gerganov
d5cb86844f
contrib : simplify + minor edits [no ci] 2024-10-06 14:15:27 +03:00
Georgi Gerganov
f4b2dcdf49
readme : fix typo [no ci] 2024-10-06 13:49:41 +03:00
Georgi Gerganov
b6d6c5289f
sync : llama.cpp 2024-10-06 12:53:28 +03:00
SRHMorris
b0915d5b51
vulkan : retry allocation with fallback flags (whisper/2451)
Co-authored-by: Samuel Morris <samuel.morris@artlist.io>
2024-10-06 12:52:11 +03:00
Georgi Gerganov
8c475b97b8
rerank : use [SEP] token instead of [BOS] (#9737)
* rerank : use [SEP] token instead of [BOS]

ggml-ci

* common : sanity check for non-NULL tokens

ggml-ci

* ci : adjust rank score interval

ggml-ci

* ci : add shebang to run.sh

ggml-ci
2024-10-05 15:55:04 +03:00
Georgi Gerganov
58b16695e1
sync : ggml 2024-10-05 15:53:49 +03:00
Georgi Gerganov
905f5485b2
metal : zero-init buffer contexts (whisper/0) 2024-10-05 15:53:00 +03:00
Viet-Anh NGUYEN (Andrew)
71967c2a6d
Add Llama Assistant (#9744) 2024-10-04 20:29:35 +02:00
Georgi Gerganov
17880771ad
sync : ggml 2024-10-04 18:50:25 +03:00
Daniel Bevenius
55951c018d
ggml : fix typo in example usage ggml_gallocr_new (ggml/984) 2024-10-04 18:50:05 +03:00
Diego Devesa
ff565769f2
ggml : fixes after sync (ggml/983)
ggml : remove test-backend-buffer

ggml : fix CUDA build warnings
2024-10-04 18:50:04 +03:00
Xuan Son Nguyen
f3fdcfaa79
ci : fine-grant permission (#9710) 2024-10-04 11:47:19 +02:00
Daniel Kleine
133c7b46b3
Fixed RNG seed docs (#9723)
* Update README.md

fixed RNG seed info

* changed print format to unsigned
2024-10-04 10:54:44 +02:00
Georgi Gerganov
d5ed2b929d
metal : remove abort (skip) (ggml/0) 2024-10-03 21:18:19 +03:00
Georgi Gerganov
1bb8a64ebf
sync : ggml 2024-10-03 21:17:49 +03:00
Johannes Gäßler
fabdc3bda3
ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) 2024-10-03 21:17:26 +03:00
Johannes Gäßler
eee39bdc96
ggml: refactor cross entropy loss CPU impl. (ggml/976) 2024-10-03 21:17:26 +03:00
Jack Mousseau
5d5ab1e5cc
metal : fix compute pass descriptor autorelease crash (#9718) 2024-10-03 21:01:46 +03:00
Diego Devesa
a7ad553513
ggml-backend : add device description to CPU backend (#9720) 2024-10-03 17:39:18 +02:00
bandoti
d6fe7abf04
ggml: unify backend logging mechanism (#9709)
* Add scaffolding for ggml logging macros

* Metal backend now uses GGML logging

* Cuda backend now uses GGML logging

* Cann backend now uses GGML logging

* Add enum tag to parameters

* Use C memory allocation funcs

* Fix compile error

* Use GGML_LOG instead of GGML_PRINT

* Rename llama_state to llama_logger_state

* Prevent null format string

* Fix whitespace

* Remove log callbacks from ggml backends

* Remove cuda log statement
2024-10-03 17:39:03 +02:00
compilade
e3c355ba65
convert : handle tokenizer merges format from transformers 4.45 (#9696) 2024-10-03 17:22:15 +03:00
Radoslav Gerganov
841713e1e4
rpc : enable vulkan (#9714)
closes #8536
2024-10-03 13:00:52 +03:00
Ouadie EL FAROUKI
5639971466
Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) 2024-10-03 07:50:44 +01:00
Diego Devesa
c83ad6d01e
ggml-backend : add device and backend reg interfaces (#9707)
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
Xuan Son Nguyen
a39ab216aa
llama : reduce compile time and binary size (#9712)
* llama : speed up compile time

* fix build

* fix build (2)
2024-10-02 15:49:55 +02:00
Alberto Cabrera Pérez
f536f4c439
[SYCL] Initial cmake support of SYCL for AMD GPUs (#9658)
sycl: initial cmake support of SYCL for AMD GPUs
2024-10-02 13:57:18 +01:00
Radoslav Gerganov
00b7317e63
vulkan : do not use tensor->extra (#9407)
* vulkan : do not use tensor->extra

This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.

Ref: #8536

* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2)

---------

Co-authored-by: 0cc4m <picard12@live.de>
2024-10-02 13:49:16 +03:00
Zhenwei Jin
76b37d1541
gguf-split : improve --split and --merge logic (#9619)
* make sure params --split and --merge are not specified at same time

* update gguf-split params parse logic

* Update examples/gguf-split/gguf-split.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-10-02 10:21:57 +03:00