ochafik
b8c0025c6d
Update server.feature
2024-03-20 22:12:06 +00:00
ochafik
9260350384
json: fix zig build
2024-03-20 22:03:58 +00:00
ochafik
d0600d91e9
json: avoid using namespace std
2024-03-20 20:26:33 +00:00
ochafik
df00efbba1
json: fix naming of top-level c++ function (+ drop unused one)
2024-03-20 20:09:10 +00:00
ochafik
6dcf856259
Merge remote-tracking branch 'origin/master' into json-fixes
2024-03-20 20:05:11 +00:00
slaren
1c51f98adc
cuda : print the returned error when CUDA initialization fails ( #6185 )
2024-03-20 21:03:26 +01:00
Ziang Wu
f9c7ba3447
llava : update MobileVLM-README.md ( #6180 )
2024-03-20 17:29:51 +02:00
Ziang Wu
272935b281
llava : add MobileVLM_V2 backup ( #6175 )
...
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* clip : fix whitespace
* fix deifinition mistake in clip.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 17:02:32 +02:00
Olivier Chafik
10ee30f1b8
json: indent 4 spaces
2024-03-20 14:47:21 +00:00
Olivier Chafik
7628bd8c76
json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
2024-03-20 14:35:10 +00:00
slaren
ccf58aa3ec
cuda : refactor to remove global resources ( #6170 )
...
* cuda : refactor to remove global resources
2024-03-20 14:42:59 +01:00
Xuan Son Nguyen
91f8ad167d
Server: version bump for httplib and json ( #6169 )
...
* server: version bump for httplib and json
* fix build
* bring back content_length
2024-03-20 13:30:36 +01:00
Georgi Gerganov
6b7e76d28c
gitignore : ignore curl-related files
2024-03-20 14:17:34 +02:00
Georgi Gerganov
bc0baab2ea
server : allow to override -ngl in tests ( #6170 )
2024-03-20 14:14:32 +02:00
Georgi Gerganov
d795988d9e
Revert "llava : add a MobileVLM_V2-1.7B backup ( #6152 )"
...
This reverts commit f8c4e745e1
.
2024-03-20 13:29:49 +02:00
Ziang Wu
f8c4e745e1
llava : add a MobileVLM_V2-1.7B backup ( #6152 )
...
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* clip : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 13:20:37 +02:00
Karthick
47cc7a7bf9
Server: Handle n_keep parameter in the request ( #6174 )
2024-03-20 12:02:34 +01:00
Jared Van Bortel
bd60d82d0c
server tests : more pythonic process management; fix bare except:
( #6146 )
...
* server tests : remove seemingly redundant newlines in print()
* server tests : use built-in subprocess features, not os.kill and psutil
* server tests : do not catch e.g. SystemExit; use print_exc
* server tests: handle TimeoutExpired exception
* server tests: fix connect on dual-stack systems
* server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127 )
* server: tests: remove the hack on windows since now we get the good socket family
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127 )
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127 )
---------
Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>
2024-03-20 06:33:49 +01:00
Neo Zhang Jianyu
6c0b287748
update readme sycl for new update ( #6151 )
...
* update readme sycl for new update
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
* update by review comments
* update w64devkit link
* update for verify device id part
* Update README-sycl.md
Co-authored-by: Meng, Hengyu <airdldl@163.com>
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-03-20 11:21:41 +08:00
Abhilash Majumder
d26e8b669d
increase igpu cluster limit ( #6159 )
2024-03-20 08:28:49 +05:30
DAN™
d8b009a945
Remove undeed header file. ( #6158 )
2024-03-19 17:16:09 +01:00
Olivier Chafik
7fc759b84f
json: fix date pattern
2024-03-19 11:59:06 +00:00
Pierrick Hymbert
d0d5de42e5
gguf-split: split and merge gguf per batch of tensors ( #6135 )
...
* gguf-split: split and merge gguf files per tensor
* gguf-split: build with make toolchain
* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split
* split : minor style + fix compile warnings
* gguf-split: remove --upload not implemented
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-19 12:05:44 +01:00
ochafik
874599e749
json: create examples/json-schema-pydantic-example.py
2024-03-19 09:10:39 +00:00
Georgi Gerganov
b80cf3b2d1
common : disable repeat penalties by default ( #6127 )
2024-03-19 10:21:54 +02:00
slaren
970a48060a
ci : exempt some labels from being tagged as stale ( #6140 )
2024-03-19 10:06:54 +02:00
DAN™
4c28b82529
common : print usage on '-h' and '--help' ( #6145 )
2024-03-19 07:59:36 +02:00
ochafik
263a86e148
json: cleaner build of test
2024-03-19 02:12:15 +00:00
ochafik
02e3bde6b4
json: don't complain about unknown format type in server if unset
2024-03-19 01:45:23 +00:00
ochafik
e7de6433cb
json: catch schema conversion errors in server
2024-03-19 01:21:49 +00:00
ochafik
05fd7e3020
json: fix json handling in server when there's no response_format
2024-03-18 20:46:57 +00:00
github-actions[bot]
2d15886bb0
flake.lock: Update
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06)
→ 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)
2024-03-18 18:51:30 +00:00
Jared Van Bortel
d199ca79f2
mpt : implement backwards compatiblity with duped output tensor ( #6139 )
2024-03-18 12:49:02 -04:00
Felix
104f5e0fc1
clip : fix memory leak ( #6138 )
2024-03-18 17:40:22 +02:00
slaren
5e1b7f94a0
backend : set max split inputs to GGML_MAX_SRC ( #6137 )
2024-03-18 16:33:44 +01:00
Georgi Gerganov
ac9ee6a4ad
ci : disable stale issue messages ( #6126 )
2024-03-18 13:45:38 +02:00
Georgi Gerganov
4f6d1337ca
ci : temporary disable sanitizer builds ( #6128 )
2024-03-18 13:45:27 +02:00
slaren
2bf8d0f7c4
backend : offload large batches to GPU ( #6083 )
...
* backend : offload large batches to GPU
* fix hip
* code cleanup
* fix CUDA split buffers
* Update ggml-backend-impl.h
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* cuda : fix memset without set_device
* imatrix : remove sched affix from weight names
* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup
* update backends
ggml-ci
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-03-18 11:03:04 +01:00
DAN™
496bc79bc2
common : tidy-up argument parsing ( #6105 )
...
* Tidy-up argument parsing.
* Missing ref.
* common : minor
* common : add static classifier
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-18 10:27:44 +02:00
Thérence
9b03719ad7
convert : add support for CamembertModel architecture ( #6119 )
...
Adding support for CamembertModel architecture used by :
https://huggingface.co/dangvantuan/sentence-camembert-large
2024-03-18 10:17:00 +02:00
Romain D
3a6efdd03c
convert : use f32 outtype for bf16 tensors ( #6106 )
...
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion.
Change the outtype to f32 to default to a lossless conversion.
2024-03-18 10:04:41 +02:00
ochafik
bd96df4e85
json: ws nit
2024-03-18 04:42:25 +00:00
ochafik
24f0b941cf
json: fix string patterns (was missing quotes)
2024-03-18 04:06:23 +00:00
ochafik
dd922a4da3
json: test/fix additional props corner cases
2024-03-18 01:32:15 +00:00
ochafik
bbd70800c8
json: improve grammar parsing failures
2024-03-18 00:34:02 +00:00
ochafik
618247885c
json: test/fix top-level anyOf
2024-03-18 00:13:58 +00:00
ochafik
20869ede26
Merge remote-tracking branch 'origin/master' into json-fixes
2024-03-17 22:53:04 +00:00
ochafik
edbd2e9862
json: add server tests for OAI JSON response_format
2024-03-17 22:51:29 +00:00
ochafik
3e1bf44e5e
json: check parsing in test + fix value & string refs
2024-03-17 22:47:20 +00:00
ochafik
84e383c1d7
json: test (& simplify output of) empty schema
2024-03-17 21:51:10 +00:00