Pierrick Hymbert
1a179bfc4e
fix loop over pointer
...
Co-authored-by: slaren <slarengh@gmail.com>
2024-03-22 00:38:23 +01:00
Pierrick Hymbert
0fd652eba7
spacing
...
Co-authored-by: slaren <slarengh@gmail.com>
2024-03-22 00:37:01 +01:00
Pierrick HYMBERT
f9a29735fc
llama_model_loader: fail if any of backend buffer cannot be allocated
2024-03-22 00:25:11 +01:00
Pierrick HYMBERT
6df9757ad6
llama_model_loader: minor, use same variable name for consistency, fix spacing in types cast
2024-03-22 00:02:55 +01:00
Pierrick HYMBERT
69bdee939a
llama_model_loader: only map tensors included in the context
2024-03-21 23:58:12 +01:00
Pierrick HYMBERT
078a1aca06
llama_model_loader: map file to backend buffer if the allocation succeeds only
2024-03-21 23:57:43 +01:00
slaren
02020b0463
fix mmap buffer management
2024-03-21 22:06:37 +01:00
Pierrick HYMBERT
d8b567d254
llama_model_loader: fail if backend cannot allocate buffer
2024-03-21 21:05:15 +01:00
Pierrick Hymbert
1c931f3d4f
Handle optional tensors
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-21 20:50:28 +01:00
Pierrick Hymbert
c34a5deee8
Simplify this by making these optional, switch some layer creation tensor optional
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-21 20:50:11 +01:00
Pierrick HYMBERT
00381b07bb
avoid copying the entire vector
2024-03-21 19:18:39 +01:00
Pierrick HYMBERT
1892ae7eb1
llama_model_loader: PR feedbacks:
...
- use only one gguf_context for metadata only
- store all ggml_context in a vector as the files and mappings
- store all weights in a vector along with the source tensor
- rename ctx_gguf to meta
- rename ctx_meta to contexts
2024-03-21 19:11:37 +01:00
Pierrick HYMBERT
60a87ae051
Merge branch 'master' into hp/split/load-model
2024-03-21 11:48:58 +01:00
Vaibhav Srivastav
1943c01981
ci : fix indentation error ( #6195 )
2024-03-21 11:30:40 +02:00
Vaibhav Srivastav
5e43ba8742
build : add mac pre-build binaries ( #6182 )
...
* Initial commit - add mac prebuilds.
* forward contribution credits for building the workflow.
* minor : remove trailing whitespaces
---------
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-21 11:13:12 +02:00
Kawrakow
76aa30a263
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache ( #6183 )
...
* k_cache: be able to use Q5_0
* k_cache: be able to use Q5_1 on CODA
* k_cache: be able to use Q5_0 on Metal
* k_cache: be able to use Q5_1 on Metal
* k_cache: be able to use IQ4_NL - just CUDA for now
* k_cache: be able to use IQ4_NL on Metal
* k_cache: add newly added supported types to llama-bench and CUDA supports_op
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-03-21 08:27:57 +01:00
AidanBeltonS
c5b8595e3f
Add nvidia and amd backends ( #6157 )
2024-03-21 11:40:52 +05:30
Pierrick HYMBERT
18ff6ca847
split: move llama_tensor_offset to llama_model_loader
2024-03-21 07:06:14 +01:00
Pierrick Hymbert
b8feff411f
Avoir copying the entire vector
...
Co-authored-by: slaren <slarengh@gmail.com>
2024-03-21 04:36:06 +01:00
slaren
42e21c6882
cuda : fix conflict with std::swap ( #6186 )
2024-03-21 01:47:46 +01:00
Pierrick HYMBERT
7c64fef91b
split: support in llama_model_loader
2024-03-20 22:30:20 +01:00
slaren
1c51f98adc
cuda : print the returned error when CUDA initialization fails ( #6185 )
2024-03-20 21:03:26 +01:00
Ziang Wu
f9c7ba3447
llava : update MobileVLM-README.md ( #6180 )
2024-03-20 17:29:51 +02:00
Ziang Wu
272935b281
llava : add MobileVLM_V2 backup ( #6175 )
...
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* clip : fix whitespace
* fix deifinition mistake in clip.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 17:02:32 +02:00
slaren
ccf58aa3ec
cuda : refactor to remove global resources ( #6170 )
...
* cuda : refactor to remove global resources
2024-03-20 14:42:59 +01:00
Xuan Son Nguyen
91f8ad167d
Server: version bump for httplib and json ( #6169 )
...
* server: version bump for httplib and json
* fix build
* bring back content_length
2024-03-20 13:30:36 +01:00
Georgi Gerganov
6b7e76d28c
gitignore : ignore curl-related files
2024-03-20 14:17:34 +02:00
Georgi Gerganov
bc0baab2ea
server : allow to override -ngl in tests ( #6170 )
2024-03-20 14:14:32 +02:00
Georgi Gerganov
d795988d9e
Revert "llava : add a MobileVLM_V2-1.7B backup ( #6152 )"
...
This reverts commit f8c4e745e1
.
2024-03-20 13:29:49 +02:00
Ziang Wu
f8c4e745e1
llava : add a MobileVLM_V2-1.7B backup ( #6152 )
...
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* clip : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 13:20:37 +02:00
Karthick
47cc7a7bf9
Server: Handle n_keep parameter in the request ( #6174 )
2024-03-20 12:02:34 +01:00
Jared Van Bortel
bd60d82d0c
server tests : more pythonic process management; fix bare except:
( #6146 )
...
* server tests : remove seemingly redundant newlines in print()
* server tests : use built-in subprocess features, not os.kill and psutil
* server tests : do not catch e.g. SystemExit; use print_exc
* server tests: handle TimeoutExpired exception
* server tests: fix connect on dual-stack systems
* server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127 )
* server: tests: remove the hack on windows since now we get the good socket family
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127 )
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127 )
---------
Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>
2024-03-20 06:33:49 +01:00
Neo Zhang Jianyu
6c0b287748
update readme sycl for new update ( #6151 )
...
* update readme sycl for new update
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
* update by review comments
* update w64devkit link
* update for verify device id part
* Update README-sycl.md
Co-authored-by: Meng, Hengyu <airdldl@163.com>
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-03-20 11:21:41 +08:00
Abhilash Majumder
d26e8b669d
increase igpu cluster limit ( #6159 )
2024-03-20 08:28:49 +05:30
DAN™
d8b009a945
Remove undeed header file. ( #6158 )
2024-03-19 17:16:09 +01:00
Pierrick Hymbert
d0d5de42e5
gguf-split: split and merge gguf per batch of tensors ( #6135 )
...
* gguf-split: split and merge gguf files per tensor
* gguf-split: build with make toolchain
* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split
* split : minor style + fix compile warnings
* gguf-split: remove --upload not implemented
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-19 12:05:44 +01:00
Georgi Gerganov
b80cf3b2d1
common : disable repeat penalties by default ( #6127 )
2024-03-19 10:21:54 +02:00
slaren
970a48060a
ci : exempt some labels from being tagged as stale ( #6140 )
2024-03-19 10:06:54 +02:00
DAN™
4c28b82529
common : print usage on '-h' and '--help' ( #6145 )
2024-03-19 07:59:36 +02:00
github-actions[bot]
2d15886bb0
flake.lock: Update
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06)
→ 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)
2024-03-18 18:51:30 +00:00
Jared Van Bortel
d199ca79f2
mpt : implement backwards compatiblity with duped output tensor ( #6139 )
2024-03-18 12:49:02 -04:00
Felix
104f5e0fc1
clip : fix memory leak ( #6138 )
2024-03-18 17:40:22 +02:00
slaren
5e1b7f94a0
backend : set max split inputs to GGML_MAX_SRC ( #6137 )
2024-03-18 16:33:44 +01:00
Georgi Gerganov
ac9ee6a4ad
ci : disable stale issue messages ( #6126 )
2024-03-18 13:45:38 +02:00
Georgi Gerganov
4f6d1337ca
ci : temporary disable sanitizer builds ( #6128 )
2024-03-18 13:45:27 +02:00
slaren
2bf8d0f7c4
backend : offload large batches to GPU ( #6083 )
...
* backend : offload large batches to GPU
* fix hip
* code cleanup
* fix CUDA split buffers
* Update ggml-backend-impl.h
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* cuda : fix memset without set_device
* imatrix : remove sched affix from weight names
* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup
* update backends
ggml-ci
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-03-18 11:03:04 +01:00
DAN™
496bc79bc2
common : tidy-up argument parsing ( #6105 )
...
* Tidy-up argument parsing.
* Missing ref.
* common : minor
* common : add static classifier
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-18 10:27:44 +02:00
Thérence
9b03719ad7
convert : add support for CamembertModel architecture ( #6119 )
...
Adding support for CamembertModel architecture used by :
https://huggingface.co/dangvantuan/sentence-camembert-large
2024-03-18 10:17:00 +02:00
Romain D
3a6efdd03c
convert : use f32 outtype for bf16 tensors ( #6106 )
...
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion.
Change the outtype to f32 to default to a lossless conversion.
2024-03-18 10:04:41 +02:00
Pierrick Hymbert
d01b3c4c32
common: llama_load_model_from_url using --model-url ( #6098 )
...
* common: llama_load_model_from_url with libcurl dependency
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-17 19:12:37 +01:00