llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	dbc35acff0	llama : introduce some typedef helpers	2024-03-22 10:58:42 +02:00
Georgi Gerganov	8326607cfe	llama : minor ggml-ci	2024-03-22 10:18:04 +02:00
Pierrick HYMBERT	e474e456eb	llama_split_prefix: use a clearer version, not pass split path len but dest max len. Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-03-22 07:48:50 +01:00
Pierrick HYMBERT	4c04400969	llama_model_loader: fix map -> unordered map	2024-03-22 07:07:00 +01:00
Pierrick HYMBERT	b19af3643f	llama_model_loader: be sure the model mappings has enough capacity before allocating backend buffer	2024-03-22 07:03:14 +01:00
Pierrick HYMBERT	a9e88c6e57	llama_model_loader: immediately add the backend buffer to the model buffers in order to free them if an error occurs in the next allocation. Reserve the expected size.	2024-03-22 06:59:04 +01:00
Pierrick HYMBERT	ec372c66a4	llama_model_loader: use at instead of operator[] if this should never add to the map.	2024-03-22 06:52:00 +01:00
Pierrick HYMBERT	9940df4f11	llama_model_loader: ensure mappings vector has the expected size	2024-03-22 06:51:21 +01:00
Pierrick HYMBERT	7cbe1eac78	llama_model_loader: if n_tensors declared not equals to loaded tensors in split, throw an exception instead of asserting	2024-03-22 06:48:15 +01:00
Pierrick Hymbert	1a179bfc4e	fix loop over pointer Co-authored-by: slaren <slarengh@gmail.com>	2024-03-22 00:38:23 +01:00
Pierrick Hymbert	0fd652eba7	spacing Co-authored-by: slaren <slarengh@gmail.com>	2024-03-22 00:37:01 +01:00
Pierrick HYMBERT	f9a29735fc	llama_model_loader: fail if any of backend buffer cannot be allocated	2024-03-22 00:25:11 +01:00
Pierrick HYMBERT	6df9757ad6	llama_model_loader: minor, use same variable name for consistency, fix spacing in types cast	2024-03-22 00:02:55 +01:00
Pierrick HYMBERT	69bdee939a	llama_model_loader: only map tensors included in the context	2024-03-21 23:58:12 +01:00
Pierrick HYMBERT	078a1aca06	llama_model_loader: map file to backend buffer if the allocation succeeds only	2024-03-21 23:57:43 +01:00
slaren	02020b0463	fix mmap buffer management	2024-03-21 22:06:37 +01:00
Pierrick HYMBERT	d8b567d254	llama_model_loader: fail if backend cannot allocate buffer	2024-03-21 21:05:15 +01:00
Pierrick Hymbert	1c931f3d4f	Handle optional tensors Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-21 20:50:28 +01:00
Pierrick Hymbert	c34a5deee8	Simplify this by making these optional, switch some layer creation tensor optional Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-21 20:50:11 +01:00
Pierrick HYMBERT	00381b07bb	avoid copying the entire vector	2024-03-21 19:18:39 +01:00
Pierrick HYMBERT	1892ae7eb1	llama_model_loader: PR feedbacks: - use only one gguf_context for metadata only - store all ggml_context in a vector as the files and mappings - store all weights in a vector along with the source tensor - rename ctx_gguf to meta - rename ctx_meta to contexts	2024-03-21 19:11:37 +01:00
Pierrick HYMBERT	60a87ae051	Merge branch 'master' into hp/split/load-model	2024-03-21 11:48:58 +01:00
Vaibhav Srivastav	1943c01981	ci : fix indentation error (#6195 )	2024-03-21 11:30:40 +02:00
Vaibhav Srivastav	5e43ba8742	build : add mac pre-build binaries (#6182 ) * Initial commit - add mac prebuilds. * forward contribution credits for building the workflow. * minor : remove trailing whitespaces --------- Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-21 11:13:12 +02:00
Kawrakow	76aa30a263	Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183 ) * k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-03-21 08:27:57 +01:00
AidanBeltonS	c5b8595e3f	Add nvidia and amd backends (#6157 )	2024-03-21 11:40:52 +05:30
Pierrick HYMBERT	18ff6ca847	split: move llama_tensor_offset to llama_model_loader	2024-03-21 07:06:14 +01:00
Pierrick Hymbert	b8feff411f	Avoir copying the entire vector Co-authored-by: slaren <slarengh@gmail.com>	2024-03-21 04:36:06 +01:00
slaren	42e21c6882	cuda : fix conflict with std::swap (#6186 )	2024-03-21 01:47:46 +01:00
Pierrick HYMBERT	7c64fef91b	split: support in llama_model_loader	2024-03-20 22:30:20 +01:00
slaren	1c51f98adc	cuda : print the returned error when CUDA initialization fails (#6185 )	2024-03-20 21:03:26 +01:00
Ziang Wu	f9c7ba3447	llava : update MobileVLM-README.md (#6180 )	2024-03-20 17:29:51 +02:00
Ziang Wu	272935b281	llava : add MobileVLM_V2 backup (#6175 ) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace * fix deifinition mistake in clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 17:02:32 +02:00
slaren	ccf58aa3ec	cuda : refactor to remove global resources (#6170 ) * cuda : refactor to remove global resources	2024-03-20 14:42:59 +01:00
Xuan Son Nguyen	91f8ad167d	Server: version bump for httplib and json (#6169 ) * server: version bump for httplib and json * fix build * bring back content_length	2024-03-20 13:30:36 +01:00
Georgi Gerganov	6b7e76d28c	gitignore : ignore curl-related files	2024-03-20 14:17:34 +02:00
Georgi Gerganov	bc0baab2ea	server : allow to override -ngl in tests (#6170 )	2024-03-20 14:14:32 +02:00
Georgi Gerganov	d795988d9e	Revert "llava : add a MobileVLM_V2-1.7B backup (#6152 )" This reverts commit `f8c4e745e1`.	2024-03-20 13:29:49 +02:00
Ziang Wu	f8c4e745e1	llava : add a MobileVLM_V2-1.7B backup (#6152 ) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 13:20:37 +02:00
Karthick	47cc7a7bf9	Server: Handle n_keep parameter in the request (#6174 )	2024-03-20 12:02:34 +01:00
Jared Van Bortel	bd60d82d0c	server tests : more pythonic process management; fix bare `except:` (#6146 ) * server tests : remove seemingly redundant newlines in print() * server tests : use built-in subprocess features, not os.kill and psutil * server tests : do not catch e.g. SystemExit; use print_exc * server tests: handle TimeoutExpired exception * server tests: fix connect on dual-stack systems * server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127) * server: tests: remove the hack on windows since now we get the good socket family * server: tests: add new tokens regex following new repeat penalties default changed in (#6127) * server: tests: add new tokens regex following new repeat penalties default changed in (#6127) --------- Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>	2024-03-20 06:33:49 +01:00
Neo Zhang Jianyu	6c0b287748	update readme sycl for new update (#6151 ) * update readme sycl for new update * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> * Update README-sycl.md Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> * update by review comments * update w64devkit link * update for verify device id part * Update README-sycl.md Co-authored-by: Meng, Hengyu <airdldl@163.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-03-20 11:21:41 +08:00
Abhilash Majumder	d26e8b669d	increase igpu cluster limit (#6159 )	2024-03-20 08:28:49 +05:30
DAN™	d8b009a945	Remove undeed header file. (#6158 )	2024-03-19 17:16:09 +01:00
Pierrick Hymbert	d0d5de42e5	gguf-split: split and merge gguf per batch of tensors (#6135 ) * gguf-split: split and merge gguf files per tensor * gguf-split: build with make toolchain * gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split * split : minor style + fix compile warnings * gguf-split: remove --upload not implemented --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-19 12:05:44 +01:00
Georgi Gerganov	b80cf3b2d1	common : disable repeat penalties by default (#6127 )	2024-03-19 10:21:54 +02:00
slaren	970a48060a	ci : exempt some labels from being tagged as stale (#6140 )	2024-03-19 10:06:54 +02:00
DAN™	4c28b82529	common : print usage on '-h' and '--help' (#6145 )	2024-03-19 07:59:36 +02:00
github-actions[bot]	2d15886bb0	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06) → 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)	2024-03-18 18:51:30 +00:00
Jared Van Bortel	d199ca79f2	mpt : implement backwards compatiblity with duped output tensor (#6139 )	2024-03-18 12:49:02 -04:00

1 2 3 4 5 ...

2508 commits