llama.cpp

Author	SHA1	Message	Date
Jeremy Song	64d83e1fd5	Enrich and reword README.md (squashed) Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md Update README.md	2023-12-16 16:18:16 +08:00
Jeremy Song	22ab495a79	fix warning in ggml.c (#5 ) Co-authored-by: syx <yixinsong@sjtu.edu.com>	2023-12-16 16:16:25 +08:00
Jeremy Song	1557b81743	Add solver (#4 ) * add solver * update solver --------- Co-authored-by: syx <yixinsong@sjtu.edu.com>	2023-12-16 16:16:25 +08:00
Holden X	9adba26a1a	Polish README (#1 ) * Upload eval figures * fix some typos * update model links	2023-12-16 00:43:16 +08:00
Holden X	66a1bb4602	add gpu index opts and udpate doc commands (#2 )	2023-12-16 00:42:08 +08:00
Holden X	fe3bc49e81	Add our README (#7 ) * add our readme * add live demo video * update README --------- Co-authored-by: syx <yixinsong@sjtu.edu.cn>	2023-12-15 23:54:07 +08:00
Holden X	15b193729b	Offloading tensors based on total VRAM budget and offloading policy (#6 ) * deprecate ffn_b * get tensor offloading levels * wip: split tensor loading * wip: framework of loading sparse model tensors * save and flush gpu alloc buffer * vram budget will fall back to remaining free memory * minor: remove vram safety margin * add options for vram budget; clean old env vars * minor: bugfix	2023-12-15 23:46:51 +08:00
Holden X	b89a0b7296	Delete README copy.md	2023-12-15 21:31:34 +08:00
Jeremy Song	bb55e4af2c	Full cpu (#5 ) * cancel assert in convert.py * add full CPU support --------- Co-authored-by: syx <yixinsong@sjtu.edu.com>	2023-12-15 21:29:10 +08:00
Holden	a456d83bbe	add fallback for m chip & fix compiler bugs (#4 )	2023-12-14 22:53:14 +08:00
Jeremy Song	e44f6401ec	Merge pull request #3 from hodlen/fix/gpu-dependency support powerinfer without GPU	2023-12-12 15:40:29 +08:00
Yixin Song	182316ecfd	support powerinfer without GPU	2023-12-12 15:40:07 +08:00
Jeremy Song	e4b798a735	Merge pull request #2 from hodlen/fix/axpy_q4 support axpy q4_0 for loop	2023-12-12 15:05:31 +08:00
syx	c796dd4c90	support axpy q4_0 for loop	2023-12-12 15:03:10 +08:00
Jeremy Song	9975f4aaa7	Merge pull request #1 from hodlen/fix/axpy Add fall back for axpy mulmat	2023-12-12 13:53:16 +08:00
syx	6f997d299a	add fall back for axpy mulmat	2023-12-12 13:50:25 +08:00
Holden	a3c295a2ae	merge PowerInfer impl from the internal codebase	2023-12-12 11:08:10 +08:00
Michael Potter	6bb4908a17	Fix MacOS Sonoma model quantization (#4052 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-14 12:34:41 -05:00
Galunid	36eed0c42c	stablelm : StableLM support (#3586 ) * Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers	2023-11-14 11:17:12 +01:00
afrideva	b46d12f86d	convert.py: also look for plain model.safetensors (#4043 ) * add safetensors to convert.py help message * Check for single-file safetensors model * Update convert.py "model" option help message * revert convert.py help message change	2023-11-13 18:03:40 -07:00
M. Yusuf Sarıgöz	bd90eca237	llava : fix regression for square images in #3613 (#4056 )	2023-11-13 18:20:52 +03:00
Georgi Gerganov	3d68f364f1	ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060 ) ggml-ci	2023-11-13 16:55:52 +02:00
Georgi Gerganov	c049b37d7b	readme : update hot topics	2023-11-13 14:18:08 +02:00
Georgi Gerganov	4760e7cc0b	sync : ggml (backend v2) (#3912 ) * sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp	2023-11-13 14:16:23 +02:00
Kerfuffle	bb50a792ec	Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041 ) * Add ReLU and SQR CUDA ops to fix Persimmon offloading * Persimmon loader: More helpful error on CUDA/ROCM when offloading too many layers	2023-11-13 01:58:15 -07:00
Kerfuffle	21fd874c8d	gguf-py: gguf_writer: Use bytearray to build metadata (#4051 ) * gguf-py: gguf_writer: Use BytesIO to build metadata * Use bytearray instead Bump gguf-py package version	2023-11-12 16:39:37 -07:00
Richard Kiss	532dd74e38	Fix some documentation typos/grammar mistakes (#4032 ) * typos * Update examples/parallel/README.md Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>	2023-11-11 23:04:58 -07:00
M. Yusuf Sarıgöz	e86fc56f75	Fix gguf-convert-endian script (#4037 ) * Fix gguf-convert-endian script * Bump version and update description	2023-11-11 08:35:31 -07:00
Alexey Parfenov	d96ca7ded7	server : fix crash when prompt exceeds context size (#3996 )	2023-11-10 23:48:21 -06:00
Kerfuffle	34b0a08207	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 ) * gguf-py: Refactor and add file reading support * Replay changes from #3871 Credit to @cebtenzzre for that pull * Various type annotation fixes. * sort imports with isort (again) * Fix missing return statement in add_tensor * style cleanup with flake8 * fix NamedTuple and Enum usage * Fix an issue with state init in GGUFReader Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly * Damagage is not a word. * Clean up gguf-py/examples/modify_gguf.py whitespace Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/examples/modify_gguf.py formatting Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/gguf/gguf_reader.py type hint Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Make examples executable, formatting changes * Add more information to GGUFReader and examples comments * Include a gguf Python package version bump * Add convert-gguf-endian.py script * cleanup * gguf-py : bump minor version * Reorganize scripts * Make GGUFReader endian detection less arbitrary * Add JSON dumping support to gguf-dump.py Which I kind of regret now * A few for gguf-dump.py cleanups * Murder accidental tuple in gguf-py/scripts/gguf-dump.py Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * cleanup * constants : remove unneeded type annotations * fix python 3.8 compat * Set up gguf- scripts in pyproject.toml * And include scripts/__init__.py, derp * convert.py: We can't currently support Q8_0 on big endian. * gguf-py: SpecialVocab: Always try available sources for special token ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u * cleanup * Promote add_X_token to GGUF metadata for BOS and EOS --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-11 08:04:50 +03:00
Jhen-Jie Hong	4a4fd3eefa	server : allow continue edit on completion mode (#3950 ) * server : allow continue edit on completion mode * server : handle abort case in runCompletion * server : style improvement	2023-11-10 16:49:33 -06:00
Galunid	df9d1293de	Unbreak persimmon after #3837 (#4010 )	2023-11-10 14:24:54 +01:00
Galunid	a75fa576ab	scripts: Generalize convert scripts (#3838 ) * Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py	2023-11-09 11:09:29 +01:00
Mihai	57ad015dc3	server : add min_p param (#3877 ) * Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending	2023-11-08 20:00:34 -06:00
slaren	875fb42871	ggml-alloc : fix backend assignments of views (#3982 )	2023-11-08 13:15:14 +01:00
Jared Van Bortel	0a7c980b6f	gguf : track writer state, free unneeded tensors, cleanup (#3871 )	2023-11-07 12:43:04 -05:00
Georgi Gerganov	413503d4b9	make : do not add linker flags when compiling static llava lib (#3977 )	2023-11-07 20:25:32 +03:00
xaedes	e9c1cecb9d	ggml : fix backward rope after YaRN (#3974 ) * fix backward process of rope rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration	2023-11-07 10:04:51 +02:00
Matthew Tejo	54b4df8886	Use params when loading models in llava-cli (#3976 ) llava-cli was loading models with default params and ignoring settings from the cli. This switches to a generic function to load the params from the cli options.	2023-11-07 10:43:59 +03:00
Meng Zhang	46876d2a2c	cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946 ) * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build * doc: add comments to ggml_cublas_loaded() * fix defined(...)	2023-11-07 08:49:08 +02:00
Damian Stewart	381efbf480	llava : expose as a shared library for downstream projects (#3613 ) * wip llava python bindings compatibility * add external llava API * add base64 in-prompt image support * wip refactor image loading * refactor image load out of llava init * cleanup * further cleanup; move llava-cli into its own file and rename * move base64.hpp into common/ * collapse clip and llava libraries * move llava into its own subdir * wip * fix bug where base64 string was not removed from the prompt * get libllava to output in the right place * expose llava methods in libllama.dylib * cleanup memory usage around clip_image_* * cleanup and refactor again * update headerdoc * build with cmake, not tested (WIP) * Editorconfig * Editorconfig * Build with make * Build with make * Fix cyclical depts on Windows * attempt to fix build on Windows * attempt to fix build on Windows * Upd TODOs * attempt to fix build on Windows+CUDA * Revert changes in cmake * Fix according to review comments * Support building as a shared library * address review comments --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2023-11-07 00:36:23 +03:00
slaren	2833a6f63c	ggml-cuda : fix f16 mul mat (#3961 ) * ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus)	2023-11-05 18:45:16 +01:00
Kerfuffle	d9ccce2e33	Allow common process_escapes to handle \x sequences (#3928 ) * Allow common process_escapes to handle \x sequences * Fix edge case when second hex digit is NUL	2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm	bb60fd0bf6	server : fix typo for --alias shortcut from -m to -a (#3958 )	2023-11-05 18:15:27 +02:00
Jared Van Bortel	132d25b8a6	cuda : fix disabling device with --tensor-split 1,0 (#3951 ) Co-authored-by: slaren <slarengh@gmail.com>	2023-11-05 10:08:57 -05:00
Meng Zhang	3d48f42efc	llama : mark LLM_ARCH_STARCODER as full offload supported (#3945 ) as done in https://github.com/ggerganov/llama.cpp/pull/3827	2023-11-05 14:40:08 +02:00
Eve	c41ea36eaa	cmake : MSVC instruction detection (fixed up #809 ) (#3923 ) * Add detection code for avx * Only check hardware when option is ON * Modify per code review sugguestions * Build locally will detect CPU * Fixes CMake style to use lowercase like everywhere else * cleanup * fix merge * linux/gcc version for testing * msvc combines avx2 and fma into /arch:AVX2 so check for both * cleanup * msvc only version * style * Update FindSIMD.cmake --------- Co-authored-by: Howard Su <howard0su@gmail.com> Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>	2023-11-05 10:03:09 +02:00
Eve	a7fac013cf	ci : use intel sde when ci cpu doesn't support avx512 (#3949 )	2023-11-05 09:46:44 +02:00
slaren	48ade94538	cuda : revert CUDA pool stuff (#3944 ) * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" This reverts commit `629f917cd6`. * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" This reverts commit `d6069051de`. ggml-ci	2023-11-05 09:12:13 +02:00
Kerfuffle	f28af0d81a	gguf-py: Support 01.AI Yi models (#3943 )	2023-11-04 16:20:34 -06:00

1 2 3 4 5 ...

1533 commits