llama.cpp

Author	SHA1	Message	Date
M. Yusuf Sarıgöz	83321c6958	gguf-py rel pipeline (#8410 ) * Upd gguf-py/readme * Bump patch version for release	2024-07-10 15:12:35 +03:00
Borislav Stanimirov	cc61948b1f	llama : C++20 compatibility for u8 strings (#8408 )	2024-07-10 14:45:44 +03:00
Borislav Stanimirov	7a80710d93	msvc : silence codecvt c++17 deprecation warnings (#8395 )	2024-07-10 14:40:53 +03:00
fairydreaming	a8be1e6f59	llama : add assert about missing llama_encode() call (#8400 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-07-10 14:38:58 +03:00
RunningLeon	e4dd31ff89	py : fix converter for internlm2 (#8321 ) * update internlm2 * remove unused file * fix lint	2024-07-10 14:26:40 +03:00
laik	8f0fad42b9	py : fix extra space in convert_hf_to_gguf.py (#8407 )	2024-07-10 14:19:10 +03:00
Xuan Son Nguyen	4fe0861a89	Merge pull request #9 from ggerganov/sl/fix_fix_lora fix lora issues	2024-07-10 10:33:42 +02:00
slaren	9841fbda7c	llama : lora fixes	2024-07-10 02:21:53 +02:00
slaren	f15167a4c7	cuda : do not use dmmv if the tensor does not have enough cols	2024-07-10 02:21:38 +02:00
ngxson	713665db2e	fix types	2024-07-10 00:36:52 +02:00
Clint Herron	a59f8fdc85	Server: Enable setting default sampling parameters via command-line (#8402 ) * Load server sampling parameters from the server context by default. * Wordsmithing comment	2024-07-09 18:26:40 -04:00
ngxson	ee2b35c65f	conversion: only allow selected models	2024-07-10 00:23:07 +02:00
Andy Salerno	fd560fe680	Update README.md to fix broken link to docs (#8399 ) Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'	2024-07-09 14:58:44 -04:00
Clint Herron	e500d6135a	Deprecation warning to assist with migration to new binary names (#8283 ) * Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames. * Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.	2024-07-09 11:54:43 -04:00
Johannes Gäßler	a03e8dd99d	make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392 )	2024-07-09 17:11:07 +02:00
Alberto Cabrera Pérez	5b0b8d8cfb	sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372 ) * SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment	2024-07-09 22:03:15 +08:00
Borislav Stanimirov	9925ca4087	cmake : allow external ggml (#8370 )	2024-07-09 11:38:00 +03:00
daghanerdonmez	9beb2dda03	readme : fix typo [no ci] (#8389 ) Bakus-Naur --> Backus-Naur	2024-07-09 09:16:00 +03:00
compilade	7d0e23d72e	gguf-py : do not use internal numpy types (#7472 )	2024-07-09 01:04:49 -04:00
Georgi Gerganov	7fdb6f73e3	flake.lock: Update (#8342 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01) → 'github:hercules-ci/flake-parts/9227223f6d922fee3c7b190b2cc238a99527bbb7?narHash=sha256-pQMhCCHyQGRzdfAkdJ4cIWiw%2BJNuWsTX7f0ZYSyz0VY%3D' (2024-07-03) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01) → 'https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz?narHash=sha256-Fm2rDDs86sHy0/1jxTOKB1118Q0O3Uc7EC0iXvXKpbI%3D' (2024-07-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/b2852eb9365c6de48ffb0dc2c9562591f652242a?narHash=sha256-C8e9S7RzshSdHB7L%2Bv9I51af1gDM5unhJ2xO1ywxNH8%3D' (2024-06-27) → 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-07-08 15:36:38 -07:00
Alberto Cabrera Pérez	a130eccef4	labeler : updated sycl to match docs and code refactor (#8373 )	2024-07-08 22:35:17 +02:00
Xuan Son Nguyen	03d24cae19	Merge pull request #8 from ngxson/xsn/fix_lora_convert add HF lora convert script	2024-07-08 22:10:57 +02:00
ngxson	95b3eb057b	fix outfile	2024-07-08 22:05:35 +02:00
ngxson	802565ca43	fix requirements	2024-07-08 22:01:23 +02:00
ngxson	d52455f2be	add requirements	2024-07-08 22:00:13 +02:00
ngxson	7a83f200d3	fix ftype	2024-07-08 21:55:41 +02:00
ngxson	6c617e20ef	add sanity check	2024-07-08 21:36:35 +02:00
ngxson	0e16188985	add metadata check	2024-07-08 17:44:14 +02:00
ngxson	41ced241f2	Merge branch 'master' into xsn/fix_lora	2024-07-08 17:06:46 +02:00
ngxson	84288ff9f7	add f16 convert	2024-07-08 17:05:17 +02:00
ngxson	712fecba61	no more transpose A	2024-07-08 16:48:55 +02:00
ngxson	847135aaa2	add convert script	2024-07-08 16:35:27 +02:00
b4b4o	c4dd11d1d3	readme : fix web link error [no ci] (#8347 )	2024-07-08 17:19:24 +03:00
Alberto Cabrera Pérez	2ec846d558	sycl : fix powf call in device code (#8368 )	2024-07-08 14:22:41 +01:00
Georgi Gerganov	3f2d538b81	scripts : fix sync for sycl	2024-07-08 13:51:31 +03:00
ngxson	79e2982788	update based on review comments	2024-07-08 11:59:01 +02:00
Georgi Gerganov	2ee44c9a18	sync : ggml ggml-ci	2024-07-08 12:23:00 +03:00
Georgi Gerganov	6847d54c4f	tests : fix whitespace (#0 )	2024-07-08 12:23:00 +03:00
John Balis	fde13b3bb9	feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) * conv transpose 1d passing test for 1d input and kernel * working for different input and output channel counts, added test for variable stride * initial draft appears to work with stride other than 1 * working with all old and new conv1d tests * added a test for large tensors * removed use cuda hardcoding * restored test-conv-transpose.c * removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail * fixed accumulator bug * added test to test-backend-ops * fixed mistake * addressed review * fixed includes * removed blank lines * style and warning fixes * return failure when test fails * fix supports_op --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-07-08 12:23:00 +03:00
Kevin Wang	470939d483	common : preallocate sampling token data vector (#8363 ) `emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op. Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.	2024-07-08 10:26:53 +03:00
Georgi Gerganov	6f0dbf6ab0	infill : assert prefix/suffix tokens + remove old space logic (#8351 )	2024-07-08 09:34:35 +03:00
Kevin Wang	ffd00797d8	common : avoid unnecessary logits fetch (#8358 )	2024-07-08 09:31:55 +03:00
toyer	04ce3a8b19	readme : add supported glm models (#8360 )	2024-07-08 08:57:19 +03:00
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Denis Spasyuk	a8db2a9ce6	Update llama-cli documentation (#8315 ) * Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md	2024-07-07 17:08:28 +02:00
Alex Tuddenham	4090ea5501	ci : add checks for cmake,make and ctest in ci/run.sh (#8200 ) * Added checks for cmake,make and ctest * Removed erroneous whitespace	2024-07-07 17:59:14 +03:00
ngxson	30faf1f3de	fix auto merge	2024-07-07 16:36:50 +02:00
ngxson	a1666aaaca	Merge branch 'master' into xsn/fix_lora	2024-07-07 16:35:41 +02:00
ngxson	f6d090d7de	add llm_build_mm	2024-07-07 16:01:05 +02:00
Andy Tai	f1948f1e10	readme : update bindings list (#8222 ) * adding guile_llama_cpp to binding list * fix formatting * fix formatting	2024-07-07 16:21:37 +03:00

1 2 3 4 5 ...

3439 commits