llama.cpp

Author	SHA1	Message	Date
M. Yusuf Sarıgöz	1f8c866408	attempt to fix build on Windows	2023-11-06 03:27:03 +03:00
M. Yusuf Sarıgöz	71ea278ad8	Merge branch 'master' into llava-lib	2023-11-05 23:40:51 +03:00
M. Yusuf Sarıgöz	ad97e0eda8	attempt to fix build on Windows	2023-11-05 23:36:16 +03:00
slaren	2833a6f63c	ggml-cuda : fix f16 mul mat (#3961 ) * ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus)	2023-11-05 18:45:16 +01:00
Kerfuffle	d9ccce2e33	Allow common process_escapes to handle \x sequences (#3928 ) * Allow common process_escapes to handle \x sequences * Fix edge case when second hex digit is NUL	2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm	bb60fd0bf6	server : fix typo for --alias shortcut from -m to -a (#3958 )	2023-11-05 18:15:27 +02:00
Jared Van Bortel	132d25b8a6	cuda : fix disabling device with --tensor-split 1,0 (#3951 ) Co-authored-by: slaren <slarengh@gmail.com>	2023-11-05 10:08:57 -05:00
M. Yusuf Sarıgöz	01f06e26c3	Fix cyclical depts on Windows	2023-11-05 17:34:48 +03:00
M. Yusuf Sarıgöz	b9277727a6	Build with make	2023-11-05 17:10:54 +03:00
M. Yusuf Sarıgöz	53dca51fd1	Build with make	2023-11-05 17:00:35 +03:00
Meng Zhang	3d48f42efc	llama : mark LLM_ARCH_STARCODER as full offload supported (#3945 ) as done in https://github.com/ggerganov/llama.cpp/pull/3827	2023-11-05 14:40:08 +02:00
M. Yusuf Sarıgöz	32bf7bf61f	Editorconfig	2023-11-05 15:33:16 +03:00
M. Yusuf Sarıgöz	c6b88446e9	Merge branch 'master' into llava-lib	2023-11-05 15:25:31 +03:00
M. Yusuf Sarıgöz	52143f799b	Editorconfig	2023-11-05 15:22:47 +03:00
Eve	c41ea36eaa	cmake : MSVC instruction detection (fixed up #809 ) (#3923 ) * Add detection code for avx * Only check hardware when option is ON * Modify per code review sugguestions * Build locally will detect CPU * Fixes CMake style to use lowercase like everywhere else * cleanup * fix merge * linux/gcc version for testing * msvc combines avx2 and fma into /arch:AVX2 so check for both * cleanup * msvc only version * style * Update FindSIMD.cmake --------- Co-authored-by: Howard Su <howard0su@gmail.com> Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>	2023-11-05 10:03:09 +02:00
Eve	a7fac013cf	ci : use intel sde when ci cpu doesn't support avx512 (#3949 )	2023-11-05 09:46:44 +02:00
slaren	48ade94538	cuda : revert CUDA pool stuff (#3944 ) * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" This reverts commit `629f917cd6`. * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" This reverts commit `d6069051de`. ggml-ci	2023-11-05 09:12:13 +02:00
Kerfuffle	f28af0d81a	gguf-py: Support 01.AI Yi models (#3943 )	2023-11-04 16:20:34 -06:00
Peter Sugihara	d9b33fe95b	metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938 )	2023-11-03 21:18:18 +02:00
Xiao-Yong Jin	5ba3746171	ggml-metal: fix yarn rope (#3937 )	2023-11-03 14:00:31 -04:00
slaren	abb77e7319	ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921 )	2023-11-03 12:13:09 +01:00
Georgi Gerganov	8f961abdc4	speculative : change default p_accept to 0.5 + CLI args (#3919 ) ggml-ci	2023-11-03 09:41:56 +02:00
Georgi Gerganov	05816027d6	common : YAYF (yet another YARN fix) (#3925 ) ggml-ci	2023-11-03 09:24:00 +02:00
cebtenzzre	3fdbe6b66b	llama : change yarn_ext_factor placeholder to -1 (#3922 )	2023-11-03 08:31:58 +02:00
M. Yusuf Sarıgöz	803703478d	build with cmake, not tested (WIP)	2023-11-03 01:34:52 +03:00
M. Yusuf Sarıgöz	e84003b430	Move llava back to examples	2023-11-03 01:10:26 +03:00
Kerfuffle	629f917cd6	cuda : add ROCM aliases for CUDA pool stuff (#3918 )	2023-11-02 21:58:22 +02:00
Andrei	51b2fc11f7	cmake : fix relative path to git submodule index (#3915 )	2023-11-02 21:40:31 +02:00
Georgi Gerganov	224e7d5b14	readme : add notice about #3912	2023-11-02 20:44:12 +02:00
Georgi Gerganov	c7743fe1c1	cuda : fix const ptrs warning causing ROCm build issues (#3913 )	2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko	d6069051de	cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903 ) * Using cuda memory pools for async alloc/dealloc. * If cuda device doesnt support memory pool than use old implementation. * Removed redundant cublasSetStream --------- Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>	2023-11-02 19:10:39 +02:00
Georgi Gerganov	4ff1046d75	gguf : print error for GGUFv1 files (#3908 )	2023-11-02 16:22:30 +02:00
slaren	21958bb393	cmake : disable LLAMA_NATIVE by default (#3906 )	2023-11-02 14:10:33 +02:00
Georgi Gerganov	2756c4fbff	gguf : remove special-case code for GGUFv1 (#3901 ) ggml-ci	2023-11-02 11:20:21 +02:00
Georgi Gerganov	1efae9b7dc	llm : prevent from 1-D tensors being GPU split (#3697 )	2023-11-02 09:54:44 +02:00
cebtenzzre	b12fa0d1c1	build : link against build info instead of compiling against it (#3879 ) * cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>	2023-11-02 08:50:16 +02:00
Georgi Gerganov	4d719a6d4e	cuda : check if this fixes Pascal card regression (#3882 )	2023-11-02 08:35:10 +02:00
Georgi Gerganov	183b3fac6c	metal : fix build errors and kernel sig after #2268 (#3898 )	2023-11-02 08:33:37 +02:00
cebtenzzre	2fffa0d61f	cuda : fix RoPE after #2268 (#3897 )	2023-11-02 07:49:44 +02:00
cebtenzzre	0eb332a10f	llama : fix llama_context_default_params after #2268 (#3893 )	2023-11-01 19:29:14 -04:00
slaren	d02e98cde0	ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891 ) * ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel * fix warnings	2023-11-01 23:10:09 +01:00
cebtenzzre	898aeca90a	llama : implement YaRN RoPE scaling (#2268 ) Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>	2023-11-01 18:04:33 -04:00
Georgi Gerganov	c43c2da8af	llm : fix llm_build_kqv taking unused tensor (benign, #3837 )	2023-11-01 23:08:30 +02:00
Georgi Gerganov	523e49b111	llm : fix falcon norm after refactoring (#3837 )	2023-11-01 23:00:50 +02:00
Georgi Gerganov	e16b9fa4ba	metal : multi-simd softmax (#3710 ) ggml-ci	2023-11-01 21:25:00 +02:00
Georgi Gerganov	ff8f9a88da	common : minor (#3715 )	2023-11-01 21:15:55 +02:00
Georgi Gerganov	50337961a6	llm : add llm_build_context (#3881 ) * llm : add llm_build_context * llm : deduce norm eps based on type + explict max_alibi_bias, clamp_kqv * llm : restore the non-graph llm_build_ functional API ggml-ci * llm : cleanup + comments	2023-11-01 20:11:02 +02:00
bandoti	0e40806c1c	common : allow caller to handle help/argument exceptions (#3715 ) * Allow caller to handle help/argument exceptions * Prepend newline to usage output * Add new gpt_params_parse_ex function to hide arg-parse impl * Fix issue blocking success case * exit instead of returning false * Update common/common.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-01 19:42:01 +02:00
staviq	a2758d08e4	log : make generating separate log files optional (#3787 ) * impl --log-new, --log-append * Update common/log.h Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> * Update common/log.h Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> * Apply suggestions from code review Co-authored-by: cebtenzzre <cebtenzzre@gmail.com> --------- Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>	2023-11-01 16:18:27 +02:00
l3utterfly	e75dfdd31b	sampling : null grammar field after reset (#3885 )	2023-11-01 15:40:43 +02:00

1 2 3 4 5 ...

1520 commits