M. Yusuf Sarıgöz
1f8c866408
attempt to fix build on Windows
2023-11-06 03:27:03 +03:00
M. Yusuf Sarıgöz
71ea278ad8
Merge branch 'master' into llava-lib
2023-11-05 23:40:51 +03:00
M. Yusuf Sarıgöz
ad97e0eda8
attempt to fix build on Windows
2023-11-05 23:36:16 +03:00
slaren
2833a6f63c
ggml-cuda : fix f16 mul mat ( #3961 )
...
* ggml-cuda : fix f16 mul mat
ggml-ci
* silence common.cpp warning (bonus)
2023-11-05 18:45:16 +01:00
Kerfuffle
d9ccce2e33
Allow common process_escapes to handle \x sequences ( #3928 )
...
* Allow common process_escapes to handle \x sequences
* Fix edge case when second hex digit is NUL
2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm
bb60fd0bf6
server : fix typo for --alias shortcut from -m to -a ( #3958 )
2023-11-05 18:15:27 +02:00
Jared Van Bortel
132d25b8a6
cuda : fix disabling device with --tensor-split 1,0 ( #3951 )
...
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-05 10:08:57 -05:00
M. Yusuf Sarıgöz
01f06e26c3
Fix cyclical depts on Windows
2023-11-05 17:34:48 +03:00
M. Yusuf Sarıgöz
b9277727a6
Build with make
2023-11-05 17:10:54 +03:00
M. Yusuf Sarıgöz
53dca51fd1
Build with make
2023-11-05 17:00:35 +03:00
Meng Zhang
3d48f42efc
llama : mark LLM_ARCH_STARCODER as full offload supported ( #3945 )
...
as done in https://github.com/ggerganov/llama.cpp/pull/3827
2023-11-05 14:40:08 +02:00
M. Yusuf Sarıgöz
32bf7bf61f
Editorconfig
2023-11-05 15:33:16 +03:00
M. Yusuf Sarıgöz
c6b88446e9
Merge branch 'master' into llava-lib
2023-11-05 15:25:31 +03:00
M. Yusuf Sarıgöz
52143f799b
Editorconfig
2023-11-05 15:22:47 +03:00
Eve
c41ea36eaa
cmake : MSVC instruction detection (fixed up #809 ) ( #3923 )
...
* Add detection code for avx
* Only check hardware when option is ON
* Modify per code review sugguestions
* Build locally will detect CPU
* Fixes CMake style to use lowercase like everywhere else
* cleanup
* fix merge
* linux/gcc version for testing
* msvc combines avx2 and fma into /arch:AVX2 so check for both
* cleanup
* msvc only version
* style
* Update FindSIMD.cmake
---------
Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
Eve
a7fac013cf
ci : use intel sde when ci cpu doesn't support avx512 ( #3949 )
2023-11-05 09:46:44 +02:00
slaren
48ade94538
cuda : revert CUDA pool stuff ( #3944 )
...
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918 )"
This reverts commit 629f917cd6
.
* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903 )"
This reverts commit d6069051de
.
ggml-ci
2023-11-05 09:12:13 +02:00
Kerfuffle
f28af0d81a
gguf-py: Support 01.AI Yi models ( #3943 )
2023-11-04 16:20:34 -06:00
Peter Sugihara
d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion ( #3938 )
2023-11-03 21:18:18 +02:00
Xiao-Yong Jin
5ba3746171
ggml-metal: fix yarn rope ( #3937 )
2023-11-03 14:00:31 -04:00
slaren
abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels ( #3921 )
2023-11-03 12:13:09 +01:00
Georgi Gerganov
8f961abdc4
speculative : change default p_accept to 0.5 + CLI args ( #3919 )
...
ggml-ci
2023-11-03 09:41:56 +02:00
Georgi Gerganov
05816027d6
common : YAYF (yet another YARN fix) ( #3925 )
...
ggml-ci
2023-11-03 09:24:00 +02:00
cebtenzzre
3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 ( #3922 )
2023-11-03 08:31:58 +02:00
M. Yusuf Sarıgöz
803703478d
build with cmake, not tested (WIP)
2023-11-03 01:34:52 +03:00
M. Yusuf Sarıgöz
e84003b430
Move llava back to examples
2023-11-03 01:10:26 +03:00
Kerfuffle
629f917cd6
cuda : add ROCM aliases for CUDA pool stuff ( #3918 )
2023-11-02 21:58:22 +02:00
Andrei
51b2fc11f7
cmake : fix relative path to git submodule index ( #3915 )
2023-11-02 21:40:31 +02:00
Georgi Gerganov
224e7d5b14
readme : add notice about #3912
2023-11-02 20:44:12 +02:00
Georgi Gerganov
c7743fe1c1
cuda : fix const ptrs warning causing ROCm build issues ( #3913 )
2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko
d6069051de
cuda : use CUDA memory pool with async memory allocation/deallocation when available ( #3903 )
...
* Using cuda memory pools for async alloc/dealloc.
* If cuda device doesnt support memory pool than use old implementation.
* Removed redundant cublasSetStream
---------
Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02 19:10:39 +02:00
Georgi Gerganov
4ff1046d75
gguf : print error for GGUFv1 files ( #3908 )
2023-11-02 16:22:30 +02:00
slaren
21958bb393
cmake : disable LLAMA_NATIVE by default ( #3906 )
2023-11-02 14:10:33 +02:00
Georgi Gerganov
2756c4fbff
gguf : remove special-case code for GGUFv1 ( #3901 )
...
ggml-ci
2023-11-02 11:20:21 +02:00
Georgi Gerganov
1efae9b7dc
llm : prevent from 1-D tensors being GPU split ( #3697 )
2023-11-02 09:54:44 +02:00
cebtenzzre
b12fa0d1c1
build : link against build info instead of compiling against it ( #3879 )
...
* cmake : fix build when .git does not exist
* cmake : simplify BUILD_INFO target
* cmake : add missing dependencies on BUILD_INFO
* build : link against build info instead of compiling against it
* zig : make build info a .cpp source instead of a header
Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
* cmake : revert change to CMP0115
---------
Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov
4d719a6d4e
cuda : check if this fixes Pascal card regression ( #3882 )
2023-11-02 08:35:10 +02:00
Georgi Gerganov
183b3fac6c
metal : fix build errors and kernel sig after #2268 ( #3898 )
2023-11-02 08:33:37 +02:00
cebtenzzre
2fffa0d61f
cuda : fix RoPE after #2268 ( #3897 )
2023-11-02 07:49:44 +02:00
cebtenzzre
0eb332a10f
llama : fix llama_context_default_params after #2268 ( #3893 )
2023-11-01 19:29:14 -04:00
slaren
d02e98cde0
ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel ( #3891 )
...
* ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel
* fix warnings
2023-11-01 23:10:09 +01:00
cebtenzzre
898aeca90a
llama : implement YaRN RoPE scaling ( #2268 )
...
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>
2023-11-01 18:04:33 -04:00
Georgi Gerganov
c43c2da8af
llm : fix llm_build_kqv taking unused tensor (benign, #3837 )
2023-11-01 23:08:30 +02:00
Georgi Gerganov
523e49b111
llm : fix falcon norm after refactoring ( #3837 )
2023-11-01 23:00:50 +02:00
Georgi Gerganov
e16b9fa4ba
metal : multi-simd softmax ( #3710 )
...
ggml-ci
2023-11-01 21:25:00 +02:00
Georgi Gerganov
ff8f9a88da
common : minor ( #3715 )
2023-11-01 21:15:55 +02:00
Georgi Gerganov
50337961a6
llm : add llm_build_context ( #3881 )
...
* llm : add llm_build_context
* llm : deduce norm eps based on type + explict max_alibi_bias, clamp_kqv
* llm : restore the non-graph llm_build_ functional API
ggml-ci
* llm : cleanup + comments
2023-11-01 20:11:02 +02:00
bandoti
0e40806c1c
common : allow caller to handle help/argument exceptions ( #3715 )
...
* Allow caller to handle help/argument exceptions
* Prepend newline to usage output
* Add new gpt_params_parse_ex function to hide arg-parse impl
* Fix issue blocking success case
* exit instead of returning false
* Update common/common.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-01 19:42:01 +02:00
staviq
a2758d08e4
log : make generating separate log files optional ( #3787 )
...
* impl --log-new, --log-append
* Update common/log.h
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
* Update common/log.h
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
* Apply suggestions from code review
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
---------
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-11-01 16:18:27 +02:00
l3utterfly
e75dfdd31b
sampling : null grammar field after reset ( #3885 )
2023-11-01 15:40:43 +02:00