Commit graph

3439 commits

Author SHA1 Message Date
M. Yusuf Sarıgöz
83321c6958
gguf-py rel pipeline (#8410)
* Upd gguf-py/readme

* Bump patch version for release
2024-07-10 15:12:35 +03:00
Borislav Stanimirov
cc61948b1f
llama : C++20 compatibility for u8 strings (#8408) 2024-07-10 14:45:44 +03:00
Borislav Stanimirov
7a80710d93
msvc : silence codecvt c++17 deprecation warnings (#8395) 2024-07-10 14:40:53 +03:00
fairydreaming
a8be1e6f59
llama : add assert about missing llama_encode() call (#8400)
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-07-10 14:38:58 +03:00
RunningLeon
e4dd31ff89
py : fix converter for internlm2 (#8321)
* update internlm2

* remove unused file

* fix lint
2024-07-10 14:26:40 +03:00
laik
8f0fad42b9
py : fix extra space in convert_hf_to_gguf.py (#8407) 2024-07-10 14:19:10 +03:00
Xuan Son Nguyen
4fe0861a89
Merge pull request #9 from ggerganov/sl/fix_fix_lora
fix lora issues
2024-07-10 10:33:42 +02:00
slaren
9841fbda7c llama : lora fixes 2024-07-10 02:21:53 +02:00
slaren
f15167a4c7 cuda : do not use dmmv if the tensor does not have enough cols 2024-07-10 02:21:38 +02:00
ngxson
713665db2e fix types 2024-07-10 00:36:52 +02:00
Clint Herron
a59f8fdc85
Server: Enable setting default sampling parameters via command-line (#8402)
* Load server sampling parameters from the server context by default.

* Wordsmithing comment
2024-07-09 18:26:40 -04:00
ngxson
ee2b35c65f conversion: only allow selected models 2024-07-10 00:23:07 +02:00
Andy Salerno
fd560fe680
Update README.md to fix broken link to docs (#8399)
Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'
2024-07-09 14:58:44 -04:00
Clint Herron
e500d6135a
Deprecation warning to assist with migration to new binary names (#8283)
* Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames.

* Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.
2024-07-09 11:54:43 -04:00
Johannes Gäßler
a03e8dd99d
make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) 2024-07-09 17:11:07 +02:00
Alberto Cabrera Pérez
5b0b8d8cfb
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment
2024-07-09 22:03:15 +08:00
Borislav Stanimirov
9925ca4087
cmake : allow external ggml (#8370) 2024-07-09 11:38:00 +03:00
daghanerdonmez
9beb2dda03
readme : fix typo [no ci] (#8389)
Bakus-Naur --> Backus-Naur
2024-07-09 09:16:00 +03:00
compilade
7d0e23d72e
gguf-py : do not use internal numpy types (#7472) 2024-07-09 01:04:49 -04:00
Georgi Gerganov
7fdb6f73e3
flake.lock: Update (#8342)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01)
  → 'github:hercules-ci/flake-parts/9227223f6d922fee3c7b190b2cc238a99527bbb7?narHash=sha256-pQMhCCHyQGRzdfAkdJ4cIWiw%2BJNuWsTX7f0ZYSyz0VY%3D' (2024-07-03)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01)
  → 'https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz?narHash=sha256-Fm2rDDs86sHy0/1jxTOKB1118Q0O3Uc7EC0iXvXKpbI%3D' (2024-07-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/b2852eb9365c6de48ffb0dc2c9562591f652242a?narHash=sha256-C8e9S7RzshSdHB7L%2Bv9I51af1gDM5unhJ2xO1ywxNH8%3D' (2024-06-27)
  → 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-07-08 15:36:38 -07:00
Alberto Cabrera Pérez
a130eccef4
labeler : updated sycl to match docs and code refactor (#8373) 2024-07-08 22:35:17 +02:00
Xuan Son Nguyen
03d24cae19
Merge pull request #8 from ngxson/xsn/fix_lora_convert
add HF lora convert script
2024-07-08 22:10:57 +02:00
ngxson
95b3eb057b fix outfile 2024-07-08 22:05:35 +02:00
ngxson
802565ca43 fix requirements 2024-07-08 22:01:23 +02:00
ngxson
d52455f2be add requirements 2024-07-08 22:00:13 +02:00
ngxson
7a83f200d3 fix ftype 2024-07-08 21:55:41 +02:00
ngxson
6c617e20ef add sanity check 2024-07-08 21:36:35 +02:00
ngxson
0e16188985 add metadata check 2024-07-08 17:44:14 +02:00
ngxson
41ced241f2 Merge branch 'master' into xsn/fix_lora 2024-07-08 17:06:46 +02:00
ngxson
84288ff9f7 add f16 convert 2024-07-08 17:05:17 +02:00
ngxson
712fecba61 no more transpose A 2024-07-08 16:48:55 +02:00
ngxson
847135aaa2 add convert script 2024-07-08 16:35:27 +02:00
b4b4o
c4dd11d1d3
readme : fix web link error [no ci] (#8347) 2024-07-08 17:19:24 +03:00
Alberto Cabrera Pérez
2ec846d558
sycl : fix powf call in device code (#8368) 2024-07-08 14:22:41 +01:00
Georgi Gerganov
3f2d538b81
scripts : fix sync for sycl 2024-07-08 13:51:31 +03:00
ngxson
79e2982788 update based on review comments 2024-07-08 11:59:01 +02:00
Georgi Gerganov
2ee44c9a18 sync : ggml
ggml-ci
2024-07-08 12:23:00 +03:00
Georgi Gerganov
6847d54c4f tests : fix whitespace (#0) 2024-07-08 12:23:00 +03:00
John Balis
fde13b3bb9 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854)
* conv transpose 1d passing test for 1d input and kernel

* working for different input and output channel counts, added test for variable stride

* initial draft appears to work with stride other than 1

* working with all old and new conv1d  tests

* added a test for large tensors

* removed use cuda hardcoding

* restored test-conv-transpose.c

* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail

* fixed accumulator bug

* added test to test-backend-ops

* fixed mistake

* addressed review

* fixed includes

* removed blank lines

* style and warning fixes

* return failure when test fails

* fix supports_op

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-07-08 12:23:00 +03:00
Kevin Wang
470939d483
common : preallocate sampling token data vector (#8363)
`emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op.

Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.
2024-07-08 10:26:53 +03:00
Georgi Gerganov
6f0dbf6ab0
infill : assert prefix/suffix tokens + remove old space logic (#8351) 2024-07-08 09:34:35 +03:00
Kevin Wang
ffd00797d8
common : avoid unnecessary logits fetch (#8358) 2024-07-08 09:31:55 +03:00
toyer
04ce3a8b19
readme : add supported glm models (#8360) 2024-07-08 08:57:19 +03:00
compilade
3fd62a6b1c
py : type-check all Python scripts with Pyright (#8341)
* py : type-check all Python scripts with Pyright

* server-tests : use trailing slash in openai base_url

* server-tests : add more type annotations

* server-tests : strip "chat" from base_url in oai_chat_completions

* server-tests : model metadata is a dict

* ci : disable pip cache in type-check workflow

The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.

* py : fix new type errors from master branch

* tests : fix test-tokenizer-random.py

Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.

* ci : only show warnings and errors in python type-check

The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors.
2024-07-07 15:04:39 -04:00
Denis Spasyuk
a8db2a9ce6
Update llama-cli documentation (#8315)
* Update README.md

* Update README.md

* Update README.md

fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas

* Update README.md

* Update README.md

* Update README.md
2024-07-07 17:08:28 +02:00
Alex Tuddenham
4090ea5501
ci : add checks for cmake,make and ctest in ci/run.sh (#8200)
* Added checks for cmake,make and ctest

* Removed erroneous whitespace
2024-07-07 17:59:14 +03:00
ngxson
30faf1f3de fix auto merge 2024-07-07 16:36:50 +02:00
ngxson
a1666aaaca Merge branch 'master' into xsn/fix_lora 2024-07-07 16:35:41 +02:00
ngxson
f6d090d7de add llm_build_mm 2024-07-07 16:01:05 +02:00
Andy Tai
f1948f1e10
readme : update bindings list (#8222)
* adding guile_llama_cpp  to binding list

* fix formatting

* fix formatting
2024-07-07 16:21:37 +03:00