Georgi Gerganov
7528c705b0
llama : fix uninitialized tensors
2024-05-21 22:02:00 +03:00
Georgi Gerganov
92711138f9
convert : read/write n_head_kv
2024-05-21 19:40:01 +03:00
Georgi Gerganov
e9acbce624
cuda : fix compile warning
2024-05-21 19:08:12 +03:00
Georgi Gerganov
23b72b871c
llama : remove tmp assert
2024-05-21 18:29:12 +03:00
Georgi Gerganov
600896b882
llama : move rope factors from KV header to tensors
2024-05-21 18:26:55 +03:00
Georgi Gerganov
d93b5cad0a
minor : cleanup
2024-05-21 17:51:17 +03:00
Georgi Gerganov
4f787ead14
backends : fix pragma semicolons
2024-05-21 17:51:17 +03:00
Georgi Gerganov
e7c7d8ca42
tests : update to use new rope API
2024-05-21 17:51:17 +03:00
Georgi Gerganov
f4cb482c62
minor : style
2024-05-21 17:51:16 +03:00
Georgi Gerganov
352c3859a7
backends : add dev messages to support rope freq. factors
2024-05-21 17:51:16 +03:00
Georgi Gerganov
471d8170bc
ggml : update ggml_rope_ext API to support freq. factors
2024-05-21 17:51:15 +03:00
Georgi Gerganov
2d473a4a9a
metal : support rope freq_factors
2024-05-21 17:51:01 +03:00
liuwei
8a9c897fd0
add one line of comments
2024-05-21 17:51:01 +03:00
liuwei
d05ae12e93
set to the short freq factor when context size is small than trained context size
2024-05-21 17:51:00 +03:00
liuwei
b1f491a297
fix flint warnings on convert-hf-to-gguf.py
2024-05-21 17:51:00 +03:00
liuwei
5683db3bf7
remove unused rope scaling type 'su' frin gguf converter
2024-05-21 17:51:00 +03:00
liuwei
6333ed1a30
make freq factors only depend on ctx size
2024-05-21 17:51:00 +03:00
liuwei
c5569311a4
add long rope support in ggml cpu backend
2024-05-21 17:51:00 +03:00
liuwei
9f871298b6
adjust index value in cuda long rope freq factors
2024-05-21 17:51:00 +03:00
liuwei
cc19780a55
address build warnings on llama.cpp
2024-05-21 17:51:00 +03:00
Wei Liu
56d9fa72de
add phi3 128k support in cuda
2024-05-21 17:50:58 +03:00
Wei Liu
8fa413d8b5
add phi3 128k support in convert-hf-to-gguf
2024-05-21 17:49:56 +03:00
Amir
11474e756d
examples: cache hf model when --model not provided ( #7353 )
...
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
2024-05-21 17:13:12 +03:00
Johannes Gäßler
d8ee902227
CUDA: deduplicate mmq code ( #7397 )
2024-05-21 16:02:12 +02:00
jaime-m-p
d7e852c1bc
Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) ( #7425 )
...
* Update brute force test: add_special
* Update brute force test: default values for add_bos_token and add_eos_token
* Enable rtrim when pre-inserting BOS
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Revert "server : fix test regexes"
2024-05-21 14:39:48 +02:00
jaime-m-p
917dc8cfa6
Tokenizer SPM fixes for phi-3 and llama-spm ( #7375 )
...
* Update brute force test: special tokens
* Fix added tokens
- Try to read 'added_tokens.json'.
- Try to read 'tokenizer_config.json'.
- Try to read 'tokenizer.json'.
* Fix special tokens rtrim
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix test regexes
2024-05-20 20:15:57 +02:00
Georgi Gerganov
fabf30b4c4
llama : remove Persimmon ( #7408 )
...
* llama : remove Persimmon
* requirements : remove
2024-05-21 02:35:28 +10:00
Johannes Gäßler
20385cebcc
perplexity: update README FP16 results [no ci] ( #7413 )
2024-05-20 18:15:38 +02:00
Radoslav Gerganov
db10f01310
rpc : track allocated buffers ( #7411 )
...
* rpc : track allocated buffers
ref: #7407
* rpc : pack rpc_tensor tightly
2024-05-20 16:36:55 +03:00
Georgi Gerganov
3bc10cb485
server : fix temperature + disable some tests ( #7409 )
...
* server : fix temperature
* server : disable tests relying on parallel determinism
* ci : change server Debug -> RelWithDebInfo
2024-05-20 22:10:03 +10:00
AidanBeltonS
6bf9b66fa3
[SYCL] Update SYCL upscale operation ( #7321 )
...
* Update SYCL upscale operation
* Formatting
* Remove messages
2024-05-20 16:38:23 +05:30
Bingan
26cd4237bc
Update README.md ( #7410 )
2024-05-20 11:55:34 +02:00
Herman Semenov
213e90ed73
ggml-opencl, llama: using reserve() if count already known ( #7272 )
2024-05-20 10:33:21 +03:00
junchao-loongson
65c58207ec
ggml : add loongarch lsx and lasx support ( #6454 )
...
* add loongarch lsx and lasx optimize code
* Add loongarch compilation support to makefile
* revert stb_image.h
* opt bytes_from_nibbles_32 and sum_i16_pairs_float
* fix undeclared
* format code
* update
* update 2
---------
Co-authored-by: Jinyang He <hejinyang@loongson.cn>
2024-05-20 10:19:21 +03:00
Georgi Gerganov
1cc0155d04
server : tuning tests ( #7388 )
...
* server : don't pass temperature as string
* server : increase timeout
* tests : fix the fix 0.8f -> 0.8
ggml-ci
* tests : set explicit temperature
2024-05-20 10:16:41 +03:00
Georgi Gerganov
e932094d58
server : return error on too large embedding input ( #7389 )
2024-05-20 08:56:05 +03:00
Georgi Gerganov
2789baf480
tests : fix --keep_split -> --keep-split ( #7374 )
2024-05-20 08:55:09 +03:00
Srihari-mcw
33c8d50acc
Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 ( #7258 )
2024-05-20 12:18:39 +10:00
slaren
d359f30921
llama : remove MPI backend ( #7395 )
2024-05-20 01:17:03 +02:00
Fred Douglas
1ea2a0036e
quantize : fix --keep-split check ( #7374 )
2024-05-19 19:37:04 +03:00
0cc4m
f030ec1f7a
Vulkan Embedding Fix ( #7360 )
...
* Fix empty Vulkan host buffers
Add fp32 fp16 matmul shader
Fix matmul shader alignment
* Remove deprecated tensor->backend uses
* Fix Vulkan validation errors on embedding models with no offloaded layers
* Fix Vulkan llava segfault when not offloading layers
2024-05-19 17:19:53 +02:00
slaren
e4e6f67be6
ggml : fix another case of quants nans ( #7387 )
2024-05-19 17:08:46 +02:00
Johannes Gäßler
5ca49cbecd
ggml: implement quantized KV cache for FA ( #7372 )
2024-05-19 16:46:13 +02:00
Johannes Gäßler
1b01f06db0
server: add test for token probs ( #7347 )
2024-05-19 16:26:02 +02:00
Johannes Gäßler
41858392e1
server: fix seed being reported back ( #7382 )
2024-05-19 17:06:33 +03:00
Anas Ahouzi
6aade19ee7
Add StableLM2 pre-tokenizer ( #7349 )
...
* Add StableLM pre-tokenizer
* Fix space
* Fix trailing whitespace
2024-05-19 22:46:46 +10:00
slaren
ab33f7a338
cuda : clear error after buffer allocation failure ( #7376 )
2024-05-19 14:19:37 +02:00
Brian
e23b974f4c
labeler.yml: Use settings from ggerganov/llama.cpp [no ci] ( #7363 )
...
https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action
Recommends the use of checkout action to use the correct repo context
when applying settings for PR labels
e.g.
steps:
- uses: actions/checkout@v4 # Uploads repository content to the runner
with:
repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more
- uses: actions/labeler@v5
with:
configuration-path: 'path/to/the/uploaded/configuration/file'
2024-05-19 20:51:03 +10:00
Georgi Gerganov
854d365aba
cmake : update android comments ( #7341 )
2024-05-19 11:01:01 +03:00
fraxy-v
f5bf761747
Capture CUDA logging output ( #7298 )
...
* logging: output capture in cuda module
* fix compile error
* fix: vsnprintf terminates with 0, string use not correct
* post review
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-05-19 00:44:42 +02:00