Commit graph

3424 commits

Author SHA1 Message Date
Georgi Gerganov
20fc3804bf
convert : fix gemma v1 tokenizer convert (#8248)
ggml-ci
2024-07-04 10:41:03 +03:00
AidanBeltonS
f619024764
[SYCL] Remove unneeded semicolons (#8280) 2024-07-04 09:07:19 +08:00
Daniele
d23287f122
Define and optimize RDNA1 (#8085) 2024-07-04 01:02:58 +02:00
slaren
5f2d4e60e2
ppl : fix n_seq_max for perplexity (#8277)
* ppl : fix n_seq_max for perplexity

* use 1 seq for kl_divergence
2024-07-03 20:33:31 +03:00
Mason M
613a3c6a53 Remove trailing whitespace 2024-07-03 12:11:38 -03:00
bandoti
4226103400
Update README.md 2024-07-03 12:02:21 -03:00
Xuan Son Nguyen
916248af1f
fix phi 3 conversion (#8262) 2024-07-03 16:01:54 +02:00
Judd
f8d6a23804
fix typo (#8267)
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-03 14:40:16 +02:00
AidanBeltonS
fadde67135
Dequant improvements rebase (#8255)
* Single load for half2

* Store scales in local mem

* Vec load quantized values
2024-07-03 09:55:34 +08:00
Mason M
dafcaf1dd3 Remove trailing whitespace 2024-07-02 20:21:14 -03:00
bandoti
3a554ae63e
Update README.md 2024-07-02 18:38:22 -03:00
Mason M
019e4a3c7e Add cflags from pkg-config to fix w64devkit build 2024-07-02 18:10:00 -03:00
MistApproach
a27152b602
fix: add missing short command line argument -mli for multiline-input (#8261) 2024-07-02 22:56:46 +02:00
Mason M
a85b5d8ee0 Merge branch 'master' into vulkan-build-integration 2024-07-02 16:00:26 -03:00
Mason M
2f5a0e8e69 Remove trailing newline 2024-07-02 15:51:56 -03:00
Clint Herron
3e2618bc7b
Adding step to clean target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257) 2024-07-02 13:19:56 -04:00
Mason M
22323d50a3 Merge branch 'master' into vulkan-build-integration 2024-07-02 13:36:04 -03:00
Clint Herron
07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
Faisal Zaghloul
968967376d
Add JAIS model(s) (#8118)
* Add `JAIS` model(s)

* cleanup

* address review comments

* remove hack

* un-hardcode max-alibi-bias

* minor tweaks

---------

Co-authored-by: fmz <quic_fzaghlou@quic.com>
2024-07-02 16:36:00 +02:00
Daniel Bevenius
023b8807e1
convert-hf : print output file name when completed (#8181)
* convert-hf : print output file name when completed

This commit adds the output file name to the log message when the
conversion is completed.

The motivation for this change is that when `--outfile` option is not
specified it migth not be obvious where the output file is written.

With this change the output of running the script will be something like
the following:
```console
INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf.
```

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! convert-hf : print output file name when completed

Updates the output of to support printing the directory if the output is
split into multiple files. Also the output file name is now retrieved
from the model_instance object.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! convert-hf : print output file name when completed

Use parent attribute of Path object and string interpolation.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! convert-hf : print output file name when completed

Use os.sep instead of hardcoding the path separator.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-07-02 09:40:49 +03:00
slaren
0e0590adab
cuda : update supports_op for matrix multiplication (#8245) 2024-07-02 09:39:38 +03:00
luoyu-intel
a9f3b10215
[SYCL] Fix win build conflict of math library (#8230)
* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16
2024-07-02 12:50:07 +08:00
luoyu-intel
d08c20edde
[SYCL] Fix the sub group size of Intel (#8106)
* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size
2024-07-02 10:16:00 +08:00
Xuan Son Nguyen
5fac350b9c
Fix gemma2 tokenizer convert (#8244)
* fix gemma2 tokenizer convert

* remove scores

* improve code, fix new line issue
2024-07-02 01:07:23 +02:00
Johannes Gäßler
cb5fad4c6c
CUDA: refactor and optimize IQ MMVQ (#8215)
* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix
2024-07-01 20:39:06 +02:00
Mateusz Charytoniuk
dae57a1ebc
readme: add Paddler to the list of projects (#8239) 2024-07-01 20:13:22 +03:00
Xuan Son Nguyen
49122a873f
gemma2: add sliding window mask (#8227)
* gemma2: add sliding window mask

* fix data_swa uninitialized

* better naming

* add co-author

Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>

* replace list with single tensor

* update

* llama : minor styling

* convert : add sanity check for query_pre_attn_scalar

* fix small typo in README

---------

Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 18:48:34 +02:00
Mason M
422bfb3e88 Merge branch 'master' into vulkan-build-integration 2024-07-01 11:52:49 -03:00
Mason M
9bca872be0 code review changes 2024-07-01 11:43:58 -03:00
Roni
0ddeff1023
readme : update tool list (#8209)
* Added gppm to Tool list in README

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 15:48:16 +03:00
Michael Francis
3840b6f593
nix : enable curl (#8043)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 14:47:04 +03:00
Georgi Gerganov
257f8e41e2
nix : remove OpenCL remnants (#8235)
* nix : remove OpenCL remnants

* minor : remove parentheses
2024-07-01 14:46:18 +03:00
iacore
694c59cb42
Document BERT support. (#8205)
* Update README.md

document BERT support

* Update README.md
2024-07-01 13:40:58 +02:00
zhentaoyu
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor (#8157)
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
Georgi Gerganov
d0a7145ba9
flake.lock: Update (#8218) 2024-06-30 16:09:34 -07:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203)
* preserve new line llama_chat_format_single

* disable chat template if in-prefix/suffix is set

* remove redundant change
2024-06-30 20:27:13 +02:00
Andrei
1c5eba6f8e
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)
* Add attention and final logit softcapping.

* fix

* Add custom add_ functions

* Disable flash attention for Gemma2

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add default value for attention and final logit softcap value

* Add custom kq scaling from Gemma2Attention

* Remove custom pre attention scaling and use computed value instead.

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-29 23:44:08 -04:00
bandoti
4eab311ed0
Merge branch 'ggerganov:master' into vulkan-build-integration 2024-06-30 00:14:14 -03:00
Mason M
ac9a065d31 Remove Python dependency from Vulkan build 2024-06-30 00:02:41 -03:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli (#8198) 2024-06-29 00:14:20 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator (#8189) 2024-06-28 18:02:05 +01:00
Xuan Son Nguyen
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal (#8172)
* tmp_contains

* minicpm chat template

* add DeepSeek Lite template

* change deepseek-lite to deepseek2

* correct code comment

* correct code from master branch
2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret
38373cfbab
Add SPM infill support (#8016)
* add --spm-infill option

* support --spm-infill

* support --spm-infill
2024-06-28 12:53:43 +02:00
slaren
b851b3fba0
cmake : allow user to override default options (#8178) 2024-06-28 12:37:45 +02:00
Olivier Chafik
139cc621e9
json: restore default additionalProperties to false, fix some pattern escapes (#8180)
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset

* json: revert default of additionalProperties to false

* Update README.md
2024-06-28 09:26:45 +01:00
pculliton
e57dc62057
llama: Add support for Gemma2ForCausalLM (#8156)
* Inference support for Gemma 2 model family

* Update convert-hf-to-gguf.py, constants, and tensor mappings

* cleanup

* format fix

* Fix special token vocab bug

* Don't add space prefix

* fix deleted lines

* Update src/llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Add model type names

* Add control vector

* Fix model type identification

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-27 21:00:43 -07:00
Xuan Son Nguyen
a27aa50ab7
Add missing items in makefile (#8177) 2024-06-28 02:19:11 +02:00
Olivier Chafik
cb0b06a8a6
json: update grammars/README w/ examples & note about additionalProperties (#8132)
* json: update grammars/README

* mention broken prefixItems

* add mention to llama-gbnf-validator

* json: explicit type: object for nested items object in cli example
2024-06-27 22:08:42 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
8172ee9da9
cmake : fix deprecated option names not working (#8171)
* cmake : fix deprecated option names not working

* remove LlAMA_OPENMP
2024-06-27 20:04:39 +02:00