Cebtenzzre
ad3c2f3b23
comment out n_past instead of marking it unused
2023-10-09 10:16:24 -04:00
Cebtenzzre
292363e556
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into pull-3417
2023-10-09 09:26:36 -04:00
Jan Ploski
7d6a24aad4
mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252
2023-10-09 09:19:10 -04:00
slaren
95bd60a0a6
ggml-alloc : fix assert in debug builds ( #3555 )
2023-10-09 15:44:58 +03:00
Georgi Gerganov
fcca0a7004
refact : fix convert script + zero out KV cache to avoid nans ( #3523 )
...
* refact : fix convert script + zero out KV cache to avoid nans
* ggml : silu(-inf) should never happen
* metal : assert various kernel requirements
2023-10-09 14:32:17 +03:00
Georgi Gerganov
dcc09d2596
metal : do not use mul_mm kernels when ne00 < 64 ( #3542 )
2023-10-09 14:28:27 +03:00
Georgi Gerganov
db3abcc114
sync : ggml (ggml-backend) ( #3548 )
...
* sync : ggml (ggml-backend)
ggml-ci
* zig : add ggml-backend to the build
2023-10-08 20:19:14 +03:00
Matheus C. França
eee42c670e
ci : add Zig CI/CD and fix build ( #2996 )
...
* zig CI/CD and fix build
Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com>
* fix build_compiler
* ci : remove trailing whitespace
---------
Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08 16:59:20 +03:00
Ryder Wishart
8e6716a102
api_like_OAI.py : compat with Microsoft Guidance ( #2746 )
...
Check for None in addition to empty string check in all request params
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08 13:55:58 +03:00
arcrank
9c38d181d4
api_like_OAI.py : simplify function ( #2796 )
...
Simplify function
2023-10-08 13:52:57 +03:00
Johannes Rudolph
a1202a31ed
k-quants : fix comments about block sizing ( #3499 )
2023-10-08 13:21:19 +03:00
Georgi Gerganov
94e502dfb7
ci : enable on obj-c changes + fix metal build ( #3540 )
2023-10-08 11:24:50 +03:00
Luo Tian
7d8b24932f
zig : fix build by introducing train.cpp ( #3539 )
2023-10-08 11:24:01 +03:00
Georgi Gerganov
b0ec5218c3
metal : support MTLGPUFamily < Apple7, formatting, style ( #3524 )
...
* metal : improve decoding speed for batches of 2-16
* metal : rename kernels mul_mat_ to mul_mv_
* metal : indentations
* minor
* metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
2023-10-08 10:01:53 +03:00
Kerfuffle
63d3b06a43
llama : fix missing break in Persimmon arch case statements ( #3535 )
2023-10-08 08:22:17 +03:00
Kerfuffle
a16e89cec8
Fix trying to strip newline from empty prompt and cfg prompt file content ( #3534 )
2023-10-07 15:31:41 -06:00
M. Yusuf Sarıgöz
4d03833211
gguf.py : fix CI for publishing GGUF package ( #3532 )
...
* Fix CI for publishing GGUF package
* Bump version
* fix
* bump version
* bump version
* bump version
2023-10-07 22:14:10 +03:00
Tom C
c47066d833
py : change version of numpy requirement to 1.24.4 ( #3515 )
...
Co-authored-by: Lyjia <me@lyjia.us>
2023-10-07 12:56:15 +03:00
cebtenzzre
f1782c68de
quantize : fail fast on write errors ( #3521 )
2023-10-07 11:41:52 +03:00
Jhen-Jie Hong
c26765a0a1
metal : support default.metallib load & reuse code for swift package ( #3522 )
...
* metal : support load default.metallib & reuse code for swift package
* metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
2023-10-07 11:40:27 +03:00
Phillip Kravtsov
0e797c2fc5
llm : support Adept Persimmon 8B ( #3410 )
...
* Produces garbage output
* wip: correct tensors up to RoPE
* correct tensors thru RoPE
* Correct outputs through masked & softmax'd KQ
* fp32 works
* Rename adept->persimmon
* Produces correct outputs
* clean up convert scripts
* remove printing logic from ggml.c
* remove prints from llama.cpp & fix merge
* trivial cleanups
* Add offload funcs
* update conversion script to directly take adept artifacts rather than .saftensors file
* Fix norm eps bug
* Support sqr and concat on metal, persimmon-8b-q4 runs correctly
* Small changes from review
* Formatting changes
* Minor changes to conversion script
* Remove old script
* Fix editorconfig formatting
* Fix build
* add overlooked offload code ggml-ci
2023-10-07 10:12:43 +03:00
goerch
3a716b4dae
Fix for #3454 ( #3455 )
...
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
2023-10-07 06:57:01 +02:00
BarfingLemurs
1faaae8c2b
readme : update models, cuda + ppl instructions ( #3510 )
2023-10-06 22:13:36 +03:00
Mihai
cb13d73a72
server : docs fix default values and add n_probs ( #3506 )
2023-10-06 21:39:33 +03:00
Kerfuffle
9ca79d5cbb
kv cache slot search improvements ( #3493 )
...
* kv cache slot search improvements
* Use n_ctx in kv find slot for consistency
* Ensure kv cache head points to a valid slot in llama_decode internal
* Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06 10:10:13 -06:00
Georgi Gerganov
0c731ca403
prompts : fix editorconfig checks after #3416
2023-10-06 16:36:32 +03:00
pudepiedj
a8777ad84e
parallel : add option to load external prompt file ( #3416 )
...
* Enable external file and add datestamp
* Add name of external file at end
* Upload ToK2024
* Delete ToK2024.txt
* Experiments with jeopardy
* Move ParallelQuestions to /proimpts and rename
* Interim commit
* Interim commit
* Final revision
* Remove trailing whitespace
* remove cmake_all.sh
* Remove cmake_all.sh
* Changed .gitignore
* Improved reporting and new question files.
* Corrected typo
* More LLM questions
* Update LLM-questions.txt
* Yet more LLM-questions
* Remove jeopardy results file
* Reinstate original jeopardy.sh
* Update examples/parallel/parallel.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06 16:16:38 +03:00
Jhen-Jie Hong
97af49fa39
server : reuse llama_sample_token common util ( #3494 )
...
* server : reuse llama_sample_token common function
* common : use n_probs for temperature sampling
2023-10-06 15:44:24 +03:00
l3utterfly
16820a5a0d
llama : correct hparams comparison ( #3446 )
...
* fixed floating point comparison issues
* updated implementation for hparam comparison to handle inf and NaN
* fixed code review comments
* minor simplification
* rename is_float_eq -> is_float_close
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06 13:47:59 +03:00
Jhen-Jie Hong
04b2f4386e
ci : fix xcodebuild destinations ( #3491 )
...
* ci : fix xcodebuild destinations
* ci : add .swift to paths
2023-10-06 13:36:43 +03:00
cebtenzzre
48edda30ee
convert : update Falcon script for new HF config ( #3448 )
...
Also adds Falcon-180B support.
Closes #3049
Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05 15:00:34 -04:00
Kenvix ⭐
45eba9369f
build : use std::make_tuple() for compatibility with older GCC versions ( #3488 )
2023-10-05 20:16:39 +03:00
staviq
acec9eaaa9
common : process escape sequences in reverse prompts ( #3461 )
2023-10-05 19:17:29 +03:00
shibe2
e2583cbc29
CLBlast: Fix handling of on-device tensor data
...
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
2023-10-05 18:25:23 +04:00
Jhen-Jie Hong
e8b8d32e86
server : fix incorrect num_tokens_predicted ( #3480 )
2023-10-05 17:02:55 +03:00
Jhen-Jie Hong
8f3a642ec1
swift : disable ACCELERATE_NEW_LAPACK ( #3481 )
2023-10-05 17:00:07 +03:00
Jhen-Jie Hong
0745384449
ci : add swift build via xcodebuild ( #3482 )
2023-10-05 16:56:21 +03:00
Kerfuffle
019ba1dcd0
convert : fix Baichuan2 models by using vocab size in config.json ( #3299 )
...
Use local GGUF package when possible in Baichuan converter
2023-10-04 17:20:28 +03:00
Georgi Gerganov
beabc8cfb0
readme : add project status link
2023-10-04 16:50:44 +03:00
Georgi Gerganov
0d152b37fe
ggml : fix build after #3329
2023-10-04 16:25:41 +03:00
ds5t5
f8c90cdbaa
llm : add Refact model ( #3329 )
...
* add refact model
* resolve comments
* rebase to the latest
* solve alibi cpu error
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04 16:23:39 +03:00
Georgi Gerganov
f93af02488
sync : ggml (conv 1d + 2d updates, UB fixes) ( #3468 )
...
* sync : ggml (conv 1d + 2d updates)
ggml-ci
* ggml : fix UB in q5_0 and q5_1 quantize code
ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
ggml-ci
* tests : fix UB in test-quantize-perf
2023-10-04 15:29:58 +03:00
Merrick Christensen
f72f8f22c9
finetune : readme fix typo ( #3465 )
...
Fix small typo
2023-10-04 09:33:13 +03:00
Jan Ploski
1364bcd712
mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt
2023-10-03 21:53:31 +02:00
Tameem
79f34abddb
ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics ( #3453 )
...
* Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v.
The RVV intrinsics is added for the following quantize row functions
quantize_row_q8_0
quantize_row_q8_1
The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1
ggml_vec_dot_q4_0_q8_0
ggml_vec_dot_q4_1_q8_1
ggml_vec_dot_q5_0_q8_0
ggml_vec_dot_q5_1_q8_1
And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
* Added RVV intrinsics support for k_quants
This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64
ggml_vec_dot_q2_K_q8_K
ggml_vec_dot_q3_K_q8_K
ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q5_K_q8_K
ggml_vec_dot_q6_K_q8_K
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
---------
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03 21:38:19 +03:00
h-h-h-h
8186242b6d
main : consistent prefix/suffix coloring ( #3425 )
...
* Typo
* No `--in-prefix` coloring
The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03 21:16:15 +03:00
Georgi Gerganov
ac2219fef3
llama : fix session saving/loading ( #3400 )
...
* llama : fix session saving/loading
* llama : temp fix for clearing "future" tokens from the KV cache
* llama : fix handling of "future" tokens when loading sessions
* llama : fix comments for llama_kv_cache API
2023-10-03 21:04:01 +03:00
Alex Klinkhamer
48be797ffb
llama : expose model's rope_freq_scale in the API ( #3418 )
...
so it can be scaled further before creating a context.
2023-10-03 20:09:28 +03:00
Jiahao Li
f56e1baec3
metal : alibi for arbitrary number of heads ( #3426 )
2023-10-03 19:55:21 +03:00
Eve
017efe899d
cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor ( #3273 )
...
* fix LLAMA_NATIVE
* syntax
* alternate implementation
* my eyes must be getting bad...
* set cmake LLAMA_NATIVE=ON by default
* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc
* revert 8283237
and only allow LLAMA_NATIVE on x86 like the Makefile
* remove -DLLAMA_MPI=ON
---------
Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03 19:53:15 +03:00