Commit graph

2517 commits

Author SHA1 Message Date
Concedo
c7c3f3d9ab updated lite 2023-11-02 22:46:54 +08:00
Concedo
b0c7b88eac try fix clouflare tunnel (+2 squashed commit)
Squashed commit:

[87d96bf2] update remote option

[c30bc909] updated fixed colab (+1 squashed commits)

Squashed commits:

[97b77563] updated fixed colab (+2 squashed commit)

Squashed commit:

[d851b04c] replaced cloudflare manual dl with remotetunnel in colab

[90ff1790] updated lite
2023-11-02 22:27:35 +08:00
Concedo
6dbb8d82b0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	models/ggml-vocab-llama.gguf
2023-11-02 20:51:45 +08:00
Concedo
42eabf2f2f rope fixes 2023-11-02 20:41:16 +08:00
slaren
21958bb393
cmake : disable LLAMA_NATIVE by default (#3906) 2023-11-02 14:10:33 +02:00
Concedo
bc4ff72317 not working merge 2023-11-02 17:52:40 +08:00
Georgi Gerganov
2756c4fbff
gguf : remove special-case code for GGUFv1 (#3901)
ggml-ci
2023-11-02 11:20:21 +02:00
Georgi Gerganov
1efae9b7dc
llm : prevent from 1-D tensors being GPU split (#3697) 2023-11-02 09:54:44 +02:00
Concedo
fca7a4c054 added noavx2 model for clblast (+1 squashed commits)
Squashed commits:

[291ecae6] added noavx2 mode for clblast (+1 squashed commits)

Squashed commits:

[562bc872] wip adding noavx2 cl
2023-11-02 15:22:34 +08:00
cebtenzzre
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
* cmake : fix build when .git does not exist

* cmake : simplify BUILD_INFO target

* cmake : add missing dependencies on BUILD_INFO

* build : link against build info instead of compiling against it

* zig : make build info a .cpp source instead of a header

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>

* cmake : revert change to CMP0115

---------

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov
4d719a6d4e
cuda : check if this fixes Pascal card regression (#3882) 2023-11-02 08:35:10 +02:00
Georgi Gerganov
183b3fac6c
metal : fix build errors and kernel sig after #2268 (#3898) 2023-11-02 08:33:37 +02:00
Concedo
82267e5e69 switched back to clinfo since it's possibly more cross platform and can get memory vals easily 2023-11-02 14:12:05 +08:00
cebtenzzre
2fffa0d61f
cuda : fix RoPE after #2268 (#3897) 2023-11-02 07:49:44 +02:00
slaren
d480d2c204 ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)
* ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel

* fix warnings

(cherry picked from commit d02e98cde0)
2023-11-02 11:19:53 +08:00
Concedo
1ab18ecb53 Merge commit 'c43c2da8af' into concedo_experimental
# Conflicts:
#	llama.cpp
2023-11-02 11:17:59 +08:00
cebtenzzre
0eb332a10f
llama : fix llama_context_default_params after #2268 (#3893) 2023-11-01 19:29:14 -04:00
slaren
d02e98cde0
ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891)
* ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel

* fix warnings
2023-11-01 23:10:09 +01:00
cebtenzzre
898aeca90a
llama : implement YaRN RoPE scaling (#2268)
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Jeffrey Quesnelle <jquesnelle@gmail.com>
2023-11-01 18:04:33 -04:00
Georgi Gerganov
c43c2da8af
llm : fix llm_build_kqv taking unused tensor (benign, #3837) 2023-11-01 23:08:30 +02:00
Georgi Gerganov
523e49b111
llm : fix falcon norm after refactoring (#3837) 2023-11-01 23:00:50 +02:00
Georgi Gerganov
e16b9fa4ba
metal : multi-simd softmax (#3710)
ggml-ci
2023-11-01 21:25:00 +02:00
Georgi Gerganov
ff8f9a88da
common : minor (#3715) 2023-11-01 21:15:55 +02:00
Georgi Gerganov
50337961a6
llm : add llm_build_context (#3881)
* llm : add llm_build_context

* llm : deduce norm eps based on type + explict max_alibi_bias, clamp_kqv

* llm : restore the non-graph llm_build_ functional API

ggml-ci

* llm : cleanup + comments
2023-11-01 20:11:02 +02:00
bandoti
0e40806c1c
common : allow caller to handle help/argument exceptions (#3715)
* Allow caller to handle help/argument exceptions

* Prepend newline to usage output

* Add new gpt_params_parse_ex function to hide arg-parse impl

* Fix issue blocking success case

* exit instead of returning false

* Update common/common.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/common.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-01 19:42:01 +02:00
Concedo
21588cefd4 tunnel code done (+1 squashed commits)
Squashed commits:

[b4bc7d20] wip integration of trycloudflare
2023-11-01 23:28:23 +08:00
staviq
a2758d08e4
log : make generating separate log files optional (#3787)
* impl --log-new, --log-append

* Update common/log.h

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

* Update common/log.h

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

* Apply suggestions from code review

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-11-01 16:18:27 +02:00
l3utterfly
e75dfdd31b
sampling : null grammar field after reset (#3885) 2023-11-01 15:40:43 +02:00
Concedo
3b227fc704 automatic gpu layer detection 2023-11-01 20:55:26 +08:00
Concedo
b395dbf6f5 wip layer calculator 2023-11-01 20:04:10 +08:00
Georgi Gerganov
9a3b4f6c86
ggml : fix UNUSED macro (#3762) 2023-11-01 13:50:45 +02:00
Andrew Godfrey
73bdcb395e
finetune : add -ngl parameter (#3762)
* Add '-ngl' support to finetune.cpp

* Add fprintf in ggml_cuda_op_add

When I tried CUDA offloading during finetuning following the readme, I got an assert here.
This probably isn't an important case because inference later gives a warning saying you should use f16 or f32 instead when using lora

* Add 'finetune.sh', which currently fails when using GPU

"error: operator (): Finetuning on tensors with type 'f16' is not yet supported"

* tweak finetune.sh

* Suppress some warnings in ggml.c

* Add f16 implementation to ggml_compute_forward_add_f16_f32

* Add an f16 case to ggml_add_cast_impl and llama_build_lora_finetune_graphs

* finetune.sh: Edit comments

* Add "add_f16_f32_f32_cuda"

* Tweak an error message

* finetune.sh: Add an optional LLAMA_MODEL_DIR variable

* finetune.sh: Add an optional LLAMA_TRAINING_DIR variable

* train : minor

* tabs to spaces

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-11-01 13:49:04 +02:00
Concedo
ae2cd56de8 kobold integration of min_p sampler (+1 squashed commits)
Squashed commits:

[8ad2e349] kobold integration for min_p sampler
2023-11-01 19:08:45 +08:00
Concedo
bcb397953f Merge remote-tracking branch 'llama.cpp/try-fix-3869' into concedo_experimental 2023-11-01 18:29:08 +08:00
Concedo
92d80b94b3 bundle simpleclinfo into pyinstaller except for linux 2023-11-01 18:26:15 +08:00
Concedo
9342636408 Merge branch 'master' into concedo_experimental
# Conflicts:
#	flake.lock
#	flake.nix
2023-11-01 18:24:36 +08:00
Concedo
df7e757d40 windows: added simpleclinfo, which helps determine clblast platform and device on windows 2023-11-01 18:10:35 +08:00
Georgi Gerganov
f0e209324a
scripts : add server-llm.sh (#3868)
* scripts : add deploy-server.sh

* scripts : rename to server-llm.sh

* scripts : working curl pipe
2023-11-01 11:29:07 +02:00
Adrian Hesketh
ca190bca8e
server : re-enable completion and embedded at the same time (#3876) 2023-11-01 11:28:28 +02:00
Georgi Gerganov
71e3718abd
llama : refactor graph build code (#3837)
* llama : factor out ggml-alloc from graph graph build functions

ggml-ci

* metal : disable kernel load log

* llama : factor out tensor offloading outside the build call (wip)

ggml-ci

* llama : offload rest of the models

ggml-ci

* llama : update offload log messages to print node index

* llama : comments

* llama : support offloading result_norm + comments

* llama : factor graph input into a function

* llama : do tensor offload only with CUDA

* llama : fix res_norm offloading

* llama : try to optimize offloading code

* llama : fix non-CUDA build

* llama : try to fix build

* llama : move refact in correct place + optimize graph input

* llama : refactor tensor offloading as callback

* llama : add layer index to all tensor names

* llama : add functional header

* llama : comment

ggml-ci

* llama : remove obsolete map for layer counting

* llama : add llm_build helper functions (#3848)

* llama : add llm_build_norm helper function

ggml-ci

* llama : add llm_build_ffn helper function (#3849)

ggml-ci

* llama : add llm_build_k_shift helper

ggml-ci

* llama : fix offloading after recent changes

* llama : add llm_build_kv_store helper

ggml-ci

* llama : remove obsolete offload names

* llama : fix llm_build_k_shift to use n_head_kv instead of n_head

* llama : simplify falcon Q, K, V computation

* llama : remove obsolete comments in build graphs

* llama : add llm_build_kqv helper

ggml-ci

* llama : minor

* llama : add LLAMA_OFFLOAD_DEBUG + fix starcoder offloading

* llama : fix input allocation logic

* llama : update offload functions for KQ tensors

* llama : normalize tensor names

ggml-ci

* llama : enable warning about not offloaded tensors

* llama : remove extra ; + deduplicate gate_b logic

* llama : add llm_build_inp_embd helper
2023-11-01 08:04:02 +02:00
kalomaze
238657db23
samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841)
* Introduce the new Min-P sampler by @kalomaze
   The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token.

* Min-P enabled and set to 0.05 default

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-10-31 20:44:49 +01:00
Georgi Gerganov
22cc9bef09
cuda : check if this fixes Pascal card regression 2023-10-31 20:01:47 +02:00
Tungsten842
07178c98e1
flake.nix: fix for rocm 5.7 (#3853) 2023-10-31 19:24:03 +02:00
Concedo
43a5143450 added clinfo binary, cleanup unused stuff 2023-10-31 22:25:25 +08:00
Concedo
f3690ba6d2 shifting enabled by default 2023-10-31 21:41:57 +08:00
Concedo
e62f38abd1 Merge branch 'master' into concedo_experimental
# Conflicts:
#	tests/test-double-float.cpp
#	tests/test-quantize-fns.cpp
2023-10-31 21:09:49 +08:00
Concedo
cc5b282350 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	build.zig
#	flake.lock
#	flake.nix
#	ggml.c
2023-10-31 20:44:04 +08:00
Georgi Gerganov
207b51900e
ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861)
* ggml : move FP16 <-> FP32 stuff to ggml-impl.h

ggml-ci

* tests : fix ARM build

* ggml : explicitly initialize deprecated type traits

* ggml : add math.h to ggml-impl.h

* ggml : remove duplicate static assert macros

* ggml : prefix lookup tables with ggml_

ggml-ci

* ggml-impl : move extern "C" to start of file
2023-10-30 19:19:15 +02:00
Concedo
9eba77c6a0 finally got something workable 2023-10-30 23:30:21 +08:00
Concedo
61c395833d context shifting is still buggy 2023-10-30 16:25:01 +08:00