slaren
|
d41cef9326
|
minor
|
2024-01-08 13:42:20 +01:00 |
|
slaren
|
444b975edd
|
llama : rewrite session kv load/set without graphs
|
2024-01-08 12:56:31 +01:00 |
|
Georgi Gerganov
|
52531fdff8
|
main : add self-extend support (#4815)
* examples : add passkey test
* passkey : better prints
* passkey : select pass key pos from CLI
* passkey : simplify n_past logic
* llama : "self-extend"-like context extension
* passkey : add comment
* main : add Self-Extend support
* llama : add comment about llama_kv_cache_seq_div
|
2024-01-08 11:18:32 +02:00 |
|
Georgi Gerganov
|
b0034d93ce
|
examples : add passkey test (#3856)
* examples : add passkey test
* passkey : better prints
* passkey : select pass key pos from CLI
* passkey : simplify n_past logic
* make : add passkey target
* passkey : add "self-extend"-like context extension (#4810)
* llama : "self-extend"-like context extension
* passkey : add comment
* passkey : add readme
|
2024-01-08 11:14:04 +02:00 |
|
slaren
|
ac145fd2e3
|
ggml : fix mul_mat_id work size
|
2024-01-08 03:51:15 +01:00 |
|
slaren
|
5e879c9977
|
llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row)
|
2024-01-07 23:26:49 +01:00 |
|
Lars Grammel
|
b7e7982953
|
readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814)
|
2024-01-07 22:24:11 +02:00 |
|
slaren
|
87c8207a04
|
Merge remote-tracking branch 'origin/master' into sl/backend-sched
|
2024-01-07 17:59:26 +01:00 |
|
slaren
|
226460cc0d
|
llama-bench : add no-kv-offload parameter (#4812)
|
2024-01-07 17:59:01 +01:00 |
|
Johannes Gäßler
|
d5a410e855
|
CUDA: fixed redundant value dequantization (#4809)
|
2024-01-07 17:24:08 +01:00 |
|
slaren
|
7c16cf106d
|
test-backend-ops : check buffer allocation failures
|
2024-01-07 13:50:02 +01:00 |
|
Georgi Gerganov
|
9dede37d81
|
llama : remove unused vars (#4796)
|
2024-01-07 14:29:36 +02:00 |
|
Georgi Gerganov
|
f77c72f371
|
ggml : fix null backend dereference (#4807)
* ggml : fix null backend dereference
* ggml : also check ggml_backend_is_cpu
|
2024-01-07 12:06:57 +01:00 |
|
Georgi Gerganov
|
3c36213df8
|
llama : remove redundant GQA check (#4796)
|
2024-01-07 11:21:53 +02:00 |
|
Alex Azarov
|
72d8407b36
|
llama.swiftui : use llama.cpp as SPM package (#4804)
|
2024-01-07 10:20:50 +02:00 |
|
Georgi Gerganov
|
d117d4dc5d
|
llama : print tensor meta for debugging
|
2024-01-07 09:51:12 +02:00 |
|
Alex Azarov
|
3418c03ecc
|
llama.swiftui : add visionOS target (#4805)
|
2024-01-07 09:46:55 +02:00 |
|
Konstantin Zhuravlyov
|
63ee677efd
|
ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787)
|
2024-01-07 08:52:42 +02:00 |
|
Georgi Gerganov
|
67984921a7
|
server : fix n_predict check (#4798)
|
2024-01-07 08:45:26 +02:00 |
|
slaren
|
72b74f364b
|
cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available)
|
2024-01-07 00:33:51 +01:00 |
|
slaren
|
2f2c36799d
|
cuda : add ggml-backend split buffer support
|
2024-01-07 00:09:26 +01:00 |
|
Daniel Illescas Romero
|
c75ca5d96f
|
llama.swiftui : use correct pointer for llama_token_eos (#4797)
|
2024-01-06 17:12:59 +02:00 |
|
Georgi Gerganov
|
96e80dabc6
|
examples : improve base-translate.sh script (#4783)
|
2024-01-06 11:40:24 +02:00 |
|
slaren
|
ece0b0d855
|
improve graph splitting, partial fix for --no-kv-offload
|
2024-01-06 05:17:15 +01:00 |
|
slaren
|
d107459321
|
ggml-backend : increase GGML_MAX_BACKENDS
|
2024-01-06 01:02:24 +01:00 |
|
slaren
|
863ef45539
|
llama : check for null tensor_split
|
2024-01-06 01:02:24 +01:00 |
|
Georgi Gerganov
|
1fa7ee2e51
|
batched-bench : add tensor_split param
|
2024-01-06 01:02:24 +01:00 |
|
slaren
|
a1ab35c682
|
fix unmap after loading
|
2024-01-06 01:02:24 +01:00 |
|
slaren
|
6483328fa9
|
ggml-backend : add names to buffers
|
2024-01-06 01:02:24 +01:00 |
|
slaren
|
33f0761e9b
|
llama : ggml-backend integration
|
2024-01-06 01:02:24 +01:00 |
|
a-n-n-a-l-e-e
|
eec22a1c63
|
cmake : check for openblas64 (#4134)
openblas v0.3.22 64-bit pkg-config file is named openblas64.pc
https://github.com/OpenMathLib/OpenBLAS/issues/3790
|
2024-01-05 18:04:40 +02:00 |
|
Ikko Eltociear Ashimine
|
be36bb946a
|
flake.nix : fix typo (#4700)
betwen -> between
|
2024-01-05 18:02:44 +02:00 |
|
Georgi Gerganov
|
91d38876df
|
metal : switch back to default.metallib (ggml/681)
ggml-ci
|
2024-01-05 18:02:06 +02:00 |
|
Georgi Gerganov
|
d061bf9405
|
ggml : fix q2_k bpw in comments (ggml/680)
|
2024-01-05 18:02:06 +02:00 |
|
Finn Voorhees
|
1bf681f90e
|
ggml : add error handling to graph_compute (whisper/1714)
|
2024-01-05 18:02:06 +02:00 |
|
Georgi Gerganov
|
c1d7cb28d3
|
ggml : do not sched_yield when calling BLAS (#4761)
* ggml : do not sched_yield when calling BLAS
ggml-ci
* ggml : fix do_yield logic
ggml-ci
* ggml : simplify do_yield logic
ggml-ci
|
2024-01-05 15:18:21 +02:00 |
|
Georgi Gerganov
|
3681f22443
|
examples : add few-shot translation example (#4783)
|
2024-01-05 15:11:10 +02:00 |
|
Daniel Bevenius
|
b3a7c20b5c
|
finetune : remove unused includes (#4756)
This commit removes unused includes from finetune.cpp.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
|
2024-01-04 21:45:37 +02:00 |
|
Georgi Gerganov
|
012cf349ae
|
server : send token probs for "stream == false" (#4714)
|
2024-01-04 19:56:33 +02:00 |
|
Johannes Gäßler
|
a91928014f
|
Print backend name on test-backend-ops failure (#4751)
|
2024-01-04 09:43:23 +01:00 |
|
singularity
|
3c0b585561
|
llama.swiftui : support loading custom model from file picker (#4767)
* swiftui: support load model from file picker
* swiftui: remove trailing whitespace
|
2024-01-04 10:22:38 +02:00 |
|
Michael Coppola
|
e5804313a1
|
server : fix options in README.md (#4765)
* fix examples/server/README.md
* minor : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2024-01-04 10:17:09 +02:00 |
|
Georgi Gerganov
|
dc891b7f7a
|
ggml : include stdlib.h before intrin.h (#4736)
|
2024-01-04 10:12:26 +02:00 |
|
singularity
|
46cea79e1f
|
llama.swiftui : fix build of ggml.metallib (#4754)
* metal: fix metal backend init failure in swiftui
* metal: build ggml.metallib instead of copy src
* llama.swift : remove debug flags from metallib build
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2024-01-04 09:58:16 +02:00 |
|
Daniel Bevenius
|
cb1e2818e0
|
train : fix typo in overlapping-samples help msg (#4758)
This commit fixes a typo in the help message for the
--overlapping-samples option.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
|
2024-01-03 19:53:40 +02:00 |
|
Ashraful Islam
|
ece9a45e8f
|
swift : update Package.swift to use ggml as dependency (#4691)
* updates the package.swift to use ggml as dependency
* changes the ggml package url src to ggerganov
|
2024-01-03 19:30:02 +02:00 |
|
Georgi Gerganov
|
7bed7eba35
|
cuda : simplify expression
Co-authored-by: slaren <slarengh@gmail.com>
|
2024-01-03 14:38:38 +02:00 |
|
Georgi Gerganov
|
d55356d3ba
|
cuda : mark I16 and I32 ops as unsupported
ggml-ci
|
2024-01-03 14:38:38 +02:00 |
|
Georgi Gerganov
|
75e3fd8581
|
sync : ggml
ggml-ci
|
2024-01-03 14:38:38 +02:00 |
|
Georgi Gerganov
|
289313716f
|
metal : add kernel_get_rows_i32
ggml-ci
|
2024-01-03 14:38:38 +02:00 |
|