Concedo
5cc64ebb52
dynatemp wizard
2024-01-09 15:51:32 +08:00
Concedo
550829ed98
dont get stuck if cloudflared failed to download correctly
2024-01-08 21:11:17 +08:00
Concedo
f04b6e7287
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# CMakeLists.txt
# README.md
# ggml-metal.m
# ggml.c
2024-01-08 14:18:49 +08:00
Lars Grammel
b7e7982953
readme : add lgrammel/modelfusion JS/TS client for llama.cpp ( #4814 )
2024-01-07 22:24:11 +02:00
slaren
226460cc0d
llama-bench : add no-kv-offload parameter ( #4812 )
2024-01-07 17:59:01 +01:00
Johannes Gäßler
d5a410e855
CUDA: fixed redundant value dequantization ( #4809 )
2024-01-07 17:24:08 +01:00
Georgi Gerganov
9dede37d81
llama : remove unused vars ( #4796 )
2024-01-07 14:29:36 +02:00
Georgi Gerganov
3c36213df8
llama : remove redundant GQA check ( #4796 )
2024-01-07 11:21:53 +02:00
Alex Azarov
72d8407b36
llama.swiftui : use llama.cpp as SPM package ( #4804 )
2024-01-07 10:20:50 +02:00
Georgi Gerganov
d117d4dc5d
llama : print tensor meta for debugging
2024-01-07 09:51:12 +02:00
Alex Azarov
3418c03ecc
llama.swiftui : add visionOS target ( #4805 )
2024-01-07 09:46:55 +02:00
Konstantin Zhuravlyov
63ee677efd
ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 ( #4787 )
2024-01-07 08:52:42 +02:00
Georgi Gerganov
67984921a7
server : fix n_predict check ( #4798 )
2024-01-07 08:45:26 +02:00
Daniel Illescas Romero
c75ca5d96f
llama.swiftui : use correct pointer for llama_token_eos ( #4797 )
2024-01-06 17:12:59 +02:00
Georgi Gerganov
96e80dabc6
examples : improve base-translate.sh script ( #4783 )
2024-01-06 11:40:24 +02:00
Concedo
b614a86dd9
disable print statements for dynatemp
2024-01-06 11:14:58 +08:00
kalomaze
123bff9a0f
Full DynaTemp implementation + UI ( #600 )
...
* move Dynatemp changes to new branch
* fix float header
* Properly reintroduce variable expert count
Controllable through experts.txt
* first pass at DynaTemp UI
Checkbox partial implemented, Min and Max Temp implemented
* DynaTemp UI Checkbox
Trigger DynaTemp on checkbox
* DynaTemp UI checkbox edition
Hell Yeah! DynaTemp!
* Remove greedy dynatemp
* Fix race condition caused by debug print
* Fixed broken presets and miro
Fixes broken presets and mirostat
* Remove debug function + HHI temp
Also removed unnecessary softmax double precision
* Fix whitespace (?) for generate function
* epic upstream renaming scheme fix
* fix stupid indents
* Other cleanup
Reintroduce unused rep pen function, move temp functions first before entropy dynamic temp
* Slight indent fix
* revert batch pyinstaller maker to mainline
and also delete experts.txt since adjustable routing is also being removed for the PR
* compact dynatemp into a single value dynatemp_range. This is a float which represents the allowed deviation from the min and max temperature when using dynatemp. Thus, if we want a value of dynatemp_min=0.3, dynatemp_max=0.5, then we would simply set temperature=0.4 and dynatemp_range=0.1. Functionally dynatemp would operate the same, but it would simplify usage and make it a single easy to adjust value.
---------
Co-authored-by: Alexander Abushady <aabushady214@gmail.com>
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-01-06 11:13:16 +08:00
a-n-n-a-l-e-e
eec22a1c63
cmake : check for openblas64 ( #4134 )
...
openblas v0.3.22 64-bit pkg-config file is named openblas64.pc
https://github.com/OpenMathLib/OpenBLAS/issues/3790
2024-01-05 18:04:40 +02:00
Ikko Eltociear Ashimine
be36bb946a
flake.nix : fix typo ( #4700 )
...
betwen -> between
2024-01-05 18:02:44 +02:00
Georgi Gerganov
91d38876df
metal : switch back to default.metallib (ggml/681)
...
ggml-ci
2024-01-05 18:02:06 +02:00
Georgi Gerganov
d061bf9405
ggml : fix q2_k bpw in comments (ggml/680)
2024-01-05 18:02:06 +02:00
Finn Voorhees
1bf681f90e
ggml : add error handling to graph_compute (whisper/1714)
2024-01-05 18:02:06 +02:00
Concedo
427ba21e62
add stub values for usage, revert cuda malloc pool implementation (+1 squashed commits)
...
Squashed commits:
[fd4cfb44] add stub values for usage, revert cuda malloc pool implementation
2024-01-05 21:58:16 +08:00
Georgi Gerganov
c1d7cb28d3
ggml : do not sched_yield when calling BLAS ( #4761 )
...
* ggml : do not sched_yield when calling BLAS
ggml-ci
* ggml : fix do_yield logic
ggml-ci
* ggml : simplify do_yield logic
ggml-ci
2024-01-05 15:18:21 +02:00
Georgi Gerganov
3681f22443
examples : add few-shot translation example ( #4783 )
2024-01-05 15:11:10 +02:00
Concedo
c9fdd42da2
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Package.swift
2024-01-05 18:32:54 +08:00
Concedo
20261049c9
try to reuse cloudflared file
2024-01-05 18:04:09 +08:00
Daniel Bevenius
b3a7c20b5c
finetune : remove unused includes ( #4756 )
...
This commit removes unused includes from finetune.cpp.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-04 21:45:37 +02:00
Georgi Gerganov
012cf349ae
server : send token probs for "stream == false" ( #4714 )
2024-01-04 19:56:33 +02:00
Johannes Gäßler
a91928014f
Print backend name on test-backend-ops failure ( #4751 )
2024-01-04 09:43:23 +01:00
singularity
3c0b585561
llama.swiftui : support loading custom model from file picker ( #4767 )
...
* swiftui: support load model from file picker
* swiftui: remove trailing whitespace
2024-01-04 10:22:38 +02:00
Michael Coppola
e5804313a1
server : fix options in README.md ( #4765 )
...
* fix examples/server/README.md
* minor : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04 10:17:09 +02:00
Georgi Gerganov
dc891b7f7a
ggml : include stdlib.h before intrin.h ( #4736 )
2024-01-04 10:12:26 +02:00
singularity
46cea79e1f
llama.swiftui : fix build of ggml.metallib ( #4754 )
...
* metal: fix metal backend init failure in swiftui
* metal: build ggml.metallib instead of copy src
* llama.swift : remove debug flags from metallib build
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-04 09:58:16 +02:00
Daniel Bevenius
cb1e2818e0
train : fix typo in overlapping-samples help msg ( #4758 )
...
This commit fixes a typo in the help message for the
--overlapping-samples option.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-03 19:53:40 +02:00
Ashraful Islam
ece9a45e8f
swift : update Package.swift to use ggml as dependency ( #4691 )
...
* updates the package.swift to use ggml as dependency
* changes the ggml package url src to ggerganov
2024-01-03 19:30:02 +02:00
Concedo
d37c94bcd9
Merge branch 'master' into concedo_experimental
2024-01-03 22:46:49 +08:00
Concedo
234f79fe9d
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# ci/run.sh
# llama.cpp
2024-01-03 22:33:38 +08:00
Georgi Gerganov
7bed7eba35
cuda : simplify expression
...
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03 14:38:38 +02:00
Georgi Gerganov
d55356d3ba
cuda : mark I16 and I32 ops as unsupported
...
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
75e3fd8581
sync : ggml
...
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
289313716f
metal : add kernel_get_rows_i32
...
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
ab62fc3e55
scripts : fix sync order + metal sed
2024-01-03 14:38:38 +02:00
Guillaume Wenzek
5f66ebca9c
ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)
...
* add more int ops
* ggml_compute_forward_dup_bytes
* add tests
* PR comments
* tests : minor indentations
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:38:38 +02:00
Justin Parker
f2eb19bd8b
server : throw an error when slot unavailable
( #4741 )
2024-01-03 10:43:19 +02:00
Concedo
e49d398f73
use same struct size for cuda and non cuda (+1 squashed commits)
...
Squashed commits:
[6eee8e2f] use same struct size for cuda and non cuda
2024-01-03 16:05:54 +08:00
Georgi Gerganov
f3f62f0d83
metal : optimize ggml_mul_mat_id (faster Mixtral PP) ( #4725 )
...
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (#4545 )
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
* metal : optimizing ggml_mul_mat_id (wip)
* metal : minor fix
* metal : opt mul_mm_id
2024-01-02 21:07:47 +02:00
Phil H
0ef3ca2ac6
server : add token counts to html footer ( #4738 )
...
* server: add token counts to stats
* server: generate hpp
---------
Co-authored-by: phiharri <ph@got-root.co.uk>
2024-01-02 17:48:49 +02:00
Georgi Gerganov
540938f890
llama : llama_model_desc print number of experts
2024-01-02 16:26:45 +02:00
Marcus Dunn
0040d42eeb
llama : replace all API facing int
's with int32_t
( #4577 )
...
* replaced all API facing `int`'s with `int32_t`
* formatting and missed `int` in `llama_token_to_piece`
2024-01-02 16:15:16 +02:00