Georgi Gerganov
58b16695e1
sync : ggml
2024-10-05 15:53:49 +03:00
Georgi Gerganov
905f5485b2
metal : zero-init buffer contexts (whisper/0)
2024-10-05 15:53:00 +03:00
Viet-Anh NGUYEN (Andrew)
71967c2a6d
Add Llama Assistant ( #9744 )
2024-10-04 20:29:35 +02:00
Georgi Gerganov
17880771ad
sync : ggml
2024-10-04 18:50:25 +03:00
Daniel Bevenius
55951c018d
ggml : fix typo in example usage ggml_gallocr_new (ggml/984)
2024-10-04 18:50:05 +03:00
Diego Devesa
ff565769f2
ggml : fixes after sync (ggml/983)
...
ggml : remove test-backend-buffer
ggml : fix CUDA build warnings
2024-10-04 18:50:04 +03:00
Xuan Son Nguyen
f3fdcfaa79
ci : fine-grant permission ( #9710 )
2024-10-04 11:47:19 +02:00
Daniel Kleine
133c7b46b3
Fixed RNG seed docs ( #9723 )
...
* Update README.md
fixed RNG seed info
* changed print format to unsigned
2024-10-04 10:54:44 +02:00
ochafik
a151ddcd5a
agent
: handle function errors and dont' stringify str outputs
2024-10-04 04:06:00 +01:00
Olivier Chafik
21a3c90a1c
agent
: tool tweaks (remove ansi escapes from python output, update env keys + provider docs)
2024-10-03 22:20:34 +01:00
Olivier Chafik
366efc8a18
tool-call
: fix llama 3.x tc parsing when there are spaces before "name"
2024-10-03 21:46:41 +01:00
Olivier Chafik
da02397f7f
agent
: support more providers (+ extract serve_tools_inside_docker.sh)
...
update readme
2024-10-03 19:23:32 +01:00
Olivier Chafik
b4fc1e8ba7
tool-call
: adjust triggers to most common tool call variations from Llama-3.1-8B and Llama-3.2-3B
2024-10-03 19:23:08 +01:00
Olivier Chafik
ece12b074f
antiprompts
: ensure partial match is at end of string (or else server stops sending replies)
2024-10-03 19:23:08 +01:00
Georgi Gerganov
d5ed2b929d
metal : remove abort (skip) (ggml/0)
2024-10-03 21:18:19 +03:00
Georgi Gerganov
1bb8a64ebf
sync : ggml
2024-10-03 21:17:49 +03:00
Johannes Gäßler
fabdc3bda3
ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)
2024-10-03 21:17:26 +03:00
Johannes Gäßler
eee39bdc96
ggml: refactor cross entropy loss CPU impl. (ggml/976)
2024-10-03 21:17:26 +03:00
Jack Mousseau
5d5ab1e5cc
metal : fix compute pass descriptor autorelease crash ( #9718 )
2024-10-03 21:01:46 +03:00
Diego Devesa
a7ad553513
ggml-backend : add device description to CPU backend ( #9720 )
2024-10-03 17:39:18 +02:00
bandoti
d6fe7abf04
ggml: unify backend logging mechanism ( #9709 )
...
* Add scaffolding for ggml logging macros
* Metal backend now uses GGML logging
* Cuda backend now uses GGML logging
* Cann backend now uses GGML logging
* Add enum tag to parameters
* Use C memory allocation funcs
* Fix compile error
* Use GGML_LOG instead of GGML_PRINT
* Rename llama_state to llama_logger_state
* Prevent null format string
* Fix whitespace
* Remove log callbacks from ggml backends
* Remove cuda log statement
2024-10-03 17:39:03 +02:00
compilade
e3c355ba65
convert : handle tokenizer merges format from transformers 4.45 ( #9696 )
2024-10-03 17:22:15 +03:00
Radoslav Gerganov
841713e1e4
rpc : enable vulkan ( #9714 )
...
closes #8536
2024-10-03 13:00:52 +03:00
Ouadie EL FAROUKI
5639971466
Fixed dequant precision issues in Q4_1 and Q5_1 ( #9711 )
2024-10-03 07:50:44 +01:00
Diego Devesa
c83ad6d01e
ggml-backend : add device and backend reg interfaces ( #9707 )
...
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
Olivier Chafik
fa8df0c350
agent
: drop fastify.py -> simpler serve_tools.py, and expose other tools to python interpreter
2024-10-02 19:51:23 +01:00
Olivier Chafik
6b4a454735
agent
: hard-code max_results=10 in brave_search
2024-10-02 19:13:28 +01:00
Olivier Chafik
26e76f9704
agent
: allow interactive chat by default, and don't reuse sessions
2024-10-02 19:12:57 +01:00
Olivier Chafik
6f2191d99e
agent
: remove *lots* of cruft from tool definitions derived from FastAPI catalog (and remove wait* tools which can be implemented in Python anyway)
2024-10-02 17:54:20 +01:00
Olivier Chafik
e2a9ab68a3
agent
: --openai flag (auto-fetches OPENAI_API_KEY), improved logging
2024-10-02 17:15:55 +01:00
Olivier Chafik
2428b73853
agent
: ditch openai dependency, use cache_prompt and expose seed
2024-10-02 16:26:45 +01:00
Olivier Chafik
b559d64ecc
Update README.md
2024-10-02 15:19:27 +01:00
Olivier Chafik
9e502e89a5
tool-call
: promote getting chat templates w/ dedicated script rather than rely on test resources
2024-10-02 15:03:08 +01:00
Olivier Chafik
f3538e755b
update tools
2024-10-02 14:57:25 +01:00
Xuan Son Nguyen
a39ab216aa
llama : reduce compile time and binary size ( #9712 )
...
* llama : speed up compile time
* fix build
* fix build (2)
2024-10-02 15:49:55 +02:00
Olivier Chafik
5b01402655
agent
: add brave_search & fetch_page tools + move to examples/agent/tools/
2024-10-02 14:29:45 +01:00
Alberto Cabrera Pérez
f536f4c439
[SYCL] Initial cmake support of SYCL for AMD GPUs ( #9658 )
...
sycl: initial cmake support of SYCL for AMD GPUs
2024-10-02 13:57:18 +01:00
Radoslav Gerganov
00b7317e63
vulkan : do not use tensor->extra ( #9407 )
...
* vulkan : do not use tensor->extra
This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.
Ref: #8536
* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2 )
---------
Co-authored-by: 0cc4m <picard12@live.de>
2024-10-02 13:49:16 +03:00
Zhenwei Jin
76b37d1541
gguf-split : improve --split and --merge logic ( #9619 )
...
* make sure params --split and --merge are not specified at same time
* update gguf-split params parse logic
* Update examples/gguf-split/gguf-split.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-10-02 10:21:57 +03:00
Georgi Gerganov
148844fe97
examples : remove benchmark ( #9704 )
...
ggml-ci
2024-10-02 10:14:44 +03:00
Olivier Chafik
c76b14501e
tool-call
: fix Makefile
2024-10-02 00:06:42 +01:00
Olivier Chafik
c36a196f53
tool-call
: prepare possible externalization of minja + factor tool call style out of template
2024-10-01 23:12:24 +01:00
Paweł Wodnicki
3f1ae2e32c
Update README.md ( #9591 )
...
Add Bielik model.
2024-10-01 19:18:46 +02:00
Georgi Gerganov
f1b8c42711
sync : ggml
2024-10-01 16:09:42 +03:00
Johannes Gäßler
e98c1c188e
test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974)
2024-10-01 16:07:40 +03:00
Salvatore Mesoraca
cb00020504
vulkan : mul_mat: fix UB with small warps (ggml/952)
...
When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.
The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.
When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.
We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
2024-10-01 16:07:39 +03:00
Borislav Stanimirov
6c5322481a
ggml : fix ggml_cast (ggml/973)
2024-10-01 16:07:39 +03:00
Johannes Gäßler
7254cdf7e8
ggml: fix gradient allocation logic (ggml/966)
...
* ggml: fix gradient allocation logic
* gradient allocation in ggml_build_backward_expand
* fixup
* fix test-backend-ops grad
* suggestions by slaren
* fix test1.c
* fix legacy opt API
* fix test-grad0
* remove keep arg
2024-10-01 16:07:38 +03:00
Georgi Gerganov
cad341d889
metal : reduce command encoding overhead ( #9698 )
...
* metal : reduce command encoding overhead
ggml-ci
* metal : add comments
2024-10-01 16:00:25 +03:00
Georgi Gerganov
a90484c6d9
llama : print correct model type for Llama 3.2 1B and 3B
2024-10-01 11:42:01 +03:00