Julia Longtin
a015d8485e
allow using code from ggml-phi-knc-dot_q5_K_q8_K.c
2024-06-09 18:01:48 +00:00
Julia Longtin
aee550af6c
force to compile.
2024-06-09 18:01:48 +00:00
Julia Longtin
a7f8abeb9b
tell ggml-common.h to export what we want.
2024-06-09 18:01:48 +00:00
Julia Longtin
8703abe225
pull in ggml specific types.
2024-06-09 18:01:48 +00:00
Julia Longtin
62e354354c
import stdio.h for size_t.
2024-06-09 18:01:48 +00:00
Julia Longtin
3edaaca993
import stdint.h for sizeSt.
2024-06-09 18:01:48 +00:00
Julia Longtin
669ce9b720
begin work on targeting dot_q5_K_q8_K.
2024-06-09 18:01:48 +00:00
Julia Longtin
c9730c0e04
be more specific about the length of our list of run amounts.
2024-06-09 18:01:48 +00:00
Julia Longtin
a48d3b96d7
spacing changes.
2024-06-09 18:01:48 +00:00
Julia Longtin
bb73cb319c
formatting changes.
2024-06-09 18:01:48 +00:00
Julia Longtin
a06fa4b1b5
use the same header as ggml.c, and remove some warnings.
2024-06-09 18:01:48 +00:00
Julia Longtin
5a9d2f5f71
remove intrinsics import, and use upConv to save 12 bytes of memory transit.
2024-06-09 18:01:48 +00:00
Julia Longtin
d095d8e9c7
Update ggml-phi-knc.c
2024-06-09 18:01:48 +00:00
Julia Longtin
a56a6f31fa
add a benchmark / test binary.
2024-06-09 18:01:48 +00:00
Julia Longtin
d7d679e41a
merge from upstream
2024-06-09 18:01:48 +00:00
Julia Longtin
c70b5f211b
Update ggml.c
2024-06-09 18:01:48 +00:00
Julia Longtin
114e7dd762
Update ggml.c
2024-06-09 18:01:48 +00:00
Julia Longtin
83be3dbab7
Update ggml.c
2024-06-09 18:01:48 +00:00
Julia Longtin
192e4ad857
implement F32 dot products.
2024-06-09 18:01:48 +00:00
Julia Longtin
7fce3f6b67
import intrinsics.
2024-06-09 18:01:48 +00:00
Julia Longtin
b5ea05f003
use right type, and define GGML_F32_VEC_ZERO.
2024-06-09 18:01:48 +00:00
Julia Longtin
429d69fd22
try to implement one intrinsic
2024-06-09 18:01:48 +00:00
Julia Longtin
7fb8d477ca
try to detect the PHI cross compiler in make.
2024-06-09 18:01:48 +00:00
Julia Longtin
366279e09e
try to detect the PHI cross compiler in make.
2024-06-09 18:01:48 +00:00
Julia Longtin
5c0d49cde4
instead of checking on glibc, check on SYS_getcpu
2024-06-09 18:01:48 +00:00
Julia Longtin
a83e2cadc0
handle the case that we have no glibc on the PHI.
2024-06-09 18:01:48 +00:00
Julia Longtin
9ec8635a06
add detection of Xeon PHI: Knights Corner.
2024-06-09 18:01:47 +00:00
compilade
132f55795e
llama : fix restoring the number of outputs from state files ( #6687 )
2024-04-15 15:56:55 +03:00
Pierrick Hymbert
3272896d79
server : revert "minor layout improvements" ( #6684 )
...
This reverts commit b3a96f27f0
.
2024-04-15 15:18:47 +03:00
Steven Prichard
7fc16a2c32
swift : linux support ( #6590 )
...
- Package.swift now supports conditional compilation based on OS
- Allows for package to be used by SPM on Non-Apple platforms
Co-authored-by: Steven Prichard <steven.prichard@justeattakeaway.com>
2024-04-15 13:14:46 +03:00
Neo Zhang Jianyu
17e98d4c96
fix mul_mat_id() for new input, make the ut pass ( #6682 )
2024-04-15 17:12:26 +08:00
David Renshaw
1958f7e06c
llama : add missing kv clear in llama_beam_search ( #6664 )
2024-04-14 15:24:15 -04:00
Chao Jiang
04fbc5f23e
Add Command R chat template ( #6650 )
...
* Add chat template for command-r model series
* Fix indentation
* Add chat template test for command-r models and update the implementation to trim whitespaces
* Remove debug print
2024-04-14 18:16:34 +02:00
Georgi Gerganov
f184dd9208
flake.lock: Update ( #6669 )
2024-04-14 06:55:30 -07:00
Dave
422c2aff1c
Added support for GGML_OP_CLAMP in Metal ( #6662 )
...
* Added support for GGML_OP_CLAMP in Metal
* Corrected size
---------
Co-authored-by: dave-fl <dave@Davids-MacBook-Pro.local>
2024-04-14 13:14:19 +02:00
Sigbjørn Skjæret
8800226d65
Fix --split-max-size ( #6655 )
...
* Fix --split-max-size
Byte size calculation was done on int and overflowed.
* add tests.sh
* add examples test scripts to ci run
Will autodiscover examples/*/tests.sh scripts and run them.
* move WORK_PATH to a subdirectory
* clean up before and after test
* explicitly define which scripts to run
* add --split-max-size to readme
2024-04-14 13:12:59 +02:00
Jaemin Son
e689fc4e91
[bug fix] convert github repository_owner to lowercase ( #6673 )
2024-04-14 13:12:36 +02:00
James A Capozzoli
a4ec34e1cd
convert : enable the --use-temp-file
cli flag ( #6645 )
2024-04-14 11:40:18 +03:00
Neo Zhang Jianyu
de17e3f745
fix memcpy() crash, add missed cmd in guide, fix softmax ( #6622 )
...
* disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax
* refactor to disable mmap for SYCL backend
* fix compile error in other os
* refactor the solution, use host buf to fix it, instead of disable mmap
* keep to support mmap()
* use host buff to reduce malloc times
* revert to malloc/free solution, for threaad safe
2024-04-14 10:42:29 +08:00
Johannes Gäßler
b5e7285baf
CUDA: fix matrix multiplication logic for tests ( #6667 )
2024-04-14 00:21:55 +02:00
Pierrick Hymbert
4bd0f93e4a
model: support arch DbrxForCausalLM
( #6515 )
...
* model: dbrx convert to gguf
#6344
* llama: support dbrx
#6344
* doc: dbrx: add the model as supported
* scripts: get-wikitext-2 add unzip
* llama: increase maximum experts allowed
* llama: factorize moe graph implementation between grok, mixtral and dbrx
---------
Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00
Olivier Chafik
ab9a3240a9
JSON schema conversion: ⚡ ️ faster repetitions, min/maxLength for strings, cap number length ( #6555 )
...
* json: rename python schema converter to make import easier
* server: skip null json_schema / grammar fields
* json: deps management for primitive rules (+ allow null values)
* json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?`
* grammars: add troubleshooting section to readme
* json: cap length of numbers to 15 digits before/after decimal point
(avoids infinite gen, e.g. "one third" -> `0.333333333333...`)
* json: unify all repetition code (w/ or w/o sep)
* json: support string minLength/maxLength
* server+json: update server/README w/ result_format
* nits
* json: fix type error w/ python 3.8
* json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions)
* json: simplify DOT `{"type": "string", "pattern": "^.$"}`
* json: remove recursion in opt_repetitions (avoids Python stack overflow)
* json: rm dead code
* json: rm useless assert & ggml.h import
2024-04-12 19:43:38 +01:00
slaren
fbbc030ba9
metal : unify mul_mv_id kernels ( #6556 )
2024-04-12 18:13:20 +02:00
Daniel Bevenius
4cc120c744
infill : add download instructions for model ( #6626 )
...
* infill : add download instructions for model
This commit adds instructions on how to download a CodeLlama model
using the `hf.sh` script. This will download the model and place it
in the `models` directory which is the same model use later by the
infill example.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! infill : add download instructions for model
Clarify the reason for using CodeLlama.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-12 15:11:46 +03:00
Pierrick Hymbert
24ee66ed0d
server : coherent log output for KV cache full ( #6637 )
2024-04-12 14:49:21 +03:00
jiez
91c736015b
llama : add gguf_remove_key + remove split meta during quantize ( #6591 )
...
* Remove split metadata when quantize model shards
* Find metadata key by enum
* Correct loop range for gguf_remove_key and code format
* Free kv memory
---------
Co-authored-by: z5269887 <z5269887@unsw.edu.au>
2024-04-12 13:45:06 +03:00
Rene Leonhardt
5c4d767ac0
chore: Fix markdown warnings ( #6625 )
2024-04-12 10:52:36 +02:00
Georgi Gerganov
ef21ce4ccb
imatrix : remove invalid assert ( #6632 )
2024-04-12 11:49:58 +03:00
MasterYi1024
dee7f8d692
Correct free memory and total memory. ( #6630 )
...
Co-authored-by: MasterYi <zouxiaoyi@kylinos.cn>
2024-04-12 10:28:12 +02:00
Pierrick Hymbert
81da18e71c
eval-callback: use ggml_op_desc to pretty print unary operator name ( #6631 )
2024-04-12 10:26:47 +02:00