Commit graph

2708 commits

Author SHA1 Message Date
Julia Longtin
ff29b659c8 formatting improvement. 2024-06-09 18:01:48 +00:00
Julia Longtin
b3ec86e59c first fixes. 2024-06-09 18:01:48 +00:00
Julia Longtin
7f5adf3b5c attempt to speed up float clearing. 2024-06-09 18:01:48 +00:00
Julia Longtin
a015d8485e allow using code from ggml-phi-knc-dot_q5_K_q8_K.c 2024-06-09 18:01:48 +00:00
Julia Longtin
aee550af6c force to compile. 2024-06-09 18:01:48 +00:00
Julia Longtin
a7f8abeb9b tell ggml-common.h to export what we want. 2024-06-09 18:01:48 +00:00
Julia Longtin
8703abe225 pull in ggml specific types. 2024-06-09 18:01:48 +00:00
Julia Longtin
62e354354c import stdio.h for size_t. 2024-06-09 18:01:48 +00:00
Julia Longtin
3edaaca993 import stdint.h for sizeSt. 2024-06-09 18:01:48 +00:00
Julia Longtin
669ce9b720 begin work on targeting dot_q5_K_q8_K. 2024-06-09 18:01:48 +00:00
Julia Longtin
c9730c0e04 be more specific about the length of our list of run amounts. 2024-06-09 18:01:48 +00:00
Julia Longtin
a48d3b96d7 spacing changes. 2024-06-09 18:01:48 +00:00
Julia Longtin
bb73cb319c formatting changes. 2024-06-09 18:01:48 +00:00
Julia Longtin
a06fa4b1b5 use the same header as ggml.c, and remove some warnings. 2024-06-09 18:01:48 +00:00
Julia Longtin
5a9d2f5f71 remove intrinsics import, and use upConv to save 12 bytes of memory transit. 2024-06-09 18:01:48 +00:00
Julia Longtin
d095d8e9c7 Update ggml-phi-knc.c 2024-06-09 18:01:48 +00:00
Julia Longtin
a56a6f31fa add a benchmark / test binary. 2024-06-09 18:01:48 +00:00
Julia Longtin
d7d679e41a merge from upstream 2024-06-09 18:01:48 +00:00
Julia Longtin
c70b5f211b Update ggml.c 2024-06-09 18:01:48 +00:00
Julia Longtin
114e7dd762 Update ggml.c 2024-06-09 18:01:48 +00:00
Julia Longtin
83be3dbab7 Update ggml.c 2024-06-09 18:01:48 +00:00
Julia Longtin
192e4ad857 implement F32 dot products. 2024-06-09 18:01:48 +00:00
Julia Longtin
7fce3f6b67 import intrinsics. 2024-06-09 18:01:48 +00:00
Julia Longtin
b5ea05f003 use right type, and define GGML_F32_VEC_ZERO. 2024-06-09 18:01:48 +00:00
Julia Longtin
429d69fd22 try to implement one intrinsic 2024-06-09 18:01:48 +00:00
Julia Longtin
7fb8d477ca try to detect the PHI cross compiler in make. 2024-06-09 18:01:48 +00:00
Julia Longtin
366279e09e try to detect the PHI cross compiler in make. 2024-06-09 18:01:48 +00:00
Julia Longtin
5c0d49cde4 instead of checking on glibc, check on SYS_getcpu 2024-06-09 18:01:48 +00:00
Julia Longtin
a83e2cadc0 handle the case that we have no glibc on the PHI. 2024-06-09 18:01:48 +00:00
Julia Longtin
9ec8635a06 add detection of Xeon PHI: Knights Corner. 2024-06-09 18:01:47 +00:00
compilade
132f55795e
llama : fix restoring the number of outputs from state files (#6687) 2024-04-15 15:56:55 +03:00
Pierrick Hymbert
3272896d79
server : revert "minor layout improvements" (#6684)
This reverts commit b3a96f27f0.
2024-04-15 15:18:47 +03:00
Steven Prichard
7fc16a2c32
swift : linux support (#6590)
- Package.swift now supports conditional compilation based on OS
- Allows for package to be used by SPM on Non-Apple platforms

Co-authored-by: Steven Prichard <steven.prichard@justeattakeaway.com>
2024-04-15 13:14:46 +03:00
Neo Zhang Jianyu
17e98d4c96
fix mul_mat_id() for new input, make the ut pass (#6682) 2024-04-15 17:12:26 +08:00
David Renshaw
1958f7e06c
llama : add missing kv clear in llama_beam_search (#6664) 2024-04-14 15:24:15 -04:00
Chao Jiang
04fbc5f23e
Add Command R chat template (#6650)
* Add chat template for command-r model series

* Fix indentation

* Add chat template test for command-r models and update the implementation to trim whitespaces

* Remove debug print
2024-04-14 18:16:34 +02:00
Georgi Gerganov
f184dd9208
flake.lock: Update (#6669) 2024-04-14 06:55:30 -07:00
Dave
422c2aff1c
Added support for GGML_OP_CLAMP in Metal (#6662)
* Added support for GGML_OP_CLAMP in Metal

* Corrected size

---------

Co-authored-by: dave-fl <dave@Davids-MacBook-Pro.local>
2024-04-14 13:14:19 +02:00
Sigbjørn Skjæret
8800226d65
Fix --split-max-size (#6655)
* Fix --split-max-size

Byte size calculation was done on int and overflowed.

* add tests.sh

* add examples test scripts to ci run

Will autodiscover examples/*/tests.sh scripts and run them.

* move WORK_PATH to a subdirectory

* clean up before and after test

* explicitly define which scripts to run

* add --split-max-size to readme
2024-04-14 13:12:59 +02:00
Jaemin Son
e689fc4e91
[bug fix] convert github repository_owner to lowercase (#6673) 2024-04-14 13:12:36 +02:00
James A Capozzoli
a4ec34e1cd
convert : enable the --use-temp-file cli flag (#6645) 2024-04-14 11:40:18 +03:00
Neo Zhang Jianyu
de17e3f745
fix memcpy() crash, add missed cmd in guide, fix softmax (#6622)
* disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax

* refactor to disable mmap for SYCL backend

* fix compile error in other os

* refactor the solution, use host buf to fix it, instead of disable mmap

* keep to support mmap()

* use host buff to reduce malloc times

* revert to malloc/free solution, for threaad safe
2024-04-14 10:42:29 +08:00
Johannes Gäßler
b5e7285baf
CUDA: fix matrix multiplication logic for tests (#6667) 2024-04-14 00:21:55 +02:00
Pierrick Hymbert
4bd0f93e4a
model: support arch DbrxForCausalLM (#6515)
* model: dbrx convert to gguf
#6344

* llama: support dbrx
#6344

* doc: dbrx: add the model as supported

* scripts: get-wikitext-2 add unzip

* llama: increase maximum experts allowed

* llama: factorize moe graph implementation between grok, mixtral and dbrx


---------

Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00
Olivier Chafik
ab9a3240a9
JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555)
* json: rename python schema converter to make import easier

* server: skip null json_schema / grammar fields

* json: deps management for primitive rules (+ allow null values)

* json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?`

* grammars: add troubleshooting section to readme

* json: cap length of numbers to 15 digits before/after decimal point

(avoids infinite gen, e.g. "one third" -> `0.333333333333...`)

* json: unify all repetition code (w/ or w/o sep)

* json: support string minLength/maxLength

* server+json: update server/README w/ result_format

* nits

* json: fix type error w/ python 3.8

* json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions)

* json: simplify DOT `{"type": "string", "pattern": "^.$"}`

* json: remove recursion in opt_repetitions (avoids Python stack overflow)

* json: rm dead code

* json: rm useless assert & ggml.h import
2024-04-12 19:43:38 +01:00
slaren
fbbc030ba9
metal : unify mul_mv_id kernels (#6556) 2024-04-12 18:13:20 +02:00
Daniel Bevenius
4cc120c744
infill : add download instructions for model (#6626)
* infill : add download instructions for model

This commit adds instructions on how to download a CodeLlama model
using the `hf.sh` script. This will download the model and place it
in the `models` directory which is the same model use later by the
infill example.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! infill : add download instructions for model

Clarify the reason for using CodeLlama.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-12 15:11:46 +03:00
Pierrick Hymbert
24ee66ed0d
server : coherent log output for KV cache full (#6637) 2024-04-12 14:49:21 +03:00
jiez
91c736015b
llama : add gguf_remove_key + remove split meta during quantize (#6591)
* Remove split metadata when quantize model shards

* Find metadata key by enum

* Correct loop range for gguf_remove_key and code format

* Free kv memory

---------

Co-authored-by: z5269887 <z5269887@unsw.edu.au>
2024-04-12 13:45:06 +03:00
Rene Leonhardt
5c4d767ac0
chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00