llama.cpp

Author	SHA1	Message	Date
Julia Longtin	a015d8485e	allow using code from ggml-phi-knc-dot_q5_K_q8_K.c	2024-06-09 18:01:48 +00:00
Julia Longtin	aee550af6c	force to compile.	2024-06-09 18:01:48 +00:00
Julia Longtin	a7f8abeb9b	tell ggml-common.h to export what we want.	2024-06-09 18:01:48 +00:00
Julia Longtin	8703abe225	pull in ggml specific types.	2024-06-09 18:01:48 +00:00
Julia Longtin	62e354354c	import stdio.h for size_t.	2024-06-09 18:01:48 +00:00
Julia Longtin	3edaaca993	import stdint.h for sizeSt.	2024-06-09 18:01:48 +00:00
Julia Longtin	669ce9b720	begin work on targeting dot_q5_K_q8_K.	2024-06-09 18:01:48 +00:00
Julia Longtin	c9730c0e04	be more specific about the length of our list of run amounts.	2024-06-09 18:01:48 +00:00
Julia Longtin	a48d3b96d7	spacing changes.	2024-06-09 18:01:48 +00:00
Julia Longtin	bb73cb319c	formatting changes.	2024-06-09 18:01:48 +00:00
Julia Longtin	a06fa4b1b5	use the same header as ggml.c, and remove some warnings.	2024-06-09 18:01:48 +00:00
Julia Longtin	5a9d2f5f71	remove intrinsics import, and use upConv to save 12 bytes of memory transit.	2024-06-09 18:01:48 +00:00
Julia Longtin	d095d8e9c7	Update ggml-phi-knc.c	2024-06-09 18:01:48 +00:00
Julia Longtin	a56a6f31fa	add a benchmark / test binary.	2024-06-09 18:01:48 +00:00
Julia Longtin	d7d679e41a	merge from upstream	2024-06-09 18:01:48 +00:00
Julia Longtin	c70b5f211b	Update ggml.c	2024-06-09 18:01:48 +00:00
Julia Longtin	114e7dd762	Update ggml.c	2024-06-09 18:01:48 +00:00
Julia Longtin	83be3dbab7	Update ggml.c	2024-06-09 18:01:48 +00:00
Julia Longtin	192e4ad857	implement F32 dot products.	2024-06-09 18:01:48 +00:00
Julia Longtin	7fce3f6b67	import intrinsics.	2024-06-09 18:01:48 +00:00
Julia Longtin	b5ea05f003	use right type, and define GGML_F32_VEC_ZERO.	2024-06-09 18:01:48 +00:00
Julia Longtin	429d69fd22	try to implement one intrinsic	2024-06-09 18:01:48 +00:00
Julia Longtin	7fb8d477ca	try to detect the PHI cross compiler in make.	2024-06-09 18:01:48 +00:00
Julia Longtin	366279e09e	try to detect the PHI cross compiler in make.	2024-06-09 18:01:48 +00:00
Julia Longtin	5c0d49cde4	instead of checking on glibc, check on SYS_getcpu	2024-06-09 18:01:48 +00:00
Julia Longtin	a83e2cadc0	handle the case that we have no glibc on the PHI.	2024-06-09 18:01:48 +00:00
Julia Longtin	9ec8635a06	add detection of Xeon PHI: Knights Corner.	2024-06-09 18:01:47 +00:00
compilade	132f55795e	llama : fix restoring the number of outputs from state files (#6687 )	2024-04-15 15:56:55 +03:00
Pierrick Hymbert	3272896d79	server : revert "minor layout improvements" (#6684 ) This reverts commit `b3a96f27f0`.	2024-04-15 15:18:47 +03:00
Steven Prichard	7fc16a2c32	swift : linux support (#6590 ) - Package.swift now supports conditional compilation based on OS - Allows for package to be used by SPM on Non-Apple platforms Co-authored-by: Steven Prichard <steven.prichard@justeattakeaway.com>	2024-04-15 13:14:46 +03:00
Neo Zhang Jianyu	17e98d4c96	fix mul_mat_id() for new input, make the ut pass (#6682 )	2024-04-15 17:12:26 +08:00
David Renshaw	1958f7e06c	llama : add missing kv clear in llama_beam_search (#6664 )	2024-04-14 15:24:15 -04:00
Chao Jiang	04fbc5f23e	Add Command R chat template (#6650 ) * Add chat template for command-r model series * Fix indentation * Add chat template test for command-r models and update the implementation to trim whitespaces * Remove debug print	2024-04-14 18:16:34 +02:00
Georgi Gerganov	f184dd9208	flake.lock: Update (#6669 )	2024-04-14 06:55:30 -07:00
Dave	422c2aff1c	Added support for GGML_OP_CLAMP in Metal (#6662 ) * Added support for GGML_OP_CLAMP in Metal * Corrected size --------- Co-authored-by: dave-fl <dave@Davids-MacBook-Pro.local>	2024-04-14 13:14:19 +02:00
Sigbjørn Skjæret	8800226d65	Fix --split-max-size (#6655 ) * Fix --split-max-size Byte size calculation was done on int and overflowed. * add tests.sh * add examples test scripts to ci run Will autodiscover examples//tests.sh scripts and run them. move WORK_PATH to a subdirectory * clean up before and after test * explicitly define which scripts to run * add --split-max-size to readme	2024-04-14 13:12:59 +02:00
Jaemin Son	e689fc4e91	[bug fix] convert github repository_owner to lowercase (#6673 )	2024-04-14 13:12:36 +02:00
James A Capozzoli	a4ec34e1cd	convert : enable the `--use-temp-file` cli flag (#6645 )	2024-04-14 11:40:18 +03:00
Neo Zhang Jianyu	de17e3f745	fix memcpy() crash, add missed cmd in guide, fix softmax (#6622 ) * disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax * refactor to disable mmap for SYCL backend * fix compile error in other os * refactor the solution, use host buf to fix it, instead of disable mmap * keep to support mmap() * use host buff to reduce malloc times * revert to malloc/free solution, for threaad safe	2024-04-14 10:42:29 +08:00
Johannes Gäßler	b5e7285baf	CUDA: fix matrix multiplication logic for tests (#6667 )	2024-04-14 00:21:55 +02:00
Pierrick Hymbert	4bd0f93e4a	model: support arch `DbrxForCausalLM` (#6515 ) * model: dbrx convert to gguf #6344 * llama: support dbrx #6344 * doc: dbrx: add the model as supported * scripts: get-wikitext-2 add unzip * llama: increase maximum experts allowed * llama: factorize moe graph implementation between grok, mixtral and dbrx --------- Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>	2024-04-13 11:33:52 +02:00
Olivier Chafik	ab9a3240a9	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 ) * json: rename python schema converter to make import easier * server: skip null json_schema / grammar fields * json: deps management for primitive rules (+ allow null values) * json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?` * grammars: add troubleshooting section to readme * json: cap length of numbers to 15 digits before/after decimal point (avoids infinite gen, e.g. "one third" -> `0.333333333333...`) * json: unify all repetition code (w/ or w/o sep) * json: support string minLength/maxLength * server+json: update server/README w/ result_format * nits * json: fix type error w/ python 3.8 * json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions) * json: simplify DOT `{"type": "string", "pattern": "^.$"}` * json: remove recursion in opt_repetitions (avoids Python stack overflow) * json: rm dead code * json: rm useless assert & ggml.h import	2024-04-12 19:43:38 +01:00
slaren	fbbc030ba9	metal : unify mul_mv_id kernels (#6556 )	2024-04-12 18:13:20 +02:00
Daniel Bevenius	4cc120c744	infill : add download instructions for model (#6626 ) * infill : add download instructions for model This commit adds instructions on how to download a CodeLlama model using the `hf.sh` script. This will download the model and place it in the `models` directory which is the same model use later by the infill example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! infill : add download instructions for model Clarify the reason for using CodeLlama. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-12 15:11:46 +03:00
Pierrick Hymbert	24ee66ed0d	server : coherent log output for KV cache full (#6637 )	2024-04-12 14:49:21 +03:00
jiez	91c736015b	llama : add gguf_remove_key + remove split meta during quantize (#6591 ) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>	2024-04-12 13:45:06 +03:00
Rene Leonhardt	5c4d767ac0	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00
Georgi Gerganov	ef21ce4ccb	imatrix : remove invalid assert (#6632 )	2024-04-12 11:49:58 +03:00
MasterYi1024	dee7f8d692	Correct free memory and total memory. (#6630 ) Co-authored-by: MasterYi <zouxiaoyi@kylinos.cn>	2024-04-12 10:28:12 +02:00
Pierrick Hymbert	81da18e71c	eval-callback: use ggml_op_desc to pretty print unary operator name (#6631 )	2024-04-12 10:26:47 +02:00

1 2 3 4 5 ...

2705 commits