llama.cpp

Author	SHA1	Message	Date
Brian	60278e4f4d	Update convert_hf_to_gguf.py Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-07-16 06:42:38 +10:00
brian khuu	ad217d7249	convert-*.py: remove autogenerated uuid	2024-07-16 06:42:38 +10:00
brian khuu	f2b425c59c	convert-*.py: import cast from typing and other refactor	2024-07-16 06:42:38 +10:00
brian khuu	04c4fffdcc	convert-*.py: prepare_tensors_for_writing() --> prepare_tensors() > Especially since it can be used for other purposes than "for writing", like preparing the tensors to then count and sum all their sizes. Co-authored-by: compilade <git@compilade.net>	2024-07-16 06:42:38 +10:00
brian khuu	64707b625c	convert-*.py: remove redundant gguf_writer.add_name() calls	2024-07-16 06:42:38 +10:00
brian khuu	f8b5931180	convert-*.py: parameter_class_attribute --> size_label	2024-07-16 06:42:38 +10:00
brian khuu	6eb08ac868	convert-*.py: Removing the redundant metadata is not None from all conditions, and indenting them. Co-authored-by: compilade <git@compilade.net>	2024-07-16 06:42:38 +10:00
brian khuu	4c91d077d2	convert-*.py: cast not required if Metadata.load_metadata_override returned a dict[str, Any] instead of a dict[str, object] Co-authored-by: compilade <git@compilade.net>	2024-07-16 06:42:38 +10:00
Brian	74383ba6d2	Apply suggestions from code review Co-authored-by: compilade <git@compilade.net>	2024-07-16 06:42:38 +10:00
brian khuu	dd14b8fdb1	convert-*.py: pyright type fixes	2024-07-16 06:42:38 +10:00
brian khuu	59a01df784	convert-*.py: refactor per model weight count estimation	2024-07-16 06:42:38 +10:00
brian khuu	2a976e1211	convert-*.py: write_tensors() --> prepare_tensors_for_writing()	2024-07-16 06:42:38 +10:00
brian khuu	fdc5a3fc80	convert-*.py: autogenerate general.uuid if missing	2024-07-16 06:42:35 +10:00
brian khuu	7ecb8f00a0	test: remove test_gguf.py and remove test_generate_any_missing_uuid()	2024-07-16 06:38:40 +10:00
brian khuu	007708e32d	gguf_writer.py: generate tensor uuid if missing	2024-07-16 06:38:40 +10:00
brian khuu	4dc8ddd35a	convert_hf_to_gguf.py: Remove code that is already in fill_templated_filename() and GGUFWriter()	2024-07-16 06:38:40 +10:00
brian khuu	2f23927d37	convert_hf_to_gguf.py: rebase error correction	2024-07-16 06:38:40 +10:00
brian khuu	5011eefeaf	convert_hf_to_gguf.py: optional, dataclass removed from type as it was unused	2024-07-16 06:38:40 +10:00
brian khuu	e9734434bd	convert-*.py: Remove self.model_name that was left in since last rebase	2024-07-16 06:38:40 +10:00
brian khuu	eaa47f5546	convert-*.py: separated unit test, hf_repo to repo_url	2024-07-16 06:38:40 +10:00
brian khuu	d060fcdbe2	convert-*.py: adjusted authorship KV store	2024-07-16 06:38:40 +10:00
brian khuu	91e65d9485	convert-*.py: add unittest to metadata class	2024-07-16 06:38:38 +10:00
brian khuu	3625a42061	convert-*.py: add heuristic to directory name fallback Also add source_url for huggingface url	2024-07-16 06:37:42 +10:00
brian khuu	39472a09da	convert-*.py: need to include self in per_model_weight_count_estimation()	2024-07-16 06:37:42 +10:00
brian khuu	54918ad14e	convert-*.py: refactor parameter weight class	2024-07-16 06:37:42 +10:00
brian khuu	32e80e094c	convert-*.py: base_model is actually in spec for model cards	2024-07-16 06:37:42 +10:00
brian khuu	4d5cd0670a	convert-*.py: use heuristics to parse _name_or_path	2024-07-16 06:37:42 +10:00
brian khuu	b0553f42da	convert-*.py: adjust help message	2024-07-16 06:37:42 +10:00
brian khuu	dd1571211e	convert-*.py: add quantized_by and enhance heuristics	2024-07-16 06:37:38 +10:00
brian khuu	5a86dfaa1c	convert-*.py: add general.organization to kv store	2024-07-16 06:36:03 +10:00
brian khuu	f7c20793b9	convert-*.py: enable --model-name direct metadata override	2024-07-16 06:36:03 +10:00
brian khuu	b1927eed82	convert-*.py: move per model weight estimation away from util back to main script plus some refactoring	2024-07-16 06:36:03 +10:00
brian khuu	684c604eca	convert-*.py: add datasets and language to KV store	2024-07-16 06:36:03 +10:00
brian khuu	0f1d50fab7	convert-*.py: add parameter size class	2024-07-16 06:36:03 +10:00
brian khuu	8f734083dd	convert-*.py: add base_version and add tags	2024-07-16 06:36:03 +10:00
brian khuu	b36e391b87	convert-*.py: parse model card in metadata util. Add license_link and license_name to kv store	2024-07-16 06:36:03 +10:00
brian khuu	5c263cb257	convert-*.py: encoding_scheme --> output_type	2024-07-16 06:36:03 +10:00
brian khuu	4d5f18a0e6	convert-*.py: metadata class moved to utility	2024-07-16 06:36:03 +10:00
brian khuu	916872f72f	convert-*.py: model card metadata	2024-07-16 06:36:03 +10:00
brian khuu	a42c2b7efc	convert-*.py: add basename and finetune metadata	2024-07-16 06:36:03 +10:00
brian khuu	dbb1b471e4	convert-*.py: add --get-outfile command and refactor	2024-07-16 06:36:03 +10:00
brian khuu	d3a936fd0e	convert-*.py: licence -> license	2024-07-16 06:36:03 +10:00
Xuan Son Nguyen	97bdd26eee	Refactor lora adapter support (#8332 ) * lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit `42415a4874`. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-15 20:50:47 +02:00
Xuan Son Nguyen	4db8f60fe7	fix ci (#8494 )	2024-07-15 19:23:10 +02:00
Daniel Bevenius	8fac431b06	ggml : suppress unknown pragma 'GCC' on windows (#8460 ) This commit adds a macro guard to pragma GCC to avoid the following warning on windows: ```console C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068: unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj] ```	2024-07-15 15:48:17 +03:00
M-A	f17f39ff9c	server: update README.md with llama-server --help output [no ci] (#8472 ) The README.md had a stale information. In particular, the --ctx-size "defaults to 512" confused me and I had to check the code to confirm this was false. This the server is evolving rapidly, it's probably better to keep the source of truth at a single place (in the source) and generate the README.md based on that. Did: make llama-server ./llama-server --help > t.txt vimdiff t.txt examples/server/README.md I copied the content inside a backquote block. I would have preferred proper text but it would require a fair amount of surgery to make the current output compatible with markdown. A follow up could be to automate this process with a script. No functional change.	2024-07-15 15:04:56 +03:00
Georgi Gerganov	9104bc20ed	common : add --no-cont-batching arg (#6358 )	2024-07-15 14:54:58 +03:00
NikolaiLyssogor	fc690b018e	docs: fix links in development docs [no ci] (#8481 ) Fixes a few links to within the repo that were broken in the reorganization of the documentation in #8325.	2024-07-15 14:46:39 +03:00
Meng, Hengyu	16bdfa42ac	[SYCL] add concat through dim 1/2 (#8483 ) * add concat through dim 1/2	2024-07-15 19:32:15 +08:00
Georgi Gerganov	3dfda05956	llama : de-duplicate deepseek2 norm	2024-07-15 14:10:39 +03:00

1 2 3 4 5 ...

3442 commits