llama.cpp

Author	SHA1	Message	Date
brian khuu	4c91d077d2	convert-*.py: cast not required if Metadata.load_metadata_override returned a dict[str, Any] instead of a dict[str, object] Co-authored-by: compilade <git@compilade.net>	2024-07-16 06:42:38 +10:00
Brian	74383ba6d2	Apply suggestions from code review Co-authored-by: compilade <git@compilade.net>	2024-07-16 06:42:38 +10:00
brian khuu	dd14b8fdb1	convert-*.py: pyright type fixes	2024-07-16 06:42:38 +10:00
brian khuu	59a01df784	convert-*.py: refactor per model weight count estimation	2024-07-16 06:42:38 +10:00
brian khuu	2a976e1211	convert-*.py: write_tensors() --> prepare_tensors_for_writing()	2024-07-16 06:42:38 +10:00
brian khuu	fdc5a3fc80	convert-*.py: autogenerate general.uuid if missing	2024-07-16 06:42:35 +10:00
brian khuu	7ecb8f00a0	test: remove test_gguf.py and remove test_generate_any_missing_uuid()	2024-07-16 06:38:40 +10:00
brian khuu	007708e32d	gguf_writer.py: generate tensor uuid if missing	2024-07-16 06:38:40 +10:00
brian khuu	4dc8ddd35a	convert_hf_to_gguf.py: Remove code that is already in fill_templated_filename() and GGUFWriter()	2024-07-16 06:38:40 +10:00
brian khuu	2f23927d37	convert_hf_to_gguf.py: rebase error correction	2024-07-16 06:38:40 +10:00
brian khuu	5011eefeaf	convert_hf_to_gguf.py: optional, dataclass removed from type as it was unused	2024-07-16 06:38:40 +10:00
brian khuu	e9734434bd	convert-*.py: Remove self.model_name that was left in since last rebase	2024-07-16 06:38:40 +10:00
brian khuu	eaa47f5546	convert-*.py: separated unit test, hf_repo to repo_url	2024-07-16 06:38:40 +10:00
brian khuu	d060fcdbe2	convert-*.py: adjusted authorship KV store	2024-07-16 06:38:40 +10:00
brian khuu	91e65d9485	convert-*.py: add unittest to metadata class	2024-07-16 06:38:38 +10:00
brian khuu	3625a42061	convert-*.py: add heuristic to directory name fallback Also add source_url for huggingface url	2024-07-16 06:37:42 +10:00
brian khuu	39472a09da	convert-*.py: need to include self in per_model_weight_count_estimation()	2024-07-16 06:37:42 +10:00
brian khuu	54918ad14e	convert-*.py: refactor parameter weight class	2024-07-16 06:37:42 +10:00
brian khuu	32e80e094c	convert-*.py: base_model is actually in spec for model cards	2024-07-16 06:37:42 +10:00
brian khuu	4d5cd0670a	convert-*.py: use heuristics to parse _name_or_path	2024-07-16 06:37:42 +10:00
brian khuu	b0553f42da	convert-*.py: adjust help message	2024-07-16 06:37:42 +10:00
brian khuu	dd1571211e	convert-*.py: add quantized_by and enhance heuristics	2024-07-16 06:37:38 +10:00
brian khuu	5a86dfaa1c	convert-*.py: add general.organization to kv store	2024-07-16 06:36:03 +10:00
brian khuu	f7c20793b9	convert-*.py: enable --model-name direct metadata override	2024-07-16 06:36:03 +10:00
brian khuu	b1927eed82	convert-*.py: move per model weight estimation away from util back to main script plus some refactoring	2024-07-16 06:36:03 +10:00
brian khuu	684c604eca	convert-*.py: add datasets and language to KV store	2024-07-16 06:36:03 +10:00
brian khuu	0f1d50fab7	convert-*.py: add parameter size class	2024-07-16 06:36:03 +10:00
brian khuu	8f734083dd	convert-*.py: add base_version and add tags	2024-07-16 06:36:03 +10:00
brian khuu	b36e391b87	convert-*.py: parse model card in metadata util. Add license_link and license_name to kv store	2024-07-16 06:36:03 +10:00
brian khuu	5c263cb257	convert-*.py: encoding_scheme --> output_type	2024-07-16 06:36:03 +10:00
brian khuu	4d5f18a0e6	convert-*.py: metadata class moved to utility	2024-07-16 06:36:03 +10:00
brian khuu	916872f72f	convert-*.py: model card metadata	2024-07-16 06:36:03 +10:00
brian khuu	a42c2b7efc	convert-*.py: add basename and finetune metadata	2024-07-16 06:36:03 +10:00
brian khuu	dbb1b471e4	convert-*.py: add --get-outfile command and refactor	2024-07-16 06:36:03 +10:00
brian khuu	d3a936fd0e	convert-*.py: licence -> license	2024-07-16 06:36:03 +10:00
Xuan Son Nguyen	97bdd26eee	Refactor lora adapter support (#8332 ) * lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit `42415a4874`. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-15 20:50:47 +02:00
Xuan Son Nguyen	4db8f60fe7	fix ci (#8494 )	2024-07-15 19:23:10 +02:00
Daniel Bevenius	8fac431b06	ggml : suppress unknown pragma 'GCC' on windows (#8460 ) This commit adds a macro guard to pragma GCC to avoid the following warning on windows: ```console C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068: unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj] ```	2024-07-15 15:48:17 +03:00
M-A	f17f39ff9c	server: update README.md with llama-server --help output [no ci] (#8472 ) The README.md had a stale information. In particular, the --ctx-size "defaults to 512" confused me and I had to check the code to confirm this was false. This the server is evolving rapidly, it's probably better to keep the source of truth at a single place (in the source) and generate the README.md based on that. Did: make llama-server ./llama-server --help > t.txt vimdiff t.txt examples/server/README.md I copied the content inside a backquote block. I would have preferred proper text but it would require a fair amount of surgery to make the current output compatible with markdown. A follow up could be to automate this process with a script. No functional change.	2024-07-15 15:04:56 +03:00
Georgi Gerganov	9104bc20ed	common : add --no-cont-batching arg (#6358 )	2024-07-15 14:54:58 +03:00
NikolaiLyssogor	fc690b018e	docs: fix links in development docs [no ci] (#8481 ) Fixes a few links to within the repo that were broken in the reorganization of the documentation in #8325.	2024-07-15 14:46:39 +03:00
Meng, Hengyu	16bdfa42ac	[SYCL] add concat through dim 1/2 (#8483 ) * add concat through dim 1/2	2024-07-15 19:32:15 +08:00
Georgi Gerganov	3dfda05956	llama : de-duplicate deepseek2 norm	2024-07-15 14:10:39 +03:00
0cc4m	bda62d7999	Vulkan MMQ Fix (#8479 ) * Fix incoherence by adding missing LOAD_VEC_A parameter * Fix Vulkan op result checker build error	2024-07-15 09:38:52 +02:00
compilade	090fca7a07	pydantic : replace uses of __annotations__ with get_type_hints (#8474 ) * pydantic : replace uses of __annotations__ with get_type_hints * pydantic : fix Python 3.9 and 3.10 support	2024-07-14 19:51:21 -04:00
Georgi Gerganov	aaab2419ea	flake.lock: Update (#8475 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03) → 'github:NixOS/nixpkgs/7e7c39ea35c5cdd002cd4588b03a3fb9ece6fad9?narHash=sha256-EYekUHJE2gxeo2pM/zM9Wlqw1Uw2XTJXOSAO79ksc4Y%3D' (2024-07-12) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-07-14 08:54:02 -07:00
Georgi Gerganov	73cf442e7b	llama : fix Gemma-2 Query scaling factors (#8473 ) * 9B - query_pre_attn_scalar = 256 not 224 See `03e657582d` Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads) * llama : fix Gemma-2 Query scaling factor ggml-ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-07-14 14:05:09 +03:00
Brian	e236528e76	gguf_hash.py: Add sha256 (#8470 ) * gguf_hash.py: Add sha256 * gguf_hash.py: rename string UUIDv5 --> uuid * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2024-07-14 16:47:14 +10:00
compilade	fa79495bb4	llama : fix pre-tokenization of non-special added tokens (#8228 ) * llama : fix mpt and olmo pre-tokenizer * llama : pre-tokenize non-special user-defined tokens first * llama : fix detection of control-like user-defined tokens * convert_hf : identify which user-defined tokens are control tokens Only used in _set_vocab_gpt2() for now. * convert_hf : identify more added control tokens for SPM tokenziers This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces, but it unfortunately requires model re-conversion. There seems to be a weird behavior of the HF tokenizer for Gemma, which prefers to use the 16-space token over more lengthy space tokens, while using the SentencePiece tokenizer does not do this. (the implementation in llama.cpp has the same behavior as SentencePiece) * llama : fix wrong pre-tokenization of byte tokens * llama : fix Viking pre-tokenizer regex The order was previously wrong, which caused errors in some tests. * llama : fix command-r detokenization * convert_hf : reduce usages of the UNKNOWN token type * llama : add UNKNOWN tokens in the special tokens cache * convert_hf : reduce usages of UNKNOWN for InternLM2 This makes the changes from #8321 more consistent with the other changes made here. * test-tokenizer-random : reduce potential confilcts with #8379 * test-tokenizer-random : add a failing edge case for falcon	2024-07-13 23:35:10 -04:00
bandoti	17eb6aa8a9	vulkan : cmake integration (#8119 ) * Add Vulkan to CMake pkg * Add Sycl to CMake pkg * Add OpenMP to CMake pkg * Split generated shader file into separate translation unit * Add CMake target for Vulkan shaders * Update README.md * Add make target for Vulkan shaders * Use pkg-config to locate vulkan library * Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow * Clean up tabs * Move sudo to apt-key invocation * Forward GGML_EXTRA_LIBS to CMake config pkg * Update vulkan obj file paths * Add shaderc to nix pkg * Add python3 to Vulkan nix build * Link against ggml in cmake pkg * Remove Python dependency from Vulkan build * code review changes * Remove trailing newline * Add cflags from pkg-config to fix w64devkit build * Update README.md * Remove trailing whitespace * Update README.md * Remove trailing whitespace * Fix doc heading * Make glslc required Vulkan component * remove clblast from nix pkg	2024-07-13 18:12:39 +02:00

1 2 3 4 5 ...

3435 commits