goerch 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff5a3f0c09 
								
							 
						 
						
							
							
								
								Work on the BPE tokenizer ( #3252 )  
							
							... 
							
							
							
							* Work on the BPE tokenizer
Tokenizer tests work for Falcon-7B
* Try to fix build problem
* Fix debug assertion failure
* Fix MSVC Unicode BOM problem
* Cleanup and an improvement
* Fix compiler warning
* Cleanup
* Test doesn't work over the full range of Unicodes
* Update .gitignore and Makefile
* Another Makefile rule
* Testing Aquila
* Moving byte decoding back to `token_to_piece` ...
... because everyone is using it.
* Guarding some unusable code pathes
* Streamlining code and adding some more assertions
Important change: I'm classifying added tokens as control tokens now for BPE.
* Adding a comment
* Adding another assertion
* Fixed vocabulary guarding assertions
* Fix PR for recent change
* Fix PR for recent change
* Fix for compiler warning
* Fix PR for recent change
* Fix PR for recent change
* Fix PR for recent change
* Fix for compiler warning
* Fixes for more compiler warnings
* Remove unused code
* Fix initialization of static maps
* Add scores and token types back, adapt gptneox
* Update llama.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update unicode.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update unicode.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Ported Starcoder and added some assertions
* Fix coding style
* Apply @jploski 's fix for missing tokens
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-10-03 09:16:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0fe321031a 
								
							 
						 
						
							
							
								
								gguf : general usability improvements ( #3409 )  
							
							
							
						 
						
							2023-10-02 14:58:46 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Zhang Peiyuan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e519621010 
								
							 
						 
						
							
							
								
								convert : remove bug in convert.py permute function ( #3364 )  
							
							
							
						 
						
							2023-09-27 20:45:20 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Erik Scholz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6eeb4d9083 
								
							 
						 
						
							
							
								
								convert: remove most of the n_mult usage in convert.py ( #3098 )  
							
							
							
						 
						
							2023-09-10 11:06:53 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6336d834ec 
								
							 
						 
						
							
							
								
								convert : fix F32 ftype not being saved ( #3048 )  
							
							
							
						 
						
							2023-09-07 14:27:42 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Erik Scholz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9c3220c48 
								
							 
						 
						
							
							
								
								convert: fix convert.py not working with int filename_stem ( #3028 )  
							
							... 
							
							
							
							* fix implicit int to string conversion
* convert : remove an obsolete pyright comment
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> 
							
						 
						
							2023-09-05 19:41:00 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kerfuffle 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cff7b0bf07 
								
							 
						 
						
							
							
								
								convert.py : BPE fixes ( #2938 )  
							
							... 
							
							
							
							* convert.py: BPE fixes?
* Remove unnecessary conditional in addl token error handling 
							
						 
						
							2023-09-03 08:52:13 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bce1fef328 
								
							 
						 
						
							
							
								
								convert : fix another python 3.8 issue ( #2949 )  
							
							
							
						 
						
							2023-08-31 22:13:51 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kerfuffle 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aeefac4ff7 
								
							 
						 
						
							
							
								
								scripts: Use local gguf package when running from repo ( #2927 )  
							
							... 
							
							
							
							* scripts: Use local gguf when running from repo 
							
						 
						
							2023-08-31 16:49:24 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								92d0b751a7 
								
							 
						 
						
							
							
								
								convert : fix python 3.8 support, modernize type annotations ( #2916 )  
							
							... 
							
							
							
							* convert : fix python 3.8 support
* convert : sort imports
* convert : fix required parameters in convert-llama-ggmlv3-to-gguf
* convert : fix mypy errors in convert-llama-ggmlv3-to-gguf
* convert : use PEP 585 generics and PEP 604 unions
Now that we have `from __future__ import annotations`, we can use this
modern syntax in Python 3.7 instead of restricting support to Python 3.9
or 3.10 respectively.
* gguf.py : a tuple is already a tuple
* add mypy.ini
* convert : add necessary `type: ignore` comments
* gguf-py: bump version 
							
						 
						
							2023-08-31 08:02:23 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b532a69b2f 
								
							 
						 
						
							
							
								
								convert.py : use dir name to name the llama  
							
							
							
						 
						
							2023-08-30 13:29:40 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kerfuffle 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dc07dc492e 
								
							 
						 
						
							
							
								
								convert : various script cleanups/fixes + merges and special token handling ( #2842 )  
							
							... 
							
							
							
							* convert: Fix permute calls and method/func definitions
* Cleanups for gguf-py
* Minor types cleanups.
* Initial implementation of handling merges and special tokens
* convert: Handle special tokens and merges in vocab only mode
convert: Vocab only mode no longer requires loading model tensors
* gguf: Refactor tensor name mapping
* convert: Fix type hint for special_token_types in SpecialVocab
* Use common special vocab handling in various conversion scripts
* First pass at implementing suggested changes
* Second pass
* gguf: SpecialVocab: Fix issue with special token content not in a dict
gguf: SpecialVocab: Allow skipping handling of merges
* convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json
* convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer
* gguf: SpecialVocab: Actually set load_merges in object
* Uniform args parsing and vocab only mode for convert examples
* convert.py: Set gpt2 as tokenizer model when using BPE
* Squish last type warning in gguf.py - yay! 
							
						 
						
							2023-08-30 11:25:50 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									jameswu2014 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bcce96ba4d 
								
							 
						 
						
							
							
								
								convert.py : fix baichuan7B support ( #2870 )  
							
							... 
							
							
							
							* [Fix]: convert.py support baichuan7B
* convert.py : fix trailing whitespaces
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-08-29 12:48:41 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kerfuffle 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								730d9c681e 
								
							 
						 
						
							
							
								
								convert.py : advanced option ( #2753 )  
							
							... 
							
							
							
							* Allow convert.py to convert to q8_0
Fix issue with bounded_parallel_map and greedy consuming iterator
Display elapsed time during conversion
* Add --concurrency option
Minor improvements to help text
Clean up bounded_parallel_map function a bit
* Massive speed improvement thanks to Cebtenzzre
* Refactor types 
							
						 
						
							2023-08-26 23:13:36 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Nigel Bosch 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a2ca4e9de9 
								
							 
						 
						
							
							
								
								Handle null rope scaling value ( #2793 )  
							
							
							
						 
						
							2023-08-26 14:11:17 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Nigel Bosch 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								28b2c996ca 
								
							 
						 
						
							
							
								
								convert.py : Get rope scale from HuggingFace models ( #2772 )  
							
							... 
							
							
							
							* Get rope scale from HF models
* Save rope scale only for linear scaling
* Rewrite for clarity 
							
						 
						
							2023-08-25 16:41:52 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								12e2e33a97 
								
							 
						 
						
							
							
								
								convert.py : export rope freq_base when converting CodeLlama from an HF model ( #2773 )  
							
							
							
						 
						
							2023-08-25 14:08:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d0f77b1353 
								
							 
						 
						
							
							
								
								convert.py : try to determine n_ctx automatically for CodeLlama ( #2770 )  
							
							
							
						 
						
							2023-08-24 21:10:39 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0d3094f0c7 
								
							 
						 
						
							
							
								
								gguf : add rope_freq_base parameter for CodeLlama ( #2769 )  
							
							
							
						 
						
							2023-08-24 21:04:05 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8f8c28e89c 
								
							 
						 
						
							
							
								
								convert : auto-determine model name based on dir + scripts update  
							
							
							
						 
						
							2023-08-24 19:26:47 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fea95c682d 
								
							 
						 
						
							
							
								
								fix convert.py for codellama, add llama 34B to the list of recognized models ( #2768 )  
							
							
							
						 
						
							2023-08-24 17:44:11 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cf658adc83 
								
							 
						 
						
							
							
								
								llm : add Falcon support ( #2717 )  
							
							... 
							
							
							
							* llama : refactor GGUF constants into static maps
* llama : check if model architecture is known
* llama : refactor llama_model_load_internal()
* gguf : add KV constant maps
* llm : read arch-specific KVs
* convert : add dummy scores + types
* falcon : load tensor data (CPU only)
* llama : fix loading progress bar
* llama : add arch member to llama_model
* falcon : CPU inference working
* falcon : support non-40B models
* falcon : minor
* llama : minor updates
ggml-ci
* convert-falcon-hf-to-gguf.py : fix special token mapping
* llama.cpp : llama default UNK token = id 0
* llama.cpp : fix bpe tokenizer
* llama.cpp : fix the fix of bpe tokenizer
* ggml : pass eps to ggml_norm
* metal : implement RoPE (mode = 2) + avoid ggml_repeat
* ggml : ggml_repeat always creates new tensor
* falcon : copy-paste self-attention from LLaMA
* metal : print extra compute pipeline info
* falcon : minor changes (still chasing the Metal problem)
* llama.cpp : fix linefeed token
* metal : fix GELU kernel numerical stability by using precise::tanh
* metal : temporary workaround for the concurrency optimization bug
* falcon : add CUDA offloading (#2739 )
* llama : better model naming and size reporting
* llama : prep new tokenizer support
* llama : advanced BPE tokenizer based on ggllm.cpp imlpementation
* llama : remove oboslete comment
ggml-ci
* common : remove obsolete BPE API + disable test-tokenizer-1
* llama : revert BPE special-case in llama_byte_to_token()
* cuda : add TODOs for RoPE NeoX implementation
* llama : default special tokens based on vocab type
* perplexity : add log for start of tokenization
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2023-08-23 23:08:04 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7c2227a197 
								
							 
						 
						
							
							
								
								chmod : make scripts executable ( #2675 )  
							
							
							
						 
						
							2023-08-23 17:29:09 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Alex Petenchea 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3b6cfe7c92 
								
							 
						 
						
							
							
								
								convert.py : clarifying error message ( #2718 )  
							
							
							
						 
						
							2023-08-22 21:58:16 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								deb7dfca4b 
								
							 
						 
						
							
							
								
								gguf : add ftype meta info to the model ( #2710 )  
							
							... 
							
							
							
							* llama : add ftype meta info to the model
ggml-ci
* convert.py : add ftype when converting (does not work)
* convert.py : fix Enum to IntEnum
ggml-ci 
							
						 
						
							2023-08-22 20:05:59 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6381d4e110 
								
							 
						 
						
							
							
								
								gguf : new file format with flexible meta data (beta) ( #2398 )  
							
							... 
							
							
							
							* gguf : first API pass
* gguf : read header + meta data
* gguf : read tensor info
* gguf : initial model loading - not tested
* gguf : add gguf_get_tensor_name()
* gguf : do not support passing existing ggml_context to gguf_init
* gguf : simplify gguf_get_val
* gguf : gguf.c is now part of ggml.c
* gguf : read / write sample models
* gguf : add comments
* refactor : reduce code duplication and better API (#2415 )
* gguf : expose the gguf_type enum through the API for now
* gguf : add array support
* gguf.py : some code style changes
* convert.py : start a new simplified implementation by removing old stuff
* convert.py : remove GGML vocab + other obsolete stuff
* GGUF : write tensor (#2426 )
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
* gguf : add gguf_find_key (#2438 )
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
* gguf : fix writing tensors
* gguf : do not hardcode tensor names to read
* gguf : write sample tensors to read
* gguf : add tokenization constants
* quick and dirty conversion example
* gguf : fix writing gguf arrays
* gguf : write tensors one by one and code reuse
* gguf : fix writing gguf arrays
* gguf : write tensors one by one
* gguf : write tensors one by one
* gguf : write tokenizer data
* gguf : upd gguf conversion script
* Update convert-llama-h5-to-gguf.py
* gguf : handle already encoded string
* ggml.h : get array str and f32
* ggml.c : get arr str and f32
* gguf.py : support any type
* Update convert-llama-h5-to-gguf.py
* gguf : fix set is not subscriptable
* gguf : update convert-llama-h5-to-gguf.py
* constants.py : add layer norm eps
* gguf.py : add layer norm eps and merges
* ggml.h : increase GGML_MAX_NAME to 64
* ggml.c : add gguf_get_arr_n
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Makefile : add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* gguf : support custom alignment value
* gguf : fix typo in function call
* gguf : mmap tensor data example
* fix : update convert-llama-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* convert-gptneox-h5-to-gguf.py : Special tokens
* gptneox-main.cpp : special tokens
* Update gptneox-main.cpp
* constants.py : special tokens
* gguf.py : accumulate kv and tensor info data + special tokens
* convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens
* gguf : gguf counterpart of llama-util.h
* gguf-util.h : update note
* convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens
* convert-llama-h5-to-gguf.py : special tokens
* Delete gptneox-common.cpp
* Delete gptneox-common.h
* convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer
* gptneox-main.cpp : gpt2 bpe tokenizer
* gpt2 bpe tokenizer (handles merges and unicode)
* Makefile : remove gptneox-common
* gguf.py : bytesarray for gpt2bpe tokenizer
* cmpnct_gpt2bpe.hpp : comments
* gguf.py : use custom alignment if present
* gguf : minor stuff
* Update gptneox-main.cpp
* map tensor names
* convert-gptneox-h5-to-gguf.py : map tensor names
* convert-llama-h5-to-gguf.py : map tensor names
* gptneox-main.cpp : map tensor names
* gguf : start implementing libllama in GGUF (WIP)
* gguf : start implementing libllama in GGUF (WIP)
* rm binary commited by mistake
* upd .gitignore
* gguf : calculate n_mult
* gguf :  inference with 7B model working (WIP)
* gguf : rm deprecated function
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : add gguf_get_kv_type
* gguf : add gguf_get_kv_type
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver
* gguf : rm references to old file formats
* gguf : shorter name for member variable
* gguf : rm redundant method
* gguf : get rid of n_mult, read n_ff from file
* Update gguf_tensor_map.py
* Update gptneox-main.cpp
* gguf : rm references to old file magics
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : quantization is working
* gguf : roper closing of file
* gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : no need to convert tensors twice
* convert-llama-h5-to-gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : simplify nbytes
* convert-llama-h5-to-gguf.py : simplify nbytes
* gptneox-main.cpp : n_layer --> n_block
* constants.py : n_layer --> n_block
* gguf.py : n_layer --> n_block
* convert-gptneox-h5-to-gguf.py : n_layer --> n_block
* convert-llama-h5-to-gguf.py : n_layer --> n_block
* gptneox-main.cpp : n_layer --> n_block
* Update gguf_tensor_map.py
* convert-gptneox-h5-to-gguf.py : load model in parts to save memory
* convert-llama-h5-to-gguf.py : load model in parts to save memory
* convert : write more metadata for LLaMA
* convert : rm quantization version
* convert-gptneox-h5-to-gguf.py : add file_type key
* gptneox-main.cpp : add file_type key
* fix conflicts
* gguf : add todos and comments
* convert-gptneox-h5-to-gguf.py : tensor name map changes
* Create gguf_namemap.py : tensor name map changes
* Delete gguf_tensor_map.py
* gptneox-main.cpp : tensor name map changes
* convert-llama-h5-to-gguf.py : fixes
* gguf.py : dont add empty strings
* simple : minor style changes
* gguf : use UNIX line ending
* Create convert-llama-7b-pth-to-gguf.py
* llama : sync gguf-llama.cpp with latest llama.cpp (#2608 )
* llama : sync gguf-llama.cpp with latest llama.cpp
* minor : indentation + assert
* llama : refactor gguf_buffer and gguf_ctx_buffer
* llama : minor
* gitignore : add gptneox-main
* llama : tokenizer fixes (#2549 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* convert : update convert-new.py with tokenizer fixes (#2614 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* llama : sync gguf-llama with llama (#2613 )
* llama : sync gguf-llama with llama
* tests : fix build + warnings (test-tokenizer-1 still fails)
* tests : fix wstring_convert
* convert : fix layer names
* llama : sync gguf-llama.cpp
* convert : update HF converter to new tokenizer voodoo magics
* llama : update tokenizer style
* convert-llama-h5-to-gguf.py : add token types
* constants.py : add token types
* gguf.py : add token types
* convert-llama-7b-pth-to-gguf.py : add token types
* gguf-llama.cpp :  fix n_head_kv
* convert-llama-h5-to-gguf.py : add 70b gqa support
* gguf.py : add tensor data layout
* convert-llama-h5-to-gguf.py : add tensor data layout
* convert-llama-7b-pth-to-gguf.py : add tensor data layout
* gptneox-main.cpp : add tensor data layout
* convert-llama-h5-to-gguf.py : clarify the reverse permute
* llama : refactor model loading code (#2620 )
* llama : style formatting + remove helper methods
* llama : fix quantization using gguf tool
* llama : simplify gguf_file_saver
* llama : fix method names
* llama : simplify write_header()
* llama : no need to pass full file loader to the file saver
just gguf_ctx
* llama : gguf_file_saver write I32
* llama : refactor tensor names (#2622 )
* gguf: update tensor names searched in quantization
* gguf : define tensor names as constants
* gguf : initial write API (not tested yet)
* gguf : write to file API (not tested)
* gguf : initial write API ready + example
* gguf : fix header write
* gguf : fixes + simplify example + add ggml_nbytes_pad()
* gguf : minor
* llama : replace gguf_file_saver with new gguf write API
* gguf : streaming support when writing files
* gguf : remove oboslete write methods
* gguf : remove obosolete gguf_get_arr_xxx API
* llama : simplify gguf_file_loader
* llama : move hparams and vocab from gguf_file_loader to llama_model_loader
* llama : merge gguf-util.h in llama.cpp
* llama : reorder definitions in .cpp to match .h
* llama : minor simplifications
* llama : refactor llama_model_loader (WIP)
wip : remove ggml_ctx from llama_model_loader
wip : merge gguf_file_loader in llama_model_loader
* llama : fix shape prints
* llama : fix Windows build + fix norm_rms_eps key
* llama : throw error on missing KV paris in model meta data
* llama : improve printing + log meta data
* llama : switch print order of meta data
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
* gguf : deduplicate (#2629 )
* gguf : better type names
* dedup : CPU + Metal is working
* ggml : fix warnings about unused results
* llama.cpp : fix line feed and compiler warning
* llama : fix strncpy warning + note token_to_str does not write null
* llama : restore the original load/save session implementation
Will migrate this to GGUF in the future
* convert-llama-h5-to-gguf.py : support alt ctx param name
* ggml : assert when using ggml_mul with non-F32 src1
* examples : dedup simple
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
* gguf.py : merge all files in gguf.py
* convert-new.py : pick #2427  for HF 70B support
* examples/gguf : no need to keep q option for quantization any more
* llama.cpp : print actual model size
* llama.cpp : use ggml_elements()
* convert-new.py : output gguf (#2635 )
* convert-new.py : output gguf (WIP)
* convert-new.py : add gguf key-value pairs
* llama : add hparams.ctx_train + no longer print ftype
* convert-new.py : minor fixes
* convert-new.py : vocab-only option should work now
* llama : fix tokenizer to use llama_char_to_byte
* tests : add new ggml-vocab-llama.gguf
* convert-new.py : tensor name mapping
* convert-new.py : add map for skipping tensor serialization
* convert-new.py : convert script now works
* gguf.py : pick some of the refactoring from #2644 
* convert-new.py : minor fixes
* convert.py : update to support GGUF output
* Revert "ci : disable CI temporary to not waste energy"
This reverts commit 7e82d25f40#2644 )
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : style fixes in simple conversion script
* gguf : refactor gptneox conversion script
* gguf : rename h5 to hf (for HuggingFace)
* gguf : refactor pth to gguf conversion script
* gguf : rm file_type key and method
* gguf.py : fix vertical alignment
* gguf.py : indentation
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* convert-gptneox-hf-to-gguf.py : fixes
* gguf.py : gptneox mapping
* convert-llama-hf-to-gguf.py : fixes
* convert-llama-7b-pth-to-gguf.py : fixes
* ggml.h : reverse GGUF_MAGIC
* gguf.py : reverse GGUF_MAGIC
* test-tokenizer-0.cpp : fix warning
* llama.cpp : print kv general.name
* llama.cpp : get special token kv and linefeed token id
* llama : print number of tensors per type + print arch + style
* tests : update vocab file with new magic
* editorconfig : fix whitespaces
* llama : re-order functions
* llama : remove C++ API + reorganize common source in /common dir
* llama : minor API updates
* llama : avoid hardcoded special tokens
* llama : fix MPI build
ggml-ci
* llama : introduce enum llama_vocab_type + remove hardcoded string constants
* convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested
* falcon-main.cpp : falcon inference example
* convert-falcon-hf-to-gguf.py : remove extra kv
* convert-gptneox-hf-to-gguf.py : remove extra kv
* convert-llama-7b-pth-to-gguf.py : remove extra kv
* convert-llama-hf-to-gguf.py : remove extra kv
* gguf.py : fix for falcon 40b
* falcon-main.cpp : fix for falcon 40b
* convert-falcon-hf-to-gguf.py : update ref
* convert-falcon-hf-to-gguf.py : add tensor data layout
* cmpnct_gpt2bpe.hpp : fixes
* falcon-main.cpp : fixes
* gptneox-main.cpp : fixes
* cmpnct_gpt2bpe.hpp : remove non-general stuff
* Update examples/server/README.md
Co-authored-by: slaren <slarengh@gmail.com>
* cmpnct_gpt2bpe.hpp : cleanup
* convert-llama-hf-to-gguf.py : special tokens
* convert-llama-7b-pth-to-gguf.py : special tokens
* convert-permute-debug.py : permute debug print
* convert-permute-debug-master.py : permute debug for master
* convert-permute-debug.py : change permute type of attn_q
* convert.py : 70b model working (change attn_q permute)
* Delete convert-permute-debug-master.py
* Delete convert-permute-debug.py
* convert-llama-hf-to-gguf.py : fix attn_q permute
* gguf.py : fix rope scale kv
* convert-llama-hf-to-gguf.py : rope scale and added tokens
* convert-llama-7b-pth-to-gguf.py : rope scale and added tokens
* llama.cpp : use rope scale kv
* convert-llama-7b-pth-to-gguf.py : rope scale fix
* convert-llama-hf-to-gguf.py : rope scale fix
* py : fix whitespace
* gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682 )
* First pass at converting GGMLv3 LLaMA models to GGUF
* Cleanups, better output during conversion
* Fix vocab space conversion logic
* More vocab conversion fixes
* Add description to converted GGUF files
* Improve help text, expand warning
* Allow specifying name and description for output GGUF
* Allow overriding vocab and hyperparams from original model metadata
* Use correct params override var name
* Fix wrong type size for Q8_K
Better handling of original style metadata
* Set default value for gguf add_tensor raw_shape KW arg
* llama : improve token type support (#2668 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* llama : add API for token type
ggml-ci
* tests : use new tokenizer type API (#2692 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* Improve commentary
* Use token type API in test-tokenizer-1.cpp
* py : cosmetics
* readme : add notice about new file format
ggml-ci
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: goerch <jhr.walter@t-online.de>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> 
							
						 
						
							2023-08-21 23:07:43 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Keiichi Tabata 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2e8265ae17 
								
							 
						 
						
							
							
								
								convert.py : add missing abstract methods for quantized data ( #2491 )  
							
							
							
						 
						
							2023-08-06 09:34:05 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									mj-shifu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7c529cede6 
								
							 
						 
						
							
							
								
								convert.py : Update to support 70B HF format model files ( #2427 )  
							
							... 
							
							
							
							* convert.py : fix llama 2 70b conversion from Huggingface 
							
						 
						
							2023-07-27 14:39:17 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ldwang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fce48caf9a 
								
							 
						 
						
							
							
								
								convert.py : support bpe tokenizer ( #2228 )  
							
							... 
							
							
							
							* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com> 
							
						 
						
							2023-07-25 16:22:09 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e76d630df1 
								
							 
						 
						
							
							
								
								llama : grouped-query attention + LLaMAv2 70B support ( #2276 )  
							
							... 
							
							
							
							* CUDA: GQA implementation
* llama : support for GQA and LLaMAv2 70B
ggml-ci
* py : fix hparams parsing (if-else blocks)
ggml-ci
* py : oh boy ..
ggml-ci
* help : fix gqa value for 70B
ggml-ci
---------
Co-authored-by: JohannesGaessler <johannesg@5d6.de> 
							
						 
						
							2023-07-23 15:09:47 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									wzy 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b1f4290953 
								
							 
						 
						
							
							
								
								cmake : install targets ( #2256 )  
							
							... 
							
							
							
							fix  #2252  
						
							2023-07-19 10:01:11 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Aarni Koskela 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e08ae99ce 
								
							 
						 
						
							
							
								
								convert.py: add mapping for safetensors bf16 ( #1598 )  
							
							... 
							
							
							
							Fixes  #1473  
						
							2023-07-07 09:12:49 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Judd 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								36680f6e40 
								
							 
						 
						
							
							
								
								convert : update for baichuan ( #2081 )  
							
							... 
							
							
							
							1. guess n_layers;
2. relax warnings on context size;
3. add a note that its derivations are also supported.
Co-authored-by: Judd <foldl@boxvest.com> 
							
						 
						
							2023-07-06 19:23:49 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Judd 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								471aab6e4c 
								
							 
						 
						
							
							
								
								convert : add support of baichuan-7b ( #2055 )  
							
							... 
							
							
							
							Co-authored-by: Judd <foldl@boxvest.com> 
							
						 
						
							2023-07-01 20:00:25 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									AN Long 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c943d823c1 
								
							 
						 
						
							
							
								
								convert : fix invalid params in write_vocab_only ( #1975 )  
							
							
							
						 
						
							2023-06-24 14:02:06 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Erik Scholz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7487137227 
								
							 
						 
						
							
							
								
								rework convert.py to read hyper-parameters from config.json ( #1958 )  
							
							... 
							
							
							
							* Read hyper-parameters from HuggingFace-transformer config.json, if they exist, and fall back to guessing, like before otherwise.
  This allows converting open_llama 3B and other non-standard model designs. 
							
						 
						
							2023-06-22 14:20:47 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jiří Podivín 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5ddf7ea1fb 
								
							 
						 
						
							
							
								
								hooks : setting up flake8 and pre-commit hooks ( #1681 )  
							
							... 
							
							
							
							Small, non-functional changes were made to non-compliant files.
These include breaking up long lines, whitespace sanitation and
unused import removal.
Maximum line length in python files was set to a generous 125 chars,
in order to minimize number of changes needed in scripts and general
annoyance. The "txt" prompts directory is excluded from the checks
as it may contain oddly formatted files and strings for a good reason.
Signed-off-by: Jiri Podivin <jpodivin@gmail.com> 
							
						 
						
							2023-06-17 13:32:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Tom Jobbins 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2b2646931b 
								
							 
						 
						
							
							
								
								convert.py: Support models which are stored in a single pytorch_model.bin ( #1469 )  
							
							... 
							
							
							
							* Support models in a single pytorch_model.bin
* Remove spurious line with typo 
							
						 
						
							2023-05-17 00:04:35 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ubik2 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								95078cc554 
								
							 
						 
						
							
							
								
								convert: add ability to convert safetensors files ( #1276 )  
							
							... 
							
							
							
							* when loading a safetensors file, ignore the metadata header
* check for safetensors files first, and only use PyTorch versions when safetensors aren't available 
							
						 
						
							2023-05-08 13:54:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Lecaillon 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a90e96b266 
								
							 
						 
						
							
							
								
								Convert.py @staticmethod ( #1327 )  
							
							... 
							
							
							
							* Line 698 has one #staticmethod and should not
otherwise throw error at unpickle.load() as not callable
* Update convert.py
---------
Co-authored-by: Ivan Stepanov <ivanstepanovftw@gmail.com> 
							
						 
						
							2023-05-05 03:17:07 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ivan Stepanov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d3e8093e9b 
								
							 
						 
						
							
							
								
								convert: support DT_BF16 tensors ( #1309 )  
							
							... 
							
							
							
							Co-authored-by: Pavol Rusnak <pavol@rusnak.io> 
							
						 
						
							2023-05-04 18:54:37 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cameron 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4ad73137a1 
								
							 
						 
						
							
							
								
								add 4_0 to default outfile namestr dict ( #1031 )  
							
							... 
							
							
							
							this came up when trying to convert the gpt4all-lora-unfiltered-quantized.bin file 
							
						 
						
							2023-04-17 20:26:23 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3173a62eb9 
								
							 
						 
						
							
							
								
								stdout : vertical align outputs for better readibility  
							
							
							
						 
						
							2023-04-16 13:59:27 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									comex 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								74f5899df4 
								
							 
						 
						
							
							
								
								convert.py: Fix loading safetensors and ggml format on Windows ( #991 )  
							
							... 
							
							
							
							Calling `mmap.mmap` on Windows apparently resets the file offset of the
raw file object (and makes the BufferedReader return a *negative* file
offset).  For safetensors, avoid using the file offset after calling
mmap.  For GGML format, explicitly save and restore the offset.
Fixes  #966 . 
							
						 
						
							2023-04-15 23:53:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Pavol Rusnak 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43ffdefb74 
								
							 
						 
						
							
							
								
								py : fix flake8 and isort nitpicks ( #960 )  
							
							
							
						 
						
							2023-04-14 14:23:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									comex 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								723dac55fa 
								
							 
						 
						
							
							
								
								py : new conversion script ( #545 )  
							
							... 
							
							
							
							Current status: Working, except for the latest GPTQ-for-LLaMa format
  that includes `g_idx`.  This turns out to require changes to GGML, so
  for now it only works if you use the `--outtype` option to dequantize it
  back to f16 (which is pointless except for debugging).
  I also included some cleanup for the C++ code.
  This script is meant to replace all the existing conversion scripts
  (including the ones that convert from older GGML formats), while also
  adding support for some new formats.  Specifically, I've tested with:
  - [x] `LLaMA` (original)
  - [x] `llama-65b-4bit`
  - [x] `alpaca-native`
  - [x] `alpaca-native-4bit`
  - [x] LLaMA converted to 'transformers' format using
        `convert_llama_weights_to_hf.py`
  - [x] `alpaca-native` quantized with `--true-sequential --act-order
        --groupsize 128` (dequantized only)
  - [x] same as above plus `--save_safetensors`
  - [x] GPT4All
  - [x] stock unversioned ggml
  - [x] ggmh
  There's enough overlap in the logic needed to handle these different
  cases that it seemed best to move to a single script.
  I haven't tried this with Alpaca-LoRA because I don't know where to find
  it.
  Useful features:
  - Uses multiple threads for a speedup in some cases (though the Python
    GIL limits the gain, and sometimes it's disk-bound anyway).
  - Combines split models into a single file (both the intra-tensor split
    of the original and the inter-tensor split of 'transformers' format
    files).  Single files are more convenient to work with and more
    friendly to future changes to use memory mapping on the C++ side.  To
    accomplish this without increasing memory requirements, it has some
    custom loading code which avoids loading whole input files into memory
    at once.
  - Because of the custom loading code, it no longer depends in PyTorch,
    which might make installing dependencies slightly easier or faster...
    although it still depends on NumPy and sentencepiece, so I don't know
    if there's any meaningful difference.  In any case, I also added a
    requirements.txt file to lock the dependency versions in case of any
    future breaking changes.
  - Type annotations checked with mypy.
  - Some attempts to be extra user-friendly:
      - The script tries to be forgiving with arguments, e.g. you can
        specify either the model file itself or the directory containing
        it.
      - The script doesn't depend on config.json / params.json, just in
        case the user downloaded files individually and doesn't have those
        handy.  But you still need tokenizer.model and, for Alpaca,
        added_tokens.json.
      - The script tries to give a helpful error message if
        added_tokens.json is missing. 
							
						 
						
							2023-04-14 10:03:03 +03:00