Matt Pulver 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c82742ac9c 
								
							 
						 
						
							
							
								
								llama : add llama_beam_search() ( #2267 )  
							
							... 
							
							
							
							* Add llama_beam_search().
* Add '// Beam search' heading to llama.{h,cpp} after llama_grammar_accept_token().
* Add space around * pointers and & references.
* Add spaces around comparison and assignment operators.
* Prefer west const.
* Use llama_ prefix for structs in global namespace.
* Delete obsolete comment from an earlier revision.
* Change eos to eob in llama_beam and llama_beam_view structs. 
							
						 
						
							2023-08-25 18:18:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								154725c543 
								
							 
						 
						
							
							
								
								llama-bench : add model sizes ( #2771 )  
							
							... 
							
							
							
							* llama-bench : add model sizes
* more compact markdown output
* back to GiB
* adjust column sizes 
							
						 
						
							2023-08-25 15:16:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jhen-Jie Hong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								29674ab4e8 
								
							 
						 
						
							
							
								
								server : display token probabilities in the UI ( #2489 )  
							
							... 
							
							
							
							* server : add n_probs param in chat UI
* server : keep message data array & show in probabilites component
* server : add simple popover component
* server : fix completion_probabilities undefined if not set n_probs
* server : implement Probabilites
* server : handle bytes
* server : make n_probs max to 10 for easy scroll
* server : adjust for dark/light mode
* server : Fix regenerated prompt
* server : update index.html.hpp
* server : convert prob to percentage + show original value as div title
* server : fix Probabilites not used if included empty str
* server : skip byte pair in display probabilites
* server : remove array check of completion_probabilities in messages
* skip empty array or byte pair (> 1) in Probabilites
* generate index.html.hpp
* fix incorrect prob convert if the str is already a known token
* use final response to show probabilities on stop
* revert unnecessary change
* correct probabilites usage
* remove unused function
* always send partial response for get correct probs of last to_send
* fix typo
* fix content of format_final_response
* refactor probs render & make pColor transparent if not found
* send empty string when got stop_pos in partial
* avoid unnecessary empty data event & send rest of partial tokens on stop
* use <br /> for new line
* skip -1 tok in loop to avoid send '' on end
* trim last new lines on stop
* revert unnecessary change 
							
						 
						
							2023-08-25 18:32:45 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Henri Vasserman 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6bbc598a63 
								
							 
						 
						
							
							
								
								ROCm Port ( #1087 )  
							
							... 
							
							
							
							* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5 )
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP
---------
Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com> 
							
						 
						
							2023-08-25 12:09:42 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kerfuffle 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7694adda8d 
								
							 
						 
						
							
							
								
								Fix for main example getting stuck when -n -2 and --interactive ( #2767 )  
							
							... 
							
							
							
							* Fix for main example getting stuck when -n -2 and --interactive
* Add a comment so future generations may suffer less. 
							
						 
						
							2023-08-24 10:11:13 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cf658adc83 
								
							 
						 
						
							
							
								
								llm : add Falcon support ( #2717 )  
							
							... 
							
							
							
							* llama : refactor GGUF constants into static maps
* llama : check if model architecture is known
* llama : refactor llama_model_load_internal()
* gguf : add KV constant maps
* llm : read arch-specific KVs
* convert : add dummy scores + types
* falcon : load tensor data (CPU only)
* llama : fix loading progress bar
* llama : add arch member to llama_model
* falcon : CPU inference working
* falcon : support non-40B models
* falcon : minor
* llama : minor updates
ggml-ci
* convert-falcon-hf-to-gguf.py : fix special token mapping
* llama.cpp : llama default UNK token = id 0
* llama.cpp : fix bpe tokenizer
* llama.cpp : fix the fix of bpe tokenizer
* ggml : pass eps to ggml_norm
* metal : implement RoPE (mode = 2) + avoid ggml_repeat
* ggml : ggml_repeat always creates new tensor
* falcon : copy-paste self-attention from LLaMA
* metal : print extra compute pipeline info
* falcon : minor changes (still chasing the Metal problem)
* llama.cpp : fix linefeed token
* metal : fix GELU kernel numerical stability by using precise::tanh
* metal : temporary workaround for the concurrency optimization bug
* falcon : add CUDA offloading (#2739 )
* llama : better model naming and size reporting
* llama : prep new tokenizer support
* llama : advanced BPE tokenizer based on ggllm.cpp imlpementation
* llama : remove oboslete comment
ggml-ci
* common : remove obsolete BPE API + disable test-tokenizer-1
* llama : revert BPE special-case in llama_byte_to_token()
* cuda : add TODOs for RoPE NeoX implementation
* llama : default special tokens based on vocab type
* perplexity : add log for start of tokenization
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2023-08-23 23:08:04 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a192860cfe 
								
							 
						 
						
							
							
								
								minor : fix trailing whitespace  
							
							
							
						 
						
							2023-08-23 22:37:39 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								95385241a9 
								
							 
						 
						
							
							
								
								examples : restore the functionality to import llama2.c models ( #2685 )  
							
							... 
							
							
							
							* Fix import of llama2.c models that don't share weights between embedding layers
* llama2c: reinstate ggmlv3 conversion output + update readme w/ gguf conv
* llama2.c: comment out legacy "load from ggml model" logic
* llama2.c: convert special-cased "<0xXX>" single byte tokens from tokenizer.bin 
							
						 
						
							2023-08-23 22:33:05 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									klosax 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5290c38e6e 
								
							 
						 
						
							
							
								
								main : insert bos if no tokens ( #2727 )  
							
							... 
							
							
							
							* main.cpp : insert bos if no tokens
* Update examples/main/main.cpp
* Update examples/main/main.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2023-08-23 16:46:03 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7c2227a197 
								
							 
						 
						
							
							
								
								chmod : make scripts executable ( #2675 )  
							
							
							
						 
						
							2023-08-23 17:29:09 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8207214b6a 
								
							 
						 
						
							
							
								
								Fix values shown in the quantize tool help ( #2735 )  
							
							... 
							
							
							
							Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-08-23 12:57:12 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								62959e740e 
								
							 
						 
						
							
							
								
								Strided perplexity ( #2714 )  
							
							... 
							
							
							
							* Implementing strided computation of perplexity
* Alternative way to output PPL results
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-08-23 12:56:42 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xiao-Yong Jin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b8ad1b66b2 
								
							 
						 
						
							
							
								
								server : allow json array in prompt or content for direct token input ( #2306 )  
							
							... 
							
							
							
							* server: allow json array in prompt or content
We accept an array of strings and numbers representing tokens,
in addition to the current string valued prompt or content.
This allows direct token input, so that any special tokens
can be processed and used at the frontend during the construction
of the json data, before sending to the server. And the server
does not need to know or parse special tokens from textual input.
With this, we can use EOS and BOS used in llama-2-chat models.
* server: use tokenizePrompt(json) and default "" if empty prompt
* server: fix prompt check
* server: tokenize endpoint no longer adds BOS 
							
						 
						
							2023-08-23 15:12:12 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Evan Jones 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f5fe98d11b 
								
							 
						 
						
							
							
								
								docs : add grammar docs ( #2701 )  
							
							... 
							
							
							
							* docs : add grammar docs
* tweaks to grammar guide
* rework GBNF example to be a commented grammar 
							
						 
						
							2023-08-22 21:01:57 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c63bb1d16a 
								
							 
						 
						
							
							
								
								CUDA: use mul_mat_q kernels by default ( #2683 )  
							
							
							
						 
						
							2023-08-22 22:47:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								519c981f8b 
								
							 
						 
						
							
							
								
								embedding : evaluate prompt in batches ( #2713 )  
							
							
							
						 
						
							2023-08-22 16:03:12 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef3f333d37 
								
							 
						 
						
							
							
								
								ggml : sync latest (SAM + SD operators, CUDA alibi) ( #2709 )  
							
							... 
							
							
							
							* ggml : sync latest (SAM + SD operators, CUDA alibi)
ggml-ci
* ggml : fix tabs 
							
						 
						
							2023-08-22 14:22:08 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8e4364f2af 
								
							 
						 
						
							
							
								
								llama-bench : minor fixes ( #2695 )  
							
							
							
						 
						
							2023-08-22 10:56:03 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jhen-Jie Hong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								226255b44e 
								
							 
						 
						
							
							
								
								server : fallback to default if client param is null ( #2688 )  
							
							... 
							
							
							
							* server : fallback to default if client param is null
* server : do not overwrite 404 if status is 500 from exception_handler 
							
						 
						
							2023-08-22 08:32:00 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6381d4e110 
								
							 
						 
						
							
							
								
								gguf : new file format with flexible meta data (beta) ( #2398 )  
							
							... 
							
							
							
							* gguf : first API pass
* gguf : read header + meta data
* gguf : read tensor info
* gguf : initial model loading - not tested
* gguf : add gguf_get_tensor_name()
* gguf : do not support passing existing ggml_context to gguf_init
* gguf : simplify gguf_get_val
* gguf : gguf.c is now part of ggml.c
* gguf : read / write sample models
* gguf : add comments
* refactor : reduce code duplication and better API (#2415 )
* gguf : expose the gguf_type enum through the API for now
* gguf : add array support
* gguf.py : some code style changes
* convert.py : start a new simplified implementation by removing old stuff
* convert.py : remove GGML vocab + other obsolete stuff
* GGUF : write tensor (#2426 )
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
* gguf : add gguf_find_key (#2438 )
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
* gguf : fix writing tensors
* gguf : do not hardcode tensor names to read
* gguf : write sample tensors to read
* gguf : add tokenization constants
* quick and dirty conversion example
* gguf : fix writing gguf arrays
* gguf : write tensors one by one and code reuse
* gguf : fix writing gguf arrays
* gguf : write tensors one by one
* gguf : write tensors one by one
* gguf : write tokenizer data
* gguf : upd gguf conversion script
* Update convert-llama-h5-to-gguf.py
* gguf : handle already encoded string
* ggml.h : get array str and f32
* ggml.c : get arr str and f32
* gguf.py : support any type
* Update convert-llama-h5-to-gguf.py
* gguf : fix set is not subscriptable
* gguf : update convert-llama-h5-to-gguf.py
* constants.py : add layer norm eps
* gguf.py : add layer norm eps and merges
* ggml.h : increase GGML_MAX_NAME to 64
* ggml.c : add gguf_get_arr_n
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Makefile : add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* gguf : support custom alignment value
* gguf : fix typo in function call
* gguf : mmap tensor data example
* fix : update convert-llama-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* convert-gptneox-h5-to-gguf.py : Special tokens
* gptneox-main.cpp : special tokens
* Update gptneox-main.cpp
* constants.py : special tokens
* gguf.py : accumulate kv and tensor info data + special tokens
* convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens
* gguf : gguf counterpart of llama-util.h
* gguf-util.h : update note
* convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens
* convert-llama-h5-to-gguf.py : special tokens
* Delete gptneox-common.cpp
* Delete gptneox-common.h
* convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer
* gptneox-main.cpp : gpt2 bpe tokenizer
* gpt2 bpe tokenizer (handles merges and unicode)
* Makefile : remove gptneox-common
* gguf.py : bytesarray for gpt2bpe tokenizer
* cmpnct_gpt2bpe.hpp : comments
* gguf.py : use custom alignment if present
* gguf : minor stuff
* Update gptneox-main.cpp
* map tensor names
* convert-gptneox-h5-to-gguf.py : map tensor names
* convert-llama-h5-to-gguf.py : map tensor names
* gptneox-main.cpp : map tensor names
* gguf : start implementing libllama in GGUF (WIP)
* gguf : start implementing libllama in GGUF (WIP)
* rm binary commited by mistake
* upd .gitignore
* gguf : calculate n_mult
* gguf :  inference with 7B model working (WIP)
* gguf : rm deprecated function
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : add gguf_get_kv_type
* gguf : add gguf_get_kv_type
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver
* gguf : rm references to old file formats
* gguf : shorter name for member variable
* gguf : rm redundant method
* gguf : get rid of n_mult, read n_ff from file
* Update gguf_tensor_map.py
* Update gptneox-main.cpp
* gguf : rm references to old file magics
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : quantization is working
* gguf : roper closing of file
* gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : no need to convert tensors twice
* convert-llama-h5-to-gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : simplify nbytes
* convert-llama-h5-to-gguf.py : simplify nbytes
* gptneox-main.cpp : n_layer --> n_block
* constants.py : n_layer --> n_block
* gguf.py : n_layer --> n_block
* convert-gptneox-h5-to-gguf.py : n_layer --> n_block
* convert-llama-h5-to-gguf.py : n_layer --> n_block
* gptneox-main.cpp : n_layer --> n_block
* Update gguf_tensor_map.py
* convert-gptneox-h5-to-gguf.py : load model in parts to save memory
* convert-llama-h5-to-gguf.py : load model in parts to save memory
* convert : write more metadata for LLaMA
* convert : rm quantization version
* convert-gptneox-h5-to-gguf.py : add file_type key
* gptneox-main.cpp : add file_type key
* fix conflicts
* gguf : add todos and comments
* convert-gptneox-h5-to-gguf.py : tensor name map changes
* Create gguf_namemap.py : tensor name map changes
* Delete gguf_tensor_map.py
* gptneox-main.cpp : tensor name map changes
* convert-llama-h5-to-gguf.py : fixes
* gguf.py : dont add empty strings
* simple : minor style changes
* gguf : use UNIX line ending
* Create convert-llama-7b-pth-to-gguf.py
* llama : sync gguf-llama.cpp with latest llama.cpp (#2608 )
* llama : sync gguf-llama.cpp with latest llama.cpp
* minor : indentation + assert
* llama : refactor gguf_buffer and gguf_ctx_buffer
* llama : minor
* gitignore : add gptneox-main
* llama : tokenizer fixes (#2549 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* convert : update convert-new.py with tokenizer fixes (#2614 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* llama : sync gguf-llama with llama (#2613 )
* llama : sync gguf-llama with llama
* tests : fix build + warnings (test-tokenizer-1 still fails)
* tests : fix wstring_convert
* convert : fix layer names
* llama : sync gguf-llama.cpp
* convert : update HF converter to new tokenizer voodoo magics
* llama : update tokenizer style
* convert-llama-h5-to-gguf.py : add token types
* constants.py : add token types
* gguf.py : add token types
* convert-llama-7b-pth-to-gguf.py : add token types
* gguf-llama.cpp :  fix n_head_kv
* convert-llama-h5-to-gguf.py : add 70b gqa support
* gguf.py : add tensor data layout
* convert-llama-h5-to-gguf.py : add tensor data layout
* convert-llama-7b-pth-to-gguf.py : add tensor data layout
* gptneox-main.cpp : add tensor data layout
* convert-llama-h5-to-gguf.py : clarify the reverse permute
* llama : refactor model loading code (#2620 )
* llama : style formatting + remove helper methods
* llama : fix quantization using gguf tool
* llama : simplify gguf_file_saver
* llama : fix method names
* llama : simplify write_header()
* llama : no need to pass full file loader to the file saver
just gguf_ctx
* llama : gguf_file_saver write I32
* llama : refactor tensor names (#2622 )
* gguf: update tensor names searched in quantization
* gguf : define tensor names as constants
* gguf : initial write API (not tested yet)
* gguf : write to file API (not tested)
* gguf : initial write API ready + example
* gguf : fix header write
* gguf : fixes + simplify example + add ggml_nbytes_pad()
* gguf : minor
* llama : replace gguf_file_saver with new gguf write API
* gguf : streaming support when writing files
* gguf : remove oboslete write methods
* gguf : remove obosolete gguf_get_arr_xxx API
* llama : simplify gguf_file_loader
* llama : move hparams and vocab from gguf_file_loader to llama_model_loader
* llama : merge gguf-util.h in llama.cpp
* llama : reorder definitions in .cpp to match .h
* llama : minor simplifications
* llama : refactor llama_model_loader (WIP)
wip : remove ggml_ctx from llama_model_loader
wip : merge gguf_file_loader in llama_model_loader
* llama : fix shape prints
* llama : fix Windows build + fix norm_rms_eps key
* llama : throw error on missing KV paris in model meta data
* llama : improve printing + log meta data
* llama : switch print order of meta data
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
* gguf : deduplicate (#2629 )
* gguf : better type names
* dedup : CPU + Metal is working
* ggml : fix warnings about unused results
* llama.cpp : fix line feed and compiler warning
* llama : fix strncpy warning + note token_to_str does not write null
* llama : restore the original load/save session implementation
Will migrate this to GGUF in the future
* convert-llama-h5-to-gguf.py : support alt ctx param name
* ggml : assert when using ggml_mul with non-F32 src1
* examples : dedup simple
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
* gguf.py : merge all files in gguf.py
* convert-new.py : pick #2427  for HF 70B support
* examples/gguf : no need to keep q option for quantization any more
* llama.cpp : print actual model size
* llama.cpp : use ggml_elements()
* convert-new.py : output gguf (#2635 )
* convert-new.py : output gguf (WIP)
* convert-new.py : add gguf key-value pairs
* llama : add hparams.ctx_train + no longer print ftype
* convert-new.py : minor fixes
* convert-new.py : vocab-only option should work now
* llama : fix tokenizer to use llama_char_to_byte
* tests : add new ggml-vocab-llama.gguf
* convert-new.py : tensor name mapping
* convert-new.py : add map for skipping tensor serialization
* convert-new.py : convert script now works
* gguf.py : pick some of the refactoring from #2644 
* convert-new.py : minor fixes
* convert.py : update to support GGUF output
* Revert "ci : disable CI temporary to not waste energy"
This reverts commit 7e82d25f40#2644 )
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : style fixes in simple conversion script
* gguf : refactor gptneox conversion script
* gguf : rename h5 to hf (for HuggingFace)
* gguf : refactor pth to gguf conversion script
* gguf : rm file_type key and method
* gguf.py : fix vertical alignment
* gguf.py : indentation
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* convert-gptneox-hf-to-gguf.py : fixes
* gguf.py : gptneox mapping
* convert-llama-hf-to-gguf.py : fixes
* convert-llama-7b-pth-to-gguf.py : fixes
* ggml.h : reverse GGUF_MAGIC
* gguf.py : reverse GGUF_MAGIC
* test-tokenizer-0.cpp : fix warning
* llama.cpp : print kv general.name
* llama.cpp : get special token kv and linefeed token id
* llama : print number of tensors per type + print arch + style
* tests : update vocab file with new magic
* editorconfig : fix whitespaces
* llama : re-order functions
* llama : remove C++ API + reorganize common source in /common dir
* llama : minor API updates
* llama : avoid hardcoded special tokens
* llama : fix MPI build
ggml-ci
* llama : introduce enum llama_vocab_type + remove hardcoded string constants
* convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested
* falcon-main.cpp : falcon inference example
* convert-falcon-hf-to-gguf.py : remove extra kv
* convert-gptneox-hf-to-gguf.py : remove extra kv
* convert-llama-7b-pth-to-gguf.py : remove extra kv
* convert-llama-hf-to-gguf.py : remove extra kv
* gguf.py : fix for falcon 40b
* falcon-main.cpp : fix for falcon 40b
* convert-falcon-hf-to-gguf.py : update ref
* convert-falcon-hf-to-gguf.py : add tensor data layout
* cmpnct_gpt2bpe.hpp : fixes
* falcon-main.cpp : fixes
* gptneox-main.cpp : fixes
* cmpnct_gpt2bpe.hpp : remove non-general stuff
* Update examples/server/README.md
Co-authored-by: slaren <slarengh@gmail.com>
* cmpnct_gpt2bpe.hpp : cleanup
* convert-llama-hf-to-gguf.py : special tokens
* convert-llama-7b-pth-to-gguf.py : special tokens
* convert-permute-debug.py : permute debug print
* convert-permute-debug-master.py : permute debug for master
* convert-permute-debug.py : change permute type of attn_q
* convert.py : 70b model working (change attn_q permute)
* Delete convert-permute-debug-master.py
* Delete convert-permute-debug.py
* convert-llama-hf-to-gguf.py : fix attn_q permute
* gguf.py : fix rope scale kv
* convert-llama-hf-to-gguf.py : rope scale and added tokens
* convert-llama-7b-pth-to-gguf.py : rope scale and added tokens
* llama.cpp : use rope scale kv
* convert-llama-7b-pth-to-gguf.py : rope scale fix
* convert-llama-hf-to-gguf.py : rope scale fix
* py : fix whitespace
* gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682 )
* First pass at converting GGMLv3 LLaMA models to GGUF
* Cleanups, better output during conversion
* Fix vocab space conversion logic
* More vocab conversion fixes
* Add description to converted GGUF files
* Improve help text, expand warning
* Allow specifying name and description for output GGUF
* Allow overriding vocab and hyperparams from original model metadata
* Use correct params override var name
* Fix wrong type size for Q8_K
Better handling of original style metadata
* Set default value for gguf add_tensor raw_shape KW arg
* llama : improve token type support (#2668 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* llama : add API for token type
ggml-ci
* tests : use new tokenizer type API (#2692 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* Improve commentary
* Use token type API in test-tokenizer-1.cpp
* py : cosmetics
* readme : add notice about new file format
ggml-ci
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: goerch <jhr.walter@t-online.de>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> 
							
						 
						
							2023-08-21 23:07:43 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb1c0727bd 
								
							 
						 
						
							
							
								
								HellaSwag: split token evaluation into batches if needed ( #2681 )  
							
							... 
							
							
							
							Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-08-21 11:11:31 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kawrakow 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5e9ff54a67 
								
							 
						 
						
							
							
								
								More efficient Hellaswag implementation ( #2677 )  
							
							... 
							
							
							
							Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> 
							
						 
						
							2023-08-20 16:44:46 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1f0bccb279 
								
							 
						 
						
							
							
								
								server : better default prompt ( #2646 )  
							
							
							
						 
						
							2023-08-19 05:45:36 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jhen-Jie Hong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f63564adfa 
								
							 
						 
						
							
							
								
								server : update xxd usage for older versions compatibility ( #2649 )  
							
							... 
							
							
							
							* server : update xxd usage for older versions compatibility
* remove unused $func 
							
						 
						
							2023-08-19 05:41:32 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								097e121e2f 
								
							 
						 
						
							
							
								
								llama : add benchmark example ( #2626 )  
							
							... 
							
							
							
							* llama : add benchmark example
* add to examples CMakeLists.txt
* fix msvc build
* add missing include
* add Bessel's correction to stdev calculation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* improve markdown formatting
* add missing include
* print warning is NDEBUG is not defined
* remove n_prompt and n_gen from the matrix, use each value separately instead
* better checks for non-optimized builds
* llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call
* fix json formatting
* add sql output
* add basic cpu and gpu info (linx/cuda only)
* markdown: also show values that differ from the default
* markdown: add build id
* cleanup
* improve formatting
* formatting
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de> 
							
						 
						
							2023-08-18 12:44:58 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e9b12c332e 
								
							 
						 
						
							
							
								
								perplexity : more meaningful ETA number - 2 decimal points  
							
							
							
						 
						
							2023-08-18 12:48:55 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									staviq 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								10151bee2e 
								
							 
						 
						
							
							
								
								server : support for saving templates in browser LocalStorage ( #2486 )  
							
							... 
							
							
							
							* support for templates in browser LocalStorage
* sync accepted #2409  fix from upstream
* convert autosave invocation to useEffect
* Apply suggestions from code review
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
* Regen index.html.cpp, suggested from code review
---------
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> 
							
						 
						
							2023-08-18 07:34:01 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kerfuffle 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8dae7ce684 
								
							 
						 
						
							
							
								
								Add --cfg-negative-prompt-file option for examples ( #2591 )  
							
							... 
							
							
							
							Add --cfg-negative-prompt-file option for examples 
							
						 
						
							2023-08-17 07:29:44 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jhen-Jie Hong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3ebb00935f 
								
							 
						 
						
							
							
								
								server : add missing /json-schema-to-grammar.mjs ( #2616 )  
							
							... 
							
							
							
							fixes  #2611  
						
							2023-08-15 06:14:14 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cheng Shao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d75561df20 
								
							 
						 
						
							
							
								
								server : add --numa support ( #2524 )  
							
							
							
						 
						
							2023-08-14 16:36:42 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jhen-Jie Hong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2feb8934eb 
								
							 
						 
						
							
							
								
								server : fix default grammar by use empty string in the UI ( #2604 )  
							
							
							
						 
						
							2023-08-14 16:20:17 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jhen-Jie Hong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5517d6e692 
								
							 
						 
						
							
							
								
								server : implement json-schema-to-grammar.mjs & add grammar param in the UI ( #2588 )  
							
							... 
							
							
							
							* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration 
							
						 
						
							2023-08-14 15:16:54 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									byte-6174 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b19edd54d5 
								
							 
						 
						
							
							
								
								Adding support for llama2.c models ( #2559 )  
							
							
							
						 
						
							2023-08-12 01:17:25 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Equim 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53dc399472 
								
							 
						 
						
							
							
								
								server: fixed wrong variable name in timing json ( #2579 )  
							
							... 
							
							
							
							* server: fixed wrong variable name in timing json
* remove redunct entry 
							
						 
						
							2023-08-12 00:35:14 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									DannyDaemonic 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9ca4abed89 
								
							 
						 
						
							
							
								
								Handle ENABLE_VIRTUAL_TERMINAL_PROCESSING more gracefully on earlier versions of Windows.  
							
							
							
						 
						
							2023-08-10 13:11:36 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Christian Demsar 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e59fcb2bc1 
								
							 
						 
						
							
							
								
								Add --n-predict -2 for stopping generation on full context ( #2565 )  
							
							
							
						 
						
							2023-08-10 16:28:27 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Krasser 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1638757767 
								
							 
						 
						
							
							
								
								Fix grammar-based sampling issue in server ( #2566 )  
							
							
							
						 
						
							2023-08-10 13:16:38 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Krasser 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f5bfea0580 
								
							 
						 
						
							
							
								
								Allow passing grammar to completion endpoint ( #2532 )  
							
							... 
							
							
							
							* Allow passing grammar to completion endpoint 
							
						 
						
							2023-08-08 16:29:19 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									chaihahaha 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7ed8d1fe7f 
								
							 
						 
						
							
							
								
								llm.vim : multiline autocompletion, get rid of "^@" ( #2543 )  
							
							
							
						 
						
							2023-08-08 15:07:02 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e7f94d6fdc 
								
							 
						 
						
							
							
								
								vim : bring back simple llm.vim example  
							
							
							
						 
						
							2023-08-08 15:06:18 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									AustinMroz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2d7baaf50f 
								
							 
						 
						
							
							
								
								vim : streaming and more ( #2495 )  
							
							... 
							
							
							
							* Update Vim plugin
* Remove getbufoneline usage, Add input bind example.
getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.
An additional example that explains how to add a keybind that works in
insert mode was added. 
							
						 
						
							2023-08-08 14:44:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									klosax 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3c3b4b167 
								
							 
						 
						
							
							
								
								Add --rope-scale parameter ( #2544 )  
							
							... 
							
							
							
							* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling 
							
						 
						
							2023-08-07 19:07:19 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									DannyDaemonic 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								86c3219895 
								
							 
						 
						
							
							
								
								console : fix issue related to Windows 11 PowerShell console mode persistence ( #2521 )  
							
							
							
						 
						
							2023-08-06 09:49:34 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jonas Wunderlich 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								332311234a 
								
							 
						 
						
							
							
								
								fix firefox autoscroll ( #2519 )  
							
							
							
						 
						
							2023-08-04 22:16:11 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Cebtenzzre 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								182af739c4 
								
							 
						 
						
							
							
								
								server: regenerate completion.js.hpp ( #2515 )  
							
							
							
						 
						
							2023-08-04 21:00:57 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									DannyDaemonic 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3498588e0f 
								
							 
						 
						
							
							
								
								Add --simple-io option for subprocesses and break out console.h and cpp ( #1558 )  
							
							
							
						 
						
							2023-08-04 08:20:12 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Stephen Nichols 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5f631c2679 
								
							 
						 
						
							
							
								
								Fixing race condition in server and partial stream handling in frontend. ( #2391 )  
							
							... 
							
							
							
							* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof 
							
						 
						
							2023-08-04 13:37:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Borislav Stanimirov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff966e7ca6 
								
							 
						 
						
							
							
								
								build : fix several cast and printf warnings ( #2499 )  
							
							
							
						 
						
							2023-08-04 13:07:21 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Evan Jones 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8183159cf3 
								
							 
						 
						
							
							
								
								examples : generate JSON according to schema ( #1887 )  
							
							... 
							
							
							
							* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text 
							
						 
						
							2023-08-02 22:05:44 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								81844fbcfd 
								
							 
						 
						
							
							
								
								tests : Fix compilation warnings (Linux/GCC) ( #2451 )  
							
							... 
							
							
							
							* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params 
							
						 
						
							2023-08-02 11:06:19 +03:00