llama : add support for Tekken pre-tokenizer (#8579)
* llama : Added support for Tekken pre-tokenizer (#8577) Removed uneeded `vocab.tokenizer_clean_spaces` assignment * llama : fix order of pre-tokenizers * * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces * Updated chkhsh for Tekken tokenizer --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
		
							parent
							
								
									69b9945b44
								
							
						
					
					
						commit
						940362224d
					
				
					 4 changed files with 18 additions and 0 deletions
				
			
		|  | @ -92,6 +92,7 @@ extern "C" { | |||
|         LLAMA_VOCAB_PRE_TYPE_CHATGLM4       = 17, | ||||
|         LLAMA_VOCAB_PRE_TYPE_VIKING         = 18, | ||||
|         LLAMA_VOCAB_PRE_TYPE_JAIS           = 19, | ||||
|         LLAMA_VOCAB_PRE_TYPE_TEKKEN         = 20, | ||||
|     }; | ||||
| 
 | ||||
|     // note: these values should be synchronized with ggml_rope
 | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue