server : Add option to return token pieces in /tokenize endpoint (#9108)
* server : added with_pieces functionality to /tokenize endpoint * server : Add tokenize with pieces tests to server.feature * Handle case if tokenizer splits along utf8 continuation bytes * Add example of token splitting * Remove trailing ws * Fix trailing ws * Maybe fix ci * maybe this fix windows ci? --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
This commit is contained in:
		
							parent
							
								
									e6b7801bd1
								
							
						
					
					
						commit
						78203641fe
					
				
					 6 changed files with 139 additions and 6 deletions
				
			
		|  | @ -105,6 +105,14 @@ Feature: llama.cpp server | |||
|     Given first token is removed | ||||
|     Then  tokens can be detokenized | ||||
| 
 | ||||
|   Scenario: Tokenize with pieces | ||||
|     When  tokenizing with pieces: | ||||
|     """ | ||||
|     What is the capital of Germany? | ||||
|     媽 | ||||
|     """ | ||||
|     Then  tokens are given with pieces | ||||
| 
 | ||||
|   Scenario: Models available | ||||
|     Given available models | ||||
|     Then  1 models are supported | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue