* (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README
		
			
				
	
	
		
			9 lines
		
	
	
	
		
			285 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			9 lines
		
	
	
	
		
			285 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # llama.cpp/examples/speculative
 | |
| 
 | |
| Demonstration of speculative decoding and tree-based speculative decoding techniques
 | |
| 
 | |
| More info:
 | |
| 
 | |
| - https://github.com/ggerganov/llama.cpp/pull/2926
 | |
| - https://github.com/ggerganov/llama.cpp/pull/3624
 | |
| - https://github.com/ggerganov/llama.cpp/pull/5625
 |