* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
		
	
			
		
			
				
	
	
		
			44 lines
		
	
	
	
		
			1.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			44 lines
		
	
	
	
		
			1.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # llama.cpp/example/batched
 | |
| 
 | |
| The example demonstrates batched generation from a given prompt
 | |
| 
 | |
| ```bash
 | |
| ./llama-batched -m ./models/llama-7b-v2/ggml-model-f16.gguf -p "Hello my name is" -np 4
 | |
| 
 | |
| ...
 | |
| 
 | |
| main: n_len = 32, n_ctx = 2048, n_parallel = 4, n_kv_req = 113
 | |
| 
 | |
|  Hello my name is
 | |
| 
 | |
| main: generating 4 sequences ...
 | |
| 
 | |
| main: stream 0 finished
 | |
| main: stream 1 finished
 | |
| main: stream 2 finished
 | |
| main: stream 3 finished
 | |
| 
 | |
| sequence 0:
 | |
| 
 | |
| Hello my name is Shirley. I am a 25-year-old female who has been working for over 5 years as a b
 | |
| 
 | |
| sequence 1:
 | |
| 
 | |
| Hello my name is Renee and I'm a 32 year old female from the United States. I'm looking for a man between
 | |
| 
 | |
| sequence 2:
 | |
| 
 | |
| Hello my name is Diana. I am looking for a housekeeping job. I have experience with children and have my own transportation. I am
 | |
| 
 | |
| sequence 3:
 | |
| 
 | |
| Hello my name is Cody. I am a 3 year old neutered male. I am a very friendly cat. I am very playful and
 | |
| 
 | |
| main: decoded 108 tokens in 3.57 s, speed: 30.26 t/s
 | |
| 
 | |
| llama_print_timings:        load time =   587.00 ms
 | |
| llama_print_timings:      sample time =     2.56 ms /   112 runs   (    0.02 ms per token, 43664.72 tokens per second)
 | |
| llama_print_timings: prompt eval time =  4089.11 ms /   118 tokens (   34.65 ms per token,    28.86 tokens per second)
 | |
| llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
 | |
| llama_print_timings:       total time =  4156.04 ms
 | |
| ```
 |