| * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit  | ||
|---|---|---|
| .. | ||
| arithmetic.gbnf | ||
| c.gbnf | ||
| chess.gbnf | ||
| japanese.gbnf | ||
| json.gbnf | ||
| json_arr.gbnf | ||
| list.gbnf | ||
| README.md | ||
GBNF Guide
GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in examples/main and examples/server.
Background
Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.
Basics
In GBNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence....
Example
Before going deeper, let's look at some of the features demonstrated in grammars/chess.gbnf, a small chess notation grammar:
# `root` specifies the pattern for the overall output
root ::= (
    # it must start with the characters "1. " followed by a sequence
    # of characters that match the `move` rule, followed by a space, followed
    # by another move, and then a newline
    "1. " move " " move "\n"
    # it's followed by one or more subsequent moves, numbered with one or two digits
    ([1-9] [0-9]? ". " move " " move "\n")+
)
# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?
pawn ::= ...
nonpawn ::= ...
castle ::= ...
Non-Terminals and Terminals
Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move, castle, or check-mate.
Terminals are actual characters (code points). They can be specified as a sequence like "1" or "O-O" or as ranges like [1-9] or [NBKQR].
Characters and character ranges
Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ], or with escapes: 8-bit (\xXX), 16-bit (\uXXXX) or 32-bit (\UXXXXXXXX).
Character ranges can be negated with ^:
single-line ::= [^\n]+ "\n"`
Sequences and Alternatives
The order of symbols in a sequence matters. For example, in "1. " move " " move "\n", the "1. " must come before the first move, etc.
Alternatives, denoted by |, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle, move can be a pawn move, a nonpawn move, or a castle.
Parentheses () can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
Repetition and Optional Symbols
- *after a symbol or sequence means that it can be repeated zero or more times (equivalent to- {0,}).
- +denotes that the symbol or sequence should appear one or more times (equivalent to- {1,}).
- ?makes the preceding symbol or sequence optional (equivalent to- {0,1}).
- {m}repeats the precedent symbol or sequence exactly- mtimes
- {m,}repeats the precedent symbol or sequence at least- mtimes
- {m,n}repeats the precedent symbol or sequence at between- mand- ntimes (included)
- {0,n}repeats the precedent symbol or sequence at most- ntimes (included)
Comments and newlines
Comments can be specified with #:
# defines optional whitespace
ws ::= [ \t\n]+
Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker | will continue the current rule, even outside of parentheses.
The root rule
In a full grammar, the root rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.
# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"
Next steps
This guide provides a brief overview. Check out the GBNF files in this directory (grammars/) for examples of full grammars. You can try them out with:
./llama-cli -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'
llama.cpp can also convert JSON schemas to grammars either ahead of time or at each request, see below.
Troubleshooting
Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).
Efficient optional repetitions
A common pattern is to allow repetitions of a pattern x up to N times.
While semantically correct, the syntax x? x? x?.... x? (with N repetitions) may result in extremely slow sampling. Instead, you can write x{0,N} (or (x (x (x ... (x)?...)?)?)? w/ N-deep nesting in earlier llama.cpp versions).
Using GBNF grammars
You can use GBNF grammars:
- In llama-server's completion endpoints, passed as the grammarbody field
- In llama-cli, passed as the --grammar&--grammar-fileflags
- With llama-gbnf-validator tool, to test them against strings.
JSON Schemas → GBNF
llama.cpp supports converting a subset of https://json-schema.org/ to GBNF grammars:
- In llama-server:
- For any completion endpoints, passed as the json_schemabody field
- For the /chat/completionsendpoint, passed inside theresult_formatbody field (e.g.{"type", "json_object", "schema": {"items": {}}})
 
- For any completion endpoints, passed as the 
- In llama-cli, passed as the --json/-jflag
- To convert to a grammar ahead of time:
- in CLI, with examples/json_schema_to_grammar.py
- in JavaScript with json-schema-to-grammar.mjs (this is used by the server's Web UI)
 
Take a look at tests to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).
Here is also a non-exhaustive list of unsupported features:
- additionalProperties: to be fixed in https://github.com/ggerganov/llama.cpp/pull/7840
- minimum,- exclusiveMinimum,- maximum,- exclusiveMaximum- integerconstraints to be implemented in https://github.com/ggerganov/llama.cpp/pull/7797
 
- Remote $refs in the C++ version (Python & JavaScript versions fetch https refs)
- Mixing propertiesw/anyOf/oneOfin the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
- stringformats- uri,- email
- contains/- minContains
- uniqueItems
- $anchor(cf. dereferencing)
- not
- Conditionals if/then/else/dependentSchemas
- patternProperties