ochafik
d6483a9c07
add min/max constrained int field to pydantic json schema example
2024-06-10 02:00:04 +01:00
ochafik
cad377d3a1
add C++11-compatible replacement for std::string_view
2024-06-09 19:35:36 +01:00
ochafik
d1f679125f
Update test-grammar-integration.cpp
2024-06-09 13:22:41 +01:00
Olivier Chafik
dcc27d1a93
fix min in [1, 9]
2024-06-09 09:42:19 +01:00
Olivier Chafik
a0f19047af
nit: move + rename _build_min_max_int
2024-06-08 21:46:18 +01:00
Olivier Chafik
e93368076b
json: port min/max integer support to Python & JS
2024-06-08 21:33:24 +01:00
Olivier Chafik
3549702da7
json: nit: move string rules together
2024-06-08 21:21:35 +01:00
Olivier Chafik
ac2a8f8930
Update test-grammar-integration.cpp
2024-06-08 20:39:42 +01:00
Olivier Chafik
4c1c29361e
json: fix negative min (w/ more than 1 digit)
2024-06-08 20:33:40 +01:00
Olivier Chafik
931b543607
json: fix negative max
2024-06-08 20:33:12 +01:00
Olivier Chafik
a786c0381b
Merge remote-tracking branch 'origin/master' into json-bounds2
2024-06-08 20:05:17 +01:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix ( #7728 )
...
* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument
2024-06-08 10:50:31 +03:00
slaren
da799b4189
vulkan : reuse parent extra for views ( #7806 )
...
* vulkan : reuse parent extra for views
* Fix validation error when multiple compute contexts are used in a graph
---------
Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal ( #7803 )
2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build ( #7784 )
...
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] ( #7745 )
2024-06-07 11:15:49 +02:00
woodx
a5cabd7649
server : do not get prompt in infill mode ( #7286 )
...
* avoid to get prompt in infill mode and embedding mode
* remove embedding mode
* refactor format
---------
Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99
d5c938cd77
[SYCL] fix softmax r2r result wrong issue ( #7811 )
2024-06-07 14:28:26 +08:00
slaren
c9ee7118d5
check for nans in imatrix and quantize ( #7807 )
...
* imatrix : detect nan/inf values
* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
Georgi Gerganov
ee459f40f6
server : fix --threads-http arg ( #7801 )
2024-06-06 19:19:59 +03:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params ( #7771 )
...
* imatrix : migrate to gpt_params
ggml-ci
* imatrix : add --save-frequency cli arg
* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. ( #6467 )
...
* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Mattheus Chediak
a143c04375
README minor fixes ( #7798 ) [no ci]
...
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
ochafik
b6b6a6caee
Update json-schema-to-grammar.cpp
2024-06-06 10:14:28 +01:00
ochafik
431edb8e7b
json: fix bounds tests
2024-06-06 10:14:28 +01:00
ochafik
5a86c6f0e2
json: integration test for schemas
2024-06-06 10:14:28 +01:00
ochafik
f8db47814b
json: proper paren fix
2024-06-06 10:14:28 +01:00
ochafik
a381deb1b6
json: fix missing paren min/max bug
2024-06-06 10:14:28 +01:00
ochafik
af63f4fb27
json: handle negative min / max integer bounds
2024-06-06 10:14:28 +01:00
ochafik
c37c484029
json: min + max integer constraints
2024-06-06 10:14:28 +01:00
ochafik
d69ccb06a4
json: fix min 0
2024-06-06 10:14:28 +01:00
ochafik
057bbdc1f3
json: support minimum for positive integer values
2024-06-06 10:14:28 +01:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator ( #6640 )
...
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Joan Fontanals
f5d7b268ec
llama : add jina v2 base code ( #7596 )
...
* feat: add changes to handle jina v2 base code
* fix: do not complicate things
* fix: fix the usage of the code model
* fix: fix comments
* fix: fix linting issues
* fix: remove ollama patches
* style : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-06 10:22:41 +03:00
slaren
2d08b7fbb4
docker : build only main and server in their images ( #7782 )
...
* add openmp lib to dockerfiles
* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren
d67caea0d6
docker : add openmp lib ( #7780 )
2024-06-06 08:17:21 +03:00
Galunid
7672adeec7
Fix encoding in python scripts ( #7733 )
2024-06-06 03:07:24 +10:00
Johannes Gäßler
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq ( #7716 )
...
* CUDA: refactor mmq, dmmv, mmvq
* fix out-of-bounds write
* struct for qk, qr, qi
* fix cmake build
* mmq_type_traits
2024-06-05 16:53:00 +02:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox ( #7634 )
...
* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw
9973e81c5c
readme : remove -ins ( #7759 )
...
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675
I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
jaime-m-p
c90dbe026b
Fix per token atrributes bits ( #7749 )
2024-06-05 01:26:14 +02:00
agray3
b90dc566c1
Allow number of nodes in CUDA graph to change ( #7738 )
...
Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.
2024-06-04 22:06:49 +02:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL ( #7735 )
...
ggml-ci
2024-06-04 21:23:20 +03:00
Georgi Gerganov
0cd6bd3483
llama : remove beam search ( #7736 )
2024-06-04 21:23:05 +03:00
Georgi Gerganov
5ca0944a15
readme : remove obsolete Zig instructions ( #7471 )
2024-06-04 19:43:01 +03:00
slaren
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe ( #7722 )
...
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
Daniele
987d743d6b
Improve hipBLAS support in CMake ( #7696 )
...
* Improve hipBLAS support in CMake
This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.
* Set ROCM_PATH correctly
2024-06-04 14:09:15 +02:00
zhouwg
b226c1227b
refine .gitignore ( #7688 )
...
This adds tags and android ndk into the git ignore list
2024-06-04 21:21:26 +10:00
jaime-m-p
3b38d48609
Per token attributes ( #7685 )
...
* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c
)
* Replace llama_token_type with llama_token_attribs
2024-06-04 09:17:17 +02:00