Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								17880771ad 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-10-04 18:50:25 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1bb8a64ebf 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-10-03 21:17:49 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c83ad6d01e 
								
							 
						 
						
							
							
								
								ggml-backend : add device and backend reg interfaces ( #9707 )  
							
							... 
							
							
							
							Co-authored-by: Johannes Gäßler <johannesg@5d6.de> 
							
						 
						
							2024-10-03 01:49:47 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f1b8c42711 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-10-01 16:09:42 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d0b1d663e4 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-09-29 21:16:07 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bb5f819975 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-09-24 11:01:18 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								4301535326 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-09-20 21:15:05 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0d2f22e45c 
								
							 
						 
						
							
							
								
								scripts : verify py deps at the start of compare ( #9520 )  
							
							
							
						 
						
							2024-09-18 18:34:32 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								385decbd63 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-09-08 11:05:55 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								60a3107ccd 
								
							 
						 
						
							
							
								
								scripts : option to increase git patch context  
							
							
							
						 
						
							2024-09-08 11:05:55 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								231cff5f6f 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-08-27 22:41:27 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4305b57c80 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-08-09 10:03:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afd27f01fe 
								
							 
						 
						
							
							
								
								scripts : sync cann files ( #0 )  
							
							
							
						 
						
							2024-08-08 14:56:52 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								366d486c16 
								
							 
						 
						
							
							
								
								scripts : fix sync filenames ( #0 )  
							
							
							
						 
						
							2024-08-08 14:40:12 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e44a561ab0 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-08-08 13:19:47 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								5587e57a76 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-08-05 08:50:57 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5e2727fe03 
								
							 
						 
						
							
							
								
								scripts : sync vulkan-shaders ( #0 )  
							
							
							
						 
						
							2024-07-27 18:08:47 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								56f20aa25d 
								
							 
						 
						
							
							
								
								scripts : sync ggml-aarch64 sources  
							
							
							
						 
						
							2024-07-27 18:07:33 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								ae7985cd7b 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-07-27 17:43:44 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3f2d538b81 
								
							 
						 
						
							
							
								
								scripts : fix sync for sycl  
							
							
							
						 
						
							2024-07-08 13:51:31 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								2ee44c9a18 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-07-08 12:23:00 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									compilade 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3fd62a6b1c 
								
							 
						 
						
							
							
								
								py : type-check all Python scripts with Pyright ( #8341 )  
							
							... 
							
							
							
							* py : type-check all Python scripts with Pyright
* server-tests : use trailing slash in openai base_url
* server-tests : add more type annotations
* server-tests : strip "chat" from base_url in oai_chat_completions
* server-tests : model metadata is a dict
* ci : disable pip cache in type-check workflow
The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.
* py : fix new type errors from master branch
* tests : fix test-tokenizer-random.py
Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.
* ci : only show warnings and errors in python type-check
The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors. 
							
						 
						
							2024-07-07 15:04:39 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e235b267a2 
								
							 
						 
						
							
							
								
								py : switch to snake_case ( #8305 )  
							
							... 
							
							
							
							* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net> 
							
						 
						
							2024-07-05 07:53:33 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ditsuke 
								
							 
						 
						
							
							
							
							
								
							
							
								821922916f 
								
							 
						 
						
							
							
								
								fix: Update script paths in CI scripts  
							
							
							
						 
						
							2024-07-04 15:39:13 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Clint Herron 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07a3fc0608 
								
							 
						 
						
							
							
								
								Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. ( #8258 )  
							
							
							
						 
						
							2024-07-02 12:18:10 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c70d117c37 
								
							 
						 
						
							
							
								
								scripts : fix filename sync  
							
							
							
						 
						
							2024-06-26 23:25:22 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f2d48fffde 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-06-26 19:39:19 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3f65429c4 
								
							 
						 
						
							
							
								
								llama : reorganize source code + improve CMake ( #8006 )  
							
							... 
							
							
							
							* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2024-06-26 18:33:02 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									jaime-m-p 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								37bef89433 
								
							 
						 
						
							
							
								
								tokenizer : BPE fixes ( #7530 )  
							
							... 
							
							
							
							* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t 
							
						 
						
							2024-06-18 18:40:52 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5326bcceeb 
								
							 
						 
						
							
							
								
								ggml : sync  
							
							
							
						 
						
							2024-06-18 09:50:45 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1c641e6aac 
								
							 
						 
						
							
							
								
								build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )  
							
							... 
							
							
							
							* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4 
							
						 
						
							2024-06-13 00:41:52 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1442677f92 
								
							 
						 
						
							
							
								
								common : refactor cli arg parsing ( #7675 )  
							
							... 
							
							
							
							* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params 
							
						 
						
							2024-06-04 21:23:39 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								554c247caf 
								
							 
						 
						
							
							
								
								ggml : remove OpenCL ( #7735 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-06-04 21:23:20 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								adc9ff3841 
								
							 
						 
						
							
							
								
								llama-bench : allow using a different printer for stderr with -oe ( #7722 )  
							
							... 
							
							
							
							compare-commits.sh : hide stdout, use -oe to print markdown 
							
						 
						
							2024-06-04 14:32:42 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c8047d538f 
								
							 
						 
						
							
							
								
								scripts: update compare_llama_bench.py [no ci] ( #7673 )  
							
							
							
						 
						
							2024-05-31 16:26:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Galunid 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9c4c9cc83f 
								
							 
						 
						
							
							
								
								Move convert.py to examples/convert-legacy-llama.py ( #7430 )  
							
							... 
							
							
							
							* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes 
							
						 
						
							2024-05-30 21:40:00 +10:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00281b7be3 
								
							 
						 
						
							
							
								
								scripts : remove mpi remnants  
							
							
							
						 
						
							2024-05-29 14:31:18 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2ab977282b 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-05-29 14:29:52 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d359f30921 
								
							 
						 
						
							
							
								
								llama : remove MPI backend ( #7395 )  
							
							
							
						 
						
							2024-05-20 01:17:03 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									jaime-m-p 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b43272afa2 
								
							 
						 
						
							
							
								
								Unicode codepoint flags for custom regexs ( #7245 )  
							
							... 
							
							
							
							* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM 
							
						 
						
							2024-05-18 01:09:13 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Brian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								51e9d02599 
								
							 
						 
						
							
							
								
								Added a single test function script and fix debug-test.sh to be more robust ( #7279 )  
							
							... 
							
							
							
							* run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust
* debug-test.sh: combined execute and gdb test mode via -g flag
* debug-test.sh: refactor
* debug-test: refactor for clarity
* debug-test.sh: comment style changes
* debug-test.sh: fix gdb 
							
						 
						
							2024-05-17 22:40:14 +10:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								29499bb593 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-05-15 13:23:41 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f773486ab 
								
							 
						 
						
							
							
								
								script : sync ggml-rpc  
							
							
							
						 
						
							2024-05-14 19:14:38 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								a5e3fde857 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-05-14 19:08:09 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7bd4ffb780 
								
							 
						 
						
							
							
								
								metal : fix warnings (skipme) ( #0 )  
							
							
							
						 
						
							2024-05-11 21:38:13 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1622ac023f 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-05-11 21:35:05 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Josh Ramer 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fed0108491 
								
							 
						 
						
							
							
								
								Scripting & documenting debugging one test without anything else in the loop. ( #7096 )  
							
							... 
							
							
							
							* A little documentation that shares my quick tips for working in the repository.
* Update startup-testing-debugging.md
* script that shows a menu of tests to pick from & run the debugger on
* debug-test.sh: Refactor CLI help message
* debug-test.sh: documentation update
* debug-test.sh: CLI Help output corrections
* debug-test.sh: minor doc fix
---------
authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com> 
							
						 
						
							2024-05-12 03:26:35 +10:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
							
							
								
							
							
								fae9d234b6 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-05-11 15:38:34 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e849648888 
								
							 
						 
						
							
							
								
								llama-bench : add pp+tg test type ( #7199 )  
							
							
							
						 
						
							2024-05-10 18:03:54 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									jaime-m-p 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43248e5594 
								
							 
						 
						
							
							
								
								llama3 custom regex split ( #6965 )  
							
							... 
							
							
							
							* merged the changes from deepseeker models to main branch
* Moved regex patterns to unicode.cpp and updated unicode.h
* Moved header files
* Resolved issues
* added and refactored unicode_regex_split and related functions
* Updated/merged the deepseek coder pr
* Refactored code
* Adding unicode regex mappings
* Adding unicode regex function
* Added needed functionality, testing remains
* Fixed issues
* Fixed issue with gpt2 regex custom preprocessor
* unicode : fix? unicode_wstring_to_utf8
* lint : fix whitespaces
* tests : add tokenizer tests for numbers
* unicode : remove redundant headers
* tests : remove and rename tokenizer test scripts
* tests : add sample usage
* gguf-py : reader prints warnings on duplicate keys
* llama : towards llama3 tokenization support (wip)
* unicode : shot in the dark to fix tests on Windows
* unicode : first try custom implementations
* convert : add "tokenizer.ggml.pre" GGUF KV (wip)
* llama : use new pre-tokenizer type
* convert : fix pre-tokenizer type writing
* lint : fix
* make : add test-tokenizer-0-llama-v3
* wip
* models : add llama v3 vocab file
* llama : adapt punctuation regex + add llama 3 regex
* minor
* unicode : set bomb
* unicode : set bomb
* unicode : always use std::wregex
* unicode : support \p{N}, \p{L} and \p{P} natively
* unicode : try fix windows
* unicode : category support via std::regex
* unicode : clean-up
* unicode : simplify
* llama3 custom regex split
* convert : add convert-hf-to-gguf-update.py
ggml-ci
* lint : update
* convert : add falcon
ggml-ci
* unicode : normalize signatures
* lint : fix
* lint : fix
* convert : remove unused functions
* convert : add comments
* convert : exercise contractions
ggml-ci
* Using char32_t for codepoints
* lint : fix
* already exists unicode_tolower()
* Typing
* Restore BOM
* cmake : refactor test targets
* tests : refactor vocab tests
ggml-ci
* tests : add more vocabs and tests
ggml-ci
* unicode : cleanup
* scripts : ignore new update script in check-requirements.sh
* Fix merge
* models : add phi-3, mpt, gpt-2, starcoder
* tests : disable obsolete
ggml-ci
* tests : use faster bpe test
ggml-ci
* llama : more prominent warning for old BPE models
* tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
* Move unused variable value
* GPT2 custom regex split
* Add alternative regex for custom aplit llama3
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Style
* Add bruteforce random tests for token encoding
* wip: fixing unicode codepoint ranges
* Fix merge
* Unicode tables: separator, lowercase, uppercase and whitespace
* llama3 custom regex split: fix \s
* Restore BOM
* Style
* wip: generate NDF table
* Ignore special tokens for testing
* Clean gen-unicode-data.py
* Refactor random tokenizer test
* lint : fix
* tests : add fail test for llama-bpe
---------
Co-authored-by: Jaggzh <jaggz.h@gmail.com>
Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: jaime-m-p <> 
							
						 
						
							2024-05-09 23:30:44 +10:00