ctx->kv and ctx->infos was reallocated using not-aligned realloc, but freed with aligned free.
to fix this a GGML_ALIGNED_REALLOC was added, but there is no posix_memalign_realloc function.
so on non-windows and non-mingw32 platforms we fall back to aligned malloc, followed by copying
and freeing the old data.
* llama2.c: direct gguf output (WIP)
* Simplify vector building logic
* llama2.c gguf conversion: fix token types in converter
* llama2.c: support copying vocab from a llama gguf model file
* llama2.c: update default path for vocab model + readme
* llama2.c: use defines for gguf keys
* llama2.c: escape whitespaces w/ U+2581 in vocab converter the llama.cpp way
* llama2.c converter: cleanups + take n_ff from config
* Speedup tokenization
On current master it takes ~3.2 seconds to tokenize
Wikitext. With this change it becomes ~525 ms.
* Fixit: it was missing the piece after the last found occurence
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* Make ggml-cuda.cu build with QK_K = 64
Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces
a meaningful result.
* k_quants tuning for Falcon-7b
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* tests : write a Python tokenizer test (wip)
* llama : prefix input text for tokenization with whitespace
* llama : distinguish pieces from decoded text + fix detokenization
* common : add comments
* examples : no longer manually add leading space when tokenizing
* tests : use Python to generate tokenizer tests for C++
* tests : add option to tokenize text files
ggml-ci
* tests : add test-tokenizer-1.py
* llama.cpp : fix LF token
* hellaswag : move the concat space for clarity
* tests : add falcon tests (py + cpp, currently do not pass Unicode)
ggml-ci
* common : temporary separate llama_detokenize calls for SPM and BPE
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
* ci : add lora test
ggml-ci
* move lora summary to the top, add lora logs
ggml-ci
* ci : decrease CPU ppl runs to 2 to avoide 20 min timeout
ggml-ci
* add 7b lora test
use 1 thread for CUDA generation tests
ggml-ci
* add test with q8_0 (cpu only)
ggml-ci
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Allow convert.py to convert to q8_0
Fix issue with bounded_parallel_map and greedy consuming iterator
Display elapsed time during conversion
* Add --concurrency option
Minor improvements to help text
Clean up bounded_parallel_map function a bit
* Massive speed improvement thanks to Cebtenzzre
* Refactor types
The use of special characters within source files can break compiling on some computers with different region and language settings. Using Unicode escape sequences should allow for the code to be compiled on all setups without needing to change your computers settings or switch regions.
* Fix bug in main.cpp where penalize_nl=false has no effect. It modifies the underlying logits array, but at this point we are already working on the candidates copy.
* Suppress redefinition warning for NOMINMAX on mingw. In my installation, this macro is already defined by /usr/lib/gcc/x86_64-w64-mingw32/11/include/c++/x86_64-w64-mingw32/bits/os_defines.h:45.
* main : fix indentation
* main : pass ctx to llama_token_nl()
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Problem
-------
`nix build` fails with missing `Accelerate.h`.
Changes
-------
- Fix build of the llama.cpp with nix for Intel: add the same SDK frameworks as
for ARM
- Add `quantize` app to the output of nix flake
- Extend nix devShell with llama-python so we can use convertScript
Testing
-------
Testing the steps with nix:
1. `nix build`
Get the model and then
2. `nix develop` and then `python convert.py models/llama-2-7b.ggmlv3.q4_0.bin`
3. `nix run llama.cpp#quantize -- open_llama_7b/ggml-model-f16.gguf ./models/ggml-model-q4_0.bin 2`
4. `nix run llama.cpp#llama -- -m models/ggml-model-q4_0.bin -p "What is nix?" -n 400 --temp 0.8 -e -t 8`
Co-authored-by: Volodymyr Vitvitskyi <volodymyrvitvitskyi@SamsungPro.local>
* llama.cpp : fix spm whitespace escaping + clean up
* main.cpp : spm - add whitespace in front of prompt
* test-tokenizer-0.cpp : spm - add whitespace in front of prompt