llama.cpp

Author	SHA1	Message	Date
Concedo	8dd8ab1659	Various enhancement and integration pygmalion.cpp	2023-04-03 00:04:43 +08:00
Concedo	3f4967b827	added new binaries	2023-04-02 17:14:38 +08:00
Concedo	bb965cc120	Merge branch 'master' into concedo # Conflicts: # README.md	2023-04-02 17:13:28 +08:00
Concedo	9aabb0d9db	massive refactor completed, GPT-J integrated	2023-04-02 17:03:30 +08:00
Leonardo Neumann	6e7801d08d	examples : add gpt4all script (#658 )	2023-04-02 10:56:20 +03:00
Stephan Walter	81040f10aa	llama : do not allocate KV cache for "vocab_only == true" (#682 ) Fixes sanitizer CI	2023-04-02 10:18:53 +03:00
Fabian	c4f89d8d73	make : use -march=native -mtune=native on x86 (#609 )	2023-04-02 10:17:05 +03:00
Murilo Santana	5b70e7de4c	fix default params for examples/main (#697 )	2023-04-02 04:41:12 +02:00
Concedo	b1f08813e3	added support for gpt4all original format	2023-04-02 00:53:46 +08:00
Ikko Eltociear Ashimine	a717cba844	py: huggingface -> Hugging Face (#686 )	2023-04-01 18:38:18 +02:00
rimoliga	d0a7f742e7	readme: replace termux links with homepage, play store is deprecated (#680 )	2023-04-01 16:57:30 +02:00
Slaren	0d054e292e	Show error message when -f fails	2023-04-01 16:08:40 +02:00
Concedo	085a9f90a7	still refactoring	2023-04-01 11:56:34 +08:00
Concedo	6e6125ebdb	updated pyinstaller to clean temp dir,removed warning flags from makefile because they are just clutter.	2023-04-01 09:25:41 +08:00
Concedo	9ab6e87b58	Merge branch 'master' into concedo # Conflicts: # CMakeLists.txt	2023-04-01 09:05:45 +08:00
Concedo	801b178f2a	still refactoring, but need a checkpoint to prepare build for 1.0.7	2023-04-01 08:55:14 +08:00
Stephan Walter	3525899277	Enable -std= for cmake builds, fix warnings (#598 )	2023-03-31 19:19:16 +00:00
Concedo	6b86f5ea22	halfway refactoring, wip adding other model types	2023-04-01 01:13:05 +08:00
slaren	1d08882afa	Optimize AVX2 ggml_vec_dot_q4_0 (#642 )	2023-03-31 15:55:52 +00:00
perserk	02c5b27e91	Add AVX acceleration (#617 ) * ggml : add AVX quantize_row_q4_0() * ggml : add AVX ggml_vec_dot_q4_0() * ggml : refactor AVX part of ggml_vec_dot_q4_0() https://github.com/ggerganov/llama.cpp/pull/617#issuecomment-1489985645	2023-03-31 13:55:44 +02:00
Concedo	56949197fe	added HF converter base	2023-03-31 19:10:21 +08:00
Concedo	17044257a0	Merge branch 'master' into concedo	2023-03-31 19:04:47 +08:00
Concedo	559a1967f7	Backwards compatibility formats all done Merge branch 'master' into concedo # Conflicts: # CMakeLists.txt # README.md # llama.cpp	2023-03-31 19:01:33 +08:00
Concedo	9eab39fe6d	prepare legacy functions (+1 squashed commits) Squashed commits: [8bc8d0d] prepare for big merge	2023-03-31 17:45:49 +08:00
Pavol Rusnak	cbef542879	py : cleanup the code - use f-strings where possible - drop first param of encode/decode functions since "utf-8" is the default	2023-03-31 10:32:01 +02:00
Concedo	79f9743347	improved console info, fixed utf encoding bugs	2023-03-31 15:38:38 +08:00
Pavol Rusnak	9733104be5	drop quantize.py (now that models are using a single file)	2023-03-31 01:07:32 +02:00
Georgi Gerganov	3df890aef4	readme : update supported models	2023-03-30 22:31:54 +03:00
Justine Tunney	ee0c40dd6d	Introduce GGML migration tool for new file format If you deleted your old Meta LLaMA .pth files, then the migrate-ggml-2023-03-30-pr613.py script will allow you to convert your old ggml files into the new mmap()'able format. See #613	2023-03-30 12:28:25 -07:00
Justine Tunney	6f23ba5ee2	Ensure --mlock works properly with mmap() support	2023-03-30 12:28:25 -07:00
Justine Tunney	78ca9838ee	Make loading weights 10-100x faster This is a breaking change that's going to give you three benefits: 1. Your inference commands should load 100x faster 2. You may be able to safely load models 2x larger 3. You can run many concurrent inference processes This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o. The new file format supports single-file models like LLaMA 7b, and it also supports multi-file models like LLaMA 13B. Our Python tool now merges the foo.1, foo.2, etc. files back into a single file so that the C++ code which maps it doesn't need to reshape data every time. That's made llama.cpp so much simpler. Much of its load code has now been deleted. Furthermore, this change ensures that tensors are aligned properly on a 32-byte boundary. That opens the door to seeing if we can get additional performance gains on some microprocessors, by using ops that require memory alignment. Lastly note that both POSIX and the Windows platform are supported Fixes #91	2023-03-30 12:28:25 -07:00
Slaren	a017390358	Initial windows support (untested)	2023-03-30 12:28:25 -07:00
Slaren	ac184d5147	Always initialize mm_addr and mm_length in llama_model	2023-03-30 12:28:25 -07:00
Slaren	276e5b7811	Unmap the file in llama_free	2023-03-30 12:28:25 -07:00
Slaren	d68c5dc435	Make mmap_file static	2023-03-30 12:28:25 -07:00
Slaren	64bde3ffd4	Fix ggml_init_params in quantize	2023-03-30 12:28:25 -07:00
Slaren	c03ae8dca1	Add mmap support for model files	2023-03-30 12:28:25 -07:00
Stephan Walter	3bcc129ba8	cmake : properly invoke CTest (#629 )	2023-03-30 20:56:59 +03:00
Casey Primozic	a4755cf288	Remove unused variable (#607 ) * It seems some new warning were added recently that exposed this. I wrote the code that included this unused variable originally and it is indeed not needed.	2023-03-30 17:53:35 +00:00
david raistrick	1f0414feec	make : fix darwin f16c flags check (#615 ) ...there was no check. ported upstream from https://github.com/zanussbaum/gpt4all.cpp/pull/2 (I dont see any clean path for upstream patches)	2023-03-30 20:34:45 +03:00
Georgi Gerganov	77efdf5a50	ggml : fix NEON signs (close #620 , #622 )	2023-03-30 20:27:32 +03:00
slaren	ed3c680bcd	Fix GGML_F32Cx8_STORE in AVX without F16C path (#619 )	2023-03-30 11:16:30 +02:00
Concedo	354d4f232f	fixed linux openblas build errors	2023-03-30 11:55:35 +08:00
Concedo	977a9a246f	Merge remote-tracking branch 'origin/master' into concedo # Conflicts: # .github/workflows/build.yml # README.md	2023-03-30 09:42:51 +08:00
Concedo	0f5b470c04	more library checks	2023-03-30 09:28:04 +08:00
anzz1	9cbc404ba6	ci : re-enable AVX512 testing (Windows-MSVC) (#584 ) * CI: Re-enable AVX512 testing (Windows-MSVC) Now with 100% less base64 encoding * plain __cpuid is enough here	2023-03-29 23:44:39 +03:00
Georgi Gerganov	b51c717d5c	ggml : init time on first ggml_init() call	2023-03-29 22:15:34 +03:00
Georgi Gerganov	0ba76c1e73	llama : fix compile warnings when reading the vocab	2023-03-29 22:13:12 +03:00
Georgi Gerganov	cea1c85948	ggml : add ARM_NEON dequantize_row_q4_1()	2023-03-29 22:10:01 +03:00
Georgi Gerganov	f202ada131	ggml : add ARM_NEON quantize_row_q4_1()	2023-03-29 22:03:07 +03:00

1 2 3 4 5 ...

355 commits