Commit graph

314 commits

Author SHA1 Message Date
Concedo
79f9743347 improved console info, fixed utf encoding bugs 2023-03-31 15:38:38 +08:00
Concedo
354d4f232f fixed linux openblas build errors 2023-03-30 11:55:35 +08:00
Concedo
977a9a246f Merge remote-tracking branch 'origin/master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	README.md
2023-03-30 09:42:51 +08:00
Concedo
0f5b470c04 more library checks 2023-03-30 09:28:04 +08:00
anzz1
9cbc404ba6
ci : re-enable AVX512 testing (Windows-MSVC) (#584)
* CI: Re-enable AVX512 testing (Windows-MSVC)

Now with 100% less base64 encoding

* plain __cpuid is enough here
2023-03-29 23:44:39 +03:00
Georgi Gerganov
b51c717d5c
ggml : init time on first ggml_init() call 2023-03-29 22:15:34 +03:00
Georgi Gerganov
0ba76c1e73
llama : fix compile warnings when reading the vocab 2023-03-29 22:13:12 +03:00
Georgi Gerganov
cea1c85948
ggml : add ARM_NEON dequantize_row_q4_1() 2023-03-29 22:10:01 +03:00
Georgi Gerganov
f202ada131
ggml : add ARM_NEON quantize_row_q4_1() 2023-03-29 22:03:07 +03:00
Georgi Gerganov
3b44d30d9b
ggml : add ARM_NEON ggml_vec_dot_q4_1() 2023-03-29 22:03:07 +03:00
Pavol Rusnak
61cbfff5c9
rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600)
to match filenames of other converters
2023-03-29 20:09:25 +02:00
Thérence
d9ad104440
Create chat-13B.bat (#592)
* Create chat-13B.bat

Same script than chat-13B.sh, but for windows users.
Tested and working on windows 10/11 v 22H2

* Apply suggestions from code review

---------

Co-authored-by: anzz1 <anzz1@live.com>
2023-03-29 20:21:09 +03:00
Concedo
d8febc8653 renamed main python script 2023-03-30 00:48:44 +08:00
Concedo
664b277c27 integrated libopenblas for greatly accelerated prompt processing. Windows binaries are included - feel free to build your own or to build for other platforms, but that is beyond the scope of this repo. Will fall back to non-blas if libopenblas is removed. 2023-03-30 00:43:52 +08:00
Georgi Gerganov
b467702b87
readme : fix typos 2023-03-29 19:38:31 +03:00
Georgi Gerganov
516d88e75c
readme : add GPT4All instructions (close #588) 2023-03-29 19:37:20 +03:00
Georgi Gerganov
53635c081c
py : add GPT4All conversion script
For now: copy-paste
Too much time for me to deduplicate the python code
2023-03-29 19:29:52 +03:00
Maël Kerbiriou
41318d708e
llama : use the same threshold for OpenBLAS and ggml thread limiting (#577) 2023-03-29 19:10:07 +03:00
Tobias Lütke
a6956b25a1
add example of re-act pattern (#583)
* add example of re-act pattern

* spelling...

* fixed whitespace in reverse prompt issue
2023-03-29 10:10:24 -05:00
anzz1
83df5639eb
Fix GCC warning about binary literal (#595)
0b10101010 -> 0xAA /* 0b10101010 */
2023-03-29 13:20:07 +00:00
anzz1
a5c42c4b13
Fix typo in llama.h (#593) 2023-03-29 13:19:29 +00:00
Concedo
49c4c225b5 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
2023-03-29 21:08:03 +08:00
Concedo
271307232c Merged PR with a few changes:
- Thread count set equal to cpu_count() if it's < 6, otherwise set to cpu_count()-2 instead. This can be forcibly overwritten by the --threads parameter. Setting all threads=cpu_count() chokes my own PC and slows it down badly, so I'd rather make it optional.

- Added localmodehost as a URL parameter in Kobold Lite instead, to avoid monkeypatching the embedded kobold lite directly. It should be parsed via ?localmodehost=(host). Also your updated klite file has the wrong encoding, it should be UTF-8, some of the symbols are incorrect such as the palette icon in settings. Repackaged the new version of Kobold Lite correctly with changes.

- Reverting the TK GUI filedialog if no model is provided, because I want to keep it noob friendly for those who don't know how to use command line args. The file dialog only loads if there are no command line args. If command line args are present, the GUI will not trigger.

- Modified the argparser to also take positional arguments for backwards compatibility, in addition to the optional argparse flags specified.

- Your code does not work if embedded kobold is removed. The embedded KAI variable was not declared in the correct scope, and also Python f-string formatted variables cannot work with raw byte strings. You also have incorrect indentation when returning the response body - have corrected all the above but please do test all codepaths if possible.

- There is a good reason to bind to "" (0.0.0.0) instead of a specific IP. It allows receiving requests from all routable interfaces. I don't know why you need an explicitly defined --host flag, but I will leave it there as an optional parameter, though the default should still be to accept from all interfaces. In that way, even if the displayed url is localhost, connecting via 192.168.x.x will also work, for example.
2023-03-29 20:38:57 +08:00
InconsolableCellist
13b4c05d66 Some more code cleanup 2023-03-28 16:59:27 -06:00
anzz1
5a5f8b1501
Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375)
* Enable Fused-Multiply-Add (FMA) instructions on MSVC

__FMA__ macro does not exist in MSVC

* Enable F16C/CVT16 vector extensions on MSVC

__F16C__ macro does not exist in MSVC, but is implied with AVX2/AVX512

* MSVC cvt intrinsics

* Add __SSE3__ macro for MSVC too because why not

even though it's not currently used for anything when AVX is defined
2023-03-28 22:44:29 +03:00
anzz1
f1217055ea
CI: fix subdirectory path globbing (#546)
- Changes in subdirectories will now be detecter properly
- (Windows-MSVC) AVX512 tests temporarily disabled
2023-03-28 22:43:25 +03:00
InconsolableCellist
13addf2a78 Merge branch 'concedo' of github.com:InconsolableCellist/llamacpp-for-kobold into concedo 2023-03-28 13:43:19 -06:00
InconsolableCellist
f7c905b0d0 Minor overhaul of code:
* Set number of utilized llama.cpp threads back to os.cpu_count, which
  had better performance on my machine (20 threads vs. 6, 3m12s vs.
  4m42s on 65B)

* Using argparse for command line args

* Supports binding to a specific interface, for use on LANs/WANs (no
  longer limited to just 127.0.0.1). Requires modified klite.embd

* General code cleanup and passing some parameters around without
  globals
2023-03-28 13:39:34 -06:00
InconsolableCellist
003365907d updating to version 17 of embedded koboldAI, and adding host address support 2023-03-28 13:39:10 -06:00
anzz1
7f4c5c6651
llama : fix linkage with mingw (#551)
* Revert 7e53955 (#542)

Still needs to be fixed properly

* Fix linking on mingw32
2023-03-28 21:23:09 +03:00
slaren
2a98bc18ea
ggml : add AVX2 implementation of quantize_row_q4_1 (#515)
* Add AVX2 implementation of quantize_row_q4_1

* Actually use AVX2

* Make quantize_row_q4_1 static

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 21:06:03 +03:00
thement
d0aaff571c
py : add temporary script to convert old ggml files to newer version (#539)
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-28 20:55:42 +03:00
Tai Duc Nguyen
d0330fd783
py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403) 2023-03-28 20:51:29 +03:00
Stephan Walter
99c5b27654
ggml : refactor quantized processing functions (#509)
* Refactor quantized processing functions

* ggml : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 20:13:01 +03:00
DooWoong Lee (David)
692ce3164e
py : removed unused model variable and verified that the code functions correctly with vocab_only setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) 2023-03-28 20:02:34 +03:00
Georgi Gerganov
96f9c0506f
ci : make ctest verbose, hopefully we see what is wrong with the sanitizer 2023-03-28 20:01:09 +03:00
Georgi Gerganov
d502bc7c9d
tests : free llama context at the end of the test 2023-03-28 19:51:55 +03:00
Stephan Walter
436e561931
all : be more strict about converting float to double (#458)
* Be more strict about converting float to double

* Test equivalence of round, SILU implementations

Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.

* Fix softmax in perplexity.cpp

* all : prefer float over double where appropriate

* perplexity : add <cmath>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 19:48:20 +03:00
Jed Fox
20e1e84884
deploy : add a Package.swift for SwiftPM support (#393)
* Add a Package.swift for SwiftPM support

* Swap from exclusions to allowlist
2023-03-28 19:39:01 +03:00
Stephan Walter
c1f885067c
ggml : introduce structs for the q4 data blocks (#356)
* Introduce structs for the q4 data blocks

* ggml : rename quant struct variables + fix ARM_NEON

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 18:56:03 +03:00
Georgi Gerganov
e0670260fb
gitignore : add "embedding" 2023-03-28 18:34:35 +03:00
dotpy314
28ba975aea
Check the existence of f16_model_path_base in quantize.py (#574)
Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com>
2023-03-28 18:06:28 +03:00
slaren
a6bdc47cba
Fix usage of F16C intrinsics in AVX code (#563)
* Fix usage of F16C intrinsics in AVX code when F16C is not defined
2023-03-28 17:26:55 +03:00
anzz1
7b8dbcb78b
main.cpp fixes, refactoring (#571)
- main: entering empty line passes back control without new input in interactive/instruct modes
- instruct mode: keep prompt fix
- instruct mode: duplicate instruct prompt fix
- refactor: move common console code from main->common
2023-03-28 17:09:55 +03:00
Concedo
bf30406f50 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
2023-03-28 17:13:38 +08:00
RJ Adriaansen
4b8efff0e3
Add embedding example to Makefile (#540) 2023-03-28 09:11:09 +03:00
Concedo
46ddbb22bf allow url params 2023-03-27 17:40:05 +08:00
Marco Matthies
7e5395575a
Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) 2023-03-27 07:55:26 +03:00
Erik Scholz
34c1072e49
ci: add debug build to sanitizer build matrix (#527) 2023-03-26 15:48:40 +00:00
Stephan Walter
939ad2d3a5
Fix undefined variables in debug build, remove unused variables (#531) 2023-03-26 15:34:02 +00:00