Commit graph

296 commits

Author SHA1 Message Date
Concedo
0f5b470c04 more library checks 2023-03-30 09:28:04 +08:00
Concedo
d8febc8653 renamed main python script 2023-03-30 00:48:44 +08:00
Concedo
664b277c27 integrated libopenblas for greatly accelerated prompt processing. Windows binaries are included - feel free to build your own or to build for other platforms, but that is beyond the scope of this repo. Will fall back to non-blas if libopenblas is removed. 2023-03-30 00:43:52 +08:00
Concedo
49c4c225b5 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
2023-03-29 21:08:03 +08:00
Concedo
271307232c Merged PR with a few changes:
- Thread count set equal to cpu_count() if it's < 6, otherwise set to cpu_count()-2 instead. This can be forcibly overwritten by the --threads parameter. Setting all threads=cpu_count() chokes my own PC and slows it down badly, so I'd rather make it optional.

- Added localmodehost as a URL parameter in Kobold Lite instead, to avoid monkeypatching the embedded kobold lite directly. It should be parsed via ?localmodehost=(host). Also your updated klite file has the wrong encoding, it should be UTF-8, some of the symbols are incorrect such as the palette icon in settings. Repackaged the new version of Kobold Lite correctly with changes.

- Reverting the TK GUI filedialog if no model is provided, because I want to keep it noob friendly for those who don't know how to use command line args. The file dialog only loads if there are no command line args. If command line args are present, the GUI will not trigger.

- Modified the argparser to also take positional arguments for backwards compatibility, in addition to the optional argparse flags specified.

- Your code does not work if embedded kobold is removed. The embedded KAI variable was not declared in the correct scope, and also Python f-string formatted variables cannot work with raw byte strings. You also have incorrect indentation when returning the response body - have corrected all the above but please do test all codepaths if possible.

- There is a good reason to bind to "" (0.0.0.0) instead of a specific IP. It allows receiving requests from all routable interfaces. I don't know why you need an explicitly defined --host flag, but I will leave it there as an optional parameter, though the default should still be to accept from all interfaces. In that way, even if the displayed url is localhost, connecting via 192.168.x.x will also work, for example.
2023-03-29 20:38:57 +08:00
InconsolableCellist
13b4c05d66 Some more code cleanup 2023-03-28 16:59:27 -06:00
anzz1
5a5f8b1501
Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375)
* Enable Fused-Multiply-Add (FMA) instructions on MSVC

__FMA__ macro does not exist in MSVC

* Enable F16C/CVT16 vector extensions on MSVC

__F16C__ macro does not exist in MSVC, but is implied with AVX2/AVX512

* MSVC cvt intrinsics

* Add __SSE3__ macro for MSVC too because why not

even though it's not currently used for anything when AVX is defined
2023-03-28 22:44:29 +03:00
anzz1
f1217055ea
CI: fix subdirectory path globbing (#546)
- Changes in subdirectories will now be detecter properly
- (Windows-MSVC) AVX512 tests temporarily disabled
2023-03-28 22:43:25 +03:00
InconsolableCellist
13addf2a78 Merge branch 'concedo' of github.com:InconsolableCellist/llamacpp-for-kobold into concedo 2023-03-28 13:43:19 -06:00
InconsolableCellist
f7c905b0d0 Minor overhaul of code:
* Set number of utilized llama.cpp threads back to os.cpu_count, which
  had better performance on my machine (20 threads vs. 6, 3m12s vs.
  4m42s on 65B)

* Using argparse for command line args

* Supports binding to a specific interface, for use on LANs/WANs (no
  longer limited to just 127.0.0.1). Requires modified klite.embd

* General code cleanup and passing some parameters around without
  globals
2023-03-28 13:39:34 -06:00
InconsolableCellist
003365907d updating to version 17 of embedded koboldAI, and adding host address support 2023-03-28 13:39:10 -06:00
anzz1
7f4c5c6651
llama : fix linkage with mingw (#551)
* Revert 7e53955 (#542)

Still needs to be fixed properly

* Fix linking on mingw32
2023-03-28 21:23:09 +03:00
slaren
2a98bc18ea
ggml : add AVX2 implementation of quantize_row_q4_1 (#515)
* Add AVX2 implementation of quantize_row_q4_1

* Actually use AVX2

* Make quantize_row_q4_1 static

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 21:06:03 +03:00
thement
d0aaff571c
py : add temporary script to convert old ggml files to newer version (#539)
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-28 20:55:42 +03:00
Tai Duc Nguyen
d0330fd783
py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403) 2023-03-28 20:51:29 +03:00
Stephan Walter
99c5b27654
ggml : refactor quantized processing functions (#509)
* Refactor quantized processing functions

* ggml : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 20:13:01 +03:00
DooWoong Lee (David)
692ce3164e
py : removed unused model variable and verified that the code functions correctly with vocab_only setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) 2023-03-28 20:02:34 +03:00
Georgi Gerganov
96f9c0506f
ci : make ctest verbose, hopefully we see what is wrong with the sanitizer 2023-03-28 20:01:09 +03:00
Georgi Gerganov
d502bc7c9d
tests : free llama context at the end of the test 2023-03-28 19:51:55 +03:00
Stephan Walter
436e561931
all : be more strict about converting float to double (#458)
* Be more strict about converting float to double

* Test equivalence of round, SILU implementations

Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.

* Fix softmax in perplexity.cpp

* all : prefer float over double where appropriate

* perplexity : add <cmath>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 19:48:20 +03:00
Jed Fox
20e1e84884
deploy : add a Package.swift for SwiftPM support (#393)
* Add a Package.swift for SwiftPM support

* Swap from exclusions to allowlist
2023-03-28 19:39:01 +03:00
Stephan Walter
c1f885067c
ggml : introduce structs for the q4 data blocks (#356)
* Introduce structs for the q4 data blocks

* ggml : rename quant struct variables + fix ARM_NEON

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 18:56:03 +03:00
Georgi Gerganov
e0670260fb
gitignore : add "embedding" 2023-03-28 18:34:35 +03:00
dotpy314
28ba975aea
Check the existence of f16_model_path_base in quantize.py (#574)
Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com>
2023-03-28 18:06:28 +03:00
slaren
a6bdc47cba
Fix usage of F16C intrinsics in AVX code (#563)
* Fix usage of F16C intrinsics in AVX code when F16C is not defined
2023-03-28 17:26:55 +03:00
anzz1
7b8dbcb78b
main.cpp fixes, refactoring (#571)
- main: entering empty line passes back control without new input in interactive/instruct modes
- instruct mode: keep prompt fix
- instruct mode: duplicate instruct prompt fix
- refactor: move common console code from main->common
2023-03-28 17:09:55 +03:00
Concedo
bf30406f50 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
2023-03-28 17:13:38 +08:00
RJ Adriaansen
4b8efff0e3
Add embedding example to Makefile (#540) 2023-03-28 09:11:09 +03:00
Concedo
46ddbb22bf allow url params 2023-03-27 17:40:05 +08:00
Marco Matthies
7e5395575a
Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) 2023-03-27 07:55:26 +03:00
Erik Scholz
34c1072e49
ci: add debug build to sanitizer build matrix (#527) 2023-03-26 15:48:40 +00:00
Stephan Walter
939ad2d3a5
Fix undefined variables in debug build, remove unused variables (#531) 2023-03-26 15:34:02 +00:00
Juan Calderon-Perez
8c2ec5e21d
Add support for linux/arm64 platform during Docker Builds (#514)
* Add support for linux/arm64 platform

* Add platform to versioned builds
2023-03-26 14:48:42 +00:00
Stephan Walter
b391579db9
Update README and comments for standalone perplexity tool (#525) 2023-03-26 16:14:01 +03:00
anzz1
7a87d31f4f
[main] fix infinite generation (-n == -1) (#523) 2023-03-26 16:06:10 +03:00
Georgi Gerganov
348d6926ee
Add logo to README.md 2023-03-26 10:20:49 +03:00
Concedo
053b20c8ca merged complete 2023-03-26 14:55:43 +08:00
Concedo
33b5d2c376 Merge branch 'master' into concedo 2023-03-26 14:52:14 +08:00
Concedo
57474944d6 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
2023-03-26 14:52:08 +08:00
Harald Fernengel
33e35b8fe8
Exit from interactive mode if input stream is bad (#491)
Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z
on Windows.
2023-03-26 08:25:46 +03:00
anzz1
19726169b3
CI: Run other sanitizer builds even if one fails (#511)
applies only to sanitizer builds so they wont be cancelled
2023-03-26 00:13:28 +02:00
jp-x-g
f732695cd5
Clarify console output in convert-pth-to-ggml.py (#512)
"Processing part 1 of 3" instead of "Processing part 0"
2023-03-25 23:53:55 +02:00
anzz1
2f7bf7dd7c
CMake / CI additions (#497)
* CMake: Add AVX512 option

* CI: Add AVX/AVX512 builds (Windows)
(AVX512 tests can only be run when the worker happens to support it, building works anyway)

* CMake: Fix sanitizer linkage ( merged #468 )

* CI: Add sanitizer builds (Ubuntu)

* CI: Fix release tagging
(change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)
2023-03-25 23:38:11 +02:00
anzz1
34ab526843
(Windows) Set console to UTF-8 on init (#420)
Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.
2023-03-25 22:29:22 +02:00
Georgi Gerganov
c2b25b6912
Fix colors enabling on WIN32 2023-03-25 21:53:39 +02:00
Georgi Gerganov
79b2b266db
If n_predict == -1, generate forever 2023-03-25 21:51:41 +02:00
Georgi Gerganov
e2d490dafd
Inifinite generation via context swapping (#71) 2023-03-25 21:36:22 +02:00
Georgi Gerganov
03f7e33560
Cleanup STL headers + fix embedding examples + minor stuff 2023-03-25 20:51:14 +02:00
Georgi Gerganov
55ad42af84
Move chat scripts into "./examples" 2023-03-25 20:37:09 +02:00
slaren
459e93cce0
Add AVX2 implementation of dequantize_row_q4_1 (#505) 2023-03-25 20:31:48 +02:00