Commit graph

275 commits

Author SHA1 Message Date
Concedo
271307232c Merged PR with a few changes:
- Thread count set equal to cpu_count() if it's < 6, otherwise set to cpu_count()-2 instead. This can be forcibly overwritten by the --threads parameter. Setting all threads=cpu_count() chokes my own PC and slows it down badly, so I'd rather make it optional.

- Added localmodehost as a URL parameter in Kobold Lite instead, to avoid monkeypatching the embedded kobold lite directly. It should be parsed via ?localmodehost=(host). Also your updated klite file has the wrong encoding, it should be UTF-8, some of the symbols are incorrect such as the palette icon in settings. Repackaged the new version of Kobold Lite correctly with changes.

- Reverting the TK GUI filedialog if no model is provided, because I want to keep it noob friendly for those who don't know how to use command line args. The file dialog only loads if there are no command line args. If command line args are present, the GUI will not trigger.

- Modified the argparser to also take positional arguments for backwards compatibility, in addition to the optional argparse flags specified.

- Your code does not work if embedded kobold is removed. The embedded KAI variable was not declared in the correct scope, and also Python f-string formatted variables cannot work with raw byte strings. You also have incorrect indentation when returning the response body - have corrected all the above but please do test all codepaths if possible.

- There is a good reason to bind to "" (0.0.0.0) instead of a specific IP. It allows receiving requests from all routable interfaces. I don't know why you need an explicitly defined --host flag, but I will leave it there as an optional parameter, though the default should still be to accept from all interfaces. In that way, even if the displayed url is localhost, connecting via 192.168.x.x will also work, for example.
2023-03-29 20:38:57 +08:00
InconsolableCellist
13b4c05d66 Some more code cleanup 2023-03-28 16:59:27 -06:00
InconsolableCellist
13addf2a78 Merge branch 'concedo' of github.com:InconsolableCellist/llamacpp-for-kobold into concedo 2023-03-28 13:43:19 -06:00
InconsolableCellist
f7c905b0d0 Minor overhaul of code:
* Set number of utilized llama.cpp threads back to os.cpu_count, which
  had better performance on my machine (20 threads vs. 6, 3m12s vs.
  4m42s on 65B)

* Using argparse for command line args

* Supports binding to a specific interface, for use on LANs/WANs (no
  longer limited to just 127.0.0.1). Requires modified klite.embd

* General code cleanup and passing some parameters around without
  globals
2023-03-28 13:39:34 -06:00
InconsolableCellist
003365907d updating to version 17 of embedded koboldAI, and adding host address support 2023-03-28 13:39:10 -06:00
Concedo
bf30406f50 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
2023-03-28 17:13:38 +08:00
RJ Adriaansen
4b8efff0e3
Add embedding example to Makefile (#540) 2023-03-28 09:11:09 +03:00
Concedo
46ddbb22bf allow url params 2023-03-27 17:40:05 +08:00
Marco Matthies
7e5395575a
Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) 2023-03-27 07:55:26 +03:00
Erik Scholz
34c1072e49
ci: add debug build to sanitizer build matrix (#527) 2023-03-26 15:48:40 +00:00
Stephan Walter
939ad2d3a5
Fix undefined variables in debug build, remove unused variables (#531) 2023-03-26 15:34:02 +00:00
Juan Calderon-Perez
8c2ec5e21d
Add support for linux/arm64 platform during Docker Builds (#514)
* Add support for linux/arm64 platform

* Add platform to versioned builds
2023-03-26 14:48:42 +00:00
Stephan Walter
b391579db9
Update README and comments for standalone perplexity tool (#525) 2023-03-26 16:14:01 +03:00
anzz1
7a87d31f4f
[main] fix infinite generation (-n == -1) (#523) 2023-03-26 16:06:10 +03:00
Georgi Gerganov
348d6926ee
Add logo to README.md 2023-03-26 10:20:49 +03:00
Concedo
053b20c8ca merged complete 2023-03-26 14:55:43 +08:00
Concedo
33b5d2c376 Merge branch 'master' into concedo 2023-03-26 14:52:14 +08:00
Concedo
57474944d6 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
2023-03-26 14:52:08 +08:00
Harald Fernengel
33e35b8fe8
Exit from interactive mode if input stream is bad (#491)
Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z
on Windows.
2023-03-26 08:25:46 +03:00
anzz1
19726169b3
CI: Run other sanitizer builds even if one fails (#511)
applies only to sanitizer builds so they wont be cancelled
2023-03-26 00:13:28 +02:00
jp-x-g
f732695cd5
Clarify console output in convert-pth-to-ggml.py (#512)
"Processing part 1 of 3" instead of "Processing part 0"
2023-03-25 23:53:55 +02:00
anzz1
2f7bf7dd7c
CMake / CI additions (#497)
* CMake: Add AVX512 option

* CI: Add AVX/AVX512 builds (Windows)
(AVX512 tests can only be run when the worker happens to support it, building works anyway)

* CMake: Fix sanitizer linkage ( merged #468 )

* CI: Add sanitizer builds (Ubuntu)

* CI: Fix release tagging
(change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)
2023-03-25 23:38:11 +02:00
anzz1
34ab526843
(Windows) Set console to UTF-8 on init (#420)
Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.
2023-03-25 22:29:22 +02:00
Georgi Gerganov
c2b25b6912
Fix colors enabling on WIN32 2023-03-25 21:53:39 +02:00
Georgi Gerganov
79b2b266db
If n_predict == -1, generate forever 2023-03-25 21:51:41 +02:00
Georgi Gerganov
e2d490dafd
Inifinite generation via context swapping (#71) 2023-03-25 21:36:22 +02:00
Georgi Gerganov
03f7e33560
Cleanup STL headers + fix embedding examples + minor stuff 2023-03-25 20:51:14 +02:00
Georgi Gerganov
55ad42af84
Move chat scripts into "./examples" 2023-03-25 20:37:09 +02:00
slaren
459e93cce0
Add AVX2 implementation of dequantize_row_q4_1 (#505) 2023-03-25 20:31:48 +02:00
Georgi Gerganov
a316a425d0
Overhaul the examples structure
- main -> examples
- utils -> examples (renamed to "common")
- quantize -> examples
- separate tools for "perplexity" and "embedding"

Hope I didn't break something !
2023-03-25 20:26:40 +02:00
Georgi Gerganov
ecbe466a36
Retire the ggml_mul_mat() branch for transposed src0 (#500)
* Retire the ggml_mul_mat() for transposed src0

- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads

* SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502)

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON

* Fix dequantization - forgot to interleave the quants
2023-03-25 19:47:21 +02:00
Georgi Gerganov
502a400192
Disable prompt verbosity by default and add option to enable (#480) 2023-03-25 17:17:16 +02:00
slaren
09aecbf628
Add AVX2 implementation of dequantize_row_q4_0 (#467) 2023-03-25 17:06:49 +02:00
Georgi Gerganov
4640eff23d
Don't interefe with BLAS for large prompts by running only 1 thread 2023-03-25 17:03:10 +02:00
Georgi Gerganov
ab77d76312
Add longer DAN prompt for testing big batch numbers 2023-03-25 16:49:09 +02:00
slaren
29b7baab67
Add timings for the prompt evaluation (#478) 2023-03-25 16:34:23 +02:00
Georgi Gerganov
4a7129acd2
Remove obsolete information from README 2023-03-25 16:30:32 +02:00
Georgi Gerganov
6b6dbc8910
Remove obsolete assert and fix compiler warning 2023-03-25 16:22:05 +02:00
Georgi Gerganov
2a2e63ce05
Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS 2023-03-25 16:10:14 +02:00
anzz1
e899bf54b2
bounds checking for input prefix (#492) 2023-03-25 14:42:09 +02:00
anzz1
fbd4d38c64
feat: '--in-prefix STRING' option (#426)
Prefix user inputs with a string
2023-03-25 14:03:19 +02:00
Jed Fox
58e6c9f36f
Add support for file load progress reporting callbacks (#434)
* File load progress reporting

* Move llama_progress_handler into llama_context_params

* Renames

* Use seekg to find file size instead

* More correct load progress

* Call progress callback more frequently

* Fix typo
2023-03-25 07:26:28 +02:00
Doomsdayrs
36d07532ef
Add missing struct annotation (#483)
`llama_sample_top_p_top_k` was missing the struct annotation on line 126.

This causes a compiler issue when being parsed by the Kotlin C interop generator.

This commit fixes the above issue by adding the struct annotation.
2023-03-25 07:21:24 +02:00
Chris Kuehl
6f1ee4b640
Fix crash for 65B model with pre-allocated memory (#485) 2023-03-25 06:38:14 +02:00
Concedo
8a339bd75c update gitignore 2023-03-25 11:23:40 +08:00
Concedo
3c78124aac Merge branch 'master' into concedo
# Conflicts:
#	README.md
2023-03-25 11:20:04 +08:00
Concedo
119392f6f2 defaulting to f32 kv, and 4 threads seem to produce better results 2023-03-25 11:11:40 +08:00
Concedo
506cd62638 changed some defaults to hopefully increase compatibility 2023-03-25 10:40:11 +08:00
Concedo
b13a768813 added softprompt endpoint 2023-03-25 10:12:47 +08:00
Georgi Gerganov
8520fc310e
Disable BLAS altogether - the bug is not just for qunatized mat mul 2023-03-24 23:47:06 +02:00