Commit graph

273 commits

Author SHA1 Message Date
InconsolableCellist
13addf2a78 Merge branch 'concedo' of github.com:InconsolableCellist/llamacpp-for-kobold into concedo 2023-03-28 13:43:19 -06:00
InconsolableCellist
f7c905b0d0 Minor overhaul of code:
* Set number of utilized llama.cpp threads back to os.cpu_count, which
  had better performance on my machine (20 threads vs. 6, 3m12s vs.
  4m42s on 65B)

* Using argparse for command line args

* Supports binding to a specific interface, for use on LANs/WANs (no
  longer limited to just 127.0.0.1). Requires modified klite.embd

* General code cleanup and passing some parameters around without
  globals
2023-03-28 13:39:34 -06:00
InconsolableCellist
003365907d updating to version 17 of embedded koboldAI, and adding host address support 2023-03-28 13:39:10 -06:00
Concedo
bf30406f50 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
2023-03-28 17:13:38 +08:00
RJ Adriaansen
4b8efff0e3
Add embedding example to Makefile (#540) 2023-03-28 09:11:09 +03:00
Concedo
46ddbb22bf allow url params 2023-03-27 17:40:05 +08:00
Marco Matthies
7e5395575a
Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) 2023-03-27 07:55:26 +03:00
Erik Scholz
34c1072e49
ci: add debug build to sanitizer build matrix (#527) 2023-03-26 15:48:40 +00:00
Stephan Walter
939ad2d3a5
Fix undefined variables in debug build, remove unused variables (#531) 2023-03-26 15:34:02 +00:00
Juan Calderon-Perez
8c2ec5e21d
Add support for linux/arm64 platform during Docker Builds (#514)
* Add support for linux/arm64 platform

* Add platform to versioned builds
2023-03-26 14:48:42 +00:00
Stephan Walter
b391579db9
Update README and comments for standalone perplexity tool (#525) 2023-03-26 16:14:01 +03:00
anzz1
7a87d31f4f
[main] fix infinite generation (-n == -1) (#523) 2023-03-26 16:06:10 +03:00
Georgi Gerganov
348d6926ee
Add logo to README.md 2023-03-26 10:20:49 +03:00
Concedo
053b20c8ca merged complete 2023-03-26 14:55:43 +08:00
Concedo
33b5d2c376 Merge branch 'master' into concedo 2023-03-26 14:52:14 +08:00
Concedo
57474944d6 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
2023-03-26 14:52:08 +08:00
Harald Fernengel
33e35b8fe8
Exit from interactive mode if input stream is bad (#491)
Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z
on Windows.
2023-03-26 08:25:46 +03:00
anzz1
19726169b3
CI: Run other sanitizer builds even if one fails (#511)
applies only to sanitizer builds so they wont be cancelled
2023-03-26 00:13:28 +02:00
jp-x-g
f732695cd5
Clarify console output in convert-pth-to-ggml.py (#512)
"Processing part 1 of 3" instead of "Processing part 0"
2023-03-25 23:53:55 +02:00
anzz1
2f7bf7dd7c
CMake / CI additions (#497)
* CMake: Add AVX512 option

* CI: Add AVX/AVX512 builds (Windows)
(AVX512 tests can only be run when the worker happens to support it, building works anyway)

* CMake: Fix sanitizer linkage ( merged #468 )

* CI: Add sanitizer builds (Ubuntu)

* CI: Fix release tagging
(change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)
2023-03-25 23:38:11 +02:00
anzz1
34ab526843
(Windows) Set console to UTF-8 on init (#420)
Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.
2023-03-25 22:29:22 +02:00
Georgi Gerganov
c2b25b6912
Fix colors enabling on WIN32 2023-03-25 21:53:39 +02:00
Georgi Gerganov
79b2b266db
If n_predict == -1, generate forever 2023-03-25 21:51:41 +02:00
Georgi Gerganov
e2d490dafd
Inifinite generation via context swapping (#71) 2023-03-25 21:36:22 +02:00
Georgi Gerganov
03f7e33560
Cleanup STL headers + fix embedding examples + minor stuff 2023-03-25 20:51:14 +02:00
Georgi Gerganov
55ad42af84
Move chat scripts into "./examples" 2023-03-25 20:37:09 +02:00
slaren
459e93cce0
Add AVX2 implementation of dequantize_row_q4_1 (#505) 2023-03-25 20:31:48 +02:00
Georgi Gerganov
a316a425d0
Overhaul the examples structure
- main -> examples
- utils -> examples (renamed to "common")
- quantize -> examples
- separate tools for "perplexity" and "embedding"

Hope I didn't break something !
2023-03-25 20:26:40 +02:00
Georgi Gerganov
ecbe466a36
Retire the ggml_mul_mat() branch for transposed src0 (#500)
* Retire the ggml_mul_mat() for transposed src0

- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads

* SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502)

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON

* Fix dequantization - forgot to interleave the quants
2023-03-25 19:47:21 +02:00
Georgi Gerganov
502a400192
Disable prompt verbosity by default and add option to enable (#480) 2023-03-25 17:17:16 +02:00
slaren
09aecbf628
Add AVX2 implementation of dequantize_row_q4_0 (#467) 2023-03-25 17:06:49 +02:00
Georgi Gerganov
4640eff23d
Don't interefe with BLAS for large prompts by running only 1 thread 2023-03-25 17:03:10 +02:00
Georgi Gerganov
ab77d76312
Add longer DAN prompt for testing big batch numbers 2023-03-25 16:49:09 +02:00
slaren
29b7baab67
Add timings for the prompt evaluation (#478) 2023-03-25 16:34:23 +02:00
Georgi Gerganov
4a7129acd2
Remove obsolete information from README 2023-03-25 16:30:32 +02:00
Georgi Gerganov
6b6dbc8910
Remove obsolete assert and fix compiler warning 2023-03-25 16:22:05 +02:00
Georgi Gerganov
2a2e63ce05
Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS 2023-03-25 16:10:14 +02:00
anzz1
e899bf54b2
bounds checking for input prefix (#492) 2023-03-25 14:42:09 +02:00
anzz1
fbd4d38c64
feat: '--in-prefix STRING' option (#426)
Prefix user inputs with a string
2023-03-25 14:03:19 +02:00
Jed Fox
58e6c9f36f
Add support for file load progress reporting callbacks (#434)
* File load progress reporting

* Move llama_progress_handler into llama_context_params

* Renames

* Use seekg to find file size instead

* More correct load progress

* Call progress callback more frequently

* Fix typo
2023-03-25 07:26:28 +02:00
Doomsdayrs
36d07532ef
Add missing struct annotation (#483)
`llama_sample_top_p_top_k` was missing the struct annotation on line 126.

This causes a compiler issue when being parsed by the Kotlin C interop generator.

This commit fixes the above issue by adding the struct annotation.
2023-03-25 07:21:24 +02:00
Chris Kuehl
6f1ee4b640
Fix crash for 65B model with pre-allocated memory (#485) 2023-03-25 06:38:14 +02:00
Concedo
8a339bd75c update gitignore 2023-03-25 11:23:40 +08:00
Concedo
3c78124aac Merge branch 'master' into concedo
# Conflicts:
#	README.md
2023-03-25 11:20:04 +08:00
Concedo
119392f6f2 defaulting to f32 kv, and 4 threads seem to produce better results 2023-03-25 11:11:40 +08:00
Concedo
506cd62638 changed some defaults to hopefully increase compatibility 2023-03-25 10:40:11 +08:00
Concedo
b13a768813 added softprompt endpoint 2023-03-25 10:12:47 +08:00
Georgi Gerganov
8520fc310e
Disable BLAS altogether - the bug is not just for qunatized mat mul 2023-03-24 23:47:06 +02:00
Georgi Gerganov
b3f460e941
Disable BLAS branch in mul_mat - seems there is a bug 2023-03-24 23:39:17 +02:00
Georgi Gerganov
04c6f5ed6f
Immediately start processing the prompt before user input has been provided (#476) 2023-03-24 23:17:58 +02:00