Commit graph

446 commits

Author SHA1 Message Date
Concedo
c1b75f38d0 try to fix noavx2 for really old devices by 2023-04-13 14:36:00 +08:00
Concedo
2ff91b5570 Merge remote-tracking branch 'occam/clblast-1' into concedo 2023-04-13 11:39:35 +08:00
Concedo
5c22f7e4c4 reduce batch sizes and skip all intrinsic flags except AVX when building in compatibility mode. 2023-04-13 11:32:05 +08:00
0cc4m
67d220210f Revert buffer changes, no improvements in benchmarks 2023-04-12 23:10:35 +02:00
0cc4m
c7e5c4f7b2 Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers 2023-04-12 23:10:33 +02:00
Concedo
f4257a8eef Merge branch 'master' into concedo 2023-04-12 23:25:45 +08:00
Concedo
1bd5992da4 clean and refactor handling of flags 2023-04-12 23:25:31 +08:00
Stephan Walter
e7f6997f89
Don't crash on ftype (formerly f16) == 4 (#917) 2023-04-12 15:06:16 +00:00
Concedo
636f8e5a8e updated the quantize files and makefile 2023-04-12 21:40:25 +08:00
Georgi Gerganov
f76cb3a34d
readme : change "GPU support" link to discussion 2023-04-12 14:48:57 +03:00
Georgi Gerganov
782438070f
readme : update hot topics with link to "GPU support" issue 2023-04-12 14:31:12 +03:00
Concedo
4faae0afa9 Merged upstream, fixed OSX compile errors, integrated noavx2 build into main 2023-04-12 18:08:55 +08:00
rabidcopy
2444a99db5
Fix make compile error in expose.cpp(?) (#44)
* fix compile error?

* Update expose.cpp
2023-04-12 16:19:38 +08:00
Nicolai Weitkemper
4dbbd40750
readme: link to sha256sums file (#902)
This is to emphasize that these do not need to be obtained from elsewhere.
2023-04-12 08:46:20 +02:00
Pavol Rusnak
8b679987cd
Fix whitespace, add .editorconfig, add GitHub workflow (#883) 2023-04-11 19:45:44 +00:00
Concedo
ca69e05d1f update readme and fixed typos 2023-04-11 23:53:21 +08:00
Concedo
9245c7d7d0 Merge branch 'master' into concedo 2023-04-11 23:38:15 +08:00
Concedo
23c675b2e6 integrated optional (experimentl) CLBlast support 2023-04-11 23:33:44 +08:00
Stephan Walter
3e6e70d8e8
Add enum llama_ftype, sync ggml_type to model files (#709) 2023-04-11 15:03:51 +00:00
comex
2663d2c678
Windows fixes (#890)
Mostly for msys2 and mingw64 builds, which are different from each other
and different from standard Visual Studio builds.  Isn't Windows fun?

- Define _GNU_SOURCE in more files (it's already used in ggml.c for
  Linux's sake).

- Don't use PrefetchVirtualMemory if not building for Windows 8 or later
  (mingw64 doesn't by default).  But warn the user about this situation
  since it's probably not intended.

- Check for NOMINMAX already being defined, which it is on mingw64.

- Actually use the `increment` variable (bug in my `pizza` PR).

- Suppress unused variable warnings in the fake pthread_create and
  pthread_join implementations for Windows.

- (not Windows-related) Remove mention of `asprintf` from comment;
  `asprintf` is no longer used.

Fixes #871.
2023-04-11 15:19:54 +02:00
Concedo
c9f18082fd Merge remote-tracking branch 'occam/clblast' into concedo 2023-04-11 17:01:31 +08:00
Concedo
1f6aa47b6e Merge branch 'master' into concedo
# Conflicts:
#	README.md
2023-04-11 16:53:41 +08:00
qouoq
a0caa34b16
Add BAIR's Koala to supported models (#877) 2023-04-10 22:41:53 +02:00
Georgi Gerganov
461ba9e66e
ggml : fix WASM build 2023-04-10 23:20:01 +03:00
Georgi Gerganov
c3ac702e5e
ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst 2023-04-10 22:42:28 +03:00
Georgi Gerganov
9d634ef452
ggml : remove trailing whitespaces 2023-04-10 22:42:28 +03:00
Marco Matthies
d9a239c410
Simplify to include lower-case windows.h always, fix compile on mingw32 (#747) 2023-04-10 19:57:59 +02:00
Georgi Gerganov
684da25926
ggml : fix quantize_row_q4_1() ARM_NEON (close #876) 2023-04-10 19:29:48 +03:00
0cc4m
c3db99ea32 Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing 2023-04-10 18:20:40 +02:00
Concedo
69b85f5b61 fixed a few OOM errors with larger contexts - I cannot figure out why they happen, so I am forced to increase the buffer size. 2023-04-11 00:14:57 +08:00
Concedo
f53238f570 Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed. 2023-04-10 12:00:34 +08:00
comex
180b693a47 Print model version.
Also improve model type printing, and fix indentation of an unrelated
switch statement.
2023-04-10 01:10:46 +02:00
comex
f963b63afa Rewrite loading code to try to satisfy everyone:
- Support all three formats (ggml, ggmf, ggjt).  (However, I didn't
  include the hack needed to support GPT4All files without conversion.
  Those can still be used after converting them with convert.py from my
  other PR.)

- Support both mmap and read (mmap is used by default, but can be
  disabled with `--no-mmap`, and is automatically disabled for pre-ggjt
  files or on platforms where mmap is not supported).

- Support multi-file models like before, but automatically determine the
  number of parts rather than requiring `--n_parts`.

- Improve validation and error checking.

- Stop using the per-file type field (f16) entirely in favor of just
  relying on the per-tensor type/size fields.  This has no immediate
  benefit, but makes it easier to experiment with different formats, and
  should make it easier to support the new GPTQ-for-LLaMa models in the
  future (I have some work in progress on that front).

- Support VirtualLock on Windows (using the same `--mlock` option as on
  Unix).

    - Indicate loading progress when using mmap + mlock.  (Which led me
      to the interesting observation that on my Linux machine, with a
      warm file cache, mlock actually takes some time, whereas mmap
      without mlock starts almost instantly...)

      - To help implement this, move mlock support from ggml to the
        loading code.

- madvise/PrefetchVirtualMemory support (based on #740)

- Switch from ifstream to the `fopen` family of functions to avoid
  unnecessary copying and, when mmap is enabled, allow reusing the same
  file descriptor for both metadata reads and mmap (whereas the existing
  implementation opens the file a second time to mmap).

- Quantization now produces a single-file output even with multi-file
  inputs (not really a feature as much as 'it was easier this way').

Implementation notes:

I tried to factor the code into more discrete pieces than before.

Regarding code style: I tried to follow the code style, but I'm naughty
and used a few advanced C++ features repeatedly:

- Destructors to make it easier to ensure everything gets cleaned up.

- Exceptions.  I don't even usually use exceptions when writing C++, and
  I can remove them if desired... but here they make the loading code
  much more succinct while still properly handling a variety of errors,
  ranging from API calls failing to integer overflow and allocation
  failure.  The exceptions are converted to error codes at the
  API boundary.)

Co-authored-by: Pavol Rusnak <pavol@rusnak.io> (for the bit I copied from #740)
2023-04-10 01:10:46 +02:00
Concedo
18a154715e added version label, improved file type checks 2023-04-10 01:03:09 +08:00
Concedo
1543c700d8 added a missing endpoint for tavern 2023-04-09 17:41:33 +08:00
Concedo
b91abc3316 increase default blas batch size 2023-04-09 15:27:43 +08:00
Concedo
4d1825263b Merge branch 'master' into concedo
# Conflicts:
#	CMakeLists.txt
#	flake.nix
2023-04-09 13:22:40 +08:00
Concedo
26a7933084 hide the tiny tkinter window 2023-04-09 01:01:34 +08:00
Tomáš Pazdiora
aaf3b23deb
fix for windows utf-8 input (#840)
Use UTF-16 as input on Windows, since UTF-8 does not work and reads multibyte characters as zeros
2023-04-08 17:49:39 +02:00
eiery
f2d1c47294
cmake should link openblas properly with -lopenblas like how it's done in the makefile (#839) 2023-04-08 11:15:17 +00:00
lon
317fb12fbd
Add new binaries to flake.nix (#847) 2023-04-08 12:04:23 +02:00
Concedo
d335fae7c4 missed a print statement 2023-04-08 17:59:53 +08:00
Concedo
0b904e12db Merge branch 'master' into concedo
# Conflicts:
#	Makefile
2023-04-08 17:42:09 +08:00
LostRuins
5dd610032e
Merge pull request #27 from ariez-xyz/patch-1
add more precise instructions for arch
2023-04-08 17:37:39 +08:00
Concedo
d8e37bfe75 new gpt2 format supported 2023-04-08 17:35:36 +08:00
ariez-xyz
b48255db19
add more precise instructions for arch 2023-04-08 10:41:57 +02:00
Concedo
1369b46bb7 notice about false positives 2023-04-08 12:20:48 +08:00
unbounded
62cfc54f77
Add quantize-stats command for testing quantization (#728)
Command that calculates some statistics over the errors introduced by
quantization, like mean square error, max error and some percentile errors for layer
weights. Should be useful for testing quantization improvements.

Exposes some internal state from ggml and llama for testing
2023-04-08 00:09:18 +02:00
Concedo
d1c957ee64 strip symbols 2023-04-08 00:59:34 +08:00
bhubbb
698f7b5d63
make : add libllama.so target for llama-cpp-python (#797)
I was able to get llama-cpp-python working but only when I build libllama.so with make.
2023-04-07 19:11:58 +03:00