convert-hf : support bfloat16 conversion (#7158)

* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto

This commit is contained in:

compilade

2024-05-11 11:06:26 -04:00

• committed by

GitHub

parent fae9d234b6

commit 5a419926b0

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

5 changed files with 404 additions and 182 deletions

									
										1

gguf-py/gguf/__init__.py
									
										View file
										
				@ -1,4 +1,5 @@

				from .constants import *

				from .lazy import *

				from .gguf_reader import *

				from .gguf_writer import *

				from .tensor_mapping import *

Rows
Columns

convert-hf : support bfloat16 conversion (#7158)

1 gguf-py/gguf/__init__.py Unescape Escape View file

1

gguf-py/gguf/init.py

View file