Implement prototype for instant mmap() loading

This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.

This commit is contained in:

Justine Tunney

2023-03-16 22:03:09 -07:00

parent 2788f373be

commit 5b8023d935

No known key found for this signature in database

GPG key ID: BE714B4575D6E328

2 changed files with 207 additions and 19 deletions

1

.gitignore vendored

View file

 @ -18,6 +18,7 @@ models/*
 /main
 /quantize
 /magic.dat
 arm_neon.h
 compile_commands.json

Rows
Columns

Implement prototype for instant mmap() loading

1 .gitignore vendored Unescape Escape View file

1

.gitignore vendored

View file