Implement prototype for instant mmap() loading

This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.
This commit is contained in:
Justine Tunney 2023-03-16 22:03:09 -07:00
parent 2788f373be
commit 5b8023d935
No known key found for this signature in database
GPG key ID: BE714B4575D6E328
2 changed files with 207 additions and 19 deletions

1
.gitignore vendored
View file

@ -18,6 +18,7 @@ models/*
/main
/quantize
/magic.dat
arm_neon.h
compile_commands.json