Implement prototype for instant mmap() loading
This change uses a custom malloc() implementation to transactionally capture to a file dynamic memory created during the loading process. That includes (1) the malloc() allocation for mem_buffer and (2) all the C++ STL objects. On my $1000 personal computer, this change lets me run ./main to generate a single token (-n 1) using the float16 7B model (~12gb size) in one second. In order to do that, there's a one time cost where a 13gb file needs to be generated. This change rocks but it shouldn't be necessary to do something this heroic. We should instead change the file format, so that tensors don't need reshaping and realignment in order to be loaded.
This commit is contained in:
parent
2788f373be
commit
5b8023d935
2 changed files with 207 additions and 19 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -18,6 +18,7 @@ models/*
|
|||
|
||||
/main
|
||||
/quantize
|
||||
/magic.dat
|
||||
|
||||
arm_neon.h
|
||||
compile_commands.json
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue