Direct I/O and Transparent HugePages

--direct-io for bypassing page cache (and using THP on Linux)

Up to 3-6x faster uncached loading, fewer pageouts, no page cache pollution.
This commit is contained in:
Pavel Fatin 2024-05-20 21:55:33 +02:00
parent 917dc8cfa6
commit 1b17ed7ab6
10 changed files with 297 additions and 30 deletions

View file

@ -38,6 +38,7 @@ options:
-nkvo, --no-kv-offload <0|1> (default: 0)
-fa, --flash-attn <0|1> (default: 0)
-mmp, --mmap <0|1> (default: 1)
-dio, --direct-io <0|1> (default: 0)
--numa <distribute|isolate|numactl> (default: disabled)
-embd, --embeddings <0|1> (default: 0)
-ts, --tensor-split <ts0/ts1/..> (default: 0)