diff --git a/README.md b/README.md index 5517bf093..6cd05be6a 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) ### Recent API changes +- [2024 Mar 30] State and session file functions reorganized under `llama_state_*` https://github.com/ggerganov/llama.cpp/pull/6341 - [2024 Mar 26] Logits and embeddings API updated for compactness https://github.com/ggerganov/llama.cpp/pull/6122 - [2024 Mar 13] Add `llama_synchronize()` + `llama_context_params.n_ubatch` https://github.com/ggerganov/llama.cpp/pull/6017 - [2024 Mar 8] `llama_kv_cache_seq_rm()` returns a `bool` instead of `void`, and new `llama_n_seq_max()` returns the upper limit of acceptable `seq_id` in batches (relevant when dealing with multiple sequences) https://github.com/ggerganov/llama.cpp/pull/5328 diff --git a/llama.h b/llama.h index f3e0c0022..3c313b884 100644 --- a/llama.h +++ b/llama.h @@ -646,16 +646,18 @@ extern "C" { size_t n_token_count), "use llama_state_save_file instead"); + // Get the exact size needed to copy the KV cache of a single sequence LLAMA_API size_t llama_state_seq_get_size( struct llama_context * ctx, llama_seq_id seq_id); + // Copy the KV cache of a single sequence into the specified buffer LLAMA_API size_t llama_state_seq_get_data( struct llama_context * ctx, uint8_t * dst, llama_seq_id seq_id); - // Copy the sequence data (originally copied with `llama_state_seq_get_data`) into a sequence. + // Copy the sequence data (originally copied with `llama_state_seq_get_data`) into the specified sequence // Returns: // - Positive: Ok // - Zero: Failed to load