This change fixes a bug where replacing text in a very long string could
cause llama.cpp to hang indefinitely. This is because the algorithm used
was quadratic, due to memmove() when s.replace() is called in a loop. It
seems most search results and LLM responses actually provide the O(n**2)
algorithm, which is a great tragedy. Using a builder string fixes things
* llama : advanced batch splits
This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.
* llama : always make recurrent state slots contiguous
* ggml : simplify mamba operators
* llama : fix integer signedness mixing
* llama : logits_all has priority over batch->logits
Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.
* llama : apply suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* llama : fix t5 segfault
* llama : fix Mamba session save and restore
* llama : minor cosmetic changes
* llama : rename llama_reorder_outputs to llama_output_reorder
Also move it closer to llama_output_reserve.
* llama : fix pooled embeddings when using batches with equal_seqs
* minor : add struct members for clarity
ggml-ci
* llama : fix T5 segfault again
* llama : fix Mamba pooled embeddings with multiple sequences
Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.
* llama : add llama_model_is_recurrent to simplify figuring that out
This will make it easier to more cleanly support RWKV-v6 and Mamba-2.
* llama : fix simple splits when the batch contains embeddings
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* llama : std::move llm_bigram_bpe from work_queue
This commit updates the retrieval of llm_bigram_bpe objects from
work_queue.top() by using std::move.
The motivation for this is to avoid the copying of the std::string
`text` member of the llm_bigram_bpe struct.
* squash! llama : std::move llm_bigram_bpe from work_queue
Introduced a MovablePriorityQueue class to allow moving elements
out of the priority queue for llm_bigram_bpe.
* squash! llama : std::move llm_bigram_bpe from work_queue
Rename MovablePriorityQueue to lama_priority_queue.
* squash! llama : std::move llm_bigram_bpe from work_queue
Rename lama_priority_queue -> llama_priority_queue.