finetune : add -ngl parameter (#3762)
* Add '-ngl' support to finetune.cpp * Add fprintf in ggml_cuda_op_add When I tried CUDA offloading during finetuning following the readme, I got an assert here. This probably isn't an important case because inference later gives a warning saying you should use f16 or f32 instead when using lora * Add 'finetune.sh', which currently fails when using GPU "error: operator (): Finetuning on tensors with type 'f16' is not yet supported" * tweak finetune.sh * Suppress some warnings in ggml.c * Add f16 implementation to ggml_compute_forward_add_f16_f32 * Add an f16 case to ggml_add_cast_impl and llama_build_lora_finetune_graphs * finetune.sh: Edit comments * Add "add_f16_f32_f32_cuda" * Tweak an error message * finetune.sh: Add an optional LLAMA_MODEL_DIR variable * finetune.sh: Add an optional LLAMA_TRAINING_DIR variable * train : minor * tabs to spaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
This commit is contained in:
parent
f0e209324a
commit
73bdcb395e
8 changed files with 106 additions and 15 deletions
|
@ -716,6 +716,7 @@ void quantize_row_q8_0(const float * restrict x, void * restrict vy, int k) {
|
|||
__riscv_vse8_v_i8m1(y[i].qs , vs, vl);
|
||||
}
|
||||
#else
|
||||
UNUSED(nb);
|
||||
// scalar
|
||||
quantize_row_q8_0_reference(x, y, k);
|
||||
#endif
|
||||
|
@ -969,6 +970,7 @@ void quantize_row_q8_1(const float * restrict x, void * restrict vy, int k) {
|
|||
y[i].s = sum*d;
|
||||
}
|
||||
#else
|
||||
UNUSED(nb);
|
||||
// scalar
|
||||
quantize_row_q8_1_reference(x, y, k);
|
||||
#endif
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue