Commit graph

9 commits

Author SHA1 Message Date
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention (#11583)
* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-02 19:31:09 +01:00
Georgi Gerganov
ab96610b1e
cmake : enable warnings in llama (#10474)
* cmake : enable warnings in llama

ggml-ci

* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS

* cmake : get_flags -> ggml_get_flags

* speculative-simple : fix warnings

* cmake : reuse ggml_get_flags

ggml-ci

* speculative-simple : fix compile warning

ggml-ci
2024-11-26 14:18:08 +02:00
Diego Devesa
5931c1f233
ggml : add support for dynamic loading of backends (#10469)
* ggml : add support for dynamic loading of backends

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-25 15:13:39 +01:00
Diego Devesa
3ee6382d48
cuda : fix CUDA_FLAGS not being applied (#10403) 2024-11-19 14:29:38 +01:00
Diego Devesa
d3481e6316
cuda : only use native when supported by cmake (#10389) 2024-11-18 18:43:40 +01:00
Johannes Gäßler
ce2e59ba10
CMake: fix typo in comment [no ci] (#10360) 2024-11-17 12:59:38 +01:00
Johannes Gäßler
c3ea58aca4
CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) 2024-11-17 09:09:55 +01:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build (#10320) 2024-11-17 09:06:34 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries (#10256)
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00