Commit graph

  • 2ff2d16131 ggml-kompute.h : remove anything that doesn't need to be public Jared Van Bortel 2024-01-26 14:57:58 -05:00
  • f3b2d22240 Add OpenCL add kernel 0cc4m 2024-01-26 20:50:05 +01:00
  • 6af02b19d1 kompute : init device automatically and remove an unnecessary free Jared Van Bortel 2024-01-26 14:42:11 -05:00
  • 8ca33dec7d test-backend-ops : check all the ops in the test for support in the backends slaren 2024-01-26 20:01:36 +01:00
  • 2512799cfe test-backend-ops : comment out Llama and Falcon tests Jared Van Bortel 2024-01-26 13:55:10 -05:00
  • aea84989f7 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/nomic-vulkan Jared Van Bortel 2024-01-26 13:46:49 -05:00
  • f94c02ec84 Tests for min_p, sampling queue JohannesGaessler 2024-01-26 15:54:55 +01:00
  • e6ce5f21a1 llama : revert unintended whitespace change Jared Van Bortel 2024-01-26 13:10:49 -05:00
  • a5cca6cd8c Move queue into context 0cc4m 2024-01-26 19:01:33 +01:00
  • 62fead3ea0
    cuda : fix tensor size calculation for non-split buffer (#5145) b1981 slaren 2024-01-26 18:59:43 +01:00
  • 61a5cf88dc kompute : remove unnecessary use_mmap=false Jared Van Bortel 2024-01-26 12:58:50 -05:00
  • 15b4538ff2
    ggml-alloc : add 10% margin to the buffer sizes (#5149) b1980 slaren 2024-01-26 18:18:26 +01:00
  • 7032f4f634
    ggml : update softmax n_task calculation (#5126) b1979 snadampal 2024-01-26 11:17:59 -06:00
  • 0c979ca345 ggml-alloc : add 10% margin to the buffer sizes slaren 2024-01-26 18:16:47 +01:00
  • c2d3fcf5ee fixed link name marcus 2024-01-26 09:02:00 -08:00
  • 142acffe60 added link to another set of rust bindings with brief note on differences. marcus 2024-01-26 08:59:35 -08:00
  • 5a8a07ec60 Simplify gpu_extras by removing events and putting staging memcpys into contexts 0cc4m 2024-01-26 17:50:29 +01:00
  • bc0192f654 fix appending to CUDA_FLAGS Jared Van Bortel 2024-01-26 11:37:30 -05:00
  • 7105287f0d cmake : pass CPU architecture flags to nvcc Jared Van Bortel 2024-01-23 17:42:25 -05:00
  • 209e41316a ggml: softmax op: update the n_task calculation Sunita Nadampalli 2024-01-26 16:03:59 +00:00
  • 2ab97157c9 add copyright and MIT license declare jianyuzh 2024-01-26 23:44:18 +08:00
  • b9ffaab188
    Update CMakeLists.txt Abhilash Majumder 2024-01-26 20:50:17 +05:30
  • 553175402b
    Update CMakeLists.txt Abhilash Majumder 2024-01-26 20:49:48 +05:30
  • 45b0618078
    Update CMakeLists.txt Abhilash Majumder 2024-01-26 20:49:25 +05:30
  • f707051066
    Update CMakeLists.txt Abhilash Majumder 2024-01-26 20:49:07 +05:30
  • 2cba564b49
    Update examples/sycl/run-llama2.sh Abhilash Majumder 2024-01-26 20:48:49 +05:30
  • c08fec2a38
    Update examples/sycl/run-llama2.sh Abhilash Majumder 2024-01-26 20:43:17 +05:30
  • 174c9a0ed6
    Update ci/run.sh Abhilash Majumder 2024-01-26 20:40:50 +05:30
  • 5f1925a8ce
    scripts : move run-with-preset.py from root to scripts folder Georgi Gerganov 2024-01-26 17:09:44 +02:00
  • fbe204539e cuda : fix tensor size calculation for non-split buffer slaren 2024-01-26 15:47:21 +01:00
  • 3b7c914de2
    tests : gitignore test-c.o Georgi Gerganov 2024-01-26 14:48:15 +02:00
  • 48c857aa10
    server : refactored the task processing logic (#5065) b1976 Xuan Son Nguyen 2024-01-26 13:42:20 +01:00
  • 413e7b0559
    ci : add model tests + script wrapper (#4586) b1975 crasm 2024-01-26 07:18:00 -05:00
  • 6dd3c28c9c
    metal : remove unused n_buffers and buffers (#5129) b1974 Paul Tsochantaris 2024-01-26 12:16:07 +00:00
  • d6a6505018 implement std::isinf for icpx with fast math. luoyu-intel 2024-01-26 18:31:04 +08:00
  • 38b431de23
    gguf : fix "general.alignment" type in gguf_reader.py (#5136) Riceball LEE 2024-01-26 17:10:28 +08:00
  • aad0b01d73
    readme : update hot topics Georgi Gerganov 2024-01-26 10:52:33 +02:00
  • c25052b72e
    fix(gguf_reader): the "general.alignment" should be UINT32 type Riceball LEE 2024-01-26 15:48:19 +08:00
  • c29a855453 code clean zhangjidong 2024-01-26 15:38:37 +08:00
  • 1182cf4d4f
    Another bucket sort (#5109) b1971 Kawrakow 2024-01-26 09:14:39 +02:00
  • b08c6b1ad8 fix tabs instead of spaces zhangjidong 2024-01-26 10:45:34 +08:00
  • 2d481ad2ea Removing unused n_buffers and buffers fields from ggml_metal_context Paul Tsochantaris 2024-01-25 23:22:16 +00:00
  • 91654ff042 kompute : fix a -Wstrict-aliasing warning Jared Van Bortel 2024-01-25 17:03:06 -05:00
  • bc287047fb kompute : remove unused immintrin.h #include Jared Van Bortel 2024-01-25 10:13:09 -05:00
  • 3915194232 test-backend-ops : make Falcon test faster with a smaller model Jared Van Bortel 2024-01-25 15:56:42 -05:00
  • 3fbf0529ef kompute : mark last few failing ops as unsupported Jared Van Bortel 2024-01-25 15:47:43 -05:00
  • 445a3734b7 kompute : fix basic Q6_K get_rows, 26 -> 24 failures Jared Van Bortel 2024-01-25 15:38:39 -05:00
  • de9fba0d39 kompute : fix basic f16 get_rows, 28 -> 26 failures Jared Van Bortel 2024-01-25 15:22:11 -05:00
  • fe54033b69
    readme : add MobileVLM 1.7B/3B to the supported models list (#5107) XiaotaoChen 2024-01-26 04:14:32 +08:00
  • 5eaf9964fc
    llama : dynamic temperature sampling (#4972) b1969 l3utterfly 2024-01-26 05:06:22 +09:00
  • 11b305082b test-backend-ops : restore softmax tests Jared Van Bortel 2024-01-25 15:05:55 -05:00
  • 38d1f0c7a0 kompute : fix op_gelu -> Falcon is working on AMDVLK Jared Van Bortel 2024-01-25 14:35:40 -05:00
  • 6fc99a6e66 test-backend-ops : test larger GELU range Jared Van Bortel 2024-01-25 15:01:21 -05:00
  • d292f4f204
    examples : make pydantic scripts pass mypy and support py3.8 (#5099) Jared Van Bortel 2024-01-25 14:51:24 -05:00
  • 1849b85473 test-backend-ops : add Falcon test Jared Van Bortel 2024-01-25 13:55:49 -05:00
  • 82ce1c4da2 Cleanup header and other files 0cc4m 2024-01-25 19:43:08 +01:00
  • 256d1bb0dd
    android : use release cmake build type by default (#5123) Valentin Konovalov 2024-01-25 12:05:51 -05:00
  • f5ac635473 kompute : fix q8_0 mmv, 41 -> 28 failures Jared Van Bortel 2024-01-25 11:27:11 -05:00
  • 6fea843b24
    metal : add parallel reduce version (disabled) Georgi Gerganov 2024-01-25 17:59:41 +02:00
  • 987335ea0a kompute : fix algorithm names Jared Van Bortel 2024-01-25 11:09:18 -05:00
  • 6e7cb0eeaf update implementation FSSRepo 2024-01-25 11:04:51 -05:00
  • faa3526a1e
    Fix Q3_K_XS for MoE models (#5113) b1966 Kawrakow 2024-01-25 17:58:53 +02:00
  • f9ca5dcbe8
    llama : avoid ggml_cast, use F32 query Georgi Gerganov 2024-01-25 17:46:07 +02:00
  • 78da3387a8 Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp into flash-attn-cuda FSSRepo 2024-01-25 09:48:37 -05:00
  • 5e15ce1658 android : use release cmake build type by default Valentin Konovalov 2024-01-25 09:42:42 -05:00
  • 40ea8cd1ac
    metal : fix comment Georgi Gerganov 2024-01-25 16:31:39 +02:00
  • 432ad04ffa
    metal : scale and mask in matrix form Georgi Gerganov 2024-01-25 15:47:52 +02:00
  • d917746ddb
    metal : avoid redundant loads of the attention Georgi Gerganov 2024-01-25 15:00:49 +02:00
  • 1446a12b29
    metal : efficient flash_attn_f16 implementation Georgi Gerganov 2024-01-23 18:27:54 +02:00
  • 2bf91c5306
    metal : clean up gg/flash-attn-simd Georgi Gerganov 2024-01-25 13:29:45 +02:00
  • f6416d4493
    wip : good version 8x32 Georgi Gerganov 2024-01-25 12:59:59 +02:00
  • 05b7f9be6f revert setting n_threads in sycl Meng, Hengyu 2024-01-25 09:44:19 +00:00
  • b06dca678b remove extra blank line in test-sampling Meng, Hengyu 2024-01-25 09:38:36 +00:00
  • ddc5a5033f
    metal : show compile log messages b1965 Georgi Gerganov 2024-01-25 11:26:17 +02:00
  • 66e24c2468 pass void as arguments of ggml_backend_sycl_print_sycl_devices Meng, Hengyu 2024-01-25 09:26:02 +00:00
  • eb12e3c391
    wip : disable skip Georgi Gerganov 2024-01-25 11:25:07 +02:00
  • f1bab50100 revert sycl checking in test-sampling Meng, Hengyu 2024-01-25 09:21:16 +00:00
  • 806382a3a6
    wip : simdify ms, vs Georgi Gerganov 2024-01-25 09:39:22 +02:00
  • 0bd6d42954
    Merge branch 'ggerganov:master' into Orion-14B-support sharpHL 2024-01-25 08:55:54 +08:00
  • 154319c42d flake8 support lixiaopu 2024-01-25 08:52:44 +08:00
  • ec68a9657f test-backend-ops : increase max_nmse_err so Llama passes Jared Van Bortel 2024-01-24 17:31:34 -05:00
  • cd4fddb29f
    cuda : fix 2-bit quants on amd hip (#5105) b1964 Engininja2 2024-01-24 16:18:15 -06:00
  • ebb5f7e968 test-backend-ops : test llama with different batch sizes Jared Van Bortel 2024-01-24 16:55:27 -05:00
  • df687b10ab kompute : support mask parameter of softmax Jared Van Bortel 2024-01-24 16:51:27 -05:00
  • 0fc36d872c match to metal impl FSSRepo 2024-01-24 16:45:30 -05:00
  • 972c2adc15 use half2 instead half4 FSSRepo 2024-01-24 16:41:57 -05:00
  • 249dfc005c use __low2float intrinsic function for new quants Engininja2 2024-01-24 15:29:57 -06:00
  • 8bd38fe32d test-backend-ops : test mask parameter of ggml_soft_max_ext Jared Van Bortel 2024-01-24 16:28:41 -05:00
  • 308f279622 kompute : support scale parameter of softmax Jared Van Bortel 2024-01-24 16:16:58 -05:00
  • 1450966071 test-backend-ops : test scale parameter of ggml_soft_max_ext Jared Van Bortel 2024-01-24 16:12:42 -05:00
  • 2852902eda test-backend-ops : add llama test Jared Van Bortel 2024-01-24 14:55:41 -05:00
  • f69ab89520 Merge remote-tracking branch 'origin/master' into sl/micro-batching slaren 2024-01-24 21:12:13 +01:00
  • cad465253d ggml : add tensor flags slaren 2024-01-24 02:44:33 +01:00
  • f2efa6cd98
    wip : simd Georgi Gerganov 2024-01-24 17:06:48 +02:00
  • 2b0f642fec fix f16 mmv, 49 -> 41 failures Jared Van Bortel 2024-01-24 12:47:41 -05:00
  • 1a14099c43 fix q4_0/q4_1 mmv, 65 -> 49 failures Jared Van Bortel 2024-01-24 11:56:43 -05:00
  • 0787b80db8 kompute : remove broken mulrow kernel -> 1 less test failure Jared Van Bortel 2024-01-22 17:42:05 -05:00
  • 2755ae3d10 kompute : fix more dispatch ambiguity -> 12 less failures Jared Van Bortel 2024-01-22 17:04:10 -05:00
  • 08e23fd78c kompute : fix op_mul kernel -> 13 less test failures Jared Van Bortel 2024-01-22 16:08:16 -05:00
  • 0899adf86e kompute : fix get_rows dispatch -> 4 less failures Jared Van Bortel 2024-01-22 14:16:10 -05:00