Commit graph

  • cb9ceff966 minor cleanup Jared Van Bortel 2024-01-19 15:05:01 -05:00
  • 33e8d6abe1 kompute : fix ggml_add kernel (#5027) Georgi Gerganov 2024-01-19 00:22:13 +02:00
  • 2f6a279e29 fix supported ops for kompute backend Jared Van Bortel 2024-01-18 15:32:55 -05:00
  • 07530731ba never try to evaluate an empty command buffer Jared Van Bortel 2024-01-18 16:11:00 -05:00
  • 729e1a4cc1 sync op_rope_f16 with recent op_rope_f32 changes Jared Van Bortel 2024-01-18 11:56:00 -05:00
  • e9d5223da3 actually fix this assertion Jared Van Bortel 2024-01-18 11:48:27 -05:00
  • 9431026a84 clean up old backend code Jared Van Bortel 2024-01-18 11:48:12 -05:00
  • d6bd471693 kompute : fix rope_f32 and scale ops (#5008) Georgi Gerganov 2024-01-18 18:49:39 +02:00
  • 76474a7c0d kompute : ignore exceptions in ggml_vk_available_devices (#12) Jared Van Bortel 2024-01-17 13:47:03 -05:00
  • cad72e1252 add sanity check and fix kompute teardown order Jared Van Bortel 2024-01-17 10:09:27 -05:00
  • 070919dbf7 attempt to get test-backend-ops working Jared Van Bortel 2024-01-10 16:14:03 -05:00
  • 5f660dada8 fix assertion failure Jared Van Bortel 2024-01-10 13:44:34 -05:00
  • 298d6eec09 kompute : initial attempt at ggml-backend v2 support Jared Van Bortel 2024-01-09 16:24:10 -05:00
  • 7c527eb568 Merge commit 'e7e4df031b' into HEAD Jared Van Bortel 2024-01-24 13:39:17 -05:00
  • 3a15a01330 Implement rope_neox op 0cc4m 2024-01-24 19:27:49 +01:00
  • d64bb815e8 add support for Orion-14B(https://huggingface.co/OrionStarAI/Orion-14B-Chat) lixiaopu 2024-01-25 01:58:35 +08:00
  • 6416821499 fix equivalent fp16 math functions, compiler error 'undefined' FSSRepo 2024-01-24 10:57:05 -05:00
  • baa70cd76b Fix Q3_K_XS for MoE models Iwan Kawrakow 2024-01-24 17:01:32 +02:00
  • 0635f844c9 fix wrong code luoyu-intel 2024-01-24 22:00:24 +08:00
  • 5bb93d41b7 fix batch error luoyu-intel 2024-01-24 21:56:47 +08:00
  • eef5faae18 pass batch offset for F16 src1 luoyu-intel 2024-01-24 21:46:48 +08:00
  • 6ccbd1777a
    wip gg/flash-attn-wip3 Georgi Gerganov 2024-01-24 15:45:04 +02:00
  • 5600118221 fix src1->type==F16 bug. luoyu-intel 2024-01-24 21:29:21 +08:00
  • 0e235fb816 fix no_new_line Meng, Hengyu 2024-01-24 13:16:38 +00:00
  • 26dde91aa6 Bucket sort: another minor improvement Iwan Kawrakow 2024-01-24 15:03:07 +02:00
  • e7c1f64c50 Bucket sort: slightly better version Iwan Kawrakow 2024-01-24 14:49:13 +02:00
  • 18742f7afc
    rm cpu blas duplicate Abhilash Majumder 2024-01-24 18:14:45 +05:30
  • 3aabd8a274 revert no mmq Meng, Hengyu 2024-01-24 12:37:29 +00:00
  • c9b316c78f nix-shell: use addToSearchPath b1963 Michael Hueschen 2024-01-22 16:44:10 -07:00
  • bf63d695b8 nix: add cc to devShell LD_LIBRARY_PATH Michael Hueschen 2024-01-22 03:17:05 -07:00
  • 8dd1b60ac6 add prefix in func name jianyuzh 2024-01-24 20:34:44 +08:00
  • d07a88d448 fix indent Meng, Hengyu 2024-01-24 12:29:17 +00:00
  • 90c1db82cd Initial bucket sort Iwan Kawrakow 2024-01-24 14:26:01 +02:00
  • 96186a742a revert hip cmake changes Meng, Hengyu 2024-01-24 12:23:23 +00:00
  • fb15de38ef revert unrelated changed in cuda cmake remove useless nommq fix typo of GGML_USE_CLBLAS_SYCL Meng, Hengyu 2024-01-24 12:18:30 +00:00
  • 67de350e79 fix ci cases for unsupported data type jianyuzh 2024-01-24 19:51:08 +08:00
  • 1387ea2117
    llama : pre-allocate input tensors in a separate buffer (#5100) b1961 slaren 2024-01-24 12:48:14 +01:00
  • da23b56f25
    wip : no ic 8 step gg/flash-attn-wip4 Georgi Gerganov 2024-01-24 13:25:34 +02:00
  • d89e0e09d7 New Feature: 1. Sum_Rows: fix cuda kernel overflow fix block shape error when nrows too big 2. Im2Col: Support Batch in cuda Support f32 to f32 both in cpu && cuda 3. DepthWiseConv: Support by Im2Col && MulMat 4. Pool_2d: Supoort avg pooling in cuda 5. HardSigmoid: Imp in cuda 6. HardSwish: Imp in cuda zhangjidong 2024-01-24 18:57:09 +08:00
  • 238ec31aeb
    Merge branch 'master' into sycl Abhilash Majumder 2024-01-24 15:14:03 +05:30
  • 22e1b45c02 fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml jianyuzh 2024-01-24 17:24:24 +08:00
  • af3eda9c77
    wip Georgi Gerganov 2024-01-24 11:18:24 +02:00
  • ec5c8bc0c9 fix conflict jianyuzh 2024-01-24 16:34:41 +08:00
  • 5cbdba693d
    wip Georgi Gerganov 2024-01-24 10:16:05 +02:00
  • 799af05619 enable llama_f16 in ci Meng, Hengyu 2024-01-24 07:45:36 +00:00
  • 83c310e00c add MobileVLM 1.7B/3B to the supported models list Chenxiaotao03 2024-01-24 15:14:17 +08:00
  • 04a46c46f8 rm unused strategy jianyuzh 2024-01-24 14:55:59 +08:00
  • 7babd76903 fix action ID format issue jianyuzh 2024-01-24 14:53:04 +08:00
  • 7a44a95b08 update CI/action for sycl code, fix CI error of repeat/dup jianyuzh 2024-01-24 14:39:46 +08:00
  • 5a69780ded cuda : fix 2-bit quants on amd hip Engininja2 2024-01-23 20:23:03 -06:00
  • 816f480e98 export funciton print_sycl_devices(), mv class dpct definition to source file jianyuzh 2024-01-24 10:10:45 +08:00
  • 963a122398 backend : add event API slaren 2024-01-21 17:55:12 +01:00
  • 91b1461030 mv internal function to .cpp file jianyuzh 2024-01-24 09:41:25 +08:00
  • 498121b11f ren as review comments jianyuzh 2024-01-24 09:20:42 +08:00
  • fccab82f5e
    Update clip.cpp John 2024-01-24 01:43:00 +01:00
  • 0978977e56 Update server.cpp Maximilian Winter 2024-01-24 01:32:33 +01:00
  • d7ac0d3d06 Ported self extension to server example Maximilian Winter 2024-01-24 01:24:57 +01:00
  • 66477a4fed check for one or zero candidates case in llama_sample_entropy l3utterfly 2024-01-24 09:09:37 +09:00
  • 36103411f3
    reformat 't' case in llama_sample_queue l3utterfly 2024-01-24 09:05:51 +09:00
  • 3a3552761b Use bucket sort for token logits JohannesGaessler 2024-01-23 22:31:37 +01:00
  • eaa7722abe llama : pre-allocate input tensors in a separate buffer slaren 2024-01-23 23:35:00 +01:00
  • 035c4f01e6
    wip Georgi Gerganov 2024-01-24 00:01:54 +02:00
  • 6374bc5779 cuda: port metal version flash_attn_ext FSSRepo 2024-01-23 16:42:53 -05:00
  • 62ca49e8c9 scripts : update usage text for ci-run.sh crasm 2024-01-23 16:22:27 -05:00
  • 25f1d79fb2 Fixes crasm 2024-01-23 16:01:03 -05:00
  • 06c2d0d117
    wip gg/flash-attn-wip2 Georgi Gerganov 2024-01-23 18:27:54 +02:00
  • bc5e64b1bf propagate buffer usage in multi buffers slaren 2024-01-23 14:47:52 +01:00
  • bdf770b3fe examples : make pydantic scripts pass mypy and support py3.8 Jared Van Bortel 2024-01-23 14:44:37 -05:00
  • a689b02ad3 Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp into flash-attn-cuda FSSRepo 2024-01-23 13:51:59 -05:00
  • 8f36df8fc9 server: fix a race condition cause by "request_completion" ngxson 2024-01-23 18:13:38 +01:00
  • 3742b6c706 Fix single queue logic 0cc4m 2024-01-23 17:45:50 +01:00
  • 566a178c8f Handle devices with only a single queue 0cc4m 2024-01-23 17:34:45 +01:00
  • 3bfb846d6a fix conflict jianyuzh 2024-01-23 23:45:56 +08:00
  • c0cfcaf66c
    Update llama.cpp John 2024-01-23 16:39:16 +01:00
  • c7e745e6f3 restore hip dependency abhilash1910 2024-01-23 07:35:31 -08:00
  • 0bd702984c Merge branch 'shangyu' of https://github.com/luffy06/llama.cpp into shangyu luffy06 2024-01-23 23:25:25 +08:00
  • d6fc1a0309 fix mac build abhilash1910 2024-01-23 07:19:30 -08:00
  • 0dbd295e39
    Update llava-cli.cpp John 2024-01-23 15:28:16 +01:00
  • 51462f1f23
    Update examples/llava/clip.cpp John 2024-01-23 15:26:26 +01:00
  • 0da77c8d2a add print functions and analyzing codes luffy06 2024-01-23 22:03:56 +08:00
  • 26d607608d
    metal : disable support for MUL_MAT F32 x F16 b1960 Georgi Gerganov 2024-01-23 15:50:56 +02:00
  • 979a9bf1be CUDA: added int8 tensor core matrix multiplication, 4279 t/s JohannesGaessler 2024-01-03 20:58:58 +01:00
  • 44879ee885
    Additional KL-divergence statistics (#5081) b1959 Kawrakow 2024-01-23 15:17:20 +02:00
  • 9ecdd12e95
    CUDA: more info when no device code (#5088) b1958 Johannes Gäßler 2024-01-23 13:31:56 +01:00
  • 5f83a12382
    fix blas matmul function Abhilash Majumder 2024-01-23 17:56:37 +05:30
  • b42a32d31a replace tab by space jianyuzh 2024-01-23 20:20:16 +08:00
  • 89758723c7
    minor : clean-up some warnings and style (#5094) b1957 Georgi Gerganov 2024-01-23 14:12:57 +02:00
  • 1d650ce6b4
    ggml : add comment Georgi Gerganov 2024-01-23 14:12:32 +02:00
  • 756c4accaf skip build sycl tool for other code path jianyuzh 2024-01-23 20:06:08 +08:00
  • 88f64b7d3d
    Remove unused headers Abhilash Majumder 2024-01-23 17:34:57 +05:30
  • d097e2a4ef
    editor format fix Abhilash Majumder 2024-01-23 17:32:42 +05:30
  • 067ef868e9
    convert : fix byte tokens for --vocab-type hfft Romain “Artefact2” Dal Maso 2024-01-22 19:14:26 +01:00
  • be31379ef8
    format fixes Abhilash Majumder 2024-01-23 14:47:30 +05:30
  • bd716b2594
    format fixes Abhilash Majumder 2024-01-23 14:45:36 +05:30
  • 1ddaf44c30 editor config format abhilash1910 2024-01-23 01:03:34 -08:00
  • ea88e2a497
    minor : clean-up some warnings and style Georgi Gerganov 2024-01-23 09:35:46 +02:00
  • 2bed4aa3f3
    devops : add intel oneapi dockerfile (#5068) b1956 Xuan Son Nguyen 2024-01-23 08:11:39 +01:00
  • 0b59931c84 perplexity: a better organized KL-divergence statistics output Iwan Kawrakow 2024-01-23 09:02:40 +02:00
  • 1c953c10a0 Check for maintenance4 support before using it 0cc4m 2024-01-23 08:00:39 +01:00
  • 125d03a503
    llama.vim : added api key support (#5090) Michael Coppola 2024-01-23 01:51:27 -05:00