Commit graph

  • 4fb52843bb
    ci : rearrange output Georgi Gerganov 2024-01-17 15:27:34 +02:00
  • 200dcaf799
    simple : restore examples, imatrix will serve as a demo Georgi Gerganov 2024-01-17 15:24:18 +02:00
  • 10b25e0388
    ci : add imatrix test Georgi Gerganov 2024-01-17 15:10:38 +02:00
  • a722d05a87
    imatrix : fix ggml_mul_mat_id hanlding Georgi Gerganov 2024-01-17 14:43:35 +02:00
  • 2b3a665d39
    llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996) Kawrakow 2024-01-17 12:36:37 +02:00
  • 9fd1e83f6d Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 ik/better_q2_k_s Iwan Kawrakow 2024-01-17 12:14:19 +02:00
  • 49bafe0986
    tests : avoid creating RNGs for each tensor gg/iq2-refactor-and-tests Georgi Gerganov 2024-01-17 10:40:55 +02:00
  • 7563293665
    metal : remove unnecessary nil check (#4986) Paul Tsochantaris 2024-01-17 08:07:24 +00:00
  • f46c0c1b0e
    llama : fix copy/paste error in llama_sampling_params comment (#4994) David Renshaw 2024-01-17 02:17:50 -05:00
  • 02d2e38949 Fix missing event cast 0cc4m 2024-01-17 06:27:40 +01:00
  • 654c3f9591 update analyze codes luffy06 2024-01-17 13:26:06 +08:00
  • 8af5c65f0d fix copy/paste error in llama_sampling_params doc comment David Renshaw 2024-01-16 23:31:56 -05:00
  • 80350daeda
    Update llama.cpp John 2024-01-17 00:03:58 +01:00
  • 196348de53
    Update llama.cpp John 2024-01-16 23:16:45 +01:00
  • 8eb8fd94e2
    tests : avoid creating RNGs for each Q tensor Georgi Gerganov 2024-01-16 23:24:05 +02:00
  • b7ddc8bf12
    cuda : fix out-of-bounds-access in mul_mat_vec_q Georgi Gerganov 2024-01-16 23:06:18 +02:00
  • 36feaeb401
    ci : enable LLAMA_CUBLAS=1 for CUDA nodes Georgi Gerganov 2024-01-16 22:32:22 +02:00
  • c3290d29e0 Switch from semaphore-synchronized multiple command buffers per op to single command buffer for multiple ops, whole graph if possible 0cc4m 2024-01-16 21:30:14 +01:00
  • e9a5d54b7d
    cuda : update supports_op for IQ2 Georgi Gerganov 2024-01-16 22:13:17 +02:00
  • bc0bb3009c
    ggml : add IQ2 to test-backend-ops + refactoring Georgi Gerganov 2024-01-14 13:15:30 +02:00
  • 5c99960901
    py : remove unnecessary hasattr (#4903) Georgi Gerganov 2024-01-16 20:59:31 +02:00
  • 7434324414 Merge branch 'master' into removing-extraneous-nil-check Paul Tsochantaris 2024-01-16 18:14:51 +00:00
  • 5137bc0052 Removing unnessecary nil check Paul Tsochantaris 2024-01-16 18:09:25 +00:00
  • bee938da74
    nix: remove nixConfig from flake.nix (#4984) b1893 Philip Taron 2024-01-16 09:56:21 -08:00
  • cec8a48470
    finetune : add training data file to log message (#4979) b1892 Daniel Bevenius 2024-01-16 18:54:24 +01:00
  • 334a835a1c
    ggml : importance matrix support for legacy quants (#4969) b1891 Kawrakow 2024-01-16 19:51:26 +02:00
  • 4feb4b33ee
    examples : add complete parallel function calling example (#4974) Maximilian Winter 2024-01-16 18:41:42 +01:00
  • 959ef0c0df
    perplexity : fix kv cache handling for hellaswag (#4981) b1889 Georgi Gerganov 2024-01-16 19:34:54 +02:00
  • a41c736c2f
    nix: remove nixConfig from flake.nix Philip Taron 2024-01-16 09:23:53 -08:00
  • c37b3474e6
    flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920) Georgi Gerganov 2024-01-16 19:13:54 +02:00
  • 7c39b02e11
    flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs github-actions[bot] 2024-01-14 00:18:19 +00:00
  • 158f8c9e21
    metal : localized logic in ggml_metal_graph_compute (#4924) b1887 Paul Tsochantaris 2024-01-16 17:05:19 +00:00
  • ae275f2d0a Reduce diff noise Paul Tsochantaris 2024-01-16 17:01:34 +00:00
  • 84a2b83849 Merge branch 'master' into localised-metal-graph-setup-logic Paul Tsochantaris 2024-01-16 16:56:49 +00:00
  • 27f5fc6d01
    perplexity : fix kv cache handling for hellaswag Georgi Gerganov 2024-01-16 18:49:55 +02:00
  • 0df8b949ce
    finetune: add training data file to log message Daniel Bevenius 2024-01-16 15:34:48 +01:00
  • 94f330ad45
    falcon arch fix for tied output embeddings John 2024-01-16 15:21:13 +01:00
  • 862f5e41ab
    android : introduce starter project example (#4926) b1886 Neuman Vong 2024-01-17 00:47:34 +11:00
  • 3a48d558a6
    metal : replace loop of dispatch_async with dispatch_apply (#4934) b1885 Alex Azarov 2024-01-16 14:41:27 +01:00
  • ef834b0070
    minor : fix build Georgi Gerganov 2024-01-16 15:40:22 +02:00
  • 48d74f4394
    Update ggml-metal.m Georgi Gerganov 2024-01-16 15:36:26 +02:00
  • 7c8d3abd1a
    metal : log recommendedMaxWorkingSetSize on iOS 16+ (#4936) b1884 Alex Azarov 2024-01-16 14:33:02 +01:00
  • d84f5c2e92
    Merge branch 'master' into azarovalex/recommendedMaxWorkingSetSize Georgi Gerganov 2024-01-16 15:28:09 +02:00
  • 26abc17010 cuda: fix compile error in jetson platform KyL0N 2024-01-16 22:06:27 +09:00
  • d92351e23d
    py : fix BPE vocab conversion Georgi Gerganov 2024-01-16 14:47:07 +02:00
  • 2a7d26bb12 Update pydantic-models-to-grammar-examples.py Maximilian Winter 2024-01-16 13:46:53 +01:00
  • 5020d12e38 removed trailing whitespace l3utterfly 2024-01-16 21:36:59 +09:00
  • da13e5566d Merge branch 'master' into localised-metal-graph-setup-logic Paul Tsochantaris 2024-01-16 12:26:28 +00:00
  • 122ed4840c
    examples : fix and improv docs for the grammar generator (#4909) Maximilian Winter 2024-01-16 13:10:48 +01:00
  • a1372737e0
    py : pad with unknown tokens when data is missing Georgi Gerganov 2024-01-16 14:03:57 +02:00
  • ea15108462 implemented dynamic temperature sampling from koboldcpp l3utterfly 2024-01-16 20:58:41 +09:00
  • 9b464b4e81
    py : fix missing added_tokens_dict for SPM vocab Georgi Gerganov 2024-01-16 13:38:54 +02:00
  • 874d9919e3 Merge remote-tracking branch 'upstream/master' into pydantic-grammar-generator Maximilian Winter 2024-01-16 12:41:18 +01:00
  • 4810d55f7d Update pydantic_models_to_grammar.py Maximilian Winter 2024-01-16 12:38:17 +01:00
  • a0b3ac8c48
    ggml : introduce GGML_CALL function annotation (#4850) b1882 Justine Tunney 2024-01-16 03:16:33 -08:00
  • d75c232e1d
    finetune : use LLAMA_FILE_MAGIC_GGLA (#4961) b1881 Daniel Bevenius 2024-01-16 12:14:19 +01:00
  • 58356e6bba
    metal : create autorelease pool during library build Georgi Gerganov 2024-01-16 12:46:32 +02:00
  • e0324285a5
    speculative : threading options (#4959) b1880 stduhpf 2024-01-16 12:04:32 +01:00
  • 448b995a84 update analyze codes luffy06 2024-01-16 17:33:34 +08:00
  • 45c292cf28 update analyze codes luffy06 2024-01-16 17:31:18 +08:00
  • 012ecec506
    backend : avoid double-ask callback calls Georgi Gerganov 2024-01-16 11:02:51 +02:00
  • 0c96c72150
    llama : fix callback placement in llama_context_params Georgi Gerganov 2024-01-16 10:52:38 +02:00
  • aa16b5445f
    simple : no need for ggml_is_contiguous + fix bool parse Georgi Gerganov 2024-01-16 10:52:08 +02:00
  • bb9abb5cd8 imatrix: guard Q4_0/Q5_0 against ffn_down craziness ik/imatrix_legacy_quants Iwan Kawrakow 2024-01-16 09:56:05 +02:00
  • 9a1d0c8930
    Merge bd482a8b92 into 3e5ca7931c MaggotHATE 2024-01-16 15:27:33 +08:00
  • 6f9ec42a27 imatrix: adding support for legacy quants Iwan Kawrakow 2024-01-16 08:37:56 +02:00
  • 523fc3ec67 Remove unused tests Neuman Vong 2024-01-16 15:15:31 +11:00
  • 87015f013a Rename CI prop to skip-armeabi-v7a Neuman Vong 2024-01-16 15:15:00 +11:00
  • abfc5188e9 Sync bench code Neuman Vong 2024-01-16 11:10:35 +11:00
  • 943bba2e5d Only build arm64-v8a in CI Neuman Vong 2024-01-15 18:56:30 +11:00
  • 9a049fc771 Set NDK version Neuman Vong 2024-01-15 16:09:42 +11:00
  • 6014a7104f Add github workflow Neuman Vong 2024-01-15 12:29:32 +11:00
  • 0f9ee09f1a Introduce starter project for Android Neuman Vong 2024-01-14 18:29:04 +11:00
  • 6215c33a2b add print codes luffy06 2024-01-16 10:29:42 +08:00
  • b986462a4e Fix linting Josh XT 2024-01-15 19:41:32 -05:00
  • d73960721a Add self extend support to server Josh XT 2024-01-15 19:36:21 -05:00
  • 11a5d56407
    Introduce GGML_CALL function annotation Justine Tunney 2024-01-12 19:00:22 -08:00
  • 3e5ca7931c
    pass cpu-architecture arguments only to host code (C;C++) (#4943) b1879 ngc92 2024-01-15 20:40:48 +02:00
  • 2028ec02fc
    finetune : use LLAMA_FILE_MAGIC_GGLA Daniel Bevenius 2024-01-15 19:08:10 +01:00
  • 78d9bd6a7a fix trailing whitespace Stéphane du Hamel 2024-01-15 18:33:37 +01:00
  • f4fe6333d8 speculative: revert default behavior when -td is unspecified Stéphane du Hamel 2024-01-15 18:11:54 +01:00
  • 49cbf3ec4d accept -td and -tbd args Stéphane du Hamel 2024-01-15 18:05:39 +01:00
  • fd8090b71f fix usage format Stéphane du Hamel 2024-01-15 18:02:24 +01:00
  • 92edbe48b8 speculative: expose draft threading Stéphane du Hamel 2024-01-15 18:00:01 +01:00
  • 0b2fca9a9f
    imatrix : offload to GPU support Georgi Gerganov 2024-01-15 16:18:11 +02:00
  • e0493800ce
    simple : fix Georgi Gerganov 2024-01-15 16:43:46 +02:00
  • e1b1db9f09
    simple : do not perform tensor data copy if not needed Georgi Gerganov 2024-01-15 16:42:16 +02:00
  • 83f3d7a83c
    backend : clean-up the implementation Georgi Gerganov 2024-01-15 15:52:41 +02:00
  • 01b6f68a00
    backend : group nodes in a single compute when user don't need them Georgi Gerganov 2024-01-14 17:30:22 +02:00
  • 65648b341f
    backend : add eval callback Georgi Gerganov 2024-01-14 16:48:16 +02:00
  • 4483396751
    llama : apply classifier-free guidance to logits directly (#4951) b1878 David Friehs 2024-01-15 14:06:52 +01:00
  • d9aa4ffa6e
    awq-py : fix typo in awq-py/README.md (#4947) Victor Z. Peng 2024-01-15 04:41:46 -08:00
  • 1804238e3f update colab Concedo 2024-01-15 20:32:50 +08:00
  • ddb008d845
    cuda : fix dequantize kernel names (#4938) b1876 Georgi Gerganov 2024-01-15 13:27:00 +02:00
  • ea6cdccea1 MobileVLM native implementation Chenxiaotao03 2024-01-15 15:30:54 +08:00
  • 2faaef3979
    llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950) b1875 Kawrakow 2024-01-15 10:09:38 +02:00
  • 3a38ee62ed llama : apply classifier-free guidance to logits directly David Friehs 2024-01-15 07:42:09 +01:00
  • dccaec76ab The check for 256 divisibility was missing for IQ2_XS, IQ2_XXS Iwan Kawrakow 2024-01-15 07:55:55 +02:00
  • 4a3156de2f
    CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) b1874 Kawrakow 2024-01-15 07:48:06 +02:00
  • b6406691c9
    Merge eda5614f41 into a836c8f534 Jay 2024-01-15 13:11:03 +08:00