Commit graph

  • ef49f1bccc
    sampling_temperature_fix John 2024-01-15 02:30:59 +01:00
  • f30978e670
    fix typo in awq-py/README.md Victor Z. Peng 2024-01-14 17:00:11 -08:00
  • d53867e939 pass cpu-architecture arguments only to host code (C;C++) Erik Schultheis 2024-01-14 21:38:12 +02:00
  • e264f2239e perplexity : ignore n_batch, submit whole chunk in one call slaren 2024-01-14 19:49:21 +01:00
  • bd5d473bd4 Temporary change to trigger CI jobs Alex Azarov 2024-01-14 19:45:42 +01:00
  • 53b1348863 Whitespace Paul Tsochantaris 2024-01-14 18:39:31 +00:00
  • 3f787a4d5a Collecting command buffer completions on single thread Paul Tsochantaris 2024-01-14 18:36:55 +00:00
  • 0068da7fef make llama_decode async, sync on get_logits slaren 2024-01-14 05:20:24 +01:00
  • a4239ae82f Try running Metal tests on macOS 13 arm64 Alex Azarov 2024-01-14 18:34:15 +01:00
  • ed36ef8b43 Check for Xcode version before using recommendedMaxWorkingSetSize Alex Azarov 2024-01-14 18:02:00 +01:00
  • 3c4d2a1b78 Only log on iOS and macOS, ignoring tvOS and other platforms Alex Azarov 2024-01-14 17:34:22 +01:00
  • 08b89f7ea6 CUDA: faster dequantize kernels for Q4_0 and Q4_1 Iwan Kawrakow 2024-01-14 17:48:30 +02:00
  • a836c8f534
    llama : fix missing quotes (#4937) b1873 David Pflug 2024-01-14 10:46:00 -05:00
  • c7997bbebe
    llama : fix missing quotes David Pflug 2024-01-14 10:40:17 -05:00
  • e26d2242e0 metal: Log recommendedMaxWorkingSetSize on iOS 16+ Alex Azarov 2024-01-14 16:07:02 +01:00
  • 0308db37a4 Attempt a fix kalomaze 2024-01-14 08:55:00 -06:00
  • b95842ae4e Merge branch 'master' into localised-metal-graph-setup-logic Paul Tsochantaris 2024-01-14 14:32:22 +00:00
  • 467a882fd2
    Add ability to use importance matrix for all k-quants (#4930) b1872 Kawrakow 2024-01-14 16:21:12 +02:00
  • 4176a92cad Replace loop of dispatch_async with dispatch_apply Alex Azarov 2024-01-14 14:54:30 +01:00
  • c4e8eb728a Round away from zero test kalomaze 2024-01-14 07:43:32 -06:00
  • fb5398c8aa
    Merge 2659a180ee into bb0c139247 Abhilash Majumder 2024-01-14 21:07:55 +08:00
  • bb0c139247
    llama : check LLAMA_TRACE env for extra logging (#4929) b1871 Georgi Gerganov 2024-01-14 13:26:53 +02:00
  • 90096a5f6f Add ability to use importance matrix for all k-quants Iwan Kawrakow 2024-01-14 11:58:56 +02:00
  • 0abbe2fcd3
    llama : check LLAMA_TRACE env for extra logging Georgi Gerganov 2024-01-14 11:31:44 +02:00
  • 4c8b870d4e
    llama : minor fix indent Georgi Gerganov 2024-01-14 11:26:47 +02:00
  • 9408cfdad6
    scripts : sync-ggml-am.sh option to skip commits Georgi Gerganov 2024-01-14 11:08:09 +02:00
  • 03c5267490
    llama : use LLAMA_LOG_ macros for logging b1869 Georgi Gerganov 2024-01-14 11:03:19 +02:00
  • a128c38de8
    Fix ffn_down quantization mix for MoE models (#4927) b1868 Kawrakow 2024-01-14 10:53:39 +02:00
  • 00cc67e2e4 Review suggestion Iwan Kawrakow 2024-01-14 10:52:55 +02:00
  • 5f5fe1bd60
    metal : correctly set SIMD support flags on iOS (#4923) b1867 Alex Azarov 2024-01-14 09:44:39 +01:00
  • ac32902a87
    llama : support WinXP build with MinGW 8.1.0 (#3419) b1866 Karthik Kumar Viswanathan 2024-01-14 00:41:44 -08:00
  • dc7bc0cb50 Merge commit '584d674be6' into concedo_experimental Concedo 2024-01-14 16:29:44 +08:00
  • b7a28dfee2
    Merge f81d2058ae into 147b17ac94 Bingxuan Wang 2024-01-14 16:18:12 +08:00
  • 147b17ac94
    2-bit quantizations (#4897) b1865 Kawrakow 2024-01-14 09:45:56 +02:00
  • 807179ec58
    Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) b1864 Kawrakow 2024-01-14 09:44:30 +02:00
  • 121eb06640 Fix the fix Iwan Kawrakow 2024-01-14 09:39:56 +02:00
  • 998b635a17 Fix ffn_down quantization mix for MoE models Iwan Kawrakow 2024-01-14 08:39:10 +02:00
  • e256bfdfff Whitespace Paul Tsochantaris 2024-01-13 23:48:39 +00:00
  • d7c078c416 log a little bit more info on iOS Alex Azarov 2024-01-14 00:22:39 +01:00
  • 04c99e882c Correctly set support_simdgroup_reduction and support_simdgroup_mm on iPhone/iPad Alex Azarov 2024-01-13 23:31:11 +01:00
  • b5f795f326 Metal: Localized logic in ggml_metal_graph_compute, minor performance improvement Paul Tsochantaris 2024-01-13 23:15:28 +00:00
  • 7aa63a2d16 Move _WIN32_WINNT to CMake Option. Bump to CMake 3.14. Add _WIN32_WINNT Guard. Karthik Kumar Viswanathan 2024-01-13 15:05:38 -08:00
  • 76484fbfd3
    sync : ggml b1863 Georgi Gerganov 2024-01-14 00:14:46 +02:00
  • c71d608ce7
    ggml: cache sin/cos for RoPE (#4908) b1862 Johannes Gäßler 2024-01-13 21:41:37 +01:00
  • 81ab469a4c ggml: cache sin/cos for RoPE JohannesGaessler 2024-01-12 23:58:41 +01:00
  • af789e7e93 fix async copy between backends slaren 2024-01-13 20:49:59 +01:00
  • 4be5ef556d
    metal : remove old API (#4919) b1861 Georgi Gerganov 2024-01-13 20:45:45 +02:00
  • 96cf0282cb
    metal : remove old API Georgi Gerganov 2024-01-13 20:18:18 +02:00
  • 0ea069b87b
    server : fix prompt caching with system prompt (#4914) b1860 Georgi Gerganov 2024-01-13 19:31:26 +02:00
  • dbbaf82758 pipeline parallelism demo slaren 2024-01-13 04:13:31 +01:00
  • f172de03f1
    llama : fix detokenization of non-special added-tokens (#4916) b1859 Georgi Gerganov 2024-01-13 18:47:38 +02:00
  • 2d57de5255
    metal : disable log for loaded kernels (#4794) b1858 Georgi Gerganov 2024-01-13 18:46:37 +02:00
  • df845cc982
    llama : minimize size used for state save/load (#4820) b1857 David Friehs 2024-01-13 17:29:43 +01:00
  • 6b48ed0893
    workflows: unbreak nix-build-aarch64, and split it out (#4915) b1856 Someone 2024-01-13 16:29:16 +00:00
  • f6185f9bba
    Fix detokenization of non-special added-tokens goerch 2023-10-24 13:46:44 +02:00
  • 722d33f34e
    main : add parameter --no-display-prompt (#4541) b1855 Yann Follet 2024-01-14 00:09:08 +08:00
  • 86618ff80e
    Merge branch 'master' into no-display-prompt Georgi Gerganov 2024-01-13 18:08:36 +02:00
  • 70af9fad4a
    workflows: unbreak nix-build-aarch64, and split it out Someone Serge 2024-01-13 16:08:35 +00:00
  • c30b1ef39a
    gguf : fix potential infinite for-loop (#4600) b1854 texmex76 2024-01-13 17:06:20 +01:00
  • b38b5e93ae
    metal : refactor kernel loading code (#4794) b1853 Georgi Gerganov 2024-01-13 18:03:45 +02:00
  • f81e467a47
    Merge branch 'master' into gg/metal-feature-set Georgi Gerganov 2024-01-13 17:49:27 +02:00
  • 9ec53ba04e
    server : fix prompt caching with system prompt Georgi Gerganov 2024-01-13 17:45:21 +02:00
  • 7dc78764e2
    compare-llama-bench: tweak output format (#4910) Johannes Gäßler 2024-01-13 15:52:53 +01:00
  • 356327feb3
    server : fix deadlock that occurs in multi-prompt scenarios (#4905) b1851 Ziad Ben Hadj-Alouane 2024-01-13 09:20:46 -05:00
  • bd77a48037
    Do not default to Repetition Penalty 1.1 (#615) kalomaze 2024-01-13 08:20:02 -06:00
  • ee8243adaa
    server : fix crash with multimodal models without BOS token (#4904) b1850 makomk 2024-01-13 14:16:11 +00:00
  • 9998ecd191
    llama : add phixtral support (wip) gg/add-phixtral Georgi Gerganov 2024-01-13 14:19:13 +02:00
  • 15ebe59210
    convert : update phi-2 to latest HF repo (#4903) b1849 Georgi Gerganov 2024-01-13 13:44:37 +02:00
  • 1fb563ebdc
    py : try to fix flake stuff gg/update-phi2-convert Georgi Gerganov 2024-01-13 13:34:08 +02:00
  • a7b399432e compare-llama-bench: tweak output format JohannesGaessler 2024-01-13 11:51:40 +01:00
  • f5205f85b5 Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B Iwan Kawrakow 2024-01-13 08:51:55 +02:00
  • 0fd29f8929 Fixed some issues and bugs of the grammar generator. Imporved Documentation Maximilian Winter 2024-01-13 05:52:30 +01:00
  • f26c51b0d1 * dont ruint all whitespace ziadb 2024-01-12 20:34:33 -05:00
  • 5805fdaae2 * fix deadlock ziadb 2024-01-12 20:31:48 -05:00
  • 3b36f2068e server: fix crash in multimodal models with add_bos_token = false Aidan Thornton 2024-01-12 23:38:56 +00:00
  • fe252237a3
    convert : update phi-2 to latest HF repo Georgi Gerganov 2024-01-12 22:48:47 +02:00
  • 51d3f485cd
    Merge c9c4e1f077 into de473f5f8e ct-clmsn 2024-01-13 07:16:00 +11:00
  • de473f5f8e
    sync : ggml b1848 Georgi Gerganov 2024-01-12 22:02:43 +02:00
  • f238461236
    ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) Georgi Gerganov 2024-01-12 14:02:30 +02:00
  • fa5c1fb44a
    backend_sched : fix assignments slaren 2024-01-12 20:38:34 +01:00
  • 52ee4540c0
    examples : add pydantic models to GBNF grammar generator (#4883) Maximilian Winter 2024-01-12 20:46:45 +01:00
  • 3fe81781e3
    CUDA: faster q8_0 -> f16 dequantization (#4895) b1844 Johannes Gäßler 2024-01-12 20:38:54 +01:00
  • e7e4df031b
    llama : ggml-backend integration (#4766) b1843 slaren 2024-01-12 20:07:38 +01:00
  • 584d674be6
    llama : remove redundant assert for StableLM (#4901) b1842 Georgi Gerganov 2024-01-12 20:54:12 +02:00
  • 5f719de77c Renamed file and fixed grammar generator issue. Maximilian Winter 2024-01-12 19:50:24 +01:00
  • f600d0d8d9 CUDA: faster q8_0 -> f16 dequantization JohannesGaessler 2024-01-07 12:12:03 +01:00
  • 149d00ffd0 Update pydantic-models-to-grammar.py Maximilian Winter 2024-01-12 19:26:28 +01:00
  • 930f907d3e
    export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) b1841 Daniel Bevenius 2024-01-12 18:54:53 +01:00
  • 3507238cca Renamed module and imported it. Maximilian Winter 2024-01-12 18:39:48 +01:00
  • 641214b8e8 Update pydantic-models-to-grammar-examples.py Maximilian Winter 2024-01-12 18:28:35 +01:00
  • 53ae0dd862 use a host buffer for the cpu compute buffer for faster copies to the gpu slaren 2024-01-12 17:40:55 +01:00
  • 458674c022 Merge remote-tracking branch 'origin/master' into sl/backend-sched slaren 2024-01-12 17:25:54 +01:00
  • aa57604ce2 Update pydantic_models_to_grammar.py Maximilian Winter 2024-01-12 17:24:49 +01:00
  • 2b42f5254c Refactored Grammar Generator Maximilian Winter 2024-01-12 17:13:26 +01:00
  • f342143e92 imatrix: guard even more against low-bit quantization misuse Iwan Kawrakow 2024-01-12 17:41:07 +02:00
  • d5598f7ea2 imatrix: also guard against Q2_K_S quantization without importance matrix Iwan Kawrakow 2024-01-12 17:09:56 +02:00
  • 04bd3ef801
    Merge 8dd90131d3 into e790eef21c Johannes Gäßler 2024-01-12 16:40:14 +02:00
  • 75f4cbf232 imatrix: Add Q2_K quantization Iwan Kawrakow 2024-01-12 16:13:54 +02:00
  • 5d33d3cd60
    Merge branch 'master' into gg/metal-feature-set Georgi Gerganov 2024-01-12 15:17:01 +02:00
  • cd104ff688
    export-lora: use LLAMA_FILE_MAGIC_GGLA Daniel Bevenius 2024-01-12 14:07:01 +01:00