Commit graph

  • e790eef21c
    llama.swiftui : update models layout (#4826) b1840 Zay 2024-01-12 05:48:00 -07:00
  • 5537d9d36b
    gitignore : imatrix Georgi Gerganov 2024-01-12 14:33:21 +02:00
  • 1b280c9fff
    CUDA: fix softmax compile for old CUDA versions (#4862) b1838 Johannes Gäßler 2024-01-12 12:30:41 +01:00
  • 3cabe80630
    llama : fix typo "imp_embd" -> "inp_embd" b1837 Georgi Gerganov 2024-01-12 13:10:19 +02:00
  • 4315a94366
    common : streamline the formatting of help (#4890) b1836 howlger 2024-01-12 12:05:32 +01:00
  • f28bc25ede
    Update common/common.cpp Georgi Gerganov 2024-01-12 13:05:05 +02:00
  • 2d00741e12
    py : fix lint (#4889) Georgi Gerganov 2024-01-12 13:03:38 +02:00
  • f445c0e68c
    llama : fix llm_build_k_shift to use correct n_rot (#4889) b1834 Georgi Gerganov 2024-01-12 13:01:56 +02:00
  • eea19039fc
    convert : fix persimmon conversion to write correct n_rot Georgi Gerganov 2024-01-12 13:00:51 +02:00
  • 8da2b25b0b imatrix: WIP Iwan Kawrakow 2024-01-12 12:53:16 +02:00
  • 39b468a411 common : streamline the formatting of help howlger 2024-01-12 10:35:34 +01:00
  • 0cb764e4ab
    llama : always use hparams.n_rot for ggml_rope_custom Georgi Gerganov 2024-01-12 11:21:45 +02:00
  • f8e08e6c59 CUDA: fix softmax compile for old CUDA versions JohannesGaessler 2024-01-10 18:27:00 +01:00
  • ff0899c9b3
    llama : fix llm_build_k_shift to use correct n_rot Georgi Gerganov 2024-01-12 10:46:39 +02:00
  • e9372e4098 imatrix: load Iwan Kawrakow 2024-01-12 09:41:44 +02:00
  • 3a8e6d07c3
    Merge 4f8d62e444 into 326b418b59 akawrykow 2024-01-12 00:46:53 -06:00
  • 326b418b59
    Importance Matrix calculation (#4861) b1833 Kawrakow 2024-01-12 06:59:57 +01:00
  • b87effd261
    Update examples/imatrix/imatrix.cpp Kawrakow 2024-01-12 06:59:43 +01:00
  • 1e7694eee7 fix opencl slaren 2024-01-12 04:16:15 +01:00
  • e73009ea51 use async memcpys to copy the graph outputs to the CPU slaren 2024-01-12 03:53:39 +01:00
  • 23c14ef53e use async copy and compute to improve multi-gpu performance slaren 2024-01-12 00:50:00 +01:00
  • ebca51be4f Added some comments for usage Maximilian Winter 2024-01-12 01:52:45 +01:00
  • 1d118386fe
    server : fix infill when prompt is empty (#4833) b1832 Georgi Gerganov 2024-01-11 23:23:49 +02:00
  • 5becef98a0 Create pydantic-models-to-grammar.py Maximilian Winter 2024-01-11 22:09:48 +01:00
  • 7edefbd79c
    main : better name for variable n_print (#4874) b1831 Georgi Gerganov 2024-01-11 22:46:26 +02:00
  • 3ca63b4538
    main : disable token count by default (#4874) b1830 Georgi Gerganov 2024-01-11 22:43:05 +02:00
  • b037787548
    swift : track ggml release branch (#4867) b1829 Georgi Gerganov 2024-01-11 21:58:28 +02:00
  • 469e75d0a3
    llama : restore intended k-quants mixes for MoE models (#4872) b1828 Kawrakow 2024-01-11 20:43:15 +01:00
  • 31fb4d8ebe
    Merge branch 'master' into ik/restore_k-quants_for_MoE Georgi Gerganov 2024-01-11 21:41:22 +02:00
  • 49662cbed3
    ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) b1827 Kawrakow 2024-01-11 20:39:39 +01:00
  • 3ba5b8ca8e
    swift : pin ggml commit + remove ggml.h from spm-headers (#4878) b1826 Georgi Gerganov 2024-01-11 21:31:31 +02:00
  • 15bba69bf6
    Merge e692c2d887 into 4330bd83fe Johannes Gäßler 2024-01-12 01:28:40 +06:00
  • 4330bd83fe
    server : implement credentialed CORS (#4514) b1825 Laura 2024-01-11 19:02:48 +01:00
  • 1c0399bdbf Merge branch 'master' into server-credentialed-cors-2 Laura 2024-01-11 18:56:11 +01:00
  • af9bf475e2 Move validate_api_key up so it is defined before its first usage Laura 2024-01-11 18:54:19 +01:00
  • 27379455c3
    server : support for multiple api keys (#4864) b1824 Michael Coppola 2024-01-11 12:51:17 -05:00
  • fd0b505037 Merge branch 'master' into server-credentialed-cors-2 Laura 2024-01-11 18:48:24 +01:00
  • a5849f2e1b Add link to Dart binding for llama.cpp adel boussaken 2024-01-11 18:43:26 +01:00
  • eab6795006
    server : add LOG_INFO when model is successfully loaded (#4881) b1823 Behnam M 2024-01-11 12:41:39 -05:00
  • d8d90aa343
    ci: nix-flake-update: new token with pr permissions (#4879) b1822 Someone 2024-01-11 17:22:34 +00:00
  • 1eebbd6d0c
    used LOG_INFO after successful model loading Behnam M 2024-01-11 12:15:59 -05:00
  • d0375a2ea6
    Merge branch 'ggerganov:master' into master Behnam M 2024-01-11 12:12:39 -05:00
  • bf1fc25d2b
    ci : fix token ID Georgi Gerganov 2024-01-11 18:25:51 +02:00
  • 9bfcb16fd3 Add llama enum for IQ2_XS ik/iq2_2.31bpw Iwan Kawrakow 2024-01-11 18:24:12 +02:00
  • 5626cdd42f
    ci: nix-flake-update: new token with pr permissions Someone Serge 2024-01-11 16:21:13 +00:00
  • 43f76bf1c3
    main : print total token count and tokens consumed so far (#4874) b1821 pudepiedj 2024-01-11 16:14:52 +00:00
  • f65f575e4b Move param def posn pudepiedj 2024-01-11 16:07:59 +00:00
  • add6fc0bed Merge branch 'master' of https://github.com/ggerganov/llama.cpp Michael Coppola 2024-01-11 10:58:00 -05:00
  • f35acb84eb
    swift : pin ggml commit + remove ggml.h from spm-headers Georgi Gerganov 2024-01-11 17:57:28 +02:00
  • 81be5d2c51 Two requested changes pudepiedj 2024-01-11 15:55:06 +00:00
  • 2f043328e3
    server : fix typo in model name (#4876) b1820 Isaac McFadyen 2024-01-11 09:33:26 -05:00
  • 2a7c94db5f
    metal : put encoder debug group behind a define (#4873) b1819 Paul Tsochantaris 2024-01-11 14:31:52 +00:00
  • 39dd8da68f
    Fix typo in model name in server.cpp Isaac McFadyen 2024-01-11 09:05:55 -05:00
  • c4867196b4 server : add --split-mode parameter slaren 2024-01-11 14:46:53 +01:00
  • c3681af783 Merge remote-tracking branch 'origin/master' into sl/backend-sched slaren 2024-01-11 12:16:53 +01:00
  • 42aa835c58 opencl : fix double initialization slaren 2024-01-11 12:00:18 +01:00
  • 74c469dbdb Updating before PR pudepiedj 2024-01-11 10:55:13 +00:00
  • 17869c87a9 Merge remote-tracking branch 'origin/master' into add_token_count pudepiedj 2024-01-11 10:28:23 +00:00
  • c22c70454c Add show token count pudepiedj 2024-01-11 10:08:58 +00:00
  • 92206c7121 Placing Metal encoder debug group behind a define Paul Tsochantaris 2024-01-11 09:40:04 +00:00
  • 6616772ffa
    finetune: sort includes according to iwyu Daniel Bevenius 2024-01-11 09:02:42 +01:00
  • 90732e824c
    finetune: add comments to all includes Daniel Bevenius 2024-01-11 08:47:12 +01:00
  • ec2ff06029
    finetune: add missing includes Daniel Bevenius 2024-01-11 08:37:22 +01:00
  • 64802ec00d
    sync : ggml b1818 Georgi Gerganov 2024-01-11 09:39:08 +02:00
  • 3267c2abc7
    metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +02:00
  • f85a973aa1
    ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -05:00
  • 5362e43962
    metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -08:00
  • e739de7909
    ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +08:00
  • c910e3c28a
    Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -05:00
  • f34432ca1e
    fix : cuda order of synchronization when setting a buffer (ggml/679) Erik Scholz 2024-01-05 16:00:00 +01:00
  • 6e60a5c69c Update Q2_K_S values in the quantize tool Iwan Kawrakow 2024-01-11 09:36:48 +02:00
  • a38378d2ff Restore intended k-quants quantization mixes for MoE models Iwan Kawrakow 2024-01-11 09:24:28 +02:00
  • 7a9f75c38b
    server : update readme to document the new /health endpoint (#4866) Behnam M 2024-01-11 02:12:05 -05:00
  • 5c1980d8d4
    server : fix build + rename enums (#4870) b1810 Georgi Gerganov 2024-01-11 09:10:34 +02:00
  • e80c61240d
    server : fix build + rename enums Georgi Gerganov 2024-01-11 09:06:52 +02:00
  • b9ad08af19 improved dynatemp wizard Concedo 2024-01-11 11:26:14 +08:00
  • edb2651e55
    server: update README.md for --api-key-file Michael Coppola 2024-01-10 22:12:27 -05:00
  • ddc06843f1 CUDA: fix softmax compile for old CUDA versions JohannesGaessler 2024-01-10 18:27:00 +01:00
  • 941e70db14 Revert "Revert "CUDA: faster softmax via shared memory + fp16 math (#4742)"" Concedo 2024-01-11 10:59:27 +08:00
  • 8b2c774ced Revert "test setting arch to all major" Concedo 2024-01-11 10:57:32 +08:00
  • 6dcc42bd6b fix whitespace slaren 2024-01-11 03:28:07 +01:00
  • d83c084020 llama-bench : add split-mode parameter slaren 2024-01-11 03:25:36 +01:00
  • 9d4ba6ed07 address review comments slaren 2024-01-11 03:13:09 +01:00
  • 2196fa64c4
    updated server readme to document the /health endpoint too Behnam M 2024-01-10 19:50:58 -05:00
  • 3b9a0e8c17 added file error handling to --api-key-file, changed code to better reflect current style Michael Coppola 2024-01-10 17:40:38 -05:00
  • 96fc0cd5a7 Merge branch 'master' of https://github.com/ggerganov/llama.cpp Michael Coppola 2024-01-10 17:27:29 -05:00
  • 50579f27e9 attempt to get test-backend-ops working Jared Van Bortel 2024-01-10 16:14:03 -05:00
  • c522c112b3 cuda : fix split buffer free slaren 2024-01-10 21:31:28 +01:00
  • 8c67fb26ba
    Merge branch 'ggerganov:master' into master Behnam M 2024-01-10 15:05:36 -05:00
  • cd108e641d
    server : add a /health endpoint (#4860) Behnam M 2024-01-10 14:56:05 -05:00
  • efd36b0270
    made ServerState atomic and turned two-line spaces into one-line Behnam M 2024-01-10 14:39:37 -05:00
  • df2de87193 minor: fix whitespace Michael Coppola 2024-01-10 14:27:28 -05:00
  • df7ab297b8 server: added support for multiple api keys, added loading api keys from file Michael Coppola 2024-01-10 14:17:17 -05:00
  • 8a99f69895 fix assertion failure Jared Van Bortel 2024-01-10 13:44:34 -05:00
  • 9289306e7a Token count changes pudepiedj 2024-01-10 18:07:36 +00:00
  • d5670d6e46 kompute : initial attempt at ggml-backend v2 support Jared Van Bortel 2024-01-09 16:24:10 -05:00
  • 1eb8804c18 PR #4766 Jared Van Bortel 2024-01-10 11:29:04 -05:00
  • 3773e1afe7 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ceb/nomic-vulkan Jared Van Bortel 2024-01-09 16:37:08 -05:00
  • ae6d6824b7 Merge commit 'd232aca5a7' into ceb/nomic-vulkan Jared Van Bortel 2024-01-09 16:34:46 -05:00
  • 904c563dbc sync xxd commands with GPT4All llama.cpp.cmake Jared Van Bortel 2024-01-10 12:12:59 -05:00