Commit graph

  • 961e293833 convert-hf : simplify BitNet pre-quantization Francis Couture-Harpin 2024-06-26 16:24:40 -04:00
  • 89dc3b254c ggml-quants : use ceiling division when quantizing q1_3 Francis Couture-Harpin 2024-06-26 15:31:48 -04:00
  • 9465ec6e12 ggml-quants : ARM NEON vec_dot for q2_2 and q1_3 Francis Couture-Harpin 2024-06-25 01:32:14 -04:00
  • 638ad52f87 ggml-quants : cleanup Q1_3 code formatting Francis Couture-Harpin 2024-06-23 19:44:09 -04:00
  • ef1e345c85 ggml-quants : Q2_2 now faster than Q4_K on with AVX2 Francis Couture-Harpin 2024-06-19 22:12:43 -04:00
  • 48b73b8498 ggml-quants : substract 1 when back in epi8 Francis Couture-Harpin 2024-06-19 17:50:34 -04:00
  • 7ef4254a92 ggml-quants : faster 1.625 bpw AVX2 vec_dot Francis Couture-Harpin 2024-06-19 14:34:32 -04:00
  • bd807499f7 ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b Francis Couture-Harpin 2024-06-19 12:21:08 -04:00
  • 09a31e2adc llama : suppress unref var in Windows MSVC Daniel Bevenius 2024-06-27 06:04:42 +01:00
  • 1dc8e91081 Merge remote-tracking branch 'offical/master' toyer 2024-06-27 02:56:56 +00:00
  • 7357273e08 add comment to glm prefix and suffix toyer 2024-06-27 02:55:52 +00:00
  • d053004046 Update vulkan obj file paths Mason M 2024-06-26 23:14:47 -03:00
  • ac146628e4
    Fix llama-android.cpp for error - "common/common.h not found" (#8145) b3246 Raj Hammeer Singh Hada 2024-06-27 07:27:57 +05:30
  • 9b31a40c6d
    clip : suppress unused variable warnings (#8105) b3245 Daniel Bevenius 2024-06-27 01:50:09 +02:00
  • 6571046097 Forward GGML_EXTRA_LIBS to CMake config pkg Mason M 2024-06-26 20:18:07 -03:00
  • a1495e709c Merge branch 'master' into vulkan-build-integration Mason M 2024-06-26 19:59:39 -03:00
  • 93fe7b7ea2
    Merge branch 'master' into spm-infill-support Sigbjørn Skjæret 2024-06-27 00:18:40 +02:00
  • dcf1c3a1a9
    use find instead Sigbjørn Skjæret 2024-06-26 23:52:28 +02:00
  • 3500c1481f
    Fix llama-android.cpp for error - "common/common.h not found" Raj Hammeer Singh Hada 2024-06-27 02:57:25 +05:30
  • b2b9bd8cbf
    account for space prefix character Sigbjørn Skjæret 2024-06-26 23:21:39 +02:00
  • c70d117c37
    scripts : fix filename sync Georgi Gerganov 2024-06-26 23:25:22 +03:00
  • 0262d0153f json: explicit type: object for nested items object in cli example ochafik 2024-06-26 21:23:44 +01:00
  • 0b816fa60f
    add mention to llama-gbnf-validator Olivier Chafik 2024-06-26 21:08:55 +01:00
  • a4ab2a7920
    Merge branch 'ggerganov:master' into fix-control-vector-loading jukofyork 2024-06-26 21:01:29 +01:00
  • ae5d0f4b89
    ci : publish new docker images only when the files change (#8142) b3243 slaren 2024-06-26 21:59:28 +02:00
  • cb4bf40299
    allow multiple directions for same layer in same file jukofyork 2024-06-26 20:58:49 +01:00
  • 31ec3993f6
    ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140) b3242 slaren 2024-06-26 21:34:14 +02:00
  • c7ab7b612c
    make : fix missing -O3 (#8143) b3241 slaren 2024-06-26 20:20:22 +02:00
  • 8501007c5f make : fix missing -O3 slaren 2024-06-26 20:13:30 +02:00
  • c786478dba ci : publish new docker images only when the files change slaren 2024-06-26 19:25:57 +02:00
  • 674a1c199b ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) slaren 2024-06-26 19:15:09 +02:00
  • f2d48fffde
    sync : ggml b3240 Georgi Gerganov 2024-06-26 19:39:19 +03:00
  • 4713bf3093
    authors : regen Georgi Gerganov 2024-06-26 19:36:44 +03:00
  • 0e814dfc42
    devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139) Georgi Gerganov 2024-06-26 19:32:07 +03:00
  • a95631ee97
    readme : update API notes Georgi Gerganov 2024-06-26 19:26:13 +03:00
  • 65f9293d14
    devops : remove clblast + LLAMA_CUDA -> GGML_CUDA gg/fix-devops Georgi Gerganov 2024-06-26 18:37:55 +03:00
  • c4ded1a8fb llama : make pos_bias contiguous for CUDA Stanisław Szymczyk 2024-06-26 17:46:39 +02:00
  • 7b4d87a17a
    refactored llama_control_vector_load_one() jukofyork 2024-06-26 16:45:21 +01:00
  • bad0cafee9 llama : updated llm_build_ffn() calls to new API in build_t5() Stanisław Szymczyk 2024-06-26 17:38:13 +02:00
  • f3f65429c4
    llama : reorganize source code + improve CMake (#8006) Georgi Gerganov 2024-06-26 18:33:02 +03:00
  • 1c8d37a267
    Merge branch 'ggerganov:master' into t5-clean-3 fairydreaming 2024-06-26 17:31:15 +02:00
  • 1e6e363d7f test zero max buffer size sl/zero-max-size slaren 2024-06-26 17:11:09 +02:00
  • 2b276756ee
    Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow jukofyork 2024-06-26 14:45:00 +01:00
  • f0f9a1244c
    Merge 50732c0698 into 8854044561 Luca 2024-06-26 15:38:32 +02:00
  • 777eb3bb0d
    Merge 005cf2e662 into 8854044561 Radoslav Gerganov 2024-06-26 06:34:39 -07:00
  • 50732c0698
    Add 128 byte read and 128 threads per block Luca 2024-06-26 15:32:52 +02:00
  • b699344543 Update README.md Olivier Chafik 2024-06-26 14:12:23 +01:00
  • 45681a57dd llama : add inference support and model types for T5 and FLAN-T5 model families Stanisław Szymczyk 2024-06-26 15:03:01 +02:00
  • e4895957f7
    Merge 0ba20ed97a into 8854044561 Georgi Gerganov 2024-06-26 13:58:42 +02:00
  • 70ef817068
    Merge b30565e0c8 into 8854044561 Radoslav Gerganov 2024-06-26 13:58:42 +02:00
  • 2318cadf0c Move sudo to apt-key invocation Mason M 2024-06-26 08:58:42 -03:00
  • c61cd05611 Clean up tabs Mason M 2024-06-26 08:49:16 -03:00
  • 4dd9017027
    Merge 2ef86e7213 into 8854044561 Justine Tunney 2024-06-26 14:18:37 +03:00
  • 90a7f8d02d Update README.md Olivier Chafik 2024-06-26 11:43:35 +01:00
  • 61086efe71 Update README.md Olivier Chafik 2024-06-26 11:42:40 +01:00
  • 018dce50a6 mention broken prefixItems Olivier Chafik 2024-06-26 11:40:40 +01:00
  • c11dc91553
    scripts : sync ggml-blas.h [no ci] Georgi Gerganov 2024-06-26 11:31:38 +03:00
  • efcc4acec3 Update README.md ochafik 2024-06-26 09:30:03 +01:00
  • fd1d48a2eb
    scripts : fix sync paths [no ci] Georgi Gerganov 2024-06-26 11:22:32 +03:00
  • ddf43fcb64 Update README.md ochafik 2024-06-26 09:19:37 +01:00
  • eb4c669095 Update README.md ochafik 2024-06-26 09:19:02 +01:00
  • c02e9e10a5 Update README.md ochafik 2024-06-26 09:14:13 +01:00
  • 034c3c4111 Update README.md ochafik 2024-06-26 09:12:32 +01:00
  • dfa7b21ec7
    Merge 3b22ea0594 into 8854044561 ZeusXuan 2024-06-26 07:58:28 +00:00
  • 3594986a4d Update README.md ochafik 2024-06-26 08:53:27 +01:00
  • 7fdaa821e3 Merge remote-tracking branch 'origin/master' into json-doc3 ochafik 2024-06-26 08:35:09 +01:00
  • 3b22ea0594
    Merge branch 'ggerganov:master' into master ZeusXuan 2024-06-26 14:58:16 +08:00
  • 7af279105e change some details of command help info zzx 2024-06-26 14:57:33 +08:00
  • 22495bfe3a
    move public backend headers to the public include directory (#8122) slaren 2024-06-26 08:50:42 +02:00
  • 0d93f02748
    make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE Georgi Gerganov 2024-06-25 22:33:47 +03:00
  • 63b75fc481
    cmake : fix kompute build Georgi Gerganov 2024-06-25 15:13:03 +03:00
  • 723739f042
    cmake : build normal ggml library (not object library) [no ci] Georgi Gerganov 2024-06-25 15:09:18 +03:00
  • c7c18f356f
    cmake : link math library [no ci] Georgi Gerganov 2024-06-25 12:02:26 +03:00
  • 3978393047
    cmake : minor [no ci] Georgi Gerganov 2024-06-25 11:30:47 +03:00
  • 070031df6b
    server : fix mingw build Georgi Gerganov 2024-06-25 09:56:47 +03:00
  • 888d790d22
    cmake : fixes [no ci] Georgi Gerganov 2024-06-22 16:51:59 +03:00
  • 62795ad02f
    ci : disable kompute build [no ci] Georgi Gerganov 2024-06-24 10:07:11 +03:00
  • 9802972238
    files : relocate [no ci] Georgi Gerganov 2024-06-26 09:53:31 +03:00
  • ef74eb13a7
    scripts : update sync [no ci] Georgi Gerganov 2024-06-21 12:29:55 +03:00
  • 30ffc765e0
    spm : fix metal header Georgi Gerganov 2024-06-26 09:32:43 +03:00
  • 8854044561
    Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (#8115) Isaac McFadyen 2024-06-26 02:29:28 -04:00
  • c8771ab5f8
    CUDA: fix misaligned shared memory read (#8123) Johannes Gäßler 2024-06-26 08:28:02 +02:00
  • 494165f3b6
    llama : extend llm_build_ffn() to support _scale tensors (#8103) b3233 Eddie-Wang 2024-06-26 14:27:46 +08:00
  • 0595f03dd1 fix chat template bug toyer 2024-06-26 05:58:13 +00:00
  • e18a5365c3 Merge remote-tracking branch 'offical/master' toyer 2024-06-26 02:48:07 +00:00
  • f7d9410cd9 nix test slaren 2024-06-26 03:21:21 +02:00
  • 9396c7bbaf set <|endoftext|> as eos and <|user|> as eot toyer 2024-06-26 02:16:12 +00:00
  • 885954646e Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow Mason M 2024-06-25 23:01:28 -03:00
  • 99c3027298 Use pkg-config to locate vulkan library Mason M 2024-06-25 22:18:56 -03:00
  • 37ff709898
    Add suggestions from review Isaac McFadyen 2024-06-25 21:07:37 -04:00
  • 9b2f16f805
    json: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863) b3232 Olivier Chafik 2024-06-26 01:46:35 +01:00
  • 6777c544bd
    json: fix additionalProperties, allow space after enum/const (#7840) b3231 Olivier Chafik 2024-06-26 01:45:58 +01:00
  • 165beb07b2 move public backend headers to the public include directory slaren 2024-06-25 22:36:24 +02:00
  • 491a967455 Add make target for Vulkan shaders Mason M 2024-06-25 19:38:32 -03:00
  • dd198ceaaa
    Merge branch 'ggerganov:master' into vulkan-build-integration bandoti 2024-06-25 19:40:15 -03:00
  • 9bf8ec5788 CUDA: fix misaligned shared memory read Johannes Gäßler 2024-06-26 00:09:56 +02:00
  • 7caa7b9e83 Merge remote-tracking branch 'origin/master' into json-type ochafik 2024-06-25 23:15:41 +01:00
  • 078e7f4260 Update README.md ochafik 2024-06-25 23:14:10 +01:00
  • f0a37296dd Merge remote-tracking branch 'origin/master' into json-doc3 ochafik 2024-06-25 23:07:51 +01:00
  • 23beed22a3 update # tokens in server test: consts can now have trailing space ochafik 2024-06-25 21:59:23 +01:00