Commit graph

  • 3878b397a9
    Merge branch 'ggerganov:master' into handle-eom-token fairydreaming 2024-08-04 20:33:43 +02:00
  • f10b0e2c39 gguf-py : add constants and method related to <|eom_id|> token Stanisław Szymczyk 2024-08-04 20:22:48 +02:00
  • 0d6fb52be0
    Install curl in runtime layer (#8693) b3511 Brandon Squizzato 2024-08-04 14:17:16 -04:00
  • 978ba3d83d
    Server: Don't ignore llama.cpp params (#8754) b3510 ardfork 2024-08-04 18:16:23 +00:00
  • e7d9abf388 fix build ngxson 2024-08-04 20:00:09 +02:00
  • 42960ec78e handle lora_no_apply ngxson 2024-08-04 19:48:17 +02:00
  • e5c2d16739 server : add lora hotswap endpoint ngxson 2024-08-04 19:46:13 +02:00
  • 5b7f62642a
    sync : ggml Georgi Gerganov 2024-08-04 19:13:25 +03:00
  • 393df556a7
    vulkan : implement Stable Diffusion operators (ggml/904) 0cc4m 2024-08-04 17:28:08 +02:00
  • eaf56f2029
    ggml : move c parameter comment to ggml_rope_ext (ggml/901) Daniel Bevenius 2024-07-29 15:06:06 +02:00
  • 6c75cb952a Only run backend ops mul mat vec block size test if block size not already covered 0cc4m 2024-08-04 17:48:35 +02:00
  • ecabd54d89 Fix Vulkan mul mat vec invalid results when ncols < warp size 0cc4m 2024-08-04 17:44:18 +02:00
  • ecf6b7f23e
    batched-bench : handle empty -npl (#8839) b3509 Brian Cunnie 2024-08-04 03:55:03 -07:00
  • adb79aa9f7
    Update examples/batched-bench/batched-bench.cpp Georgi Gerganov 2024-08-04 13:54:20 +03:00
  • bddcc5f985
    llama : better replace_all gg/replace-all Georgi Gerganov 2024-08-04 13:42:08 +03:00
  • 8ce8d57cdc
    Update README.md BarfingLemurs 2024-08-04 06:27:35 -04:00
  • 01aae2b497 baby-llama : remove duplicate vector include b3508 Daniel Bevenius 2024-08-03 15:07:47 +02:00
  • 59c5d479de
    attn_qkv.weight in IQ4_XS for FTYPE IQ3_M Nexes the Old 2024-08-04 12:06:06 +02:00
  • 93c35f86a9
    attn.output.tensor of FYPE IQ3_M in IQ4_XS Nexes the Old 2024-08-04 11:59:52 +02:00
  • 409406dcc8 Add pre-tokenizer regexes for BLOOM and gpt3-finnish Esko Toivonen 2024-08-04 11:52:09 +03:00
  • 4b77ea95f5
    flake.lock: Update (#8847) Georgi Gerganov 2024-08-04 05:53:20 +03:00
  • 229c35cb59 gguf-py : remove LlamaFileTypeMap compilade/gguf-py-quants-class Francis Couture-Harpin 2024-08-03 21:22:37 -04:00
  • d7ed6f1816 flake.lock: Update github-actions[bot] 2024-08-04 00:20:03 +00:00
  • f034aa1bb1 ggml-quants : rename fields of TQ1_0 and TQ2_0 structs for consistency Francis Couture-Harpin 2024-08-03 16:22:04 -04:00
  • 76614f352e
    ggml : reading the runtime sve config of the cpu (#8709) b3506 jdomke 2024-08-04 01:34:41 +09:00
  • 74746ae7f5 fixup! py: add more authorship metadata from model card brian khuu 2024-08-04 00:43:46 +10:00
  • d3271d6cfa
    baby-llama : remove duplicate vector include Daniel Bevenius 2024-08-03 15:07:47 +02:00
  • 256707309f enforce UTF-8 when using MSVC LIU Xiao 2024-07-27 11:15:55 +08:00
  • 35ee170413 add typedef of WIN32_MEMORY_RANGE_ENTRY and PWIN32_MEMORY_RANGE_ENTRY to make it work under MinGW LIU Xiao 2024-07-05 17:43:08 +08:00
  • ead607e013 0x601 not works, use _WIN32_WINNT_WIN7 instead LIU Xiao 2024-07-05 17:42:17 +08:00
  • 12c5612774 correct _WIN32_WINNT and WINVER settings LIU Xiao 2024-07-05 17:41:44 +08:00
  • 3a987b00fc set _WIN32_WINNT=0x601 (WIN7) under mingw in Makefile LIU Xiao 2024-07-03 12:56:08 +08:00
  • d3cd16601e set _WIN32_WINNT and WINVER to GGML_WIN_VER under WIN32 instead of MINGW LIU Xiao 2024-07-03 12:54:56 +08:00
  • 0ad8c88d3f set GGML_WIN_VER to 0x601 (WIN7) LIU Xiao 2024-07-03 12:52:35 +08:00
  • e8119ef670 PrefetchVirtualMemory is called dynamically so remove the preprocessor LIU Xiao 2024-07-03 12:49:53 +08:00
  • b1b2f75473 server: update cpp-httplib to version that support win7 by setting _WIN32_WINNT<_WIN32_WINNT_WIN8 LIU Xiao 2024-07-03 12:46:31 +08:00
  • 8b127f0cc5 server: Windows 7 compatibility LIU Xiao 2024-06-29 19:30:55 +08:00
  • 2bdcb7b15a ggml: Add epsilon as a parameter for group_norm Molly Sophia 2024-08-02 13:56:51 +08:00
  • d5779c27ba
    More occurences of n_experts == 8 changed to >= in quant strategies Nexes the Old 2024-08-03 03:04:25 +02:00
  • 8eca9b526d [example] batched-bench "segmentation fault" Brian Cunnie 2024-08-02 17:21:43 -07:00
  • 04eec58112 ggml : remove q1_3 and q2_2 Francis Couture-Harpin 2024-08-02 19:52:19 -04:00
  • 7d337d0f89
    Slight reorder of the attn.weight tree Nexes the Old 2024-08-03 01:35:08 +02:00
  • 63986631a3
    Apply the GQA2/Expert2 conditionality to the IQ3 quants Nexes the Old 2024-08-02 23:49:03 +02:00
  • e82ff5a346 gguf-py : fix BF16 numpy view type Francis Couture-Harpin 2024-08-02 17:42:46 -04:00
  • 861265b91e gguf-py : fix flake8 lint Francis Couture-Harpin 2024-08-02 16:23:30 -04:00
  • 5e27e7e11c convert_hf : simplify internal quantization type selection Francis Couture-Harpin 2024-08-02 16:14:49 -04:00
  • 1ac1a79161 gguf-py : use classes for quants Francis Couture-Harpin 2024-07-27 16:01:50 -04:00
  • b72c20b85c
    Fix conversion of unnormalized BF16->BF16 weights (#7843) b3505 Sigbjørn Skjæret 2024-08-02 21:11:39 +02:00
  • 02c75452c1
    vulkan: fix storageBuffer16BitAccess detection on some adreno driver rhjdvsgsgks 2024-08-02 18:46:31 +00:00
  • b77cdd83ff
    Small changes for IQ2 quant strategies (notably IQ2_S and IQ2_M) Nexes the Old 2024-08-02 20:40:04 +02:00
  • 9cdd1dfa16 cmake : Link vulkan-shaders-gen with pthreads Jaeden Amero 2024-08-02 17:45:36 +00:00
  • e09a800f9a
    cann: Fix ggml_cann_im2col for 1D im2col (#8819) b3504 Mengqing Cao 2024-08-02 16:50:53 +08:00
  • 8513bb8ae7 common : Changed tuple to struct (TODO fix) Jia Liu 2024-08-02 16:25:51 +08:00
  • c1111cc096 fix build warning MengqingCao 2024-08-02 07:43:48 +00:00
  • 77c580deeb modify convert script of minicpmv caitianchi 2024-08-02 15:15:02 +08:00
  • 6da5130bdb support minicpmv2.6 caitianchi 2024-08-02 15:13:54 +08:00
  • c9f65bfa76 fix ggml_cann_im2col for 1D im2col MengqingCao 2024-08-02 07:01:14 +00:00
  • e868aa928e revert xxhash fix and add brackets domke 2024-08-02 12:36:46 +09:00
  • da52ed164d readme: add GPUStack to the UI list in README [no ci] Yinlin Li 2024-08-02 11:08:24 +08:00
  • 0fbbd88458
    [SYCL] Fixing wrong VDR iq4nl value (#8812) b3503 Ouadie EL FAROUKI 2024-08-02 01:55:17 +01:00
  • 39ae18444f added dir-assistant to UI projects Chase Adams 2024-08-01 17:05:53 -05:00
  • afbb4c1322
    ggml-cuda: Adding support for unified memory (#8035) b3502 matteo 2024-08-01 23:28:28 +02:00
  • 5f87eac04d Fixing wront VDR iq4nl value OuadiElfarouki 2024-08-01 19:02:25 +01:00
  • b7a08fd5e0
    Build: Only include execinfo.h on linux systems that support it (#8783) b3501 Alex O'Connell 2024-08-01 12:53:46 -04:00
  • 23c7587806 README: add ramalama to the availables UI Eric Curtin 2024-08-01 17:47:11 +01:00
  • 7a11eb3a26
    cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) b3500 slaren 2024-08-01 15:26:22 +02:00
  • f059434f05 add test slaren 2024-08-01 14:26:24 +02:00
  • 8e68836639 only use dmmv for supported types slaren 2024-08-01 14:26:09 +02:00
  • f2596b3409 update asserts slaren 2024-08-01 14:08:58 +02:00
  • 7a70fcd85e py: add more authorship metadata from model card brian khuu 2024-08-01 22:03:22 +10:00
  • 4a03d0de27 prefix variable to avoid possible conflicts domke 2024-08-01 20:20:10 +09:00
  • 3a3a7528cd
    merge cleanup Sigbjørn Skjæret 2024-08-01 10:24:14 +02:00
  • dc051541ff
    missed prototype update in merge Sigbjørn Skjæret 2024-08-01 10:00:21 +02:00
  • 2b7464888f Merge branch 'master' of github.com:ggerganov/llama.cpp into convert-bf16-fix Sigbjørn Skjæret 2024-08-01 09:51:06 +02:00
  • 45719a2472 ggml : avoid directly using vmlal_high_s8, for 32-bit ARM compat Francis Couture-Harpin 2024-08-01 01:11:30 -04:00
  • 4d71c98544 mv dpct/helper.hpp to dpct.hpp arthw 2024-08-01 12:52:06 +08:00
  • 254a750249 rename device_infos to infos arthw 2024-08-01 12:48:18 +08:00
  • 6211ac0408 simple code for loop arthw 2024-08-01 12:42:11 +08:00
  • 5417089aeb ggml : add NEON vec_dot implementation for TQ1_0 and TQ2_0 Francis Couture-Harpin 2024-07-31 23:35:04 -04:00
  • 271240e020 use glibc macro instead of defining a custom one Alex O'Connell 2024-07-31 23:33:19 -04:00
  • 1947c1200e support set main gpu arthw 2024-08-01 11:21:16 +08:00
  • a6dd6994a5 ggml : fix build issues in certain environments Francis Couture-Harpin 2024-07-31 23:14:36 -04:00
  • c8a0090922
    cann: support q8_0 for Ascend backend (#8805) b3499 wangshuai09 2024-08-01 10:39:05 +08:00
  • 5af1609e10 cann: fix q8_0 wangshuai09 2024-08-01 01:05:41 +00:00
  • 75e46ca22d
    Add files via upload hatgrey 2024-08-01 06:13:34 +05:30
  • 8fa112b38d
    mpi-cli/ hatgrey 2024-08-01 06:01:58 +05:30
  • d78bb46c21
    Delete examples/parallel/parallel.cpp hatgrey 2024-08-01 05:38:13 +05:30
  • afbbcf3c04
    server : update llama-server embedding flag documentation (#8779) b3498 Igor Okulist 2024-07-31 18:59:09 -05:00
  • 1ca16dd935 cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X slaren 2024-07-31 22:27:18 +02:00
  • ed9d2854c9
    Build: Fix potential race condition (#8781) b3497 Clint Herron 2024-07-31 15:51:06 -04:00
  • f38c6f3c19 adding one more case where the PR should not be enabled matteo serva 2024-07-31 18:12:41 +02:00
  • 7444bad4f2
    Updating the documentation matteo 2024-07-31 18:10:36 +02:00
  • 47f6e02eda fix: try fix the tensor rank of mul mat hongruichen 2024-07-31 22:44:21 +08:00
  • 398ede5efe
    Adding Gemma 2 2B configs (#8784) b3496 pculliton 2024-07-31 11:12:10 -04:00
  • 36e6685d93
    Update src/llama.cpp pculliton 2024-07-31 11:00:16 -04:00
  • 390f54eb74 cleaning up the documentation matteo serva 2024-07-31 16:43:53 +02:00
  • 44d28ddd5c
    cmake : fix use of external ggml (#8787) b3495 Borislav Stanimirov 2024-07-31 16:40:08 +03:00
  • 6cc7432b37 Merge remote-tracking branch 'origin/master' into dev-refactoring hongruichen 2024-07-31 20:25:28 +08:00
  • 74eb05a13b feat: add ggml_qnn_op_config for handle different op hongruichen 2024-07-29 23:12:51 +08:00
  • 098ba711f9
    cmake : fix use of external ggml Borislav Stanimirov 2024-07-31 10:40:05 +03:00