Commit graph

  • 2d4de517bb un-hardcode max-alibi-bias fmz 2024-07-01 09:26:56 -07:00
  • e24328ea1a fix small typo in README ngxson 2024-07-01 17:54:34 +02:00
  • 80bdc38e45 Merge branch 'master' into xsn/gemma2_mask_swa ngxson 2024-07-01 17:54:17 +02:00
  • becc1d8a90 readme: add Paddler to the list of related projects Mateusz Charytoniuk 2024-07-01 17:52:13 +02:00
  • 7dc9cbf03f
    convert : add sanity check for query_pre_attn_scalar Georgi Gerganov 2024-07-01 18:38:24 +03:00
  • ce711f6eae
    llama : minor styling Georgi Gerganov 2024-07-01 18:26:24 +03:00
  • 422bfb3e88 Merge branch 'master' into vulkan-build-integration Mason M 2024-07-01 11:52:49 -03:00
  • 3808a4c1e0 Merge branch 'master' into dev-refactoring hongruichen 2024-07-01 22:52:08 +08:00
  • 9bca872be0 code review changes Mason M 2024-07-01 11:42:47 -03:00
  • 0ddeff1023
    readme : update tool list (#8209) b3273 Roni 2024-07-01 14:48:16 +02:00
  • 4422f2a162
    Update README.md Georgi Gerganov 2024-07-01 15:47:58 +03:00
  • a1189ef3ca Merge branch 'master' into mixed_types_gemm OuadiElfarouki 2024-07-01 13:31:05 +01:00
  • 3840b6f593
    nix : enable curl (#8043) Michael Francis 2024-07-01 07:47:04 -04:00
  • f2ea920e42
    Merge branch 'master' into chore/enable-curl-option Georgi Gerganov 2024-07-01 14:46:51 +03:00
  • 257f8e41e2
    nix : remove OpenCL remnants (#8235) Georgi Gerganov 2024-07-01 14:46:18 +03:00
  • d4a1923d4e
    minor : remove parentheses gg/nix-remove-opencl Georgi Gerganov 2024-07-01 14:45:55 +03:00
  • 694c59cb42
    Document BERT support. (#8205) iacore 2024-07-01 11:40:58 +00:00
  • 197fe6c1d7
    [SYCL] Update SYCL-Rope op and Refactor (#8157) b3269 zhentaoyu 2024-07-01 19:39:06 +08:00
  • 30f85eba85 try CI fix Johannes Gäßler 2024-07-01 13:30:28 +02:00
  • 5e5d89844b Enabled more data types for oneMKL gemm_batch OuadiElfarouki 2024-07-01 12:02:37 +01:00
  • 32cd6f5748
    nix : remove OpenCL remnants Georgi Gerganov 2024-07-01 13:49:44 +03:00
  • 3f89b58c48
    Merge branch 'master' into chore/enable-curl-option Georgi Gerganov 2024-07-01 13:47:50 +03:00
  • ed5496fb32 update ngxson 2024-07-01 12:35:47 +02:00
  • e9441510f8 tests : add _CRT_SECURE_NO_WARNINGS for WIN32 Daniel Bevenius 2024-01-01 06:38:48 +01:00
  • 865dd03f43 modified the general name of glm model toyer 2024-07-01 03:31:50 +00:00
  • c8cdb48d10 llama : support all OpenELM models Francis Couture-Harpin 2024-06-30 23:13:48 -04:00
  • 5e9dba664a fix conflicts toyer 2024-07-01 02:50:33 +00:00
  • 0d3a94a6b8 merge master toyer 2024-07-01 02:38:23 +00:00
  • d07f0a90c3 fix codestyle toyer 2024-07-01 02:23:19 +00:00
  • 43aa0d32c6 rebase and fix compile Yu Zhentao 2024-06-28 02:52:15 +00:00
  • ec55dc5098 fall back rope when src0 is not contiguous Yu Zhentao 2024-06-27 08:23:54 +00:00
  • 6514f176a4 align with rope.cu and move sycl-op to a single file Yu Zhentao 2024-06-27 07:30:04 +00:00
  • d0a7145ba9
    flake.lock: Update (#8218) b3268 Georgi Gerganov 2024-07-01 02:09:34 +03:00
  • d09ecb84c8 replace list with single tensor ngxson 2024-06-30 23:40:25 +02:00
  • 231dae4f68 add co-author ngxson 2024-06-30 23:11:04 +02:00
  • 46b56e6768 better naming ngxson 2024-06-30 22:27:47 +02:00
  • 51b2577dd4 Merge branch 'master' into openelm Francis Couture-Harpin 2024-06-30 16:22:07 -04:00
  • 10c3c419e9 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-30 15:31:25 -04:00
  • df9e9c9fcf change default Johannes Gäßler 2024-06-30 20:39:19 +02:00
  • 78754008df remove MIN_CC_DP4A checks Johannes Gäßler 2024-06-30 20:36:38 +02:00
  • db2ffd519d llama : fix mpt and olmo pre-tokenizer Francis Couture-Harpin 2024-06-30 14:34:55 -04:00
  • 9ef0780062
    Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203) b3267 Xuan Son Nguyen 2024-06-30 20:27:13 +02:00
  • a92595aa93 __dp4a -> ggml_cuda_dp4a Johannes Gäßler 2024-06-30 20:20:54 +02:00
  • ab2c3de9b3 fix data_swa uninitialized ngxson 2024-06-30 20:18:53 +02:00
  • 7df7530b8f gemma2: add sliding window mask ngxson 2024-06-30 19:26:13 +02:00
  • 0480dab44a uint -> uint32_t Johannes Gäßler 2024-06-30 12:05:44 +02:00
  • 4f4e1661ed Merge branch 'ggerganov-master' Aliebc 2024-06-30 14:09:10 +08:00
  • d9b5678b5b Merge with conflict Aliebc 2024-06-15 10:45:01 +08:00
  • 725ba0b352 Add YX UI for llama-server Aliebc 2024-06-15 17:50:00 +08:00
  • 32bf2296a2 Add YX simple filter for llama-server Aliebc 2024-06-15 10:45:01 +08:00
  • 49d31f7a62
    squash! convert-hf : print output file name when completed Daniel Bevenius 2024-06-30 06:49:30 +02:00
  • 06e169f872 fix formatting Andy Tai 2024-06-29 21:24:01 -07:00
  • 1cd9886620 fix formatting Andy Tai 2024-06-29 21:23:19 -07:00
  • 462cbc6cfb adding guile_llama_cpp to binding list Andy Tai 2024-06-29 21:19:46 -07:00
  • 1c5eba6f8e
    llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) b3266 Andrei 2024-06-29 20:44:08 -07:00
  • 4eab311ed0
    Merge branch 'ggerganov:master' into vulkan-build-integration bandoti 2024-06-30 00:14:14 -03:00
  • 51f0bd50a1 Remove custom pre attention scaling and use computed value instead. add-gemma2-soft-capping Andrei Betlen 2024-06-29 23:02:50 -04:00
  • ac9a065d31 Remove Python dependency from Vulkan build Mason M 2024-06-30 00:02:41 -03:00
  • 3fd4adfab8 flake.lock: Update github-actions[bot] 2024-06-30 00:19:47 +00:00
  • ec15f4d520 CUDA: refactor and optimize IQ MMVQ Johannes Gäßler 2024-06-27 16:55:43 +02:00
  • 6dc9eb4040 llama : quantization-related fixes for T5 Stanisław Szymczyk 2024-06-29 18:09:22 +02:00
  • e18d43c668 clip: don't throw exceptions from llava functions compiled as extern "C" Aleksandr Erofeev 2024-06-29 15:50:33 +01:00
  • bef4b29509 Added gppm to Tool list in README sulpher 2024-06-29 16:22:06 +02:00
  • a89427908d Add custom kq scaling from Gemma2Attention Andrei Betlen 2024-06-29 10:17:33 -04:00
  • 70f63b8fa1
    Update README.md iacore 2024-06-29 11:11:31 +00:00
  • 285b317bfc
    Update README.md iacore 2024-06-29 11:00:29 +00:00
  • 7d29b095be remove redundant change ngxson 2024-06-29 11:10:33 +02:00
  • 0040a53f63 disable chat template if in-prefix/suffix is set ngxson 2024-06-29 10:58:37 +02:00
  • e926a060db preserve new line llama_chat_format_single ngxson 2024-06-29 10:58:04 +02:00
  • cd8c573ede
    squash! convert-hf : print output file name when completed Daniel Bevenius 2024-06-29 09:14:56 +02:00
  • 6f2464e3dd Merge branch 'add-gemma2-soft-capping' of github.com:ggerganov/llama.cpp into add-gemma2-soft-capping Andrei Betlen 2024-06-29 01:11:17 -04:00
  • bb7159927d Add default value for attention and final logit softcap value Andrei Betlen 2024-06-29 01:10:55 -04:00
  • 8edf73a729 Merge branch 'master' of github.com:ggerganov/llama.cpp into add-gemma2-soft-capping Andrei Betlen 2024-06-29 00:59:58 -04:00
  • 6879fb501f
    squash! convert-hf : print output file name when completed Daniel Bevenius 2024-06-29 06:26:29 +02:00
  • 8fbd59308b ggml-quants : attempt to fix Arm 32-bit support Francis Couture-Harpin 2024-06-28 22:52:57 -04:00
  • ec50944bf6 ggml-quants : fix build failure on Windows Francis Couture-Harpin 2024-06-28 20:41:13 -04:00
  • bfd2f21fb4 bitnet : replace 1.58b with b1.58, as in the paper Francis Couture-Harpin 2024-06-28 20:38:12 -04:00
  • 2dd5d1f4b3
    llamafile : improve moe prompt eval speed on cpu Justine Tunney 2024-06-28 16:18:33 -07:00
  • 06a37d98b1 Removed erroneous whitespace alex.tuddenham 2024-06-28 23:38:58 +01:00
  • a5870ba3ef Added checks for cmake,make and ctest alex.tuddenham 2024-06-28 23:30:57 +01:00
  • 72272b83a3
    fix code typo in llama-cli (#8198) b3265 Xuan Son Nguyen 2024-06-29 00:14:20 +02:00
  • c59bee0249 tweak tests ochafik 2024-06-28 23:00:35 +01:00
  • f33ad2a099 fix integ test ochafik 2024-06-28 22:54:33 +01:00
  • f286589a32 Merge remote-tracking branch 'origin/master' into json-order ochafik 2024-06-28 22:37:05 +01:00
  • 9e5f17c7fe fix typos ochafik 2024-06-28 22:34:39 +01:00
  • b386363957 fix code typo in llama-cli ngxson 2024-06-28 23:29:25 +02:00
  • 7f227d279d Update test-grammar-integration.cpp ochafik 2024-06-28 22:28:43 +01:00
  • 9c05bd26fb Update json_schema_to_grammar.mjs ochafik 2024-06-28 22:13:58 +01:00
  • 60b2df61cd json: reshuffle C++ converter ochafik 2024-06-28 22:06:59 +01:00
  • 675a7410b1 Merge branch 'master' into convert-bf16-fix Francis Couture-Harpin 2024-06-28 17:06:20 -04:00
  • 886ffadbaa json: bring JS & Python clis closer in space and spirit ochafik 2024-06-28 21:55:28 +01:00
  • 5b67a6cfbf ggml-impl : do not flush bf16 subnormals to zero Francis Couture-Harpin 2024-06-28 16:47:55 -04:00
  • b6abfdb5fe json: restrict external refs to https, remove allowFetch options ochafik 2024-06-28 21:39:02 +01:00
  • 9ba101313e Update README.md ochafik 2024-06-28 21:23:35 +01:00
  • 3b24739071 json: cache externally fetched refs (forever for now) ochafik 2024-06-28 21:21:47 +01:00
  • ae058954c7 Merge remote-tracking branch 'origin/master' into json-refs ochafik 2024-06-28 21:15:28 +01:00
  • 3a2471811f
    Update src/llama.cpp Andrei 2024-06-28 16:07:47 -04:00
  • f4424c150f Disable flash attention for Gemma2 Andrei Betlen 2024-06-28 16:00:20 -04:00
  • d1137c20f1 Add custom add_ functions Andrei Betlen 2024-06-28 15:58:02 -04:00
  • d3d3c4eb35 fix Andrei Betlen 2024-06-28 15:46:45 -04:00