Commit graph

  • 9cb317f77e
    ggml : full ALiBi support (#7192) b2844 Georgi Gerganov 2024-05-11 10:32:41 +03:00
  • 03e940cdec
    convert : fix convert for refact models gg/refactor-alibi-2 Georgi Gerganov 2024-05-11 10:31:52 +03:00
  • 7f03dd0d4b GroupKV: Add int32_t to variant list, to simplify int use HanishKVC 2024-05-11 12:45:58 +05:30
  • 0342124946 GroupKV: Add to_str wrt vectors, help avoid compiler confusion HanishKVC 2024-05-11 12:27:42 +05:30
  • 7d7c59ec50 GroupKV:Simplify:P2: Rename tags, Make debug logs conditional HanishKVC 2024-05-11 11:57:27 +05:30
  • d764a9d395 GroupKV: Simplify code to the minimal needed for GroupKV - P1 HanishKVC 2024-05-11 11:32:06 +05:30
  • 86b842b172 GroupKV: Duplicate SimpCfg to chop down into GroupKV HanishKVC 2024-05-11 10:57:32 +05:30
  • 820fe7e667 make tab into spaces Steve Grubb 2024-05-10 22:45:48 -04:00
  • ce9c5aebea [server] Cleanup a memory leak on exit Steve Grubb 2024-05-10 22:41:41 -04:00
  • 0c9a0aef4c fix: add ignore merges tests to cmake Tony Fettes 2024-05-10 21:26:07 +08:00
  • c3d0f41d50 fix: change ignore_merges to bool Haoxiang Fei 2024-05-10 19:19:29 +08:00
  • 1fb5b55894 fix: copy to fix fallthrough Haoxiang Fei 2024-05-10 18:18:01 +08:00
  • 5d30a6ddd0 fix: test-tokenizer-1-bpe --ingore-merges detection Haoxiang Fei 2024-05-10 16:31:01 +08:00
  • 8a51d3b12d fix: set ignore_merges only for llama-3 Haoxiang Fei 2024-05-10 16:12:20 +08:00
  • c7614930f3 test: add test for llama-3 bpe ignore_merges Haoxiang Fei 2024-05-10 15:49:31 +08:00
  • c21d5e13fe fix: llama-3 ignore_merges Haoxiang Fei 2024-05-10 15:24:35 +08:00
  • bd80601ea8 Updating comments with what we've learned. Kunnis 2024-05-10 17:30:37 -05:00
  • c05d947fbe remove convert-lora-to-ggml.py slaren 2024-05-10 18:17:24 +02:00
  • a82ada7dcd comment clarification. Julia Longtin 2024-05-10 21:57:16 +00:00
  • 525a1cda7f
    error capturing if repo not reachable due to lack of licensed access CrispStrobe 2024-05-10 23:12:13 +02:00
  • 1c68ea8d9f disable resizing if numa is enabled. Kunnis 2024-05-10 16:05:07 -05:00
  • 4a3c42c82c correct a comment, and use jz when comparing to zero. Julia Longtin 2024-05-10 20:30:56 +00:00
  • 974e43be25 Making it much more likely to rechunk. Kunnis 2024-05-10 15:17:33 -05:00
  • 806472787d use values inside of the loop as soon as we have them. Julia Longtin 2024-05-10 19:33:58 +00:00
  • c0506f94bf SimpCfg: Allow for direct initialization lists based init HanishKVC 2024-05-11 00:17:08 +05:30
  • 8342de6478
    Add the example that shows all tests. Josh Ramer 2024-05-10 14:02:47 -05:00
  • e0af2df690 convert-hf : support outtype templating in outfile name compilade/lazy-bfloat16-convert-hf Francis Couture-Harpin 2024-05-10 13:35:37 -04:00
  • 3010850d92 make the test debugging script more robust Ubuntu 2024-05-10 17:27:36 +00:00
  • 21a1e740c2 fix loop. Julia Longtin 2024-05-10 17:07:27 +00:00
  • 7e44eabe0f move sub earlier, and move the compare of iterations to outside, and at the end of the loop. Julia Longtin 2024-05-10 17:03:41 +00:00
  • fe27902964 SimpCfg: Avoid iostream/cout and format for direct library use HanishKVC 2024-05-10 21:32:19 +05:30
  • 7966c8e443 spacing and comment changes. Julia Longtin 2024-05-10 16:50:39 +00:00
  • 67e60c0da4 Basic fix to cors check StrangeBytesDev 2024-05-10 09:39:14 -07:00
  • 650094e17b remove useless prefetches. Julia Longtin 2024-05-10 16:28:53 +00:00
  • 0ff7d5dd1a perform better prefetches, and invert the test of our clear flag for clarity. Julia Longtin 2024-05-10 16:14:28 +00:00
  • e849648888
    llama-bench : add pp+tg test type (#7199) b2843 slaren 2024-05-10 18:03:54 +02:00
  • b00607d1ab use vbroadcastss in place of vbroadcast32x4. Julia Longtin 2024-05-10 15:52:35 +00:00
  • fe47a6dcba server: fix reported top tokens for temperature 0 Johannes Gäßler 2024-05-10 17:40:26 +02:00
  • 395df0cc7c server: do not populate probs array when temperature is 0 Leon Knauer 2024-05-10 17:42:36 +02:00
  • 1f9a0eb8ce ChatON: Remove unneeded iostream HanishKVC 2024-05-10 21:10:44 +05:30
  • 18e437665c
    metal : fix flash attention kernel requirements (#7169) b2842 Georgi Gerganov 2024-05-10 18:20:10 +03:00
  • 362f71e454
    Merge branch 'master' into feat-simple-main-example Brian 2024-05-11 01:07:11 +10:00
  • aa6f4c280b CRLF -> LF Paulo de Castro 2024-05-10 12:04:45 -03:00
  • 8c660242d7
    convert : print "ignore_merges" field Georgi Gerganov 2024-05-10 17:53:04 +03:00
  • f6edcc4061 Use a vectorized assembly function to handle remaining chunks less than vector wide. Julia Longtin 2024-05-10 14:52:46 +00:00
  • 18d452a863
    Merge branch 'master' into activation-fusion Brian 2024-05-11 00:49:57 +10:00
  • 0faf92e74c
    ggml : require mask when using ALiBi Georgi Gerganov 2024-05-10 17:13:11 +03:00
  • 2282ac4d9f broadcast a single int8, instead of 4 of them. Julia Longtin 2024-05-10 14:19:27 +00:00
  • 5037f4e9bc
    Merge branch 'master' into master Brian 2024-05-10 23:55:28 +10:00
  • c25d83d416
    Merge branch 'master' into server-fix-num-token-eval Brian 2024-05-10 23:51:16 +10:00
  • 11dbcf02ae
    Merge branch 'master' into xsn/minicpm-template Brian 2024-05-10 23:44:15 +10:00
  • e9a3ba643b update llama-bench readme slaren 2024-05-10 15:33:11 +02:00
  • e4ac8ae720
    Update llama.h respect current numerology Nexesenex 2024-05-10 14:57:26 +02:00
  • 9422668ed4
    Merge branch 'master' into command_mode Brian 2024-05-10 22:57:00 +10:00
  • 397b1f8f9d
    vulkan : add dev notes Georgi Gerganov 2024-05-10 15:56:25 +03:00
  • 536983b1ad
    ggml : fix assert message Georgi Gerganov 2024-05-10 15:45:18 +03:00
  • a1278f13da
    minor : clean-up Georgi Gerganov 2024-05-10 15:31:05 +03:00
  • 25c6e82e7a
    llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200) b2840 slaren 2024-05-10 14:28:01 +02:00
  • d9adb8832b
    Merge remote-tracking branch 'origin/gg/refactor-alibi-2' into HEAD Georgi Gerganov 2024-05-10 15:27:21 +03:00
  • ae3305391f llama : use n_vocab to differentiate between mistral 7B and llama3 8B slaren 2024-05-10 14:16:57 +02:00
  • 865af990cc
    ggml : ggml_flash_attn_ext() support ALiBi (CUDA) Georgi Gerganov 2024-05-10 14:50:28 +03:00
  • 3658dd0a7f llama-bench : add pp+tg test type slaren 2024-05-10 13:47:32 +02:00
  • cc3df3f388
    Fix(server): stopped_word always true Kuriko Moe 2024-05-10 19:43:41 +08:00
  • f7055d31c5
    ggml : fix warning Georgi Gerganov 2024-05-10 14:03:00 +03:00
  • 4e3880978f
    Fix memory bug in grammar parser (#7194) b2839 Justine Tunney 2024-05-10 07:01:08 -04:00
  • 97c27f59f6
    ggml : ggml_flash_attn_ext() support ALiBi (Metal) Georgi Gerganov 2024-05-10 13:51:00 +03:00
  • 364c375e5d rm wait() arthw 2024-05-10 18:51:36 +08:00
  • f89fe2732c
    Main+: optionally allow special tokens from user in interactive mode (#7097) b2838 HanishKVC 2024-05-10 15:51:58 +05:30
  • 880ff7739d
    Fix memory bug in grammar parser Justine Tunney 2024-05-10 01:48:43 -07:00
  • 166e60bf9b
    ggml : ggml_flash_attn_ext() support ALiBi (CPU) Georgi Gerganov 2024-05-10 11:33:34 +03:00
  • d0592d495d
    ggml : update ggml_soft_max_ext() CUDA, SYCL Georgi Gerganov 2024-05-10 11:12:19 +03:00
  • 7fdca3348c
    ggml : full ALiBi support Georgi Gerganov 2024-05-10 10:40:33 +03:00
  • abb406b888 Merge branch 'master' into hkvc_chaton_v3 HanishKVC 2024-05-10 13:14:26 +05:30
  • 9566de9a0d Merge branch 'master' into hkvc_chat_interactivespecials HanishKVC 2024-05-10 12:35:58 +05:30
  • 0db0192200
    Merge d3286d6eca into d11afd6652 Johannes Gäßler 2024-05-10 14:48:33 +08:00
  • d11afd6652
    llava : fix moondream support (#7163) b2837 Andrei 2024-05-10 02:41:10 -04:00
  • d12c57b559
    Update convert-hf-to-gguf.py Georgi Gerganov 2024-05-10 09:25:24 +03:00
  • 3a8387863f
    Merge branch 'master' into Nexesenex-IQ1_XS-IQ1_S-quant-strategies Brian 2024-05-10 15:15:23 +10:00
  • e70fba507d
    Merge branch 'master' into master Brian 2024-05-10 15:12:53 +10:00
  • 160d0f0a8b
    Merge branch 'master' into master Brian 2024-05-10 15:07:55 +10:00
  • 68e7c2579a
    Merge branch 'master' into smooth-pr Brian 2024-05-10 15:06:13 +10:00
  • 807c8252ce Add in the re-chunking code. Kunnis 2024-05-09 23:50:37 -05:00
  • fc7dc515f1 adding the looping structure based on the chunk configuration. Kunnis 2024-05-09 23:29:49 -05:00
  • 4762d79d3d The yield shouldn't be necessary. Kunnis 2024-05-09 23:23:13 -05:00
  • c0557fa20a Starting the buildup of the loop Kunnis 2024-05-09 23:22:12 -05:00
  • 9acaec58bd starting to setup the chunking variables Kunnis 2024-05-09 22:31:43 -05:00
  • 891d583711 Formatting to match the orig patch Kunnis 2024-05-09 22:25:20 -05:00
  • 700c782dc1 Reorg the code. Kunnis 2024-05-09 22:18:26 -05:00
  • daa87b1813 adding the current_chunk Kunnis 2024-05-09 22:12:20 -05:00
  • bb1b1d0071 Moving row_size Kunnis 2024-05-09 21:58:36 -05:00
  • 209922f5ac moving src1_cont inside Kunnis 2024-05-09 21:53:27 -05:00
  • 086e5a8225 Moving some variables around Kunnis 2024-05-09 21:49:00 -05:00
  • f63f147471
    Merge branch 'master' into new_minicpm Brian 2024-05-10 11:40:43 +10:00
  • 87a98a5b6d resolve merge + SplitArguments for easier parsing Christian Zhou-Zheng 2024-05-09 21:22:55 -04:00
  • d7359a389c
    ggml : rewrite silu and softmax for cpu Justine Tunney 2024-05-08 17:15:46 -07:00
  • 8c570c9496
    Minor arithmetic improvement to mmvq wrapper kernel (#7172) b2836 Ouadie EL FAROUKI 2024-05-10 01:32:15 +01:00
  • 867de5edce use different restrict syntax, to make g++ happy. Julia Longtin 2024-05-09 23:08:43 +00:00
  • eaf4bd8b39
    eval-callback : fix conversion to float (#7184) b2835 slaren 2024-05-10 01:04:12 +02:00
  • 17c1e23bed update the readme to reflect the scripts existence Ubuntu 2024-05-09 22:48:37 +00:00
  • 861d87c864 script that shows a menu of tests to pick from & run the debugger on Ubuntu 2024-05-09 22:27:46 +00:00