Commit graph

  • feea528add Standardize top_k sorting kalomaze 2024-01-22 05:11:50 -06:00
  • 152d9d05e0
    finetune : print sample-start/include-sample-start (#5072) b1941 Daniel Bevenius 2024-01-22 12:11:01 +01:00
  • 66d575c45c
    llama : add Q3_K_XS (#5060) b1940 Kawrakow 2024-01-22 12:43:33 +02:00
  • c0e9d27b61 Add ability to compute KL-divergence Iwan Kawrakow 2024-01-22 12:15:36 +02:00
  • 4779d994fc tiny min p return check tweak kalomaze 2024-01-22 03:58:59 -06:00
  • 6167c263c7 Softmax exp & sum in one pass + temp returns if 1 kalomaze 2024-01-22 03:02:40 -06:00
  • 57744932c6
    ci : fix Windows CI by updating Intel SDE version (#5053) b1939 bobqianic 2024-01-22 08:55:05 +00:00
  • 130b0812d0 minor fixes Reinforce-II 2024-01-22 08:12:57 +00:00
  • f909301cdd
    finetune: print sample-start/include-sample-start Daniel Bevenius 2024-01-22 08:22:25 +01:00
  • 3466c6ebcf
    llama : add more qwen2 models (#5071) Shijie 2024-01-22 15:33:19 +08:00
  • ea175bfdd2
    Merge 31106553db into 504dc37be8 John 2024-01-22 08:25:51 +01:00
  • e6259b8fd1 add more qwen2 models simonJJJ 2024-01-22 14:36:09 +08:00
  • fe5e95bb02 kl-divergence: be able to save all logits to a file Iwan Kawrakow 2024-01-22 06:48:56 +02:00
  • 6d37742a3a devops: add intel oneapi dockerfile ngxson 2024-01-22 01:25:58 +01:00
  • 504dc37be8
    Revert LLAMA_NATIVE to OFF in flake.nix (#5066) iSma 2024-01-21 22:37:13 +01:00
  • 17720fad66
    metal : parallel reduce across heads Georgi Gerganov 2024-01-21 22:44:41 +02:00
  • 77d08f3272
    metal : parallelize across KV size Georgi Gerganov 2024-01-21 21:04:15 +02:00
  • 4fd9f5fd0a
    Revert LLAMA_NATIVE to OFF in flake.nix iSma 2024-01-21 20:31:23 +01:00
  • a4b6341c7b
    wip : template for rows per warp Georgi Gerganov 2024-01-21 18:24:13 +02:00
  • 05490fad7f
    add safetensors support to convert-lora-to-ggml.py (#5062) kuronekosaiko 2024-01-22 00:28:14 +08:00
  • f12f4cec99
    Update convert-lora-to-ggml.py kuronekosaiko 2024-01-22 00:20:35 +08:00
  • b7b53a5ccc convert : use presence of tokenizer.json to determine StableLM tokenizer loader Francis Couture-Harpin 2024-01-21 11:01:31 -05:00
  • f31955f5d1
    wip : 4 rows per simd group Georgi Gerganov 2024-01-21 18:01:28 +02:00
  • 8cde449b8b
    wip : 8 rows per simd group Georgi Gerganov 2024-01-21 12:23:22 +02:00
  • 6c5629d4d2
    add #include <string> to unicode.h (#5051) bobqianic 2024-01-21 15:17:35 +00:00
  • 906afe7810 server: add comments ngxson 2024-01-21 14:58:26 +01:00
  • 12829b2e64 server: add llama_server_response_event ngxson 2024-01-21 14:45:28 +01:00
  • 7dcbe39d36
    Add ability to evauate multiple choice tasks (#5047) Kawrakow 2024-01-21 14:42:44 +02:00
  • 7fa5ca9e62 Fix gcc warnings 0cc4m 2024-01-21 13:31:15 +01:00
  • 1f55cd20a0 Simplify barrier synchronization calls 0cc4m 2024-01-21 13:27:09 +01:00
  • 00f214c335 Fix oversized host staging buffers 0cc4m 2024-01-21 12:52:52 +01:00
  • 9c9523fd0f Make MSVC happy Iwan Kawrakow 2024-01-21 12:53:59 +02:00
  • e7a8fbc6f7
    Update unicode.h bobqianic 2024-01-21 10:27:57 +00:00
  • b5bf694b23
    add safetensors support to convert-lora-to-ggml.py kuronekosaiko 2024-01-21 18:27:18 +08:00
  • c37859bf21 move android script to example/llava directory Chenxiaotao03 2024-01-21 15:47:18 +08:00
  • b97325800a
    metal : specialize for head size Georgi Gerganov 2024-01-21 12:01:55 +02:00
  • 52ae085750
    metal : reduce branches Georgi Gerganov 2024-01-21 11:38:17 +02:00
  • 92540e44c2 Rename truthful_qa to multiple_choice Iwan Kawrakow 2024-01-21 11:53:38 +02:00
  • 6e6174206f Properly implement Vulkan backend buffer handling 0cc4m 2024-01-21 10:28:46 +01:00
  • 528da7515e
    metal : f16 precision Georgi Gerganov 2024-01-21 11:13:24 +02:00
  • 1173f49c3b
    metal : initial implementation Georgi Gerganov 2024-01-20 17:32:28 +02:00
  • 29c41d49fe Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K Iwan Kawrakow 2024-01-21 09:22:52 +02:00
  • ec4b80104f Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S Iwan Kawrakow 2024-01-21 08:48:58 +02:00
  • 3974340353 typo correction Uzo Nweke 2024-01-21 01:08:27 -05:00
  • 726c0fa9a2
    Slightly faster imatrix (#5050) Kawrakow 2024-01-21 08:01:20 +02:00
  • b2081a518c fix path and naming for lora Uzo Nweke 2024-01-20 23:41:50 -05:00
  • 032b34ce5b run mistral 7b for cpu only too Uzo Nweke 2024-01-20 23:15:57 -05:00
  • 32ea8780a4
    workflows: nix-ci: drop the redundant "paths" filter Someone Serge 2024-01-13 17:38:32 +00:00
  • 71174ee501
    workflows: nix-build-aarch64: rate limit Someone Serge 2024-01-13 17:16:54 +00:00
  • 42efc1d960
    workflows: nix-ci: rebuild on flake.lock updates Someone Serge 2024-01-13 17:10:19 +00:00
  • 942c0107a7
    flake.lock: Update (#5054) Georgi Gerganov 2024-01-21 05:17:27 +02:00
  • 940c01eb09 ggml : limit get_rows threads to the number of rows slaren 2024-01-21 04:09:26 +01:00
  • babb76a33a added debug check for printf statements l3utterfly 2024-01-21 09:25:27 +09:00
  • 84a185bf8d flake.lock: Update github-actions[bot] 2024-01-21 00:18:23 +00:00
  • 586d010041
    Update build.yml bobqianic 2024-01-20 23:46:42 +00:00
  • b43ebde3b0
    convert : partially revert PR #4818 (#5041) Jared Van Bortel 2024-01-20 18:14:18 -05:00
  • a11f1497ef llama : support StableLM 2 1.6B Francis Couture-Harpin 2024-01-20 17:47:11 -05:00
  • ac83ce3960
    Update unicode.h bobqianic 2024-01-20 21:58:08 +00:00
  • 2b137c54bd add mixtral 7bv0.1 q8 lora in gguff format, so we can test perplexity Uzo Nweke 2024-01-20 16:15:47 -05:00
  • 3e4871a3e1 workflows : use -L main for all ctest crasm 2024-01-20 04:01:03 -05:00
  • 16e12ab734 also duplicate gpu compute buffers to avoid races slaren 2024-01-20 18:52:33 +01:00
  • a97198747f ggml : multi-threaded get_rows slaren 2024-01-20 18:36:50 +01:00
  • d86f80f416 TruthfulQA: prepare tasks in parallel for large test datasets Iwan Kawrakow 2024-01-20 11:44:55 +02:00
  • 21d0ce5e05 TruthfulQA: fix random sample Iwan Kawrakow 2024-01-20 11:09:18 +02:00
  • b0a4873697 TruthfulQA: works but the result is bad Iwan Kawrakow 2024-01-20 10:26:15 +02:00
  • 6ce06623fd TruthfulQA: 1st attempt, does not look like it is working Iwan Kawrakow 2024-01-19 19:58:23 +02:00
  • 3aa56562c0 imatrix: add --no-ppl option to skip PPL calculations altogether Iwan Kawrakow 2024-01-20 19:04:09 +02:00
  • cdeac23ef5 imatrix: speedup by avoiding unnecessary allocations and copies Iwan Kawrakow 2024-01-20 18:49:02 +02:00
  • bc98eda9d5 add n_ubatch (-ub) parameter slaren 2024-01-20 16:49:24 +01:00
  • 09688c771b Merge remote-tracking branch 'origin/master' into sl/micro-batching slaren 2024-01-20 16:26:28 +01:00
  • e5de370cdf minor slaren 2024-01-15 19:24:55 +01:00
  • 97c1549808
    perplexity : fix MSVC build after #5020 (#5043) Jared Van Bortel 2024-01-20 10:08:08 -05:00
  • 6df465a91d
    llama : run all KQV ops on the CPU with no KV offload (#5049) slaren 2024-01-20 16:05:49 +01:00
  • a9681febd6
    ggml : online attention (CPU) gg/flash-attn-online Georgi Gerganov 2024-01-20 12:26:49 +02:00
  • 2c0ed7a638 multithreaded dequantize in mul_mat when using blas library Reinforce-II 2024-01-20 14:01:24 +00:00
  • 20fefdfe2b make GGML_TASK_INIT phase can be run in multithread Reinforce-II 2024-01-20 14:01:05 +00:00
  • 16b7e83ce2 llama : run all KQV ops on the CPU with no KV offload slaren 2024-01-19 12:16:54 +01:00
  • 4549a6abb1 Fix README.md output for ctest_with_model crasm 2024-01-20 03:14:23 -05:00
  • c3cdfffa88
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-01-20 10:12:07 +02:00
  • 77bc1bbd05
    cmake : add support for ccache (#5002) Herman Semenov 2024-01-20 08:11:31 +00:00
  • 2dc7c81b61
    cmake : option to disable ccache Georgi Gerganov 2024-01-20 10:10:42 +02:00
  • 48e2b13372
    Add a dart/flutter binding to README.md (#4882) adel boussaken 2024-01-20 09:05:43 +01:00
  • 228f50224d Add get_model.cpp to tests/CMakeLists.txt crasm 2024-01-20 03:01:38 -05:00
  • d8b8ec6852 got stuck on CMake crasm 2024-01-20 02:49:49 -05:00
  • 6c459d6e66 Merge remote-tracking branch 'origin/master' into revamp-ci-run-sh crasm 2024-01-20 02:11:55 -05:00
  • aa4ebf6879 Fix gg_get_model function crasm 2024-01-20 02:04:58 -05:00
  • cca894f16a
    cuda : fix compile error in jetson platform (#4975) Kylin 2024-01-20 15:01:46 +08:00
  • 847019e39a
    cuda: update ggml-cuda.cu comment Kylin 2024-01-20 14:41:40 +08:00
  • beb1c932b9
    cuda: update comment in ggml-cuda.cu Kylin 2024-01-20 14:10:17 +08:00
  • 0e41cc3323 ci : add ctest_with_model for debug and release crasm 2024-01-19 23:47:08 -05:00
  • d6cf33d6c5 Update test-model-load-cancel crasm 2024-01-19 22:35:05 -05:00
  • 86c3eab119 Attempt at writing ctest_with_model crasm 2023-12-18 04:45:39 -05:00
  • e7413ce3e0 ci : ctest uses -L main crasm 2023-12-18 04:23:58 -05:00
  • 0fddbcfe18 Label all ctest tests crasm 2023-12-18 04:23:20 -05:00
  • fecdccaf8e Simplify .gitignore for tests, clang-tidy fixes crasm 2023-12-17 22:33:38 -05:00
  • 63108cfb03 scripts : add wrapper script for local use of ci/run.sh crasm 2024-01-19 20:23:58 -05:00
  • fded2e6a11 apply suggestions FSSRepo 2024-01-19 20:18:18 -05:00
  • da41206bd6 adding test for mistral 7b-v0.1, need lora Uzo Nweke 2024-01-19 19:49:39 -05:00
  • 8f12dfaef3 breaking ground Uzo Nweke 2024-01-19 19:29:35 -05:00
  • 6e29f4c725 server: add llama_server_queue struct ngxson 2024-01-20 00:25:20 +01:00