Commit graph

  • efe287f026 replaced all API facing int's with int32_t marcus 2023-12-21 13:39:05 -08:00
  • afefa319f1
    ggml : change ggml_scale to take a float instead of tensor (#4573) b1680 Georgi Gerganov 2023-12-21 23:20:49 +02:00
  • 769a7bc85e
    gguf-py : fix broken link Georgi Gerganov 2023-12-21 23:20:36 +02:00
  • 32259b2dad
    gguf : simplify example dependencies b1678 Georgi Gerganov 2023-12-21 23:07:58 +02:00
  • 4a5f9d629e
    ci : add jlumbroso/free-disk-space to docker workflow (#4150) b1677 Samuel Maynard 2023-12-21 22:36:26 +02:00
  • ab1b75166f
    Merge branch 'master' into gg/ggml_scale gg/ggml_scale Georgi Gerganov 2023-12-21 22:35:11 +02:00
  • d232aca5a7
    llama : initial ggml-backend integration (#4520) b1676 slaren 2023-12-21 21:07:46 +01:00
  • 31f27758fa
    llama : allow getting n_batch from llama_context in c api (#4540) b1675 Marcus Dunn 2023-12-21 11:57:48 -08:00
  • 74d109bebc
    Update llama.h Georgi Gerganov 2023-12-21 21:57:32 +02:00
  • 56fa50819f
    metal : fix ggml_metal_log vargs (#4373) Finn Voorhees 2023-12-21 14:55:02 -05:00
  • 0f630fbc92
    cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) b1673 Erik Garrison 2023-12-21 13:45:32 -06:00
  • b784f881c3
    tests : fix test-grad0 Georgi Gerganov 2023-12-21 21:33:57 +02:00
  • 562cf222b5
    ggml-cuda: Fix HIP build by adding define for __trap (#4569) b1672 arlo-phoenix 2023-12-21 20:13:25 +01:00
  • f4d884f47e metal : add default log function that prints to stderr, cleanup code slaren 2023-12-21 20:03:40 +01:00
  • 36c3f41f66
    ggml : fix CPU implementation Georgi Gerganov 2023-12-21 21:02:23 +02:00
  • 199f6bdc46
    ggml : change ggml_scale to take a float instead of tensor Georgi Gerganov 2023-12-21 20:50:24 +02:00
  • 323881ef4b remove unnecessary unmap slaren 2023-12-21 19:37:23 +01:00
  • 16582cdf4e Merge remote-tracking branch 'origin/master' into sl/ggml-backend-int slaren 2023-12-21 19:34:07 +01:00
  • cd4167b634 llama_mmap : avoid unmapping the same fragments again in the destructor slaren 2023-12-21 19:24:54 +01:00
  • a907194060 ggml-cuda: Fix HIP build by adding define for __trap arlo-phoenix 2023-12-21 19:04:27 +01:00
  • 8fe03ffdda
    common : remove incorrect --model-draft default (#4568) b1671 Jared Van Bortel 2023-12-21 12:55:34 -05:00
  • 9154494808
    CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) b1670 Johannes Gäßler 2023-12-21 18:42:59 +01:00
  • c083718c89
    readme : update coding guidelines Georgi Gerganov 2023-12-21 19:27:14 +02:00
  • 7c87353e61 common : remove incorrect --model-draft default ceb/fix-draft-model-default Jared Van Bortel 2023-12-21 12:17:12 -05:00
  • 880e352277
    py : open merges file as 'utf-8' (#4566) howlger 2023-12-21 18:07:34 +01:00
  • 66f35a2f48
    cuda : better error message for ggml_get_rows (#4561) b1667 bobqianic 2023-12-21 17:06:44 +00:00
  • 1398823922
    cuda : replace asserts in wrong architecture checks with __trap (#4556) b1666 slaren 2023-12-21 18:02:30 +01:00
  • 4aff73fa65 Fixed default MSVS build Alexander Krivutsenko 2023-12-21 18:01:55 +01:00
  • fcd0c2caa6 CUDA: mul_mat_id always on GPU for batches >= 32 JohannesGaessler 2023-12-20 22:38:43 +01:00
  • 6fb45b7e65
    In vocab.py, make sure that merges_file is opened as 'utf-8' howlger 2023-12-21 17:41:20 +01:00
  • 61d435674f
    Update ggml-cuda.cu Georgi Gerganov 2023-12-21 18:40:09 +02:00
  • d3223afdad
    llama : disable per-tensor info prints on model load (#4562) b1665 Johannes Gäßler 2023-12-21 17:34:17 +01:00
  • 46bcbf3805 fixed formatting ct-clmsn 2023-12-21 11:17:13 -05:00
  • ecf9c7983c did some formatting ct-clmsn 2023-12-21 11:10:28 -05:00
  • 6d08baccea added explicit casting; fixed small memcpy issue ct-clmsn 2023-12-21 11:05:19 -05:00
  • fa49c150d0 fixed small segmentation bug; switched to using type-sensitive openshmem calls ct-clmsn 2023-12-21 10:52:59 -05:00
  • 0de3b02353 updated README.md, fixed small documentation issues; modified a variable name ct-clmsn 2023-12-21 10:27:46 -05:00
  • 2378a29bde better error handling, try to avoid segfault in sillytavern Concedo 2023-12-21 22:58:48 +08:00
  • 1008498463
    Update ggml-cuda.cu bobqianic 2023-12-21 13:58:16 +00:00
  • 4852b47c84
    Update Makefile FantasyGmm 2023-12-21 21:55:35 +08:00
  • 28681d7235 Disabled per-tensor info prints on model load JohannesGaessler 2023-12-21 14:44:45 +01:00
  • 6a72c7f2e3 Merge remote-tracking branch 'origin/master' into sl/ggml-backend-int slaren 2023-12-21 14:18:22 +01:00
  • a74b1a89b3 do not offload scales slaren 2023-12-21 14:18:21 +01:00
  • a4e191f3df
    cuda : fix fprintf format string (minor) Georgi Gerganov 2023-12-21 14:30:50 +02:00
  • c05d195583 Merge branch 'concedo' into concedo_experimental Concedo 2023-12-21 20:08:54 +08:00
  • ff4c2b18d7 testing workflow for windows cuda builds Concedo 2023-12-21 19:36:52 +08:00
  • 96c12cf395 Merge branch 'master' into concedo_experimental Concedo 2023-12-21 20:03:21 +08:00
  • a82f85bb2b
    Update ggml-cuda.cu bobqianic 2023-12-21 11:45:46 +00:00
  • e1f013bbf8 testing workflow for windows cuda builds Concedo 2023-12-21 19:36:52 +08:00
  • 1d7a1912ce
    Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) b1664 LoganDark 2023-12-21 01:59:27 -08:00
  • bdfe4ba85c Add nocleanup special arg crasm 2023-12-21 04:55:28 -05:00
  • a787ebe7cf
    Handle broken pipe error (#572) Eugene Palmoff 2023-12-21 14:51:36 +05:00
  • e86b8cd93a Remove shellcheck installation step from workflow crasm 2023-12-21 04:28:58 -05:00
  • c9a6de8f8a Add check-requirements.sh script and GitHub workflow crasm 2023-12-21 04:16:41 -05:00
  • 8d1b87851d fix old jetson compile error FantasyGmm 2023-12-21 14:40:48 +08:00
  • ea1331a221 added comment ct-clmsn 2023-12-21 00:02:44 -05:00
  • 6aad7af26d improved README.md ct-clmsn 2023-12-20 23:56:45 -05:00
  • 79be614ea5 updated README.md and Makefile ct-clmsn 2023-12-20 23:52:34 -05:00
  • 4a8fd5567f
    add tinyllama chat prompt Yazan Agha-Schrader 2023-12-21 05:23:03 +01:00
  • 9604114da0 added baseline makefile support; fixed several compilation warnings ct-clmsn 2023-12-20 23:19:51 -05:00
  • fcfe07f829 initial import ct-clmsn 2023-12-20 22:44:08 -05:00
  • af9cd93413
    Ignore local content teleprint-me 2023-12-20 21:44:41 -05:00
  • 35a1e44e44 use LOG_TEE for stderr in ggml-cuda.cu Yann Follet 2023-12-21 02:37:20 +00:00
  • 8710171486 remove empty line Yann Follet 2023-12-21 01:20:43 +00:00
  • 20171125a8 clean up & split PRs Yazan Agha-Schrader 2023-12-21 01:43:08 +01:00
  • 7775e38d58 make bad_arch noreturn, remove returns slaren 2023-12-21 01:19:38 +01:00
  • 7d9323ed0f cuda : replace asserts in wrong architecture checks with __trap slaren 2023-12-21 00:42:50 +01:00
  • 065d56c569
    Fix access violation in ggml_cuda_free_data if tensor->extra is NULL LoganDark 2023-12-20 14:32:38 -08:00
  • 8ed2a8eb62 move final progress_callback call to load_all_data slaren 2023-12-20 23:23:21 +01:00
  • 98366a4047 sync gitignore Yazan Agha-Schrader 2023-12-20 23:17:54 +01:00
  • ecb23d4ac5 restore progress_callback behavior slaren 2023-12-20 23:15:15 +01:00
  • a6e9700a83 Fix syntax error Laura 2023-12-20 22:54:38 +01:00
  • 5834a25345 llama_mmap::align_offset : use pointers instead of references for out parameters slaren 2023-12-20 22:07:09 +01:00
  • 6c045a86ed ggml_backend_alloc_ctx_tensors_from_buft : remove old print slaren 2023-12-20 22:05:32 +01:00
  • f70f94dfb8 use posix_fadvise instead of posix_fadvise64 slaren 2023-12-20 21:53:52 +01:00
  • 6becb1f943
    Remove deprecated conversion script teleprint-me 2023-12-20 15:44:16 -05:00
  • e290792ae4
    Consolidate Handling of Phi Models in llama.cpp teleprint-me 2023-12-20 15:34:56 -05:00
  • ea6ae8d04c
    Consolidate Phi model conversion handling in convert-hf-to-gguf.py teleprint-me 2023-12-20 15:26:25 -05:00
  • e96f40bf99
    Update tensor mappings for Phi models (Phi-1, Phi-1.5, Phi-2) teleprint-me 2023-12-20 15:18:52 -05:00
  • e53d44c0bb
    Consolidate PHI and PHI2 architectures in gguf constants teleprint-me 2023-12-20 15:15:35 -05:00
  • 1d4bcd2044
    Merge branch 'master' into phi-1 teleprint-me 2023-12-20 12:28:48 -05:00
  • 24cc321931 update session copy/set to use ggml-backend slaren 2023-12-20 18:12:29 +01:00
  • bcd87ca925 update quantize and lora slaren 2023-12-20 17:15:43 +01:00
  • 967a0146fa
    Update ggml-cuda.cu Johannes Gäßler 2023-12-20 14:53:09 +01:00
  • 842adecffc CUDA: make MoE tensors contiguous for batch size>1 JohannesGaessler 2023-12-19 20:25:52 +01:00
  • f618e95eba remove test gpt2 file manikbhandari 2023-12-20 07:55:36 -05:00
  • 92b6fc3db2 update formatting manikbhandari 2023-12-20 07:52:17 -05:00
  • 5241045819 use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap slaren 2023-12-20 12:57:45 +01:00
  • 48e0767606 adds support for other models within gpt2 manikbhandari 2023-12-20 06:52:02 -05:00
  • 5d5b6088b5 resolve merge conflict manikbhandari 2023-12-20 06:28:20 -05:00
  • 799fc22689
    CUDA: Faster Mixtral prompt processing (#4538) b1663 Johannes Gäßler 2023-12-20 15:41:22 +01:00
  • b853df4207 Add convert-persimmon-to-gguf.py to new requirements.txt scheme crasm 2023-12-20 03:32:22 -05:00
  • ba46057b11 Merge remote-tracking branch 'upstream/master' into cancel-model-load crasm 2023-12-20 00:15:09 -05:00
  • ca122dc9e0 Add comment crasm 2023-12-20 00:14:56 -05:00
  • a0eab1ea19 Make per-python-script requirements work alone crasm 2023-12-20 00:10:31 -05:00
  • 267cfa408b Merge commit 'c50e400163' into cancel-model-load crasm 2023-12-20 00:04:20 -05:00
  • 293d16fd40 Restructure requirements.txt crasm 2023-12-20 00:00:08 -05:00
  • 741b7fb59b Merge branch 'github' of https://gitlab.vinai.io/mlbooster/llama.cpp into feature/awq_pr Trần Đức Nam 2023-12-20 11:21:16 +07:00
  • 71c0a27fdf Fixed params count Le Hoang Anh 2023-12-20 11:10:49 +07:00
  • c02f6df7c4 Formatted other files Le Hoang Anh 2023-12-20 11:04:03 +07:00