Commit graph

  • 3b5515bbe0
    reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator xaedes 2023-08-14 22:09:36 +02:00
  • cedb4870c6
    gguf.py : add token types klosax 2023-08-14 22:08:40 +02:00
  • 5d518d421f
    constants.py : add token types klosax 2023-08-14 22:07:53 +02:00
  • 7ec125b1dc
    convert-llama-h5-to-gguf.py : add token types klosax 2023-08-14 22:06:33 +02:00
  • 56228461c8
    fix memory "leak" in optimizers xaedes 2023-08-14 21:12:02 +02:00
  • 6c63550f63
    llama : update tokenizer style Georgi Gerganov 2023-08-14 22:10:19 +03:00
  • 3e6468b097
    fix test when to create temporary backward graph xaedes 2023-08-14 20:56:03 +02:00
  • 098654c277
    only use ggml_allocr_alloc when tensor has NULL data and is no view xaedes 2023-08-14 20:56:56 +02:00
  • faf3e21eaf
    add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly xaedes 2023-08-14 20:50:09 +02:00
  • 7494c78428
    llama : sync gguf-llama with llama (#2613) Georgi Gerganov 2023-08-14 21:33:33 +03:00
  • e4b8f94d6b
    convert : update HF converter to new tokenizer voodoo magics Georgi Gerganov 2023-08-14 21:31:02 +03:00
  • 95d7593e4a
    llama : sync gguf-llama.cpp Georgi Gerganov 2023-08-14 21:18:19 +03:00
  • c35fc0bbb0
    convert : fix layer names Georgi Gerganov 2023-08-14 21:06:07 +03:00
  • 01080a5a51
    tests : fix wstring_convert Georgi Gerganov 2023-08-14 20:50:15 +03:00
  • aa0551a504
    tests : fix build + warnings (test-tokenizer-1 still fails) Georgi Gerganov 2023-08-14 20:14:55 +03:00
  • 58fdf3a07a
    llama : sync gguf-llama with llama Georgi Gerganov 2023-08-14 19:58:05 +03:00
  • afc4ca2889
    convert : update convert-new.py with tokenizer fixes (#2614) goerch 2023-08-14 19:20:04 +02:00
  • c9c3b87a9e Merge branch 'gguf' of https://github.com/goerch/llama.cpp into gguf goerch 2023-08-14 19:11:44 +02:00
  • cfb0e6ff16 Adapt convert-new.py (and fix a clang-cl compiler error on windows) goerch 2023-08-14 19:08:44 +02:00
  • 6e280b24dc
    remove unused forward_batch function xaedes 2023-08-14 19:02:12 +02:00
  • 3794dceb7f
    remove unused train params: mem_compute1_gb & mem_compute2_gb xaedes 2023-08-14 18:44:42 +02:00
  • 6f161c784b
    remove trailing whitespace xaedes 2023-08-14 18:33:27 +02:00
  • 271e4d64b5
    remove unused training parameters "use_scratch" and "use_unified" xaedes 2023-08-14 18:31:59 +02:00
  • c954f41ca4
    remove handwritten training functions xaedes 2023-08-14 18:27:01 +02:00
  • ec1b100720
    llama : tokenizer fixes (#2549) goerch 2023-08-14 18:30:28 +02:00
  • fe788a1c7a
    allocate graph on context using ggml_new_graph xaedes 2023-08-14 18:24:13 +02:00
  • 75baed230c
    set names for tensors in unified train function for easier debugging xaedes 2023-08-14 18:17:14 +02:00
  • 3e99a8d653
    format name of cloned tensors with " (clone)" suffix xaedes 2023-08-14 18:15:09 +02:00
  • 865c4cd3c1
    integrate unified training function which may use memory allocator xaedes 2023-08-14 18:12:58 +02:00
  • 4ed096c6b0
    add training options whether to use allocator and/or unified training function xaedes 2023-08-14 18:10:02 +02:00
  • d6c5b03858
    fix ASSERT to work with zero layers xaedes 2023-08-14 18:08:19 +02:00
  • 38f4438c32
    make sure some tensors are not reallocated by inserting new temporary nodes depending on them: xaedes 2023-08-14 18:07:16 +02:00
  • 9716eb8ef0
    fix variable name and add missing boolean negation xaedes 2023-08-14 17:59:19 +02:00
  • 5884b43a62
    add input tensors as checkpoints xaedes 2023-08-14 17:58:49 +02:00
  • b2f1310196
    swap arguments to commutative ops to be the same as in forward_batch_wo_cache_flash_attn xaedes 2023-08-14 17:57:13 +02:00
  • 5a11b75875
    fix variable names xaedes 2023-08-14 17:55:51 +02:00
  • 345f516f7c
    correctly clone view tensors by setting data pointers xaedes 2023-08-14 17:55:13 +02:00
  • 52c92c0a8c
    terminate recursive tensor cloning when reaching tensor without src tensors xaedes 2023-08-14 17:53:36 +02:00
  • 0dd496c5e2
    fix variable name and add missing type cast xaedes 2023-08-14 17:52:48 +02:00
  • cfddc36be2
    correctly clone reshape and permute operations by also cloning tensor->nb values xaedes 2023-08-14 17:52:15 +02:00
  • d43741540b
    don't use allocate hash_map on context xaedes 2023-08-14 17:51:20 +02:00
  • fc826c8ea8
    in train function replace add_inplace by regular add xaedes 2023-08-14 17:49:22 +02:00
  • 7108448841 Merge branch 'gguf' of https://github.com/goerch/llama.cpp into gguf goerch 2023-08-14 16:36:45 +02:00
  • fb591e1f04 Merge branch 'gguf' of https://github.com/ggerganov/llama.cpp into gguf goerch 2023-08-14 16:35:57 +02:00
  • 712b614ad4
    Merge branch 'gguf' into gguf goerch 2023-08-14 16:22:50 +02:00
  • 8af3a99ff1
    Merge branch 'master' into gguf Georgi Gerganov 2023-08-14 16:39:18 +03:00
  • 6f14854880
    gitignore : add gptneox-main Georgi Gerganov 2023-08-14 16:39:02 +03:00
  • d783f7982e
    metal : return null instead of exit(1) (#2573) master-d783f79 Jhen-Jie Hong 2023-08-14 21:37:39 +08:00
  • d75561df20
    server : add --numa support (#2524) master-d75561d Cheng Shao 2023-08-14 15:36:42 +02:00
  • 348acf188c
    llama : add missing enum keyword in function signatures (#2610) master-348acf1 Kamil Tomšík 2023-08-14 15:35:16 +02:00
  • f00780b2ee
    llama : sync gguf-llama.cpp with latest llama.cpp (#2608) Georgi Gerganov 2023-08-14 16:28:44 +03:00
  • 18d00611e2
    llama : minor Georgi Gerganov 2023-08-14 16:26:40 +03:00
  • f85395252f
    llama : refactor gguf_buffer and gguf_ctx_buffer Georgi Gerganov 2023-08-14 14:44:55 +03:00
  • 6f64b6c0f8
    Create convert-llama-7b-pth-to-gguf.py klosax 2023-08-14 13:51:09 +02:00
  • bbfd39382e
    Zig @cImport("llama.h") requires enum keyword in function signatures Kamil Tomšík 2023-08-14 13:20:31 +02:00
  • 797088a7cd
    minor : indentation + assert Georgi Gerganov 2023-08-14 14:10:21 +03:00
  • f4a0e0ec5a
    llama : sync gguf-llama.cpp with latest llama.cpp Georgi Gerganov 2023-08-14 13:44:37 +03:00
  • 62490f1380
    gguf : use UNIX line ending Georgi Gerganov 2023-08-14 13:04:35 +03:00
  • 0c19ae70d5
    simple : minor style changes Georgi Gerganov 2023-08-14 12:56:48 +03:00
  • 5c5a95ba2d
    gguf.py : dont add empty strings klosax 2023-08-14 11:22:06 +02:00
  • a7d226f871
    convert-llama-h5-to-gguf.py : fixes klosax 2023-08-14 11:14:24 +02:00
  • c2c1690568 Merge branch 'master' into server-probs jhen 2023-08-14 17:13:24 +08:00
  • e9be24f9ad Fix fp32 fallback if device doesn't support fp16, add force disable env var GGML_VULKAN_DISABLE_F16 0cc4m 2023-08-14 11:07:55 +02:00
  • d753dfbcc8
    gptneox-main.cpp : tensor name map changes klosax 2023-08-14 10:59:18 +02:00
  • 806a15749d
    Delete gguf_tensor_map.py klosax 2023-08-14 10:57:19 +02:00
  • 51939d7d1b
    Create gguf_namemap.py : tensor name map changes klosax 2023-08-14 10:56:59 +02:00
  • 5d22a9db13
    convert-gptneox-h5-to-gguf.py : tensor name map changes klosax 2023-08-14 10:55:44 +02:00
  • 236194fc3d add more comments Evan Jones 2023-08-14 04:41:50 -04:00
  • 1cd06fa25e
    CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596) master-1cd06fa Johannes Gäßler 2023-08-14 10:41:22 +02:00
  • 2feb8934eb
    server : fix default grammar by use empty string in the UI (#2604) master-2feb893 Jhen-Jie Hong 2023-08-14 16:20:17 +08:00
  • 7f828e6b10 Add llama_context_default_params_by_ref to be available to get it from java JNA ostix360 2023-08-14 10:07:49 +02:00
  • e950518776 CUDA: launch_bounds, small q4_K, q5_K mmq refactor JohannesGaessler 2023-08-12 17:13:20 +02:00
  • 01d22a4a10 Merge upstream changes, fix conflict 0cc4m 2023-08-14 09:47:43 +02:00
  • 592ebb044d Transfer remaining shaders to header and compile on runtime 0cc4m 2023-08-14 09:39:58 +02:00
  • d2dd8eb1d1 server : default grammar to empty string in the UI jhen 2023-08-14 15:28:17 +08:00
  • 70e2f7ca56
    Merge 'origin/master' into hipblas Henri Vasserman 2023-08-14 10:27:18 +03:00
  • dbdb2c1353 Merge branch 'master' of github.com:ggerganov/llama.cpp Laura 2023-08-14 09:21:10 +02:00
  • 5517d6e692
    server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588) master-5517d6e Jhen-Jie Hong 2023-08-14 15:16:54 +08:00
  • 56a1f32072
    Merge branch 'master' into gguf Georgi Gerganov 2023-08-14 10:14:05 +03:00
  • 196b50fee7 gguf : add todos and comments M. Yusuf Sarıgöz 2023-08-14 08:50:47 +03:00
  • f31b539714
    Enhance Windows 7 and below compatibility. (#2592) master-f31b539 vxiiduu 2023-08-14 13:59:16 +10:00
  • 4ae761144d server : optimize regex & iteration jhen 2023-08-14 09:20:06 +08:00
  • c57302162a server : fix sort of prop pairs jhen 2023-08-14 07:40:41 +08:00
  • f41c6254a8 server : generate .hpp jhen 2023-08-14 06:24:12 +08:00
  • ee77efea2a
    test : add simple grammar parsing tests (#2594) master-ee77efe drbh 2023-08-13 10:00:48 -04:00
  • 24f48833ab fix conflicts M. Yusuf Sarıgöz 2023-08-13 16:55:42 +03:00
  • dc29f21481 adds cassert header drbh 2023-08-13 09:25:53 -04:00
  • 6beebf3fd9
    gptneox-main.cpp : add file_type key klosax 2023-08-13 14:11:01 +02:00
  • 2827b840e4
    convert-gptneox-h5-to-gguf.py : add file_type key klosax 2023-08-13 13:54:10 +02:00
  • bf2dad3100 convert : rm quantization version M. Yusuf Sarıgöz 2023-08-13 14:38:53 +03:00
  • 1d60468eee fix conflicts M. Yusuf Sarıgöz 2023-08-13 13:35:40 +03:00
  • 91d4bfd536 convert : write more metadata for LLaMA M. Yusuf Sarıgöz 2023-08-13 13:29:46 +03:00
  • 17800cd80f
    convert-llama-h5-to-gguf.py : load model in parts to save memory klosax 2023-08-13 12:20:02 +02:00
  • e3d1f07eb1
    convert-gptneox-h5-to-gguf.py : load model in parts to save memory klosax 2023-08-13 12:18:34 +02:00
  • 6d094cd89c server : remove trailing whitespaces jhen 2023-08-13 17:47:54 +08:00
  • a47ca7ae7a Add runtime shader compilation, start transferring shaders to this approach 0cc4m 2023-08-13 11:01:27 +02:00
  • 9bf5a7efcb
    Update gguf_tensor_map.py klosax 2023-08-13 01:27:38 +02:00
  • bffd3cde10 server : remove array check of completion_probabilities in messages jhen 2023-08-13 07:07:40 +08:00
  • f64d44a9b9
    CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) master-f64d44a Johannes Gäßler 2023-08-13 00:24:45 +02:00
  • c7bd8c147c
    gptneox-main.cpp : n_layer --> n_block klosax 2023-08-13 00:03:32 +02:00