Commit graph

  • cb8a4be5d0 Merge branch 'cancel-model-load' of github.com:crasm/llama.cpp into cancel-model-load crasm 2023-12-17 14:31:49 -05:00
  • 32ebd525bf Fail test if model file is missing crasm 2023-12-17 14:31:03 -05:00
  • 1160de38f6
    Update llama.cpp Georgi Gerganov 2023-12-17 21:25:19 +02:00
  • bf571733fd Merge branch 'master' of github.com:ggerganov/llama.cpp Laura 2023-12-17 20:03:30 +01:00
  • b1306c4394
    readme : update hot topics Georgi Gerganov 2023-12-17 20:16:23 +02:00
  • d8ed670c6c
    lookup : use n_draft from CLI params Georgi Gerganov 2023-12-17 20:06:41 +02:00
  • 800a489e4a
    llama.swiftui : add bench functionality (#4483) b1654 Georgi Gerganov 2023-12-17 19:38:41 +02:00
  • 865066621b
    llama.swiftui : improve bench gg/swiftui-bench Georgi Gerganov 2023-12-17 19:37:22 +02:00
  • 5c5bdba605
    llama : remove "mostly" from model infos Georgi Gerganov 2023-12-17 19:36:53 +02:00
  • f7f468a97d
    gguf-py : fail fast on nonsensical special token IDs (#4489) Jared Van Bortel 2023-12-17 10:45:46 -05:00
  • f86b9d152c
    lookup : minor pr/4484 Georgi Gerganov 2023-12-17 17:25:28 +02:00
  • 919c40660f
    build : Check the ROCm installation location (#4485) b1652 Matheus Gabriel Alves Silva 2023-12-17 12:23:33 -03:00
  • 45668633fd
    finetune : keep allocs alive until all allocations are done (#4486) b1651 slaren 2023-12-17 16:05:56 +01:00
  • 8a264a5bd4 fixup! Trailing whitespace MatheusGASource 2023-12-17 12:05:40 -03:00
  • 0ffc92d2d2
    server : disable llm logs if SERVER_VERBOSE is off (#3792) b1650 olexiyb 2023-12-17 17:02:16 +02:00
  • 8edd2b40fd
    server : fix grammar being ignored (#4494) b1649 AdithyanI 2023-12-17 15:57:56 +01:00
  • eb16dae7e7
    server : fix possible ambiguity in content type charset (#4501) b1648 Alexey Parfenov 2023-12-17 14:56:09 +00:00
  • 62bd52b7bf
    server : allow requests larger than 8K (#4500) b1647 mzcu 2023-12-17 15:54:37 +01:00
  • 5b27975479
    lookup : fix token positions in the draft batch Georgi Gerganov 2023-12-17 16:47:26 +02:00
  • 1b26d7151a Added colors to distinguish drafted tokens (--color). Updated README Leon Ericsson 2023-12-17 13:04:46 +01:00
  • 262fd466f3
    llama.swiftui : remove model from project Georgi Gerganov 2023-12-17 13:49:44 +02:00
  • 5daa5f54fd
    Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) b1646 Bach Le 2023-12-17 18:57:33 +08:00
  • 4ed98b90bc
    llama.swiftui : avoid data copy via "downloadTask" Georgi Gerganov 2023-12-17 12:19:52 +02:00
  • 9629448716
    llama.swiftui : UX improvements Georgi Gerganov 2023-12-17 11:46:13 +02:00
  • d36ca171b6
    gitignore : xcode stuff Georgi Gerganov 2023-12-17 10:49:05 +02:00
  • 7ec5721cc7 Link to cublas dynamically on Windows even with LLAMA_STATIC Bach Le 2023-12-17 15:58:25 +08:00
  • 5fef0d6bc9 Merge remote-tracking branch 'origin/master' into vulkan 0cc4m 2023-12-17 08:44:48 +01:00
  • 42e9525884
    cuda : less diff in the rope_neox kernel Georgi Gerganov 2023-12-17 09:14:29 +02:00
  • d2f1e0dacc
    Merge branch 'cuda-cublas-opts' into gg/phi-2 gg/phi-2 Georgi Gerganov 2023-12-17 08:41:46 +02:00
  • f703ca8a3c
    ggml : fix NeoX rope to rotate just first n_dims Georgi Gerganov 2023-12-17 08:39:18 +02:00
  • b672c169ca
    ggml : fix NeoX rope to rotate just first n_dims Georgi Gerganov 2023-12-17 08:39:18 +02:00
  • ec05230703 updated lite, up ver Concedo 2023-12-17 14:38:39 +08:00
  • e8cf7f6ed3 Merge remote-tracking branch 'origin/master' into concedo_experimental Concedo 2023-12-17 14:37:14 +08:00
  • e75889a9b8
    Merge branch 'master' into cuda-cublas-opts Georgi Gerganov 2023-12-17 08:20:02 +02:00
  • ea98db46fd fixup! It was returning the path instead of the command output MatheusGASource 2023-12-17 00:47:03 -03:00
  • da44d45265 comment #Preview & fix editorconfig check jhen 2023-12-17 11:37:55 +08:00
  • a520e87ed6 update project.pbxproj jhen 2023-12-17 11:31:44 +08:00
  • 3d7628fce6 more generic approach MatheusGASource 2023-12-17 00:30:59 -03:00
  • ce1df8124a add download buttons & expose llamaState.loadModel jhen 2023-12-17 11:09:51 +08:00
  • ff87313db8 force to use n_gpu_layers on simulator jhen 2023-12-17 11:08:17 +08:00
  • 21b68f3032 fixup! Add basic chipStar support Quinten Kock 2023-12-17 01:52:36 +01:00
  • 2a86c00ffa Add basic chipStar support Quinten Kock 2023-12-16 17:40:56 +01:00
  • c6c4fc081c
    lora : add support for non-llama models (#3333) b1645 slaren 2023-12-16 18:58:46 +01:00
  • 07838c9124 fix style slaren 2023-12-16 17:07:41 +01:00
  • 93c9a5d12a lora : include embd and output layers in size calculation slaren 2023-12-16 17:04:39 +01:00
  • 0644c3be51
    phi-2 : scale Q instead of KQ for better precision Georgi Gerganov 2023-12-16 18:01:08 +02:00
  • 70e0fa12f4
    server: fix possible ambiguity in content type charset ZXED 2023-12-16 18:09:11 +03:00
  • 76a3ba42eb Merge branch 'master' into concedo_experimental Concedo 2023-12-16 22:58:53 +08:00
  • b7c7845277 Merge remote-tracking branch 'origin/master' into lora-falcon slaren 2023-12-16 15:41:31 +01:00
  • 3289eb0cb3 lora : allow 1d tensors slaren 2023-12-16 15:40:39 +01:00
  • f5c0184bf3 added support for gpt2 manikbhandari 2023-12-16 09:17:45 -05:00
  • 0b6ffa580c
    convert : revert "added_tokens_decoder" change Georgi Gerganov 2023-12-16 16:05:35 +02:00
  • 8972427293 Server: allow requests larger than 8K Milos Cubrilo 2023-12-16 14:57:37 +01:00
  • c05883f3a9 Free model gpu buffers on exit 0cc4m 2023-12-16 14:49:03 +01:00
  • 45b8032b9c Merge branch 'prompt-lookup' of github.com:LeonEricsson/llama.cpp into prompt-lookup Leon Ericsson 2023-12-16 12:13:50 +01:00
  • 21431197a1 kv_cache management Leon Ericsson 2023-12-16 12:12:33 +01:00
  • ed7c2cb1f9
    Update server.cpp AdithyanI 2023-12-16 12:07:23 +01:00
  • a878be4cb1
    convert : phi don't add BOS token Georgi Gerganov 2023-12-16 11:20:11 +02:00
  • 5469d82d5a
    llama : fix meta KV override bug Georgi Gerganov 2023-12-16 11:19:56 +02:00
  • 7500fa2f07
    py : whitespaces Georgi Gerganov 2023-12-16 11:01:02 +02:00
  • aa5c881adb
    phi-2 : use layer norm eps Georgi Gerganov 2023-12-16 10:54:10 +02:00
  • a2a3d2c8d7
    phi-2 : various fixes Georgi Gerganov 2023-12-16 10:46:18 +02:00
  • a81d4a5ea0
    Update demo in README.md (#6) Holden X 2023-12-16 16:42:33 +08:00
  • cd34b87de0 Fix Python and shader header format 0cc4m 2023-12-16 09:35:14 +01:00
  • 2c8a156c58 Merge upstream changes, fix conflicts, adapt soft_max op 0cc4m 2023-12-16 09:30:20 +01:00
  • 64d83e1fd5 Enrich and reword README.md (squashed) Jeremy Song 2023-12-16 01:28:10 +08:00
  • 22ab495a79 fix warning in ggml.c (#5) Jeremy Song 2023-12-16 12:13:57 +08:00
  • 1557b81743 Add solver (#4) Jeremy Song 2023-12-16 02:22:53 +08:00
  • 0adf4c73bc update: benchmark results for llama2-7b Trần Đức Nam 2023-12-16 11:38:42 +07:00
  • 8a5be3bd58
    llama : sanity checks for access to logits (#4274) b1644 Jared Van Bortel 2023-12-15 22:16:15 -05:00
  • e20765534d fix breaking change Ebey Abraham 2023-12-16 00:41:06 +00:00
  • b0547d2196 gguf-py : fail fast on nonsensical special token IDs ceb/fix-badspecial-silentfail Jared Van Bortel 2023-12-15 18:06:42 -05:00
  • 8072706210 kompute : always destroy Manager via the destructor Jared Van Bortel 2023-12-15 16:23:24 -05:00
  • 2d2c76acc4 vulkan : fix free of stack addr in llama_buffer Jared Van Bortel 2023-11-29 18:17:57 -05:00
  • 12cc80cb89 phi2 implementation Ebey Abraham 2023-12-15 20:56:57 +00:00
  • 4d607da918 finetune : keep allocs alive until all allocations are done slaren 2023-12-15 20:27:55 +01:00
  • f58f581ca8 refactor llama.cpp modifications Jared Van Bortel 2023-12-15 13:38:54 -05:00
  • 04ce04ae96 build : Check the ROCm installation location MatheusGASource 2023-12-15 13:57:47 -03:00
  • 9adba26a1a
    Polish README (#1) Holden X 2023-12-16 00:43:16 +08:00
  • 66a1bb4602
    add gpu index opts and udpate doc commands (#2) Holden X 2023-12-16 00:42:08 +08:00
  • fe3bc49e81 Add our README (#7) Holden X 2023-12-15 23:49:37 +08:00
  • 15b193729b
    Offloading tensors based on total VRAM budget and offloading policy (#6) Holden X 2023-12-15 23:46:51 +08:00
  • 6a8680204c
    llama.swiftui : initial bench functionality Georgi Gerganov 2023-12-15 16:39:16 +02:00
  • b89a0b7296
    Delete README copy.md Holden X 2023-12-15 21:31:34 +08:00
  • bb55e4af2c
    Full cpu (#5) Jeremy Song 2023-12-15 21:29:10 +08:00
  • 29e3645501 Merge remote-tracking branch 'origin' into add_gpt2_support EC2 Default User 2023-12-15 13:26:09 +00:00
  • 340484161f
    Merge branch 'ggerganov:master' into prompt-lookup LeonEricsson 2023-12-15 14:15:04 +01:00
  • 1665ad8bf1 BUG: generates gibberish/repeating tokens after a while Leon Ericsson 2023-12-15 14:14:17 +01:00
  • 88ae8952b6
    server : add optional API Key Authentication example (#4441) b1643 ShadovvBeast 2023-12-15 13:49:01 +02:00
  • 81e67a218e
    server : to snake_case Georgi Gerganov 2023-12-15 13:47:14 +02:00
  • ee4725a686
    ggml : group mul_mat_id rows by matrix (cpu only) (#4480) b1642 slaren 2023-12-15 12:45:50 +01:00
  • afd336f7a6
    llama.swiftui : add bench button Georgi Gerganov 2023-12-15 12:38:30 +02:00
  • 7afb69b8f5 store row groups in wdata and calculate only once in GGML_TASK_INIT slaren 2023-12-15 11:30:04 +01:00
  • 3de63cf103 remove mmid parameters from mm forward slaren 2023-12-15 10:49:16 +01:00
  • 66ce753abd ggml : group mul_mat_id rows by matrix (cpu only) slaren 2023-12-15 00:15:24 +01:00
  • 4b1f70cb03 Fix bool return in llama_model_load, remove std::ignore use crasm 2023-12-14 16:29:05 -05:00
  • 6744dbe924
    ggml : use ggml_row_size where possible (#4472) b1641 slaren 2023-12-14 20:05:21 +01:00
  • 87cfad3c1c Merge branch 'master' of https://github.com/ggerganov/llama.cpp into rocm-amd-uma Erik Garrison 2023-12-14 19:42:46 +01:00
  • 1e946c54a2 cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON Erik Garrison 2023-12-14 19:41:53 +01:00
  • c8fd4ba846 ggml : restore 'static' specifiers Jared Van Bortel 2023-12-14 13:18:14 -05:00