Commit graph

  • 3a801e2752
    Update ggml-sycl.cpp Abhilash Majumder 2024-03-15 10:27:32 +05:30
  • 5e904ed8f2
    Update ggml-sycl.cpp Abhilash Majumder 2024-03-15 10:27:25 +05:30
  • 2217b02c99 Change requirement of last backend being CPU to requiring its default buffer type be a host buffer, fix rebase errors Branden Butler 2024-03-14 22:24:54 -05:00
  • e8a61568e9 Use CXX and CXXFLAGS for ggml-mpi compilation in Makefile Branden Butler 2024-03-14 19:56:23 -05:00
  • 4692644ff9 Remove hard-coded layer splits and support more than 2 nodes Branden Butler 2024-03-13 01:38:38 -05:00
  • 5f156f3a0c Clean up MPI backend a tad Branden Butler 2024-03-12 18:17:42 -05:00
  • 72dcd66c0f Resize seq_ids by n_seq_max, port over sync_pipelined instead of using Bcast Branden Butler 2024-03-12 14:20:03 -05:00
  • c6280bc3f4 Update to use backend GUID and changed signatures Branden Butler 2024-03-12 12:40:23 -05:00
  • 01be58caa9 Fix simple to use new per-node thread count Branden Butler 2024-03-12 11:34:20 -05:00
  • 619bf62acf Support new MPI backend in llama.cpp and increase GGML max split inputs Branden Butler 2024-03-12 11:33:33 -05:00
  • 942ce843f8 Working MPI backend implementation Branden Butler 2024-03-12 11:32:31 -05:00
  • bc93545005 Allow MPI backend to wrap multiple backends Branden Butler 2024-02-26 19:12:16 -06:00
  • 968cefb4a9 Wrap backends with MPI backend Branden Butler 2024-02-19 12:21:48 -06:00
  • b98274c76f Begin transition to backend v2 Branden Butler 2024-02-05 17:19:45 -06:00
  • aa166462f1 Fix draft thread args and remove grads from mpi eval_init Branden Butler 2024-02-03 13:57:00 -06:00
  • c9d18263b3 Allow per-node threads to be set in command-line args, add mpi support to main Branden Butler 2023-11-01 14:55:32 -05:00
  • 32078d6fe1 Fix missing layer_inp_i names Branden Butler 2023-11-01 12:23:30 -05:00
  • b7599f7a56 Fix some mpi mem leaks, add mpi-layer-split to help when using mpi Branden Butler 2023-10-31 15:55:15 -05:00
  • 888d4f591b Update MPI code to new KV seq rm and bos/eos model APIs Branden Butler 2023-10-30 10:50:20 -05:00
  • bcfb190c28 Synchronize batch sequence info, fixing MPI for llama_decode() Branden Butler 2023-10-29 15:16:16 -05:00
  • ede7ff0c66 Fix MPI compilation errors Branden Butler 2023-10-25 17:15:11 -05:00
  • 50a63eb5f9 Fix minor rebase errors Branden Butler 2023-10-24 12:00:52 -05:00
  • fda60ead35 Replace vector with C-style array and length in llama_split_layers_weighted Branden Butler 2023-09-28 12:39:34 -05:00
  • 364b707130 Remove unrelated sections from mpi readme Branden Butler 2023-09-25 18:59:14 -05:00
  • 6c07d6cfa1 Remove fprintf logs from mpi main Branden Butler 2023-09-25 17:52:57 -05:00
  • 8fe813130a Update MPI example to follow main changes Branden Butler 2023-09-25 17:42:41 -05:00
  • 16eff5af69 Disable warmup under MPI Branden Butler 2023-09-25 17:41:57 -05:00
  • 4829c6224e Revert accidental removal of ggml_mpi_backend_init Branden Butler 2023-09-25 10:15:30 -05:00
  • 78112ab5c2 Remove mtest (#3177) Branden Butler 2023-09-25 10:05:42 -05:00
  • 1e78fa4f91 Add code comments in MPI Branden Butler 2023-09-25 09:58:44 -05:00
  • 40a810923a Add documentation for ggml-mpi functions Branden Butler 2023-09-24 23:34:00 -05:00
  • 3ca1ca0182 Refactor MPI for heterogenous cluster support. Branden Butler 2023-09-24 10:05:34 -05:00
  • 3b3ad949f5 json: fix top-level $refs ochafik 2024-03-15 00:52:36 +00:00
  • 5a7deb27d5 json: pass static command to std::system in tests (fixed temp files) ochafik 2024-03-15 00:03:06 +00:00
  • f2165502c9 json: fix zig build ochafik 2024-03-14 23:51:44 +00:00
  • 3feac66d0f Merge remote-tracking branch 'origin/master' into json-fixes ochafik 2024-03-14 23:37:13 +00:00
  • 82813891b7 llama-bench : use random tokens to improve accuracy with mixtral slaren 2024-03-14 23:49:15 +01:00
  • 52837f03d5 Fix compiler warnings Ondřej Čertík 2024-03-14 16:49:43 -06:00
  • fc6f042b30
    Merge pull request #1 from ggerganov/gg/repeng Theia Vogel 2024-03-14 14:56:06 -07:00
  • 51d5aa12b3 add Orion chat template ngxson 2024-03-14 22:32:17 +01:00
  • 4755afd1cb
    llama : fix integer overflow during quantization (#6063) b2431 Georgi Gerganov 2024-03-14 22:58:41 +02:00
  • 5c32737f64
    llama : fix integer overflow during quantization Georgi Gerganov 2024-03-14 21:40:20 +02:00
  • 792c39a0ae gguf : add support for I64 and F64 arrays Ondřej Čertík 2024-03-14 11:57:10 -06:00
  • 6e0438da3c
    gguf : fix resource leaks (#6061) b2430 Steve Grubb 2024-03-14 14:29:32 -04:00
  • 68d1f2611c Fix resource leaks Steve Grubb 2024-03-14 14:04:15 -04:00
  • 727107707a
    gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
  • d97fb71f5c gguf-py : bump version to 0.8.0 Ondřej Čertík 2024-03-14 11:52:36 -06:00
  • 67022f14f2 feat: add jina v2 support bwanglzu 2024-03-14 18:20:09 +01:00
  • 260b069013
    Merge 0f7495469c into 69ff61397d Philipp Emanuel Weidmann 2024-03-14 22:22:45 +05:30
  • 059b93f465 Fix typo layernorm_epsilon should be layer_norm_epsilon Steve Grubb 2024-03-14 12:38:50 -04:00
  • 69ff61397d
    llama : support models without vocabulary (#5798) b2428 Michael Podvitskiy 2024-03-14 17:21:56 +01:00
  • 0a9bc301ac
    control-vectors : minor code style updates gg/repeng Georgi Gerganov 2024-03-14 16:43:37 +02:00
  • 044ec4b2a5
    embedding : add EOS token if not present (#899) b2427 Georgi Gerganov 2024-03-14 15:14:14 +02:00
  • 31277a1a24
    fix format Neo Zhang Jianyu 2024-03-14 20:52:27 +08:00
  • dd519ea996
    mark comment for improvement in the future Neo Zhang Jianyu 2024-03-14 20:48:05 +08:00
  • e8d77ab896
    fix format Neo Zhang Jianyu 2024-03-14 20:46:57 +08:00
  • 3b672ca8fe
    fix format Neo Zhang Jianyu 2024-03-14 20:44:38 +08:00
  • e36c5c06cc issues: ci - close inactive issue with workflow Pierrick HYMBERT 2024-03-14 13:31:41 +01:00
  • 42abb46c1f
    Merge branch 'master' into vgel/repeng Georgi Gerganov 2024-03-14 14:26:23 +02:00
  • 0af3ed733f bug fix abhilash1910 2024-03-14 05:06:22 -07:00
  • 81b6139f4c bug fix abhilash1910 2024-03-14 04:57:58 -07:00
  • 9b030b98a6 iq2_s abhilash1910 2024-03-14 04:47:03 -07:00
  • 08d3b40190 iq2_s abhilash1910 2024-03-14 04:42:29 -07:00
  • 77178eedc8
    gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
  • 15a333260a
    readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
  • 43241adf22
    server: disable debug release type sanitizer, simplify trigger (#6047) b2424 Pierrick Hymbert 2024-03-14 12:15:39 +01:00
  • a44bc969e4
    llama : fix typo b2423 Georgi Gerganov 2024-03-14 13:13:06 +02:00
  • 2c4fb69246
    llama : optimize defrag moves + fix fragmentation calculation (#6037) b2422 Michael Podvitskiy 2024-03-14 11:56:48 +01:00
  • b88cd9f6ca
    Update llama.cpp Georgi Gerganov 2024-03-14 12:44:22 +02:00
  • 3ca23481dd
    gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
  • 3fe8d7a17f
    ggml : designate enum vals for integer types (#6050) b2420 Georgi Gerganov 2024-03-14 12:38:37 +02:00
  • 68265ebfc6
    embedding : print all resulting embeddings (#899) b2419 Georgi Gerganov 2024-03-14 12:37:20 +02:00
  • 381da2d9f0
    metal : build metallib + fix embed path (#6015) b2418 Georgi Gerganov 2024-03-14 11:55:23 +02:00
  • 5e9f459f30
    ggml : designate enum vals for integer types Georgi Gerganov 2024-03-14 11:39:25 +02:00
  • abf0afd0d6
    ci : fix iOS builds to use embedded library gg/metal-embed Georgi Gerganov 2024-03-14 11:34:22 +02:00
  • ed0f77b177
    metal : fix embeded library build Georgi Gerganov 2024-03-14 11:16:51 +02:00
  • 9cb1554fb0 one more NoVocab assert Michael Podvitskiy 2024-03-14 09:29:13 +01:00
  • 0fd6c1f015
    embedding : print cosine similarity (#899) b2417 Georgi Gerganov 2024-03-14 10:12:29 +02:00
  • 3da43c4b94
    fix grammar issue Neo Zhang Jianyu 2024-03-14 09:57:11 +08:00
  • a469431cab
    fix grammar issue Neo Zhang Jianyu 2024-03-14 09:56:48 +08:00
  • 0fa925c0c2 server: test: disable debug release type sanitizer thread on PR, simplify triggers and matrix - increase time out for server - do not fail fast Pierrick HYMBERT 2024-03-13 22:00:54 +01:00
  • 38328bb599 fragmentation calculation fix Michael Podvitskiy 2024-03-13 21:48:37 +01:00
  • 97ad402539 attempt to reduce the impact of a worst-case scenario Michael Podvitskiy 2024-03-13 11:24:15 +01:00
  • fc5d6e6513 Add support for I8, I16, I32 to gguf_reader Ondřej Čertík 2024-03-13 14:12:41 -06:00
  • c5423753f7 Add support for I8, I16 and I32 to gguf_writer Ondřej Čertík 2024-03-13 14:06:33 -06:00
  • dc0e4d8e74 Add support for I8, I16 and I32 Ondřej Čertík 2024-03-13 13:59:27 -06:00
  • 7a57feba0c import intrinsics. Julia Longtin 2024-03-13 19:26:54 +00:00
  • a1ae649662 use right type, and define GGML_F32_VEC_ZERO. Julia Longtin 2024-03-13 19:23:53 +00:00
  • f346a41deb try to implement one intrinsic Julia Longtin 2024-03-13 19:18:10 +00:00
  • aff1471582 readme: improve readme content for llava 1.6 Jian Liao 2024-03-13 11:44:26 -07:00
  • 19885d205e
    readme : update details about running llama in Termux on Android (#6039) b2416 Linwei Wang 2024-03-14 02:34:40 +08:00
  • 76a936c893
    readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
  • b7e9d5c8d4 Refactor dtype handling to be extensible Ondřej Čertík 2024-03-13 12:21:24 -06:00
  • 463628372d
    grammar : handle missing "root" node (#6004) b2414 Clint Herron 2024-03-13 14:10:40 -04:00
  • f30ea47a87
    llama : add pipeline parallelism support (#6017) b2413 slaren 2024-03-13 18:54:21 +01:00
  • 976176d0dd reduce default n_batch to 2048 slaren 2024-03-13 18:37:17 +01:00
  • 1f564815a3 small fix slaren 2024-03-13 18:33:22 +01:00
  • cb580a6493 fix merge slaren 2024-03-13 18:22:22 +01:00
  • 9092883d58 llama : better n_batch and n_ubatch comment slaren 2024-03-13 18:09:56 +01:00
  • 3c38789f5b ggml_backend_cpu_graph_compute : fix return value when alloc fails slaren 2024-03-13 17:59:55 +01:00