Commit graph

  • 15f5d96037
    build : fix build info generation and cleanup Makefile (#3920) b1589 Jared Van Bortel 2023-11-30 17:23:08 -05:00
  • 73df0c4364 * merge with base ziadb 2023-11-30 17:16:48 -05:00
  • f595b69798 Revert "fix includes with help from include-what-you-use" Jared Van Bortel 2023-11-30 17:11:31 -05:00
  • 33c9892af5
    llava : ShareGPT4V compatibility (vision encoder only loading) (#4172) John 2023-11-30 23:11:14 +01:00
  • 0f175a6084 * change to set ziadb 2023-11-30 17:09:59 -05:00
  • 8efa0f6ebe
    main : pass LOG_TEE callback to llama.cpp log (#4033) b1587 Andrew Godfrey 2023-11-30 13:56:19 -08:00
  • 524907aa76
    readme : fix (#4135) vodkaslime 2023-12-01 05:49:21 +08:00
  • 3bd2c7ce1b
    docker : add finetune option (#4211) Juraj Bednar 2023-11-30 22:46:01 +01:00
  • bde629bb53
    batched.swift : update README.md (#4214) Miwa / Ensan 2023-12-01 06:45:17 +09:00
  • f7f9e06212
    cmake : fix the metal file foder path (#4217) b1583 Li Tan 2023-11-30 13:44:11 -08:00
  • 74daabae69
    readme : fix typo (#4253) Dawid Wysocki 2023-11-30 22:43:32 +01:00
  • b18c66ca6e
    llama : fix alignment of general.name in print meta (#4254) b1581 Daniel Bevenius 2023-11-30 22:43:08 +01:00
  • f4d973cecb
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258) slaren 2023-11-30 22:42:23 +01:00
  • 954e22858c
    llama : fix typical sampling (#4261) b1579 tarcey 2023-11-30 22:40:23 +01:00
  • e2bd725f4b
    py : fix oai proxy (#3972) rhjdvsgsgks 2023-11-30 20:50:40 +00:00
  • f91707bbe1 llama : sanity checks for access to logits Jared Van Bortel 2023-11-30 14:36:02 -05:00
  • ec8e74ebc8
    use stop as separator to replace hardcoded \n rhjdvsgsgks 2023-11-30 20:25:01 +00:00
  • c4db59230d
    metal : warp-based reduce for rms_norm Georgi Gerganov 2023-11-30 22:21:30 +02:00
  • 4183d1e931 Fix Apple clang determination bug. Will Findley 2023-11-30 14:59:07 -05:00
  • 55717c98c4
    metal : warp-based reduction for soft max kernel Georgi Gerganov 2023-11-30 21:52:32 +02:00
  • 68e02c0d58
    cuda : fix warp reduction initialization of shared mem Georgi Gerganov 2023-11-30 21:39:48 +02:00
  • 6b86bcffac
    cuda : increase max block size to 1024 Georgi Gerganov 2023-11-30 20:40:47 +02:00
  • 62532c05aa
    cuda : do warp-based block reduce Georgi Gerganov 2023-11-30 20:36:08 +02:00
  • c7c8dabcf7
    ggml : update soft max cpu Georgi Gerganov 2023-11-30 20:05:41 +02:00
  • 66aecf596c starting to build the graph mike dupont 2023-11-30 12:44:44 -05:00
  • ebd062bc19
    cuda : use 512 threads for soft_max instead of 32 Georgi Gerganov 2023-11-30 17:19:29 +02:00
  • 23987729aa Starting point kalomaze 2023-11-30 07:01:05 -06:00
  • 9ed1a9fd16 update mike dupont 2023-11-30 07:28:19 -05:00
  • cd5e1901d9 removed llama from common mike dupont 2023-11-30 06:59:10 -05:00
  • a195cdeec8 fixed chub ai imports (+1 squashed commits) Concedo 2023-11-30 18:00:39 +08:00
  • e9724cdc9d Merge branch 'master' into concedo_experimental Concedo 2023-11-30 14:31:53 +08:00
  • a012342a77 updated docs, shifted kv extra space to be subtracted from user's ctx value instead of added on load. Concedo 2023-11-30 14:19:40 +08:00
  • 5596f78935 update vocab Bingxuan Wang 2023-11-30 11:05:28 +08:00
  • 3b371e10c4
    Update examples/server/server.cpp Ziad Ben Hadj-Alouane 2023-11-29 22:04:08 -05:00
  • 14785e1148
    Update examples/server/server.cpp Ziad Ben Hadj-Alouane 2023-11-29 22:04:00 -05:00
  • 0e1a5aa5fa
    Update examples/server/server.cpp Ziad Ben Hadj-Alouane 2023-11-29 22:03:55 -05:00
  • 09da4b14f9
    Update examples/server/server.cpp Ziad Ben Hadj-Alouane 2023-11-29 22:03:50 -05:00
  • 68db4d597e update name format Bingxuan Wang 2023-11-30 10:09:41 +08:00
  • 277c64db68 Merge branch 'deepseek-llm' into regex_gpt2_preprocess Bingxuan Wang 2023-11-30 09:53:27 +08:00
  • 1293b46ef9
    Fix typical sampling. tarcey 2023-11-30 01:01:12 +01:00
  • 770fc6123f * typo fix ziadb 2023-11-29 17:25:24 -05:00
  • e72bb2403f * add --log-disable to disable logging to file in the server example ziadb 2023-11-29 17:24:27 -05:00
  • 38ce5d02e0 * remove all references to mutex_multitasks ziadb 2023-11-29 16:56:43 -05:00
  • b0024a6be2 linking mike dupont 2023-11-29 16:21:40 -05:00
  • 46d9bec698 adding in docs and notes mike dupont 2023-11-29 15:50:55 -05:00
  • f3ed3c00f5 convert.py : fix llama/llama2 conversion due to vocab_size=-1 slaren 2023-11-29 19:28:07 +01:00
  • 04f199e512
    Merge 167f9b20fc into 1f5cd83275 varon 2023-11-29 10:38:25 -06:00
  • 580fe2064c
    metal : simplify soft_max encoding Georgi Gerganov 2023-11-29 17:30:19 +02:00
  • 390a445906
    batched-bench : print threads Georgi Gerganov 2023-11-29 17:26:12 +02:00
  • 6a66f69f9f
    ggml : implement soft_max_ext (CPU) Georgi Gerganov 2023-11-29 17:07:07 +02:00
  • 486833214f 测试优化 supermy 2023-11-29 22:49:35 +08:00
  • 88519fbf97
    cuda : implement soft_max_ext Georgi Gerganov 2023-11-29 15:34:20 +02:00
  • 1e03cdf3cd
    llama: fix alignment of special tokens Daniel Bevenius 2023-11-29 14:56:06 +01:00
  • dc68997915
    llama: fix alignment of general.name in print meta Daniel Bevenius 2023-11-29 14:48:11 +01:00
  • 7da781ff26
    Fix typo in README.md Dawid Wysocki 2023-11-29 14:21:17 +01:00
  • e89597c062
    metal : implement soft_max_ext Georgi Gerganov 2023-11-29 12:44:47 +02:00
  • 1f5cd83275
    examples : add readme files Georgi Gerganov 2023-11-29 11:00:17 +02:00
  • 479af49955
    Merge pull request #1 from ggerganov/master Shijie 2023-11-29 16:10:59 +08:00
  • fecb61b193
    clean-up : warnings, names Georgi Gerganov 2023-11-29 10:01:36 +02:00
  • 4fea3420ee
    readme : add FreeChat (#4248) Peter Sugihara 2023-11-28 23:16:34 -08:00
  • 66ef4a20e2 refined multiuser mode Concedo 2023-11-29 14:29:45 +08:00
  • 1807a6e280 now faster and smaller mike dupont 2023-11-28 21:50:31 -05:00
  • 5615953b77 Merge branch 'master' into speedup-persimmon Galunid 2023-11-28 22:23:43 +01:00
  • b274940002
    readme: add FreeChat Peter Sugihara 2023-11-28 13:20:51 -08:00
  • 3e28686d7f persimmon : use rope over whole Qcur/Kcur Galunid 2023-11-28 22:08:50 +01:00
  • a1f9699645
    Merge 38b01ba136 into 64e64aa255 xaedes 2023-11-28 16:35:34 +01:00
  • 1504532b16
    Merge 128562dc83 into 64e64aa255 kaaid 2023-11-28 16:31:28 +01:00
  • b75152e3e9 added a proper quiet mode Concedo 2023-11-28 21:20:51 +08:00
  • 581021ab93 Merge branch 'master' into concedo_experimental Concedo 2023-11-28 20:57:56 +08:00
  • ba5c33319b Allocate a small amount of extra context for GGUF to deal with KV fragmentation causing issues in some scenarios. Concedo 2023-11-28 20:55:14 +08:00
  • c8d847d57e
    Merge branch 'master' into server-ui-improvements Yazan Agha-Schrader 2023-11-28 12:57:03 +01:00
  • 3a15b28ce6
    fix typo Yazan Agha-Schrader 2023-11-28 12:39:29 +01:00
  • 64e64aa255
    ggml : restore abort() in GGML_ASSERT (#4242) b1575 Jared Van Bortel 2023-11-28 04:51:11 -05:00
  • d2ef458b02 show more info about available APIs Concedo 2023-11-28 17:17:47 +08:00
  • cb33629441 update for deepseek-llm Bingxuan Wang 2023-11-28 16:58:58 +08:00
  • c96c458fe5
    Merge branch 'ggerganov:master' into master Yazan Agha-Schrader 2023-11-28 09:43:39 +01:00
  • 5ac0f300a9 improve error handling Yazan Agha-Schrader 2023-11-28 09:43:01 +01:00
  • 8406b0924b
    ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240) b1574 Georgi Gerganov 2023-11-28 10:32:03 +02:00
  • 1f5357cbcf
    fix flake8 warnings wonjun Jang 2023-11-28 16:46:54 +09:00
  • 61edd1bc59
    Remove unused variable/functions, add types to class variable and methods, delete blank liens wonjun Jang 2023-11-28 16:23:27 +09:00
  • 116fc90e9a
    Update promptFormats.js Yazan Agha-Schrader 2023-11-28 07:17:45 +01:00
  • 2e4c05e00a
    Update promptFormats.js Yazan Agha-Schrader 2023-11-28 07:15:03 +01:00
  • 9dcb514b1d update start server scripts Yazan Agha-Schrader 2023-11-28 06:57:29 +01:00
  • 57f8edd016 fix start-server.sh Yazan Agha-Schrader 2023-11-28 05:32:34 +01:00
  • bb39b87964 ggml : restore abort() in GGML_ASSERT assert-restore-abort Jared Van Bortel 2023-11-27 19:27:09 -05:00
  • d1d1cceda7 notebook mike dupont 2023-11-27 18:53:10 -05:00
  • e2ee37761e * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests ziadb 2023-11-27 18:12:58 -05:00
  • e056b06fbd error handling for missing dialog Yazan Agha-Schrader 2023-11-27 22:12:15 +01:00
  • 4fa32ad0e3 update Yazan Agha-Schrader 2023-11-27 21:45:12 +01:00
  • 1b6d4226b8 add start scripts to root path Yazan Agha-Schrader 2023-11-27 21:35:31 +01:00
  • 87f4102a70
    llama : revert n_threads_batch logic gg/fix-cpu-blas Georgi Gerganov 2023-11-27 21:21:23 +02:00
  • b38a16dfcf
    cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970) b1573 bandoti 2023-11-27 15:25:42 -04:00
  • ae096d0a92
    Merge branch 'ggerganov:master' into master Yazan Agha-Schrader 2023-11-27 20:10:11 +01:00
  • 33ee60fc2a Add BUILD_SHARED_LIBS option Mason M 2023-11-27 15:02:10 -04:00
  • e9b7a5cbd0
    llama : use n_threads_batch only when n_tokens >= 32 Georgi Gerganov 2023-11-27 20:48:44 +02:00
  • f815fe43d3
    ggml : use blas even if src0 is not F32 Georgi Gerganov 2023-11-27 20:48:27 +02:00
  • 6272b6764a use stride=128 if built for tensor cores ceb/perf-faster-multigpu Jared Van Bortel 2023-11-27 13:09:14 -05:00
  • dd71a35cc8 make MUL_MAT_SRC1_COL_STRIDE conditional on runtime mmq Jared Van Bortel 2023-11-27 13:05:55 -05:00
  • 0dab8cd7cc
    readme : add Amica to UI list (#4230) Kasumi 2023-11-28 01:39:42 +08:00
  • 229ee215b1
    Merge branch 'ggerganov:master' into cmake-fix-missing-build-info bandoti 2023-11-27 13:33:32 -04:00