Commit graph

  • fb14faf6b0
    clang-tidy Evan Jones 2023-06-03 07:00:42 -04:00
  • 50ce29667f add interface for float input ningshanwutuobang 2023-06-03 18:51:58 +08:00
  • c1b293d31a fixed MPT ooms Concedo 2023-06-03 18:37:13 +08:00
  • 8bd9a3a48b updated readme, improved simple launcher Concedo 2023-06-03 17:17:15 +08:00
  • 6f82e17b7a added MPT support Concedo 2023-06-03 16:14:08 +08:00
  • 4df2ef3161
    mtl : make it work with main example Georgi Gerganov 2023-06-03 09:11:15 +03:00
  • c812ff2b8a Fix prompt cache saving and chat-persistent rollover (fixes #1670) Evan Jones 2023-06-03 00:41:16 -04:00
  • df2ecc942a
    Merge pull request #18 from anon998/update-readme Randall Fitzgerald 2023-06-02 17:04:25 -04:00
  • 98ae2de017 parse --mlock and --no-mmap + format anon 2023-06-02 17:54:46 -03:00
  • 05a5a485b8 make help text load faster anon 2023-06-02 17:52:04 -03:00
  • a6ed390cc6 update readme anon 2023-06-02 17:48:29 -03:00
  • e1e2be2146 remove --keep from help text anon 2023-06-02 17:47:42 -03:00
  • 2f4e9d19cc
    mtl : plug Metal inference into llama.cpp (very quick-n-dirty) Georgi Gerganov 2023-06-02 21:52:11 +03:00
  • 640a889632
    mtl : add save/load vocab to ggml file Georgi Gerganov 2023-06-02 21:00:30 +03:00
  • 03c2d72867
    mtl : simplify implementation Georgi Gerganov 2023-06-02 20:36:26 +03:00
  • 627605732c
    mtl : remove printfs from inner loop Georgi Gerganov 2023-06-02 19:58:08 +03:00
  • 9839259b63 allow specifying the horde limit as well Concedo 2023-06-03 00:55:44 +08:00
  • b088e14a7e
    mtl : more threads for rms_norm + better timing Georgi Gerganov 2023-06-02 19:26:58 +03:00
  • 70c3387726
    mtl : fix kernel signature + roll inner loop Georgi Gerganov 2023-06-02 19:11:39 +03:00
  • b58d73ca8c
    ci : disable temporary Georgi Gerganov 2023-05-29 20:57:24 +03:00
  • 847bbfe9e6
    mtl : faster mul_mat_q4_0_f32 kernel Georgi Gerganov 2023-06-02 18:28:31 +03:00
  • 5758e9f09b
    Removed embedding from flags. Randall Fitzgerald 2023-06-02 08:31:12 -07:00
  • 310bf61496
    Merge pull request #17 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-02 11:25:01 -04:00
  • de6df486e9
    Removed embedding from README Randall Fitzgerald 2023-06-02 08:24:46 -07:00
  • 33671460b0
    mtl : fix bug in f16 x f32 mul mat + speed-up computation Georgi Gerganov 2023-06-02 18:23:51 +03:00
  • bcd616700e
    improve docs and example Henri Vasserman 2023-06-02 18:04:46 +03:00
  • 96b0e536b7 Merge branch 'opencl-dev-concedo' into concedo_experimental Concedo 2023-06-02 22:12:14 +08:00
  • 59fe16877d Clblast fixes + enhancements to save VRAM: Concedo 2023-06-02 22:10:49 +08:00
  • 7cebe2eaf8 Merge branch 'master' of https://github.com/digiwombat/llama.cpp digiwombat 2023-06-02 10:06:04 -04:00
  • 16e1c9813a Removed the embedding api endpoint and associated code. digiwombat 2023-06-02 10:05:52 -04:00
  • 4dd72fc6e4
    Merge pull request #16 from anon998/fix-log-json Randall Fitzgerald 2023-06-02 09:43:29 -04:00
  • 41bb71bde7 replace invalid characters instead of crashing anon 2023-06-02 10:37:13 -03:00
  • 3ff27d30e3 Fixed up a few things in embedding mode. digiwombat 2023-06-02 09:20:53 -04:00
  • 28cc0cdc50
    Merge pull request #15 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-02 08:47:54 -04:00
  • 782120f9ae remove unneeded line Yuval Peled 2023-06-02 15:35:44 +03:00
  • 0ac1dd8df7 add benchmarks Yuval Peled 2023-06-02 15:33:41 +03:00
  • 1a76dbe00d Merge remote-tracking branch 'origin/master' Yuval Peled 2023-06-02 15:28:48 +03:00
  • 65823520c6 Add performance troubleshoot & benchmark Yuval Peled 2023-06-02 15:28:31 +03:00
  • 9291f4c606 add benchmarks Yuval Peled 2023-06-02 15:27:08 +03:00
  • 3df0192804
    improve long input truncation Henri Vasserman 2023-06-02 15:18:51 +03:00
  • 83cd67dda6 test table Yuval Peled 2023-06-02 15:09:31 +03:00
  • 89b377d7f8 test anchor link Yuval Peled 2023-06-02 14:47:00 +03:00
  • 1bd52c8627
    Merge branch 'ggerganov:master' into master Randall Fitzgerald 2023-06-02 07:31:55 -04:00
  • f5d5e7020d
    Merge pull request #14 from anon998/do-completion-update Randall Fitzgerald 2023-06-02 07:30:53 -04:00
  • f820740dad move multibyte check to doCompletion anon 2023-06-02 08:16:39 -03:00
  • 8f9e546b51 trim partial stopping strings when not streaming anon 2023-06-02 08:14:28 -03:00
  • bebea657cb
    Merge pull request #13 from anon998/small-fixes Randall Fitzgerald 2023-06-02 06:53:10 -04:00
  • abb7782745
    Merge branch 'master' into small-fixes anon998 2023-06-02 10:35:06 +00:00
  • 88cc7bb6f7
    Stuff with logits Henri Vasserman 2023-06-02 13:29:57 +03:00
  • 47efbb5cf3 use std::isinf to check if ignore_eos is active anon 2023-06-02 07:19:21 -03:00
  • 2932db15a3 avoid creating element in logit_bias accidentally anon 2023-06-02 06:55:38 -03:00
  • d38ad41528 Add llama.cpp:full image support for Chinese, and related documents(#1649) qingfengfenga 2023-06-02 17:40:54 +08:00
  • a8a9f19689 small fixes anon 2023-06-02 05:57:20 -03:00
  • 49dce94885 make types match gpt_params exactly anon 2023-06-02 05:51:34 -03:00
  • 1488a0f528 make functions that never return false void anon 2023-06-02 05:47:00 -03:00
  • ebfead6e5a remove unused variables anon 2023-06-02 05:45:57 -03:00
  • 731ecc0d1b fix typo anon 2023-06-02 05:45:16 -03:00
  • 0bc047730f
    Apply suggestions from code review Henri Vasserman 2023-06-02 10:29:09 +03:00
  • 8d0c81e7cc Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental Concedo 2023-06-02 12:19:59 +08:00
  • 144d8a8312 updated lite Concedo 2023-06-02 12:19:51 +08:00
  • e55f7b0bdb
    mtl : add f16 mat x f32 vec multiplication kernel Georgi Gerganov 2023-06-01 23:37:49 +03:00
  • f0196a7e7a
    mtl : optimize rms_norm and soft_max kernels Georgi Gerganov 2023-06-01 22:51:42 +03:00
  • d9626743ac
    add option to use scratch buffers in training or not xaedes 2023-06-01 20:59:19 +02:00
  • 9665429e94
    mtl : full GPU inference of the computation graph Georgi Gerganov 2023-06-01 21:50:01 +03:00
  • fbd3f6258d
    mtl : add non-broadcast mul kernel Georgi Gerganov 2023-06-01 21:40:53 +03:00
  • 42dca4004c
    mtl : add silu kernel Georgi Gerganov 2023-06-01 21:35:11 +03:00
  • a0cc3de59a
    mtl : add f32 -> f32 cpy kernel Georgi Gerganov 2023-06-01 21:30:33 +03:00
  • a266c26de2
    mtl : verify V tensor contents Georgi Gerganov 2023-06-01 21:27:24 +03:00
  • f67c2d8cab
    ggml : update ggml_nbytes() to handle non-contiguous tensors Georgi Gerganov 2023-06-01 21:27:03 +03:00
  • 0d4b87de3d
    improve training memory usage with scratch buffers xaedes 2023-06-01 19:50:48 +02:00
  • 17930fbcb7
    mtl : fix soft_max kernel Georgi Gerganov 2023-06-01 20:48:24 +03:00
  • 765b290010
    bug fix for ggml_compute_forward_get_rows_back_f32 xaedes 2023-06-01 19:42:51 +02:00
  • 3164f93381
    fix formulas in comments xaedes 2023-06-01 19:41:55 +02:00
  • 17a70362a6
    mtl : add diag_mask_inf kernel Georgi Gerganov 2023-06-01 20:41:54 +03:00
  • 0e269665cd
    add ggml_opt_resume_g which accepts forward and backward cgraphs xaedes 2023-06-01 19:41:28 +02:00
  • 24239f0df7 Improve implementation 0cc4m 2023-06-01 18:57:08 +02:00
  • 0f1c580860
    mtl : add scale kernel Georgi Gerganov 2023-06-01 19:52:32 +03:00
  • 51efb59437
    mtl : confirm f16 x f32 attention mul mat Georgi Gerganov 2023-06-01 19:45:36 +03:00
  • 948fcfde7e
    mtl : add cpy kernel + handle view ops Georgi Gerganov 2023-06-01 19:21:28 +03:00
  • 94ea9e7bfe
    ggml : store offset as opt arg for ggml_view_xd() operators Georgi Gerganov 2023-06-01 19:21:08 +03:00
  • a8a22ff93f Build locally will detect CPU Howard Su 2023-06-01 23:00:12 +08:00
  • ac072d7c91 Modify per code review sugguestions Howard Su 2023-04-16 22:42:06 +08:00
  • 7adce4f64c Only check hardware when option is ON Howard Su 2023-04-07 21:04:47 +08:00
  • 5f50d15120 Add detection code for avx Howard Su 2023-04-01 16:32:14 +08:00
  • 98552d1e5d cleanup and simplify the code Howard Su 2023-06-01 22:44:36 +08:00
  • 37659d2c4e allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads. Concedo 2023-06-01 22:33:50 +08:00
  • ab0a7d1531 Add normalfloat4 as Q4_2 Howard Su 2023-06-01 22:17:53 +08:00
  • d29b6d5f55
    Merge pull request #12 from anon998/clear-logit-bias Randall Fitzgerald 2023-06-01 08:58:35 -04:00
  • 8cbc4be6c2 clear logit_bias between requests + print anon 2023-06-01 09:49:50 -03:00
  • 6025476e39 default penalize_nl back to true anon 2023-06-01 09:49:16 -03:00
  • 49a18bdd14 remove unused parameter warning anon 2023-06-01 09:41:35 -03:00
  • af711263ae
    Merge pull request #11 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-01 08:10:55 -04:00
  • 797155a0d1
    Merge pull request #10 from cirk2/master Randall Fitzgerald 2023-06-01 08:10:26 -04:00
  • 49272e3c53 adjusted defaults Concedo 2023-06-01 20:03:44 +08:00
  • 9531ae60db
    Add logit bias support Henri Vasserman 2023-06-01 13:57:47 +03:00
  • 8c6a5fc92b
    last tokens fixes Henri Vasserman 2023-06-01 13:18:12 +03:00
  • 5bbc030338
    Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS rules Felix Hellmann 2023-06-01 10:47:53 +02:00
  • 457aaf5bad Reduce code duplication between cuda and opencl branches 0cc4m 2023-06-01 07:33:32 +02:00
  • 93278f84cf low_level_api_chat_cpp.py: fix default path_prefix arg value to match class default value Don Mahurin 2023-05-23 06:21:31 -07:00
  • 62ddbc6cd9
    Update Makefile to add SSSE3 compilation use cases rankaiyx 2023-06-01 08:46:07 +08:00