Commit graph

  • 19097c97a8 Add f16 implementation to ggml_compute_forward_add_f16_f32 Andrew Godfrey 2023-10-23 13:21:17 -07:00
  • 9ea91ceaf2 Suppress some warnings in ggml.c Andrew Godfrey 2023-10-23 13:01:31 -07:00
  • 1758d0abef tweak finetune.sh Andrew Godfrey 2023-10-23 09:59:43 -07:00
  • e1ebce03d6 Add 'finetune.sh', which currently fails when using GPU Andrew Godfrey 2023-10-21 22:42:37 -07:00
  • 4d452dbc10 Add fprintf in ggml_cuda_op_add Andrew Godfrey 2023-10-21 22:40:40 -07:00
  • facb1a3e0f Add '-ngl' support to finetune.cpp Andrew Godfrey 2023-10-21 22:41:24 -07:00
  • 2b4ea35e56
    cuda : add batched cuBLAS GEMM for faster attention (#3749) b1422 Georgi Gerganov 2023-10-24 16:48:37 +03:00
  • d798a17c34
    cuda : add TODO for calling cublas from kernel + using mem pool cuda-batched-gemm Georgi Gerganov 2023-10-24 16:33:24 +03:00
  • 51b3b56c08 Prevent offloading of more than 33 layers Galunid 2023-10-24 15:05:58 +02:00
  • 27c34c0112
    cuda : reduce mallocs in cublasGemmBatchedEx branch Georgi Gerganov 2023-10-24 15:06:02 +03:00
  • d2f8c9d51b Fix detokenization of non-special added-tokens goerch 2023-10-24 13:46:44 +02:00
  • 27d0c11897 Merge branch 'master' into stablelm-support Galunid 2023-10-24 12:52:48 +02:00
  • 3d297c1a30
    cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases Georgi Gerganov 2023-10-24 13:34:54 +03:00
  • fa2cd7e7b9 Add special token handling to conver script Galunid 2023-10-24 12:47:00 +02:00
  • 6a4d9c26e1
    readme: add AUR instructions and clean up preview (#494) AlpinDale 2023-10-24 09:13:56 +00:00
  • 7d120f2794
    Add context size parameter to google colab notebook (#489) teddybear082 2023-10-24 05:13:01 -04:00
  • 7744aa6a9c updated colab Concedo 2023-10-24 15:37:47 +08:00
  • 6966474928
    cuda : play with faster Q4_0 dequantization cuda-batched-gemm-deq Georgi Gerganov 2023-10-24 10:29:40 +03:00
  • daab3d7f45
    Add more tokenizer tests (#3742) b1421 Galunid 2023-10-24 09:17:17 +02:00
  • 469c9addef
    metal : handle ggml_scale for n%4 != 0 (close #3754) b1420 Georgi Gerganov 2023-10-24 09:46:50 +03:00
  • 09ff755ecc Added special token support to llama_tokenize() calls in server.cpp Michael Coppola 2023-10-23 17:30:28 -04:00
  • d415669087
    cuda : add ROCm / hipBLAS cublasGemmBatchedEx define Georgi Gerganov 2023-10-24 00:18:49 +03:00
  • 878aa4f209
    Apply suggestions from code review Kerfuffle 2023-10-23 15:09:50 -06:00
  • e3932593d4
    Revert "make : add optional CUDA_NATIVE_ARCH (#2482)" b1419 Georgi Gerganov 2023-10-23 23:46:05 +03:00
  • 9d02956443
    issues : separate bug and enhancement template + no default title (#3748) M. Yusuf Sarıgöz 2023-10-23 22:57:16 +03:00
  • a01ddfbcaa Remove bloom vocab/test Galunid 2023-10-23 21:52:05 +02:00
  • 69a6735087
    Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746) Galunid 2023-10-23 21:46:00 +02:00
  • 5be6c803fa
    llama : remove token functions with context args in favor of model (#3720) b1416 Marcus Dunn 2023-10-23 12:40:03 -07:00
  • 38de5d0458 Comment cosmetics Galunid 2023-10-23 19:58:35 +02:00
  • c13fcfbfc0
    cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops) Georgi Gerganov 2023-10-23 20:37:04 +03:00
  • 84d4ca0e47
    cuda : minor indentation Georgi Gerganov 2023-10-23 20:36:50 +03:00
  • 8d8d54f834
    ggml : skip nops in compute_forward Georgi Gerganov 2023-10-23 20:36:32 +03:00
  • 6a30bf3e51
    batched : add NGL arg Georgi Gerganov 2023-10-23 20:36:12 +03:00
  • 8fb1be642e
    cmake : add helper for faster CUDA builds Georgi Gerganov 2023-10-23 20:35:19 +03:00
  • fc5bb85545 changed token functions to use new model variants Marcus Dunn 2023-10-23 09:30:00 -07:00
  • 2df3801706 changed token functions to use new model variants Marcus Dunn 2023-10-23 09:29:11 -07:00
  • 38cdb82235 fixed main.cpp Marcus Dunn 2023-10-23 09:28:07 -07:00
  • 4646c9dadd added back docs Marcus Dunn 2023-10-23 09:25:05 -07:00
  • d7ef0be063
    Merge branch 'ggerganov:master' into master Marcus Dunn 2023-10-23 09:19:54 -07:00
  • a550b23e3a changed 3 more functions to take in model Marcus Dunn 2023-10-23 09:17:55 -07:00
  • 22d5eb41bb removed old llama_token functions Marcus Dunn 2023-10-23 09:15:48 -07:00
  • bc8395d5c4 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ntkv2 cebtenzzre 2023-10-23 12:10:36 -04:00
  • 353f4ef717
    formatting Marcus Dunn 2023-10-23 09:08:46 -07:00
  • 1244b0060b Update comment Galunid 2023-10-23 18:03:19 +02:00
  • b9bb4cbe86 Separate bug and enhancement template + no default title upd-issue-templates M. Yusuf Sarıgöz 2023-10-23 18:59:11 +03:00
  • a06e82bbfd Restrict bpe tokenizer tests to unicode planes Galunid 2023-10-23 17:52:35 +02:00
  • 4150e74d04 Eliminate tokenizes post, add option "special" to tokenize Eliminate tab compression from modified files. Troy Beukema 2023-10-23 11:48:37 -04:00
  • 6336701c93
    Fix baichuan convert script not detecing model (#3739) Galunid 2023-10-23 17:47:03 +02:00
  • 226f0a857d Update test vocab files Galunid 2023-10-23 17:40:23 +02:00
  • c04ddb6990 Add mpt Galunid 2023-10-23 16:57:50 +02:00
  • 217f82e734 Update special token handling Galunid 2023-10-23 16:53:34 +02:00
  • 769135b6be Add starcoder Galunid 2023-10-23 14:45:06 +02:00
  • 6919b8ce44 Add more tokenizer tests Galunid 2023-10-23 13:12:47 +02:00
  • f26df62651 Fix baichuan convert script not detecing model Galunid 2023-10-23 10:47:01 +02:00
  • 13e08d0efa Sync latest changes KerfuffleV2 2023-10-23 02:40:37 -06:00
  • 8a569cfee5 perplexity anti-mode improvements KerfuffleV2 2023-10-21 04:26:47 -06:00
  • d6b44fb3ae Force measure to allocate more memory for 70Bs KerfuffleV2 2023-10-19 21:14:23 -06:00
  • fae6d9c70d Fix pushing in wrong halflayer idx KerfuffleV2 2023-10-19 18:11:15 -06:00
  • 0abf0064ca What if we do something crazy like add layers instead of removing them? KerfuffleV2 2023-10-19 18:00:15 -06:00
  • d6f35c7ca5 Layer skipping demo KerfuffleV2 2023-10-09 18:54:16 -06:00
  • 89611cb05a
    Add fast tokenizer option to BpeVocab wonjun Jang 2023-10-23 04:15:43 +00:00
  • 5872e4f4da server support for system, prefix, and suffix prompts with special tokens Wile E. Coyote 2023-10-22 21:45:30 -04:00
  • d9c0332323 Update readme with stablelm support Galunid 2023-10-22 23:21:38 +02:00
  • 98b157809f
    Update README.md JJ 2023-10-22 14:18:24 -07:00
  • a92fd2d752 Add tests for stablelm tokenizer Galunid 2023-10-22 22:55:29 +02:00
  • 96981f37b1
    make : add optional CUDA_NATIVE_ARCH (#2482) b1414 Alex 2023-10-22 15:56:53 -04:00
  • 438c2ca830
    server : parallel decoding and multimodal (#3677) b1413 Georgi Gerganov 2023-10-22 22:53:08 +03:00
  • cf5eff36ae Merge branch 'master' into stablelm-support Galunid 2023-10-22 21:33:53 +02:00
  • c0f4d54870
    server : add comment about changing slot_state to bool server-rev Georgi Gerganov 2023-10-22 22:24:39 +03:00
  • 9e70cc0322
    Add test for MPT tokenization (#3728) b1412 goerch 2023-10-22 21:21:42 +02:00
  • e39905050f Fix added_tokens crashes Galunid 2023-10-22 21:21:31 +02:00
  • 83e1490187
    server : fix slot reuse Georgi Gerganov 2023-10-22 21:57:23 +03:00
  • 74204ccbae Clarify logic in conversion goerch 2023-10-22 20:35:50 +02:00
  • 5a42a5f8e8
    readme : remove unsupported node.js library (#3703) Ian Scrivener 2023-10-23 05:16:43 +11:00
  • a5e7dbd614
    llama : validate special token ids are in range when loading GGUF model (#3635) b1410 Kerfuffle 2023-10-22 12:14:56 -06:00
  • d3956aea53
    main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623) b1409 vvhg1 2023-10-22 20:09:51 +02:00
  • 8fe7ca4875
    server : apply fix from #3722 Georgi Gerganov 2023-10-22 21:05:45 +03:00
  • 237f1e7912 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into ntkv2 cebtenzzre 2023-10-22 14:00:33 -04:00
  • d6b93147ae Fix issue blocking success case Mason M 2023-10-22 14:36:37 -03:00
  • ae36f009ef Add new gpt_params_parse_ex function to hide arg-parse impl Mason M 2023-10-22 14:25:25 -03:00
  • 00ae55b388
    server : hide ctx_sampling->prev behind API (#3696) Georgi Gerganov 2023-10-22 20:09:01 +03:00
  • 3d6a687f1d Update readme to document multimodal in server M. Yusuf Sarıgöz 2023-10-22 20:03:35 +03:00
  • 1dc13168ff Remove unnecessary restriction in test case goerch 2023-10-22 18:59:06 +02:00
  • dd1af2ed35
    server : minor style Georgi Gerganov 2023-10-22 19:52:38 +03:00
  • a4d69d8b81 Merge branch 'server-rev' of https://github.com//ggerganov/llama.cpp into server-rev M. Yusuf Sarıgöz 2023-10-22 19:49:48 +03:00
  • 2679c432d5 Update readme to document multimodal in server M. Yusuf Sarıgöz 2023-10-22 19:49:33 +03:00
  • 8ee736370a Revert code motion goerch 2023-10-22 18:36:35 +02:00
  • a8063171bd
    server : completion requests remember slot_id Georgi Gerganov 2023-10-22 19:34:48 +03:00
  • f305d6434f
    editorconfig : new line in index.html Georgi Gerganov 2023-10-22 19:10:30 +03:00
  • 5359fb9267 Do not save/load image_data to localStorage M. Yusuf Sarıgöz 2023-10-22 19:08:09 +03:00
  • 6a94ae6d49 Add test for MPT tokenization goerch 2023-10-22 17:18:28 +02:00
  • f67d971344
    server : bug fix for prompt caching Georgi Gerganov 2023-10-22 17:52:59 +03:00
  • 3de5ba4946 Finish broadcasting mul mat support for GQA 0cc4m 2023-10-22 16:44:53 +02:00
  • 569ebf11cf
    server : refactor ctx_sampling init + n_ctx + names Georgi Gerganov 2023-10-22 16:57:05 +03:00
  • ef18f4d579
    server : fix crash in Debug on macOS (I have no idea why this fixes it!?) Georgi Gerganov 2023-10-22 16:55:40 +03:00
  • 197a0a9e23
    server : fix switch fallthrough Georgi Gerganov 2023-10-22 16:55:05 +03:00
  • 5f1f8a5a89 adjust Concedo 2023-10-22 21:53:54 +08:00
  • 715f384a6b
    clip : link to ggml, not to llama Georgi Gerganov 2023-10-22 16:52:12 +03:00
  • 4b4ab722ab
    make : silence stb warnings Georgi Gerganov 2023-10-22 16:51:59 +03:00
  • ccf8334651 remove script (+8 squashed commit) Concedo 2023-10-22 20:26:18 +08:00