Commit graph

  • f64dea0821 added implementation of DRY sampler l3utterfly 2024-04-25 15:55:34 +09:00
  • 0e51cc38cb
    Fix CORS for /health endpoint cosmo 2024-04-25 04:59:29 +00:00
  • fb80f13cd4
    Update sgemm.cpp Eve 2024-04-25 04:03:29 +00:00
  • e97c0fdb35
    Merge branch 'ggerganov:master' into sgemm-avx Eve 2024-04-25 03:21:19 +00:00
  • 063a31f7a8 sse load netrunnereve 2024-04-24 23:00:02 -04:00
  • c806db318d improve fp16 validation performance slaren 2024-04-25 02:10:11 +02:00
  • 2ef86e7213
    Clamp out of range values in K quantizer Justine Tunney 2024-04-24 16:59:30 -07:00
  • 3c02508aad Merge branch 'grammar-reps' of https://github.com/ochafik/llama.cpp into grammar-reps Olivier Chafik 2024-04-25 00:45:13 +01:00
  • eb7ccd88d1 json: fix integral-part Olivier Chafik 2024-04-25 00:44:54 +01:00
  • 46fe6483ab
    Update examples/server/public/json-schema-to-grammar.mjs Olivier Chafik 2024-04-25 00:40:02 +01:00
  • 28138351f4
    Apply suggestions from code review Olivier Chafik 2024-04-25 00:38:27 +01:00
  • 218f41fa43 json: update numeric rule to be unambiguous Olivier Chafik 2024-04-24 23:54:35 +01:00
  • 98d9bea0fc llama : check that all the tensor data is in the model file slaren 2024-04-24 22:20:31 +02:00
  • 6aea16eb5a add basic tensor data validation function slaren 2024-04-24 21:48:16 +02:00
  • f9b42b8cd8 Added new options and some fixes mann1x 2024-04-24 21:50:01 +02:00
  • 784e11dea1
    README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00
  • 2d5341d196 README: add graphic for matrix multiplication Johannes Gäßler 2024-04-24 19:49:56 +02:00
  • d69cf87fce use or, instead of and. bug fix? Julia Longtin 2024-04-24 17:50:12 +00:00
  • 476d319fde correct buffer size ngxson 2024-04-24 19:41:51 +02:00
  • 8cae9a9ef6 comment and spacing fixes. Julia Longtin 2024-04-24 17:38:42 +00:00
  • 7f89803536 add enum keyword ngxson 2024-04-24 18:38:20 +02:00
  • 0d3363e4e6 llama_chat_get_typed_template ngxson 2024-04-24 18:27:39 +02:00
  • 81b5903890 adapt phi3 template ngxson 2024-04-24 18:18:12 +02:00
  • ada54292c6 Merge branch 'master' into xsn/chat_template_prefix_postfix ngxson 2024-04-24 18:12:13 +02:00
  • 3222b4b8e5 add guide for adding template ngxson 2024-04-24 18:06:41 +02:00
  • ce281b904c
    llama : disable FA for AMD Georgi Gerganov 2024-04-24 16:48:10 +03:00
  • c3f4b1f2d2 feat: rename Jina Bert to Jina Bert V2 Joan Martinez 2024-04-24 15:46:18 +02:00
  • f1a93548aa clean up ngxson 2024-04-24 15:31:46 +02:00
  • 408759687f further addressed comments Alan Gray 2024-04-24 06:31:08 -07:00
  • 05efa34d92 grammars: keep llama_grammar_copy non-quadratic optim for later Olivier Chafik 2024-04-24 14:28:16 +01:00
  • 0c74ad3cf1 grammar: nit numbering in comment Olivier Chafik 2024-04-24 14:24:36 +01:00
  • 21bac1e453 grammar: nit typo switched error msgs Olivier Chafik 2024-04-24 14:22:20 +01:00
  • 588b72d950 fix templates not support system message ngxson 2024-04-24 15:20:21 +02:00
  • d03c98ed9a grammars: ensure unambiguous number alternatives Olivier Chafik 2024-04-24 14:16:22 +01:00
  • d403b180a6 Addressed comments Alan Gray 2024-04-24 05:43:26 -07:00
  • a61281fef5 grammars: comment on rule repetitions Olivier Chafik 2024-04-24 14:11:42 +01:00
  • b4e4b8a935
    llama : add llama_get_pooling_type function (#6862) b2724 Douglas Hanley 2024-04-24 08:10:07 -05:00
  • e87ec07937 fix argument name, move with ctx funcs Douglas Hanley 2024-04-24 08:02:49 -05:00
  • 724f879fa2
    Update examples/server/public/json-schema-to-grammar.mjs Olivier Chafik 2024-04-24 12:05:12 +01:00
  • d47f537484
    Update examples/json_schema_to_grammar.py Olivier Chafik 2024-04-24 12:05:04 +01:00
  • 8937ec5307
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-24 14:00:32 +03:00
  • 3fe847b574
    server : do not apply Markdown formatting in code sections (#6850) mgroeber9110 2024-04-24 12:54:24 +02:00
  • 37246b1031
    common : revert showing control tokens by default for server (#6860) Kyle Mistele 2024-04-24 05:15:29 -05:00
  • 411137a608
    common : simplify Georgi Gerganov 2024-04-24 13:14:14 +03:00
  • c3d4ead136 added missing CUDA_CHECKs Alan Gray 2024-04-24 02:37:57 -07:00
  • 6e09a26504 allow wildcards for tensor names Julia Bruckner 2024-04-24 11:17:22 +02:00
  • 28103f4832
    Server: fix seed for multiple slots (#6835) Johannes Gäßler 2024-04-24 11:08:36 +02:00
  • c0d1b3e03e
    ggml : move 32-bit arm compat in ggml-impl.h (#6865) Georgi Gerganov 2024-04-24 12:00:07 +03:00
  • abd3314064
    llama : add phi 3 chat template (#6857) Tristan Druyen 2024-04-24 10:52:37 +02:00
  • 992fff725f
    test : fix chat template result Georgi Gerganov 2024-04-24 11:50:35 +03:00
  • b0c3013f2e
    ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend zhou.weiguo 2024-04-24 16:28:18 +08:00
  • dfa067631c feat: example comments in embedding Joan Martinez 2024-04-24 10:14:02 +02:00
  • dd060a2a4e feat: handle gpt2 tokenizer with Jina architecture Joan Martinez 2024-04-24 10:05:34 +02:00
  • b54f7bbaef
    ggml : move 32-bit arm compat in ggml-impl.h Georgi Gerganov 2024-04-24 09:22:35 +03:00
  • f2588b0b70
    convert : fix set_vocab_sentencepiece Georgi Gerganov 2024-04-24 10:19:38 +03:00
  • 3fec68be4e
    convert : add support of codeqwen due to tokenizer (#6707) Junyang Lin 2024-04-24 15:16:21 +08:00
  • 8aa536a367
    convert : fix whitespace Georgi Gerganov 2024-04-24 10:15:44 +03:00
  • c8297c6af5
    llama : add phi3 support (#6852) b2717 liuwei-git 2024-04-24 15:00:37 +08:00
  • 725cf63646
    convert : fix lint checks Georgi Gerganov 2024-04-24 09:45:35 +03:00
  • ae133e7fa6
    llama : tabs -> spaces Georgi Gerganov 2024-04-24 09:43:41 +03:00
  • 32661ac8b4
    llama : minor / style Georgi Gerganov 2024-04-24 09:39:22 +03:00
  • 1bf93ced81
    llama : match EOT token <|end|> Georgi Gerganov 2024-04-24 09:39:04 +03:00
  • cef12f9e45
    convert : add BOS token Georgi Gerganov 2024-04-24 09:38:40 +03:00
  • 25f378375e
    Update README.md ManakRaj-7 2024-04-24 11:45:12 +05:30
  • dee9566dc7 reduce 256 to 128 (and back!) conversions netrunnereve 2024-04-24 00:22:38 -04:00
  • 5ae41e9bdf add llama_get_pooling_type function Douglas Hanley 2024-04-23 11:22:46 -05:00
  • 6c081e501c feat: use the overridden declaration of llama_token_to_piece from common/common.cpp to specify "false" so that control tokens are not shown in chat completion responses" Kyle Mistele 2024-04-23 22:51:01 -05:00
  • 206c974eb6 feat: revert changes to default behavior of llama_token_to_piece; provide overridden declaration to receive "bool special" param to toggle showing control tokens Kyle Mistele 2024-04-23 22:50:22 -05:00
  • 9facb0f07a combine denibble with load netrunnereve 2024-04-23 23:46:49 -04:00
  • 572960a045 fix: revert showing control tokens by default Kyle Mistele 2024-04-23 22:25:46 -05:00
  • c910886b23
    Add phi 3 chat template & tests tristandruyen 2024-04-24 01:23:27 +02:00
  • 92af43eec2 Merge branch 'master' of https://github.com/ggerganov/llama.cpp Adrian Liechti 2024-04-24 00:38:27 +02:00
  • 5dcccb3a7d
    convert : fix tokenizer conversion gg/add-phi-3-support Georgi Gerganov 2024-04-23 22:11:09 +03:00
  • 171a73890e remove unused code Wei Liu 2024-04-24 01:55:33 +08:00
  • 725afbcf52 Merge branch 'master' of https://github.com/liuwei-git/llama.cpp Wei Liu 2024-04-24 01:48:23 +08:00
  • 9ff95625e9 add explicit phi3 support Wei Liu 2024-04-24 01:45:14 +08:00
  • 1732737232
    convert : add phi-3 support Georgi Gerganov 2024-04-23 20:38:51 +03:00
  • e693add0b6 add explicit phi3 support Wei Liu 2024-04-24 00:55:30 +08:00
  • 751591d520
    server : add help for --flash-attn arg Georgi Gerganov 2024-04-23 18:16:25 +03:00
  • 0485579078 Added themes support with two sample themes and a favicon. John Boero 2024-04-23 16:13:09 +01:00
  • d228bf8552
    cont Georgi Gerganov 2024-04-23 17:32:11 +03:00
  • 56657e52e5
    llama : fix n_batch requirements Georgi Gerganov 2024-04-23 17:30:37 +03:00
  • 19e8982f51
    llama : prep ALiBi support for BERT models Georgi Gerganov 2024-04-23 17:24:28 +03:00
  • 78d363b0d4
    llama : replace bool need_kq_pos with use_alibi Georgi Gerganov 2024-04-23 17:15:13 +03:00
  • cde49b7448 feat: support q_normalization and k_normalization in Jina arch Joan Martinez 2024-04-23 16:10:38 +02:00
  • df4719ec7e Disable CUDA graphs for old GPU arch and with env var Alan Gray 2024-04-23 06:27:08 -07:00
  • e0a3679aeb Fix preci failures z5269887 2024-04-23 20:50:00 +08:00
  • 2b7cff5f5a fixup! fixup! sampling: separate rng per sampling context Johannes Gäßler 2024-04-23 13:55:05 +02:00
  • 760db9ee35 fixup! sampling: separate rng per sampling context Johannes Gäßler 2024-04-23 13:53:15 +02:00
  • 054e73e021 fix spaces Julia Bruckner 2024-04-23 13:39:16 +02:00
  • dbe6483e7e custom quantization schemas Julia Bruckner 2024-04-23 13:35:03 +02:00
  • 31e2f5668c custom quantization schemas Julia Bruckner 2024-04-23 13:33:05 +02:00
  • 123eaf054f sampling: separate rng per sampling context Johannes Gäßler 2024-04-22 23:49:49 +02:00
  • 1a07f60451 Cleanup tweaks and DSC class. The file copy raid functionality is not protected by an named ifdef Markus Tavenrath 2024-04-23 12:00:10 +02:00
  • d7d6a4ed46 fix: JinaBertForMaskedLM registration Joan Martinez 2024-04-23 09:48:40 +02:00
  • b11224c5e1 add missing DirectStorageCUDA files Markus Tavenrath 2024-04-23 09:37:28 +02:00
  • 3864eea4cb
    ggml : add TODO's for F16/F32 mask/pos support in other backends Georgi Gerganov 2024-04-23 10:01:49 +03:00
  • c129369702
    cuda : try to fix __hgt2_mask Georgi Gerganov 2024-04-22 21:42:43 +03:00
  • 94cf99ca35 Server: do not apply Markdown formatting in code sections mgroeber9110 2024-04-23 06:36:23 +02:00
  • 257391aae3 style netrunnereve 2024-04-22 23:48:07 -04:00