Commit graph

  • 0cabcbe588 fixed server 200 null response when context is exceeded VJHack 2024-09-20 14:54:08 -05:00
  • 63351143b2
    quantize : improve type name parsing (#9570) b3796 slaren 2024-09-20 20:55:36 +02:00
  • 31e18fc2f8 quantize : do not ignore invalid types in arg parsing slaren 2024-09-20 20:15:26 +02:00
  • d13edb17ed ggml : fix builds (#0) b3795 Georgi Gerganov 2024-09-20 20:12:52 +03:00
  • 27609c49b9 ggml : fix trailing whitespace (#0) Georgi Gerganov 2024-09-20 19:13:02 +03:00
  • 4301535326 sync : ggml Georgi Gerganov 2024-09-20 19:06:59 +03:00
  • 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) Johannes Gäßler 2024-09-20 19:04:44 +03:00
  • a6809c6a2e examples : add null threadpool args where needed (ggml/0) Georgi Gerganov 2024-09-08 11:10:43 +03:00
  • a39be1a94d
    ggml : fix builds (#0) Georgi Gerganov 2024-09-20 20:12:52 +03:00
  • 5cb12f6839
    CUDA: fix sum.cu compilation for CUDA < 11.7 (#9562) b3790 Johannes Gäßler 2024-09-20 18:35:35 +02:00
  • ebc359c1a3
    ggml : fix trailing whitespace (#0) Georgi Gerganov 2024-09-20 19:13:02 +03:00
  • bddc6c6acb
    sync : ggml Georgi Gerganov 2024-09-20 19:06:59 +03:00
  • fed94ad1c5
    ggml/examples: add backend support for numerical optimization (ggml/949) Johannes Gäßler 2024-09-20 19:04:44 +03:00
  • bb51df5f81
    examples : add null threadpool args where needed (ggml/0) Georgi Gerganov 2024-09-08 11:10:43 +03:00
  • f9c2155158 squash! baby-llama : rename llama_layer to baby_llama_layer Daniel Bevenius 2024-09-20 15:45:22 +02:00
  • 2f2e4b35a6 Fixed error message to say 'enable context shift' VJHack 2024-09-20 08:48:59 -05:00
  • c93300f02f squash! baby-llama : rename llama_layer to baby_llama_layer Daniel Bevenius 2024-09-20 15:05:00 +02:00
  • 25d4599e19 remove unused files zhenweijin 2024-09-20 19:06:39 +08:00
  • 8be5d11c0d Merge branch 'ggerganov-gg/tokenizer-cleanup' into refactor-tokenizer zhenweijin 2024-09-20 18:48:58 +08:00
  • 02629d98f1 llama : make llm_tokenizer more private Georgi Gerganov 2024-09-20 11:41:51 +03:00
  • 2ec25dbf27 refactor tokenizer zhenweijin 2024-09-11 09:42:55 +08:00
  • d653d25116 Merge branch 'gg/tokenizer-cleanup' of https://github.com/ggerganov/llama.cpp into ggerganov-gg/tokenizer-cleanup zhenweijin 2024-09-20 18:38:13 +08:00
  • 403758f93e refactor tokenizer zhenweijin 2024-09-11 09:42:55 +08:00
  • d39e26741f
    examples : flush log upon ctrl+c (#9559) b3789 Georgi Gerganov 2024-09-20 11:46:56 +03:00
  • 6e873e561a
    llama : make llm_tokenizer more private gg/tokenizer-cleanup Georgi Gerganov 2024-09-20 11:41:51 +03:00
  • b51daccd6b clear before resize Alan Gray 2024-09-20 01:05:12 -07:00
  • 7f7e684c5e CUDA: fix sum.cu compilation for CUDA < 11.7 Johannes Gäßler 2024-09-20 09:57:01 +02:00
  • e9d8ebaa2c
    examples : flush log upon ctrl+c Georgi Gerganov 2024-09-20 10:06:38 +03:00
  • d949c5844d refactor tokenizer zhenweijin 2024-09-11 09:42:55 +08:00
  • 722ec1eb51
    perplexity : do not escape input data by default (#9548) b3788 Sigbjørn Skjæret 2024-09-20 08:38:10 +02:00
  • f557ccfd2c update oneapi to 2024.2 arthw 2024-09-20 10:47:38 +08:00
  • 0940460774 baby-llama : rename llama_layer to baby_llama_layer Daniel Bevenius 2024-09-20 04:41:29 +02:00
  • 26aac8e289 Soften the token embeddings bump for experts >= 4 Nexesenex 2024-08-25 14:42:33 +02:00
  • 5644d4ca01 Merge branch 'master' into pr/8836 Nexesenex 2024-09-20 01:38:20 +02:00
  • c6b3ea6595 Avoid using saved CUDA graph if scale changes and reset nodes/params on update Alan Gray 2024-09-17 08:47:06 -07:00
  • b276c09b6f
    Merge 6f9d1275a0 into 6026da52d6 Bruno Pio 2024-09-19 10:45:01 +01:00
  • 6026da52d6
    server : clean-up completed tasks from waiting list (#9531) b3787 Georgi Gerganov 2024-09-19 12:44:53 +03:00
  • da00027c4b
    Perplexity input data should not be unescaped Sigbjørn Skjæret 2024-09-19 10:50:19 +02:00
  • eca0fab44e
    imatrix : disable prompt escape by default (#9543) b3786 Sigbjørn Skjæret 2024-09-19 09:58:14 +02:00
  • 5e1a23adb0 fix function params Jia Liu 2024-09-19 15:46:17 +08:00
  • 216e7d9648 fix llama_reset_model_time Jia Liu 2024-09-19 11:30:47 +08:00
  • 24bea1549b add llama_model_reset_time API Jia Liu 2024-09-19 11:06:47 +08:00
  • 279308c74a changed llama.cpp (build_phi3 to load bias for lm.head); fixed dumb eos token issues Yutong Dai 2024-09-19 01:13:25 +00:00
  • 568886416d allow disable context shift for sever VJHack 2024-09-18 19:34:05 -05:00
  • a537aaa87b
    Imatrix input data should not be unescaped Sigbjørn Skjæret 2024-09-19 01:26:30 +02:00
  • 6f9d1275a0
    Update convert_hf_to_gguf.py Bruno Pio 2024-09-18 20:00:25 -03:00
  • c42ec2f8bb add solar pro support Michael Yang 2024-09-16 15:53:16 -07:00
  • 64c6af3195
    ggml : fix n_threads_cur initialization with one thread (#9538) b3785 slaren 2024-09-18 19:13:08 +02:00
  • 6b0248c29a
    Update ggml/src/ggml.c sl/fix-omp-one-thread Max Krasnyansky 2024-09-18 09:00:26 -07:00
  • 0d2f22e45c
    scripts : verify py deps at the start of compare (#9520) Georgi Gerganov 2024-09-18 18:34:32 +03:00
  • 0e601cafe9 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-09-18 09:13:46 -04:00
  • f9196c9174 ggml : fix n_threads_cur initialization with one thread slaren 2024-09-18 14:58:49 +02:00
  • 6443ddd985
    llama : use reserve/emplace_back in sampler_sample (#9534) b3783 Daniel Bevenius 2024-09-18 13:42:36 +02:00
  • e08b907760
    Update clip.cpp Tejaakshaykumar 2024-09-18 15:57:22 +05:30
  • 87c5161b0c llama : use reserve/emplace_back in sampler_sample Daniel Bevenius 2024-09-18 11:42:07 +02:00
  • a829583c97 AVX512 version of ggml_gemm_q4_0_8x8_q8_0 Srihari-mcw 2024-09-18 00:55:41 -07:00
  • e01cdda168
    server : clean-up completed tasks from waiting list Georgi Gerganov 2024-09-18 10:20:41 +03:00
  • 8a308354f6
    server : match OAI structured output response (#9527) b3782 Vinesh Janarthanan 2024-09-18 01:50:34 -05:00
  • f799155ab8
    server : fix OpenSSL build (remove obsolete LOG_INFO) (#9529) b3781 Eric Zhang 2024-09-18 14:28:20 +08:00
  • 1a88cff6ac
    server : fix openssl build by removing invalid LOG_INFO references EZForever 2024-09-18 10:52:13 +08:00
  • 556d4b6292 cleaned up pr VJHack 2024-09-17 21:13:46 -05:00
  • faf67b3de4
    [SYCL]set context default value to avoid memory issue, update guide (#9476) Neo Zhang Jianyu 2024-09-18 08:30:31 +08:00
  • c713223367 Updating warning message to explicitly include reference to LLAMA_FFMPEG=1 to help users know exactly how to recompile with video support. Suggestion by @Galunid. Clint Herron 2024-09-17 17:04:25 -04:00
  • 7be099fa81
    llama-bench: correct argument parsing error message (#9524) b3779 Michael Podvitskiy 2024-09-17 22:41:38 +02:00
  • 7e7f8b91d6 llama-bench: correct argument parsing error message Michael Podvitskiy 2024-09-17 20:30:01 +02:00
  • 8b836ae731
    arg : add env variable for parallel (#9513) b3778 Bert Wagner 2024-09-17 09:35:38 -04:00
  • cbef812dcd
    Update README.md with env: LLAMA_ARG_N_PARALLEL Bert Wagner 2024-09-17 06:50:54 -04:00
  • 8344ef58f8
    llama : fix n_vocab init for 'no_vocab' case (#9511) b3777 Michael Podvitskiy 2024-09-17 12:18:22 +02:00
  • cba0340871
    Refactored error handling for hyperparameter validation in clip.cpp Tejaakshaykumar 2024-09-17 15:46:59 +05:30
  • 28d1c4566a
    Update examples/llava/clip.cpp Tejaakshaykumar 2024-09-17 15:13:48 +05:30
  • 93ef595b4b
    llama: correct vocab size for logging Michael Podvitskiy 2024-09-17 11:23:52 +02:00
  • a6a8f8d09c
    Update docs/backend/SYCL.md fix_ctx_default Neo Zhang Jianyu 2024-09-17 16:25:43 +08:00
  • 0226613853
    threadpool : skip polling for unused threads (#9461) Max Krasnyansky 2024-09-17 01:19:46 -07:00
  • cbfa2fcbdc
    scripts : verify py deps at the start of compare Georgi Gerganov 2024-09-17 11:05:10 +03:00
  • 503147a9f9
    unicode : add <algorithm> (#9508) b3775 Yuri Khrustalev 2024-09-17 02:51:15 -04:00
  • 0d2ec43833
    llama : support IBM Granite architecture (#9412) b3774 Gabe Goodhart 2024-09-17 00:44:58 -06:00
  • 37f3a3810e
    llama : add llama_n_head() (#9512) Michael Podvitskiy 2024-09-17 08:23:30 +02:00
  • 0d083672dc Moving ffmpeg dependency to live behind a compiler flag that can be enabled optionally by setting LLAMA_FFMPEG=1 in call to make. Clint Herron 2024-09-17 01:18:41 -04:00
  • eb550592e4 test-barrier: release threadpool before releasing the context Max Krasnyansky 2024-09-16 16:41:41 -07:00
  • a8095187d8 threadpool: improve abort handling Max Krasnyansky 2024-09-16 15:25:20 -07:00
  • b9763b3301 threadpool: improve thread sync for new-graphs Max Krasnyansky 2024-09-16 14:35:09 -07:00
  • e83d2707d3 convert : adapt MiniCPM3 to separate rope_freqs insertion Francis Couture-Harpin 2024-09-16 12:05:29 -04:00
  • a25f838e53 add env variable for parallel Bert Wagner 2024-09-16 15:00:49 -04:00
  • c4411d5b5f threads: add simple barrier test Max Krasnyansky 2024-09-15 11:58:21 -07:00
  • ed094a5211 threadpool: further simplify and improve ggml_barrier Max Krasnyansky 2024-09-13 11:21:59 -07:00
  • 2bd9f47800 threadpool: skip polling for unused threads Max Krasnyansky 2024-09-12 21:28:45 -07:00
  • 9704f0e928 llama: log warning if there's no vocab_size in metadata Michael Podvitskiy 2024-09-16 19:30:07 +02:00
  • 546524de6c llama: the method for obtaining information about n_head is included in the public header Michael Podvitskiy 2024-09-16 19:16:49 +02:00
  • 30b751ef06 update to get some results; need to check vit and llm Yutong Dai 2024-09-16 17:11:12 +00:00
  • 544b26640d llama: updated error output for llama_decode_internal and llama_encode_internal Michael Podvitskiy 2024-09-16 18:35:57 +02:00
  • a5e87bf438 llama: fixed n_vocab for no_vocab models Michael Podvitskiy 2024-09-16 18:30:28 +02:00
  • ed0f2c4ab1 Merge branch 'master' into compilade/convert-separate-extra-tensors Francis Couture-Harpin 2024-09-16 12:01:12 -04:00
  • 435b5f9176
    Merge cc1c017191 into 23e0d70bac Georgi Gerganov 2024-09-16 16:51:17 +01:00
  • 5d054a42f9 fix(llama.cpp): Use separate switch clause for granite in llm_load_hparams Gabe Goodhart 2024-09-16 09:15:15 -06:00
  • 65c5bb91ab fix(convert_hf_to_gguf/gguf-py): _multiplier -> _scale Gabe Goodhart 2024-09-16 08:56:56 -06:00
  • 0bdf04e7b5 fix(llama.cpp): Switch Granite param names to use _scale for consistency Gabe Goodhart 2024-09-16 08:55:58 -06:00
  • 23e0d70bac
    ggml : move common CPU backend impl to new header (#9509) b3772 slaren 2024-09-16 16:22:07 +02:00
  • 80863806a3 fix(convert_hf_to_gguf): Use LlamaModel as base for GraniteModel Gabe Goodhart 2024-09-10 09:36:44 -06:00
  • e73d795eff fix(llama.cpp): Determine granite language 3b instruct by vocab size Gabe Goodhart 2024-09-09 09:03:09 -06:00
  • ec13f29b73 feat(llama.cpp): First pass at full port of granite deviations from llama Gabe Goodhart 2024-09-05 16:43:01 -06:00