Commit graph

  • fa30cc8611
    Merge branch 'master' into fix_mul_mat Neo Zhang Jianyu 2024-03-05 14:11:12 +08:00
  • 96b9179a22 rebase and rm tailing space Jianyu Zhang 2024-03-05 14:03:23 +08:00
  • 49a8477205 fix speculative decoding build on windows (#5874) Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00
  • 6aac3d4267 nix: static build (#5814) hutli 2024-03-05 02:33:08 +01:00
  • 2e4e9c001d llama : fix embeddings (#5796) Georgi Gerganov 2024-03-04 22:31:20 +02:00
  • dabfd53d7e flake : fix Georgi Gerganov 2024-03-04 21:50:50 +02:00
  • 86e4a3bdfe ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
  • 3a44f13b72 sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
  • d87093e9f7 ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
  • 465e411f2e cmake : handle cases where git index is not found in .git (#5844) Dane Madsen 2024-03-05 05:26:55 +11:00
  • e245d6c866 speculative : implement stochastic speculative sampling (#5625) Minsoo Cheong 2024-03-05 03:24:00 +09:00
  • 3ae5525a6c add alias for chat template (#5858) Xuan Son Nguyen 2024-03-04 12:22:08 +01:00
  • b15e7533a1 sync : ggml Georgi Gerganov 2024-03-04 10:40:04 +02:00
  • 9e4d115da4 add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +08:00
  • edabfadc29 common : use LLAMA_DEFAULT_SEED (#5855) DAN™ 2024-03-04 03:08:19 -05:00
  • f3a6dd6cde main : support special tokens as reverse/anti prompt (#5847) DAN™ 2024-03-04 02:57:20 -05:00
  • 22dd02a6b8 cuda : fix data race in soft max (#5853) slaren 2024-03-03 14:26:18 +01:00
  • fd4a186d04 readme : add API changes section Georgi Gerganov 2024-03-03 12:44:03 +02:00
  • e55ee8a221 llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
  • 8bb872ded5 gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
  • 524864d39c llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
  • 23a6275f17 ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
  • f72df318f1 server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
  • 756a4ac7d5 flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
  • 8479e7d43d server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
  • 506177de82 llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
  • 1a5ed7a290 ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
  • 0867b91ab3 convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • 9285e71465 convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • d0c9a891ea ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
  • dcf09d3c2c scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • 9758243a16 llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 8899bdb685 llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
  • 7d455b7b16 fix editorconfig check break Minsoo Cheong 2024-03-05 14:51:44 +09:00
  • eaff11ca55
    readme: add convert-hf-to-gguf.py in example hiepxanh 2024-03-04 21:12:50 -08:00
  • 4e7c26c32c json: handle pattern repetitions ochafik 2024-03-05 03:40:23 +00:00
  • d5ef412f31 json: merge lit sequences and handle negatives ochafik 2024-03-05 03:26:35 +00:00
  • a78eb4a0c3 json: fix _format_literal (json.dumps already escapes quotes) ochafik 2024-03-05 03:25:43 +00:00
  • 32bf3df076 fix format issue Jianyu Zhang 2024-03-05 11:25:34 +08:00
  • 29eee40474
    fix speculative decoding build on windows (#5874) b2343 Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00
  • 89524c2fd2 restore ci/run.sh, rename struct defination, fix bug in ggml_sycl_op_mul_mat_sycl Jianyu Zhang 2024-03-05 11:22:57 +08:00
  • 1d41d6f7c2
    nix: static build (#5814) hutli 2024-03-05 02:33:08 +01:00
  • e24ca9cb02 convert-hf : fix flake8 warnings in CI Jared Van Bortel 2024-03-04 17:37:42 -05:00
  • fbeb5a7483
    fix speculative decoding build on windows Jeffrey Quesnelle 2024-03-04 16:32:18 -05:00
  • 29ae62d2ae
    llama : fix embeddings (#5796) Georgi Gerganov 2024-03-04 22:31:20 +02:00
  • e0843afe1b
    flake : fix Georgi Gerganov 2024-03-04 21:50:50 +02:00
  • a1c6d96ed8 ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
  • efd8533ef8 sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
  • 9fa2627347 ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
  • 78fb94387c
    ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
  • 58c7f6167c
    ggml : fix F16 store (ARM NEON) Georgi Gerganov 2024-03-04 20:44:57 +02:00
  • e307882c34
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-03-04 20:42:48 +02:00
  • fe52be11e3
    cmake : handle cases where git index is not found in .git (#5844) Dane Madsen 2024-03-05 05:26:55 +11:00
  • 6d341ab6c5
    speculative : implement stochastic speculative sampling (#5625) Minsoo Cheong 2024-03-05 03:24:00 +09:00
  • 7cafaa4724
    readme : update API changes list Georgi Gerganov 2024-03-04 20:11:49 +02:00
  • a6a263b919 iq3_s_mult_shuffle: works on ARM_NEON and Metal Iwan Kawrakow 2024-03-04 20:10:36 +02:00
  • d221d91598
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-03-04 18:03:54 +00:00
  • 1af2d06139
    llama : assert input batch with pooling enabled Georgi Gerganov 2024-03-04 19:56:40 +02:00
  • b587482287 iq3_s_mult_shuffle: mult + shuffle based codebook Iwan Kawrakow 2024-03-04 19:43:22 +02:00
  • c23c554744
    llama : simplify causal mask condition Georgi Gerganov 2024-03-04 19:39:40 +02:00
  • fc9af156ff
    llama : assert pooling tensor Georgi Gerganov 2024-03-04 19:24:03 +02:00
  • 79e4eede23
    llama : distinguish token vs sequence embeddings Georgi Gerganov 2024-03-04 19:14:22 +02:00
  • 4aa00f48d6
    nix: update comment on effectiveStdenv Philip Taron 2024-03-04 07:29:03 -08:00
  • 2a99d1b243 ggml : implicitly pass src tensors through dst for Mamba-related ops Francis Couture-Harpin 2024-03-04 10:10:50 -05:00
  • 4ec0e9abbf
    wip gg/fix-embeddings-wip Georgi Gerganov 2024-03-04 17:07:12 +02:00
  • 7a2ca8bde8 removed dupulicate declaration of effectiveStdenv hutli 2024-03-04 14:02:58 +01:00
  • e66da356a4
    llama : add pooling switch Georgi Gerganov 2024-03-04 14:06:33 +02:00
  • 9bbeb0f110
    embeddings : fix llama_batch_init arg Georgi Gerganov 2024-03-04 14:06:00 +02:00
  • bc51e28cf4 using correct syntax for effectiveStdenv default hutli 2024-03-04 12:53:18 +01:00
  • d12f2399ac using enableStatic to determine glibc dependency hutli 2024-03-04 12:45:03 +01:00
  • eb42596277
    llama : do not use KV cache for non-causal models Georgi Gerganov 2024-03-04 13:31:03 +02:00
  • 949edeb343 Silence unused variable warning pudepiedj 2024-03-04 11:28:03 +00:00
  • 4ffcdce2ff
    add alias for chat template (#5858) b2334 Xuan Son Nguyen 2024-03-04 12:22:08 +01:00
  • d0347840c1
    llama : fix embeddings Georgi Gerganov 2024-02-29 15:39:10 +02:00
  • 0cba73b0cf
    sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
  • 2b838cbd33
    ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
  • a0fc62661f
    sync : ggml b2333 Georgi Gerganov 2024-03-04 10:40:04 +02:00
  • 7d43c585dc
    add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +08:00
  • 51e184fd87
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-03-04 08:19:12 +00:00
  • 82f3e668ad
    common : use LLAMA_DEFAULT_SEED (#5855) b2331 DAN™ 2024-03-04 03:08:19 -05:00
  • f2f002d9af Correct whitespace/nl editor config pudepiedj 2024-03-04 08:02:28 +00:00
  • 5a51cc1bb4
    main : support special tokens as reverse/anti prompt (#5847) b2330 DAN™ 2024-03-04 02:57:20 -05:00
  • 98dae326b1
    main : minor Georgi Gerganov 2024-03-04 09:55:39 +02:00
  • 4089657815 Remove extraneous files pudepiedj 2024-03-04 07:54:00 +00:00
  • d532d5b1f7 Remove rtf files pudepiedj 2024-03-04 07:44:37 +00:00
  • eb3da36e89 Delete rb and vca modules pudepiedj 2024-03-04 07:43:25 +00:00
  • f44e9456a2 server update pudepiedj 2024-03-04 07:18:18 +00:00
  • b48bf8b411 iq3_s_mult: scalar dot product Iwan Kawrakow 2024-03-03 11:55:31 +02:00
  • 056bdb3029 add PR link to README Minsoo Cheong 2024-03-04 15:07:40 +09:00
  • 67ad517e11 remove malloc code by utilizing vectors Minsoo Cheong 2024-03-04 14:55:35 +09:00
  • 0dce40a725 add wait() for memcpy LiangtaoJin 2024-03-04 07:58:26 +05:30
  • ddc124946f rm unused function Jianyu Zhang 2024-03-04 08:46:55 +08:00
  • dccfcfc901
    Update CMakeLists.txt Dane Madsen 2024-03-04 09:48:25 +10:00
  • f4ca2301bb Allow setting multiple CORS enabled origins. StrangebytesDev 2024-03-03 15:29:32 -08:00
  • 21ac451d8e Create ts-type-to-grammar.sh ochafik 2024-03-03 22:22:21 +00:00
  • 2dc03f404d Update --http-cors-origin flag description StrangebytesDev 2024-03-03 14:09:41 -08:00
  • 06b04e93c5 json:fix typo ochafik 2024-03-03 21:42:43 +00:00
  • be1324721c Merge remote-tracking branch 'origin/master' into json-fixes ochafik 2024-03-03 21:35:17 +00:00
  • 320e89c010 Change allowed methods. Rename cors flag. StrangebytesDev 2024-03-03 13:00:33 -08:00
  • d6961d4c48
    Merge branch 'ggerganov:master' into master StrangeBytesDev 2024-03-03 11:37:13 -08:00