Commit graph

  • ac8fcd1c77 quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889) Jared Van Bortel 2024-03-05 11:56:37 -05:00
  • ea06497e54 grammars : blacklists character control set (#5888) ExtReMLapin 2024-03-05 17:33:08 +01:00
  • a8b057a96b Revert "grammars : don't allow to output unescaped new line in string (#5885)" Georgi Gerganov 2024-03-05 15:56:24 +02:00
  • c5b1868d6e grammars : don't allow to output unescaped new line in string (#5885) ExtReMLapin 2024-03-05 14:44:29 +01:00
  • 6ebc420a6b Vulkan Improvements (#5835) 0cc4m 2024-03-05 13:33:42 +01:00
  • b360fdff8c [SYCL] fix mul_mat fault in CI/unit-test (#5862) Neo Zhang Jianyu 2024-03-05 16:08:35 +08:00
  • abc4095952 fix editorconfig check break (#5879) Minsoo Cheong 2024-03-05 15:12:23 +09:00
  • 10781d8382 fix speculative decoding build on windows (#5874) Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00
  • 20df72d844 nix: static build (#5814) hutli 2024-03-05 02:33:08 +01:00
  • a27367e4ac llama : fix embeddings (#5796) Georgi Gerganov 2024-03-04 22:31:20 +02:00
  • 69aedef00a flake : fix Georgi Gerganov 2024-03-04 21:50:50 +02:00
  • b282f8730b ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
  • c460d54032 sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
  • 69bcce65db ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
  • c752df5988 cmake : handle cases where git index is not found in .git (#5844) Dane Madsen 2024-03-05 05:26:55 +11:00
  • f93f315eeb speculative : implement stochastic speculative sampling (#5625) Minsoo Cheong 2024-03-05 03:24:00 +09:00
  • d667ada202 add alias for chat template (#5858) Xuan Son Nguyen 2024-03-04 12:22:08 +01:00
  • 7c94ffbd81 sync : ggml Georgi Gerganov 2024-03-04 10:40:04 +02:00
  • e3043a428a add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +08:00
  • 50973c77ae common : use LLAMA_DEFAULT_SEED (#5855) DAN™ 2024-03-04 03:08:19 -05:00
  • d48c24273b main : support special tokens as reverse/anti prompt (#5847) DAN™ 2024-03-04 02:57:20 -05:00
  • a0a1ca04d8 cuda : fix data race in soft max (#5853) slaren 2024-03-03 14:26:18 +01:00
  • 5b2daffcde readme : add API changes section Georgi Gerganov 2024-03-03 12:44:03 +02:00
  • 74a0202ff3 llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
  • c0a1a5de91 gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
  • abb8e001e5 llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
  • 7a5c8bd3b6 ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
  • 909f62ef71 server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
  • 49dab82b48 flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
  • 23f60349e3 server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
  • 352c2f375f llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
  • afe9525a70 ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
  • dcb8d4439a convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • 5fd9d9e1ad convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • 2ee066ad9e ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
  • 28af53f508 scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • a6ebb7be75 llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 316f837abd llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
  • e789d25713 Support multiple GPUs (split mode) on SYCL backend (#5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
  • 439467b4f6 workflows : remove nocleanup arg for check-requirements.sh (#5826) crasm 2024-03-02 00:11:06 -05:00
  • 337b5df18c build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
  • 2d16323c7e convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) nold 2024-03-01 22:51:12 +01:00
  • 81ee97614b llama : add StarCoder2 support (#5795) Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
  • 3851d6bebe server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
  • dba99b0778 ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) ddpasa 2024-03-01 18:00:00 +01:00
  • e54be67b1b gemma : fix bfloat16 -> float16 conversion issue (#5810) kunal-vaishnavi 2024-03-01 06:08:08 -08:00
  • d134b79c6a common : fix flag --logits-all to --all-logits (#5805) Miwa / Ensan 2024-03-01 22:48:56 +09:00
  • cd3dca791b llama : cleanup unused mmq flags (#5772) Pierrick Hymbert 2024-03-01 12:39:06 +01:00
  • 51a38fc0c8 unicode : switch to multimap based nfd_map (#5799) Douglas Hanley 2024-03-01 03:15:36 -06:00
  • 996a2ff344 server: allow to override threads server pool with --threads-http (#5794) Pierrick Hymbert 2024-03-01 10:08:08 +01:00
  • a4a260e57d ci : add Ubuntu 22 Vulkan CI run (#5789) Eve 2024-03-01 08:54:53 +00:00
  • 644d40a95f server : fix newlines in help (#5785) Georgi Gerganov 2024-03-01 09:59:43 +02:00
  • 058951722f [SYCL] Use batched mul_mat pathway (#5591) AidanBeltonS 2024-03-01 07:36:47 +00:00
  • 3e0058dbfd Server: normalize naming (#5779) Xuan Son Nguyen 2024-02-29 21:42:11 +01:00
  • d3085deb2a add causal_attn flag to llama_cparams Douglas Hanley 2024-03-09 22:59:30 -06:00
  • 4871722772 flake.lock: Update github-actions[bot] 2024-03-10 04:33:32 +00:00
  • d39dc65f33 Moving logic to validate GBNF rule references to the end of the parse function. Clint Herron 2024-03-09 22:35:17 -05:00
  • 64b6f42ff5 server: ci: EOF EOL Pierrick HYMBERT 2024-03-10 02:10:51 +01:00
  • 1f7f2809b5 server: ci: remove tmp push branch Pierrick HYMBERT 2024-03-10 02:08:23 +01:00
  • 89c4bd5e97 server: ci: windows build and tests Pierrick HYMBERT 2024-03-09 23:00:48 +01:00
  • 621e86b331
    server: benchmark: chat/completions scenario and other llm servers comparison (#5941) b2382 Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • 6bfb80eb75 server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible Pierrick HYMBERT 2024-03-09 22:54:24 +01:00
  • f24460c4cb update_slots ngxson 2024-03-09 22:07:31 +01:00
  • e8a4c814af update docs ngxson 2024-03-09 21:22:49 +01:00
  • c240bd7026 launch_slot_with_task ngxson 2024-03-09 21:13:30 +01:00
  • 02e49353c7 update completion.js ngxson 2024-03-09 21:12:27 +01:00
  • 77d1ac7e00
    server : print chat template info b2381 Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • 4840c4e678 correct coding style ngxson 2024-03-09 21:03:18 +01:00
  • 2df2834df3 mostly style fixes; correct KQ_mask comment Douglas Hanley 2024-03-09 13:30:54 -06:00
  • b54afce9f4 mostly style fixes; fix KQ_mask comment gritlm-pr Douglas Hanley 2024-03-09 13:03:46 -06:00
  • d894f352bf
    perplexity : support using multiple sequences to allow larger batch sizes (#5946) b2380 slaren 2024-03-09 19:55:54 +01:00
  • 52c76d57a5
    server : add defrag thold parameter Georgi Gerganov 2024-03-09 20:44:35 +02:00
  • 23dbcfa2c8 print tested n_ctx, add assert slaren 2024-03-09 19:43:08 +01:00
  • 5d25f74821
    Merge branch 'master' into hp/server/bench/init Georgi Gerganov 2024-03-09 20:42:25 +02:00
  • b1d9c2636a Flush stdout and output ending newline if streaming. DAN™ 2024-03-09 12:57:00 -05:00
  • a86c844f3f Fix types. DAN™ 2024-03-09 12:48:02 -05:00
  • 098dbaab44
    readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 8380ecfb21
    ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) b2378 Georgi Gerganov 2024-03-09 17:36:20 +02:00
  • 58308a0ecc
    server : fix metrics init (#5964) b2377 Georgi Gerganov 2024-03-09 17:34:15 +02:00
  • e52d1bd9f0
    server : fix metrix init Georgi Gerganov 2024-03-09 17:32:13 +02:00
  • 0f641ebd63 small fix ngxson 2024-03-09 15:57:30 +01:00
  • 400d4e637f revert limit max n_predict ngxson 2024-03-09 15:52:08 +01:00
  • c15f5a6e1b fix api key test case ngxson 2024-03-09 15:48:24 +01:00
  • 8d0033ad63
    common, server : add top-a sampler Romain “Artefact2” Dal Maso 2024-02-20 15:51:49 +01:00
  • d2e1411ca9 Merge branch 'master' into xsn/better_error ngxson 2024-03-09 15:31:25 +01:00
  • de81c22abd server: do not crash on grammar error ngxson 2024-03-09 15:27:42 +01:00
  • ba9c3e3192 server: format error to json ngxson 2024-03-09 15:01:19 +01:00
  • 5b09797321
    ggml : remove old quantization functions (#5942) b2376 Georgi Gerganov 2024-03-09 15:53:59 +02:00
  • e5cb306f9f
    vulkan : remove hist and fix typo Georgi Gerganov 2024-03-09 15:53:16 +02:00
  • 97c09585d6
    server : clarify some items in the readme (#5957) Georgi Gerganov 2024-03-09 15:47:47 +02:00
  • 299588048b
    server : fix typo Georgi Gerganov 2024-03-09 15:47:29 +02:00
  • dc5daa7759
    tests : remove hist usage in test-backend-ops Georgi Gerganov 2024-03-09 15:43:00 +02:00
  • 95ea0ff2df
    ggml : remove hist data from the quantization API Georgi Gerganov 2024-03-09 14:57:32 +02:00
  • a62902ac8e
    ggml : restrict correctness Georgi Gerganov 2024-03-09 13:32:27 +02:00
  • 2005b1f136
    ggml : simplify ggml_quantize_chunk Georgi Gerganov 2024-03-09 13:12:25 +02:00
  • 13c1cc6a9f
    ggml : remove old quantization functions Georgi Gerganov 2024-03-08 15:20:39 +02:00
  • 03acc82a85 Clean-up GritLM sample code. DAN™ 2024-03-09 07:44:25 -05:00
  • fb215c3832
    server : normalize embeddings (#5956) b2374 SeungWon Jeong 2024-03-09 21:27:58 +09:00
  • 02addabbae
    common : better normalize impl Georgi Gerganov 2024-03-09 14:26:28 +02:00
  • 98cccf14e3
    common : reuse llama_embd_normalize Georgi Gerganov 2024-03-09 14:23:50 +02:00