Commit graph

  • 4d4d2366fc
    convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) b2318 Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • c7a0ad8ec9
    convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • bf90920fb2 iq3_s_mult: ARM_NEON works - 13 t/s Iwan Kawrakow 2024-03-02 19:17:27 +02:00
  • 0b673ca187 s/_MODEL_CLASSES/_model_classes/ ceb/convert-hf-refactor Jared Van Bortel 2024-03-02 12:14:37 -05:00
  • 0fe9cd488f WIP Iwan Kawrakow 2024-03-02 17:56:16 +02:00
  • bbde6eb256
    ggml : IQ3_S improvements (#5829) b2316 Kawrakow 2024-03-02 17:00:51 +02:00
  • b59615fa42 Assume tied weights if lm_head/output weights is missing. Don Mahurin 2024-03-01 11:49:53 -08:00
  • ef2cd694c4
    scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • 8abf8d3a08 server: tests: fix server timeout Pierrick HYMBERT 2024-03-02 15:51:27 +01:00
  • 6c32d8c7ad
    llama : refactor internal quantization functions (#5830) b2314 Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 802da0091b
    llama : fix segfault from unknown model arch name (#5820) b2313 compilade 2024-03-02 08:42:56 -05:00
  • a80533e276 server: tests - passkey - limit the number of max tokens to predix Pierrick HYMBERT 2024-03-02 14:42:11 +01:00
  • f8773f759e server: tests - passkey - limit the number of max tokens to predix Pierrick HYMBERT 2024-03-02 14:38:08 +01:00
  • cf4c86ee20 server: tests - passkey - first good working value of nga Pierrick HYMBERT 2024-03-02 14:31:27 +01:00
  • ed60b97434 server: tests - fix passkey not using pre/suffix Pierrick HYMBERT 2024-03-02 14:25:10 +01:00
  • 3b8242a188 server: tests - missing EOL at EOF Pierrick HYMBERT 2024-03-02 14:13:49 +01:00
  • af82fb4ad7 server: revert change on slot n_ctx Pierrick HYMBERT 2024-03-02 14:12:12 +01:00
  • 2495f7273a server: logs: do not truncate log values Pierrick HYMBERT 2024-03-02 14:01:06 +01:00
  • 616d7e9a9b server: do not truncate prompt tokens if self-extend through group attention is enabled Pierrick HYMBERT 2024-03-02 13:52:52 +01:00
  • 60113da241 server: tests: add group attention params Pierrick HYMBERT 2024-03-02 13:50:28 +01:00
  • ab5b06b2cf server: logs: do not truncate log values Pierrick HYMBERT 2024-03-02 13:49:18 +01:00
  • 18e739d61d server: tests: add passkey test Pierrick HYMBERT 2024-03-02 13:02:05 +01:00
  • 8bda1c1041
    Merge branch 'ggerganov:master' into server_branch pudepiedj 2024-03-02 12:09:07 +00:00
  • 319ded7dde server: tests: download model from HF, add batch size Pierrick HYMBERT 2024-03-02 13:01:57 +01:00
  • 1780d9601d server: tests: add debug field in context before scenario Pierrick HYMBERT 2024-03-02 12:50:55 +01:00
  • 715641391d
    Support multiple GPUs (split mode) on SYCL backend (#5806) b2312 Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
  • a8eeab2d58 Add q4_1, q5_0, q5_1 and q8_0 dequant mat mat mul shaders 0cc4m 2024-03-02 12:33:10 +01:00
  • 52186adcbe refactor ngxson 2024-03-02 12:07:38 +01:00
  • 0f774a81cd server: /v1/models add some metadata Pierrick HYMBERT 2024-03-02 12:06:12 +01:00
  • 68814783c5 Merge remote-tracking branch 'origin/master' into server_branch pudepiedj 2024-03-02 10:28:37 +00:00
  • 5d61ae8d2a Renaming some vars pudepiedj 2024-03-02 10:24:07 +00:00
  • 8f944baa77 refactor quantize multi thread ngxson 2024-03-02 11:11:52 +01:00
  • 2acb281105 Add soft_max alibi support 0cc4m 2024-03-02 09:40:01 +01:00
  • d4dfc250cc Fix ARM_NEON ik/iq3_s_faster Iwan Kawrakow 2024-03-02 10:12:02 +02:00
  • c76135401f remove warnings from comparison between int and size_t Minsoo Cheong 2024-03-02 16:45:07 +09:00
  • 73a7e42692 server: tests: add models endpoint scenario Pierrick HYMBERT 2024-03-02 07:37:49 +01:00
  • 93bce3c909 iq3_s: use new grid everywhere Iwan Kawrakow 2024-03-02 07:57:39 +02:00
  • d8f7064cb1 llama : remove redundant nullptr check in llm_arch_from_string Francis Couture-Harpin 2024-03-02 00:31:51 -05:00
  • 9bf297a02b
    workflows : remove nocleanup arg for check-requirements.sh (#5826) b2311 crasm 2024-03-02 00:11:06 -05:00
  • 44e33d4f37 llama : remove redundant inner const for LLM_TENSOR_NAMES compilade 2024-03-01 16:21:15 -05:00
  • 6cf481b3ac llama : make all LLM maps const Francis Couture-Harpin 2024-03-01 14:26:20 -05:00
  • 3b257f8867 llama : fix segfault from unknown model arch name Francis Couture-Harpin 2024-03-01 10:32:38 -05:00
  • 3cfb0f01c8
    "buildStatic" -> "enableStatic" hutli 2024-03-02 02:44:55 +01:00
  • 48830d706c
    "buildStatic" -> "enableStatic" hutli 2024-03-02 02:44:22 +01:00
  • db22ce347f
    using host platform isStatic as default value for "enableStatic" hutli 2024-03-02 02:42:22 +01:00
  • 886e68aee9 ok ngxson 2024-03-02 00:25:33 +01:00
  • cb5e8f7fc4
    build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
  • f0323d21a7 workflows : remove nocleanup arg for check-requirements.sh crasm 2024-03-01 16:58:17 -05:00
  • da3b9ba2b7
    convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) b2309 nold 2024-03-01 22:51:12 +01:00
  • 853b3d716d merge: debug ngxson 2024-03-01 22:07:50 +01:00
  • a1bf1e1b2e merge: quant input quant output ngxson 2024-03-01 22:01:27 +01:00
  • 7f0a1d66b5 convert-hf : make model class definitions self-contained Jared Van Bortel 2024-03-01 15:52:37 -05:00
  • 95845d17ec convert-hf : make actual types match annotations Jared Van Bortel 2024-03-01 15:19:59 -05:00
  • c29af7e225
    llama : add StarCoder2 support (#5795) b2308 Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
  • ee5b171250 address comment Sourab Mangrulkar 2024-03-02 00:43:33 +05:30
  • 5827ff401b json: add typings ochafik 2024-03-01 19:10:29 +00:00
  • c5bc1540d8 json: spaces in output and unrestricted output spaces ochafik 2024-03-01 19:10:04 +00:00
  • ed24688af8 json: temp fix for escapes ochafik 2024-03-01 19:08:59 +00:00
  • 148555cebc json: fix merge ochafik 2024-03-01 19:08:44 +00:00
  • bc0e0d9f55 json: support any ({} or {type: object}) ochafik 2024-03-01 19:06:19 +00:00
  • f6f851ba87 json: support allOf + nested anyOf ochafik 2024-03-01 19:05:38 +00:00
  • 82ade9f558 join: support union types (mostly for nullable types I think) ochafik 2024-03-01 19:04:42 +00:00
  • ea4244e1f7 json: fix $ref resolution ochafik 2024-03-01 19:04:04 +00:00
  • 12f0d7e84b json: resolve $ref (and support https schema urls) ochafik 2024-03-01 19:03:23 +00:00
  • 1428a85ff2 json: add support for pattern ochafik 2024-03-01 19:00:47 +00:00
  • a0f4cdd5d5
    chore: Switch to pkgs.nixfmt-rfc-style ditsuke 2024-03-01 23:37:19 +05:30
  • 11d4e099b4 iq3_s: PPL improvement Iwan Kawrakow 2024-03-01 20:01:30 +02:00
  • 38d16b1426
    server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
  • f8ab539190 convert : update help string ceb/convert-vocab-fallback Jared Van Bortel 2024-03-01 12:29:34 -05:00
  • f51554180a Merge remote-tracking branch 'origin/master' into server_branch pudepiedj 2024-03-01 17:26:01 +00:00
  • 767aef90be docs : s/LLaMa/LLaMA/ Jared Van Bortel 2024-03-01 12:22:59 -05:00
  • 17d22efa40 convert : automatically fall back to HfVocab if needed Jared Van Bortel 2024-03-01 12:08:54 -05:00
  • c2224f003b
    ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) b2306 ddpasa 2024-03-01 18:00:00 +01:00
  • e43e81a5d7 WIP Iwan Kawrakow 2024-03-01 18:48:08 +02:00
  • f09188e9d8 merge: add debug msg ngxson 2024-03-01 17:43:58 +01:00
  • e8e6103f42
    add support for starcoder2 Cocoa 2024-03-02 00:20:49 +08:00
  • abec8c0c3a merge: accept quant input ngxson 2024-03-01 17:19:18 +01:00
  • b47525df0a server tweak pudepiedj 2024-03-01 15:53:56 +00:00
  • 7b629c3b65 iq3_s: minor improvement on Metal Iwan Kawrakow 2024-03-01 17:46:33 +02:00
  • bf11052b9f buildStatic variable to toggle static builds hutli 2024-03-01 14:35:12 +01:00
  • 498e998cd7 buildStatic variable to toggle static builds hutli 2024-03-01 14:29:03 +01:00
  • 51ec91b93a using default LLAMA_STATIC cmake skipping build of shared libs and using pkgs.glibc.static hutli 2024-02-23 15:09:38 +01:00
  • b4ac6f0820 merge hutli 2024-02-15 14:25:04 +01:00
  • 9c5b594cde iq3_s: another small ARM_NEON improvement Iwan Kawrakow 2024-03-01 16:53:21 +02:00
  • d5a6d7a5cb Remove tiktoken package Anas Ahouzi 2024-03-01 06:45:03 -08:00
  • 6cdb9c44ae Revert back to GPT2 tokenizer Anas Ahouzi 2024-03-01 06:37:23 -08:00
  • e4cc412114 update according to review comments Jianyu Zhang 2024-03-01 22:30:32 +08:00
  • 1e94989156 iq3_s: somewhat faster ARM_NEON dot product Iwan Kawrakow 2024-03-01 16:22:33 +02:00
  • 1daaf30bde json: support required / optional properties ochafik 2024-03-01 14:15:27 +00:00
  • 3c339ce34a json: support additionalProperties ({[k: string]: [string,number][]}) ochafik 2024-03-01 14:14:45 +00:00
  • 2d9580a37b json: support tuple types ([number, string]) ochafik 2024-03-01 14:12:46 +00:00
  • 09248e0897 json: fix arrays (disallow [,1]) ochafik 2024-03-01 14:11:13 +00:00
  • e743386728
    gemma : fix bfloat16 -> float16 conversion issue (#5810) b2305 kunal-vaishnavi 2024-03-01 06:08:08 -08:00
  • 39e3a429c8 iq3_s: somewhat faster AVX2 dot product Iwan Kawrakow 2024-03-01 15:58:08 +02:00
  • 2cfae6d9a8 merge: try..catch ngxson 2024-03-01 14:50:42 +01:00
  • f49a535686
    common : fix flag --logits-all to --all-logits (#5805) b2304 Miwa / Ensan 2024-03-01 22:48:56 +09:00
  • 3e6e3668c9 merge: new input format ngxson 2024-03-01 14:47:00 +01:00
  • 15f233b9a1
    Merge pull request #1 from ggerganov/gg/fix-starcoder2 Sourab Mangrulkar 2024-03-01 19:05:35 +05:30
  • 9862d59c05
    llama : change starcoder2 rope type gg/fix-starcoder2 Georgi Gerganov 2024-03-01 15:10:31 +02:00
  • a1a42e023c ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken ddpasa 2024-03-01 10:53:12 +01:00