Commit graph

  • 4a471b12d6 don't shift if there's no truncation Paulo 2024-05-02 21:28:47 -03:00
  • 6a54973d82 Merge branch 'master' into compilade/convert-hf-refactor Francis Couture-Harpin 2024-05-02 20:02:46 -04:00
  • 60325fa56f
    Remove .attention from skipped tensors to match more accurately (#7051) b2782 Bartowski 2024-05-02 19:49:09 -04:00
  • f700301a3b Remove .attention from skipped tensors to match more accurately colin 2024-05-02 16:35:18 -04:00
  • 13f4cf70db convert-hf : use a plain class for Model, and forbid direct instantiation Francis Couture-Harpin 2024-05-02 15:50:21 -04:00
  • ce067af118 convert-hf : use an ABC for Model again Francis Couture-Harpin 2024-05-02 15:00:36 -04:00
  • c7a6c32882 fix: protect against slow tokenizer Joan Martinez 2024-05-02 18:01:00 +02:00
  • 6ecf3189e0
    chore: fix typo in llama.cpp (#7032) b2781 alwqx 2024-05-02 23:56:41 +08:00
  • c9fbea433c
    fix another typo on the same line Jared Van Bortel 2024-05-02 11:55:17 -04:00
  • 44af096490
    Merge branch 'ggerganov:master' into ag_cuda_graphs agray3 2024-05-02 16:37:04 +01:00
  • 566b32f08d
    llama : rename ctx to user_data in progress_callback Daniel Bevenius 2024-05-02 16:42:48 +02:00
  • 0f94ff7155 fix: only do pre tokenization and normalization Joan Martinez 2024-05-02 16:39:58 +02:00
  • 0672cd8f42 use conver ids to tokens Joan Martinez 2024-05-02 15:31:00 +02:00
  • 9cbad1b2cf Add test for command-r tokenizer. DAN™ 2024-05-02 07:21:11 -04:00
  • 8242447b7b Support handling of LFS for download. DAN™ 2024-05-02 07:17:05 -04:00
  • a1aa65e069 feat: change convert hf to gguf Joan Martinez 2024-05-02 12:02:26 +02:00
  • 14cd69a87d feat: add pre tokenization Joan Martinez 2024-05-02 11:59:03 +02:00
  • f6365b82cd Merge branch 'feat-jina-embeddings' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-05-02 11:23:49 +02:00
  • c95013d1b5 Whitespace formatting fixes. Stanisław Szymczyk 2024-05-02 09:53:59 +02:00
  • 3275e60f57
    falcon : fix regex Georgi Gerganov 2024-05-02 11:52:50 +03:00
  • 3a461dbff3
    tests : add test that fails with DeepSeek tokenizers Georgi Gerganov 2024-05-02 11:46:20 +03:00
  • cf00fe1ea3
    starcoder : fix pre-tokenizer Georgi Gerganov 2024-05-02 11:00:15 +03:00
  • 7053b261ab
    unicode : add all unicode number ranges Georgi Gerganov 2024-05-02 10:59:24 +03:00
  • ce7d3a0442
    tests : add test-tokenizer-0.sh Georgi Gerganov 2024-05-02 08:34:56 +03:00
  • e41b6ceee9 server: update tool calling, introduce system prompt for json schema ochafik 2024-05-02 04:54:58 +01:00
  • 08e2b7701f *.py: accidentally corrected the wrong line brian khuu 2024-05-02 13:35:54 +10:00
  • abf0ff0d2a
    Disable benchmark on forked repo Sigbjørn Skjæret 2024-05-02 04:52:33 +02:00
  • 2b2127c2a3 agent: url params ochafik 2024-05-02 03:20:25 +01:00
  • ca1a640da2 server: tool call grammar-constraints ochafik 2024-05-02 03:20:00 +01:00
  • a34ace9f52 Add BPE pre-tokenization for Command-R. DAN™ 2024-05-01 21:17:08 -04:00
  • e95be2907d chore: fix typo in llama.cpp alwqx 2024-05-02 09:11:33 +08:00
  • 644c2696d0 convert-hf : sort model part names Francis Couture-Harpin 2024-05-01 19:16:59 -04:00
  • 639b374b1a convert-hf : convert norms to f32 by default Francis Couture-Harpin 2024-05-01 19:02:34 -04:00
  • b0d943de17
    Update LOG_IMPL and LOG_TEE_IMPL (#7029) b2780 Andrew Downing 2024-05-01 17:31:30 -04:00
  • 21068b6bdf convert-hf : display tensor shape Francis Couture-Harpin 2024-05-01 16:59:21 -04:00
  • c814c8c2b9
    Update LOG_IMPL and LOG_TEE_IMPL Andrew Downing 2024-05-01 15:47:15 -04:00
  • 8d608a81b7
    main : fix off by one error for context shift (#6921) b2779 l3utterfly 2024-05-02 04:27:41 +09:00
  • 88ef908c90 examples : more roll back options for token healing mare5x 2024-04-30 20:04:35 +02:00
  • c77bb3203c examples : add simple token healing example mare5x 2024-04-30 13:38:14 +02:00
  • dcd8dfa1b5 convert : use a string for the SentencePiece tokenizer path Francis Couture-Harpin 2024-05-01 13:07:10 -04:00
  • 3870164f47 convert-hf : allow unusual model part names Francis Couture-Harpin 2024-05-01 12:30:20 -04:00
  • 3ea0d36000
    Server: add tests for batch size, different seeds (#6950) Johannes Gäßler 2024-05-01 17:52:55 +02:00
  • 154ad1236e convert-hf-to-gguf-update.py: use triple quoted f-string instead brian khuu 2024-05-02 01:47:41 +10:00
  • 56f60f5d69 convert-hf : flake8 linter doesn't like semicolons Francis Couture-Harpin 2024-05-01 11:36:23 -04:00
  • 6d42f3d773 revert changes to convert-hf-to-gguf.py for get_name() brian khuu 2024-05-02 01:35:33 +10:00
  • 547ed8a7ca convert.py: When --vocab-only is passed, generate false but valid params to allow vocab creation solely from tokenizer.model 20kdc 2024-05-01 16:23:51 +01:00
  • 1613ef8d8e
    CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) b2777 Johannes Gäßler 2024-05-01 14:46:37 +02:00
  • 58199503a8 Fall back if graph capture fails and address other comments Alan Gray 2024-04-30 06:19:51 -07:00
  • 859734eecc CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 Johannes Gäßler 2024-05-01 10:02:20 +02:00
  • 534db8eb3e
    If first token generated from the server is the stop word the server will crash maor-ps 2024-05-01 15:08:10 +03:00
  • 71d8bd6480 Added support for the snowflake-arctic model. Stanisław Szymczyk 2024-05-01 09:43:19 +02:00
  • c4ec9c0d3d
    ci : exempt confirmed bugs from being tagged as stale (#7014) b2776 slaren 2024-05-01 07:13:59 +02:00
  • 1e46fa8dce Merge remote-tracking branch 'origin/master' into 0cc4m/vulkan-moe 0cc4m 2024-05-01 06:49:32 +02:00
  • 6c1c4b4688 move stl return out of extern C Achazwl 2024-05-01 11:17:45 +08:00
  • 8af6f9c5df
    Merge branch 'ggerganov:master' into sgemm-avx Eve 2024-05-01 00:56:39 +00:00
  • 80736c556b
    Update llama.cpp Olivier Chafik 2024-05-01 01:55:09 +01:00
  • c7032d3d7a
    fix typo Jeximo 2024-04-30 21:23:44 -03:00
  • d2b4e1a8b1
    removed OpenBlas Jeximo 2024-04-30 21:18:44 -03:00
  • 57a37f19c2
    don't assume git is installed Jeximo 2024-04-30 20:51:53 -03:00
  • b115ad432e
    Tidy Android Instructions README.md Jeximo 2024-04-30 20:33:45 -03:00
  • cde9ea65e8 convert-hf : simplify MoE weights stacking Francis Couture-Harpin 2024-04-30 18:12:01 -04:00
  • e355a639de ci : exempt confirmed bugs from being tagged as stale slaren 2024-04-30 23:47:27 +02:00
  • a8f9b07631
    perplexity: more statistics, added documentation (#6936) b2775 Johannes Gäßler 2024-04-30 23:36:27 +02:00
  • 36f2faf0b4 remove pre-BPE fix tables Johannes Gäßler 2024-04-30 23:26:20 +02:00
  • 942e2be3ba feat: update server README with undocumented options Kyle Mistele 2024-04-30 17:06:38 -04:00
  • 9ff8d4d3ed Server: add tests for batch size, different seeds Johannes Gäßler 2024-04-27 23:30:15 +02:00
  • 698f0b3479 convert-hf : remove unused n_dims in extra_*_tensors Francis Couture-Harpin 2024-04-30 15:02:34 -04:00
  • c33775bcc7 convert : upgrade to sentencepiece v0.2.0 Francis Couture-Harpin 2024-04-30 15:01:23 -04:00
  • 990bf5711a grammar: add repetition tests ochafik 2024-04-30 19:52:51 +01:00
  • 0148661a60 Merge remote-tracking branch 'origin/master' into grammar-fast ochafik 2024-04-30 19:40:10 +01:00
  • 909e4c664b Revert "With mechanism to fall back if graph capture fails" Alan Gray 2024-04-30 11:39:59 -07:00
  • 476c97ddbd Merge remote-tracking branch 'origin/master' into grammar-reps ochafik 2024-04-30 19:39:45 +01:00
  • 0d720acb91 Merge branch 'master' into compilade/convert-hf-refactor Francis Couture-Harpin 2024-04-30 14:08:05 -04:00
  • 47e02eb7bc convert-hf : begin refactoring write_tensor Francis Couture-Harpin 2024-04-30 14:07:28 -04:00
  • 312e20b54a openai: update after merge Olivier Chafik 2024-04-30 18:29:08 +01:00
  • 7675ac6cf4 Merge remote-tracking branch 'origin/master' into agent-example Olivier Chafik 2024-04-30 18:11:40 +01:00
  • 3e560c8665 Fix flashattn Jerome 2024-04-20 11:06:03 -04:00
  • eb17c6232b update README Johannes Gäßler 2024-04-30 18:17:35 +02:00
  • 7eb14d5a6b Fix flash-attn for AMD Johannes Gäßler 2024-04-19 13:50:17 -04:00
  • f364eb6fb5
    switch to using localizedDescription (#7010) b2774 Kevin Gibbons 2024-04-30 08:14:02 -07:00
  • f3b414847a switch to using localizedDescription Kevin Gibbons 2024-04-30 07:43:38 -07:00
  • eb9f15fb6f With mechanism to fall back if graph capture fails Alan Gray 2024-04-30 06:19:51 -07:00
  • f4ae2540d4 hardcode error codes on metal Kevin Gibbons 2024-04-30 07:32:01 -07:00
  • 14073a2caf feat: proper KQ_pos for Jina embeddings Joan Martinez 2024-04-30 16:22:35 +02:00
  • bd7a95e799
    Merge b8aec23086 into 77e15bec62 yq-pan 2024-04-30 09:14:09 -04:00
  • 8d2dead681 Remove comment on assert that was failing joshcarp 2024-04-30 08:53:03 -04:00
  • 77e15bec62
    metal : remove deprecated error code (#7008) b2773 Georgi Gerganov 2024-04-30 15:52:21 +03:00
  • 896dee5059 Update joshcarp 2024-04-30 08:51:01 -04:00
  • d44e0fb22c Added more comprehensive graph node checking Alan Gray 2024-04-30 03:29:35 -07:00
  • e73ab4bd16 Merge branch 'feat-jina-embeddings' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-04-30 14:34:26 +02:00
  • d9b8dd667d fix: add some changes as per review Joan Martinez 2024-04-30 14:15:50 +02:00
  • 2835441a26 Merge branch 'feat-jina-embeddings' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-04-30 14:26:03 +02:00
  • da96368535 fix: add some changes as per review Joan Martinez 2024-04-30 14:15:50 +02:00
  • 587a9ede92
    metal : remove deprecated error code Georgi Gerganov 2024-04-30 14:58:02 +03:00
  • f8d1709061
    Merge branch 'master' into feat-jina-embeddings Joan Fontanals 2024-04-30 13:44:44 +02:00
  • 0c6d820b89 Style jaime-m-p 2024-04-30 13:18:25 +02:00
  • 2cd1eb0daa
    Add alternative regex for custom aplit llama3 jaime-m-p 2024-04-30 13:02:46 +02:00
  • 281a2d899e update README Johannes Gäßler 2024-04-30 00:21:45 +02:00
  • fcdd66a7a2 add LLaMA 3 8b scoreboard Johannes Gäßler 2024-04-27 18:44:29 +02:00
  • c31c0a8b4c perplexity: more statistics, added documentation Johannes Gäßler 2024-04-26 14:28:16 +02:00