Commit graph

  • 86d1d84642 basic avx implementation netrunnereve 2024-04-22 23:35:02 -04:00
  • 4e96a812b3
    [SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 flag activated (#6767) b2716 Anas Ahouzi 2024-04-23 02:53:18 +02:00
  • b1a189115e Server: add tests for consistent results Johannes Gäßler 2024-04-22 22:02:37 +02:00
  • 192090bae4
    llamafile : improve sgemm.cpp (#6796) b2715 Justine Tunney 2024-04-22 15:00:36 -04:00
  • 90e99eaf1c fix an offset error, and get rid of tabs. Julia Longtin 2024-04-22 18:29:31 +00:00
  • 6d16090246 fix some small errors. Julia Longtin 2024-04-22 18:22:22 +00:00
  • e298d9e65e further optimizations. 0.99 tokens per second. Julia Longtin 2024-04-22 18:16:28 +00:00
  • d22488c040
    Address review comments Justine Tunney 2024-04-22 11:07:33 -07:00
  • ca37f7d2c5 Remove dubug flag mann1x 2024-04-22 20:12:15 +02:00
  • ca0409fae4
    Merge branch 'ggerganov:master' into mannix-server-startup ManniX-ITA 2024-04-22 20:10:50 +02:00
  • b188c9c983 CpuSet support for Windows mann1x 2024-04-22 20:08:51 +02:00
  • 309a918ed7
    Change boolean_t to bool type Tevin Wang 2024-04-22 13:49:17 -04:00
  • c70bfd7bcb
    cuda : "constexpr dim3" -> "const dim3" Georgi Gerganov 2024-04-22 20:31:23 +03:00
  • af5af31aa0
    Update boolean isShard to snake case Tevin Wang 2024-04-22 13:29:56 -04:00
  • c2691d968a disable for multi-gpu and batch size > 1 Alan Gray 2024-04-22 09:01:44 -07:00
  • 5408d55506
    cuda : uint -> uint32_t Georgi Gerganov 2024-04-22 19:12:06 +03:00
  • 71ff763e0e feat: set proper vocab settings Joan Martinez 2024-04-22 18:02:48 +02:00
  • 64cd4b1339 fix: fix linting and editor Joan Martinez 2024-04-22 17:42:48 +02:00
  • 141eb5107f Update llama_model_quantize_params z5269887 2024-04-22 23:38:09 +08:00
  • d6e453eb6c Split model correctly even if tensor id is out-of-order z5269887 2024-04-22 23:16:23 +08:00
  • 6d66e609b5
    Update examples/quantize/quantize.cpp jiez 2024-04-22 22:27:25 +08:00
  • e931888d50
    ggml : fix calloc argument ordering. (#6820) b2714 Dave Airlie 2024-04-23 00:05:06 +10:00
  • 8960fe86ae
    llama : fix typo in <|im_end|> token text (#6745) Georgi Gerganov 2024-04-22 15:41:11 +03:00
  • 0054f3681b
    make github CI happy zhou.weiguo 2024-04-22 20:13:45 +08:00
  • 800f4fe48e Tidied to now only use CUDA runtime (not mixed with driver calls) Alan Gray 2024-04-22 04:50:39 -07:00
  • f725ca90fb
    ggml : ggml_soft_max support F16/F32 mask/pos Georgi Gerganov 2024-04-22 13:46:23 +03:00
  • c1c0f4d883 fix: fix convert formatting Joan Martinez 2024-04-22 13:45:32 +02:00
  • db7e8ce58f
    Merge branch 'master' into feat-jina-embeddings Joan Fontanals 2024-04-22 13:31:24 +02:00
  • d6ac931b7a fix: fix small detail Joan Martinez 2024-04-22 13:23:00 +02:00
  • c0956b09ba
    ci: fix job are cancelling each other (#6781) b2712 Pierrick Hymbert 2024-04-22 13:22:54 +02:00
  • 795ff1d3d3 fix: revert some changes Joan Martinez 2024-04-22 13:20:03 +02:00
  • e2323706e4 fix: revert changes to Makefile and CMakeLists Joan Martinez 2024-04-22 13:15:34 +02:00
  • c229e48937 fix: do some cleanup unused vars Joan Martinez 2024-04-22 13:12:14 +02:00
  • 63a1d7c0be fix: clean prints Joan Martinez 2024-04-22 13:06:05 +02:00
  • cf1c1447e3 fix: fix usage of ALIBI Joan Martinez 2024-04-22 13:05:26 +02:00
  • c8dd0e7c1c FIx issues raised in comments Alan Gray 2024-04-22 01:32:06 -07:00
  • e9b4a1bf68 flake.lock: Update github-actions[bot] 2024-04-21 00:17:47 +00:00
  • c11d05fec0
    llama : force disable flash attention for incompatible models Georgi Gerganov 2024-04-22 12:50:41 +03:00
  • cb76d747d1
    ggml : fix num dimensions in ggml_flash_attn_ext Georgi Gerganov 2024-04-22 12:50:26 +03:00
  • a39217d428
    common : print --flash-attn in help Georgi Gerganov 2024-04-22 12:50:10 +03:00
  • 124e4dced2 Update test-bench Aidan 2024-04-22 10:42:32 +01:00
  • a202b56127 add header ngxson 2024-04-22 09:04:24 +02:00
  • 98c46cfbfa fix llama_chat_apply_template ngxson 2024-04-22 08:49:57 +02:00
  • 4a7f9b4aee ggml : fix calloc argument ordering. Dave Airlie 2024-04-22 15:14:37 +10:00
  • d6df8ecb9c refactor chat template api ngxson 2024-04-22 06:45:01 +02:00
  • 9cba545fbf
    ggml: add new member in GGML's internal data structure zhou.weiguo 2024-04-22 07:14:19 +08:00
  • 0de4b6d0fb corrected style jhs-panda 2024-04-21 19:04:26 -04:00
  • 465a1a2fa5 removed print statements and cleaned up code jhs-panda 2024-04-21 18:56:24 -04:00
  • 64e2abe95b does not show progress bar if downloaded in shards jhs-panda 2024-04-21 18:08:31 -04:00
  • 6b220dca32
    Help clang produce fma instructions Justine Tunney 2024-04-21 12:22:39 -07:00
  • 8360e0c960
    no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Pierrick Hymbert 2024-04-21 21:00:34 +02:00
  • 9d4d14c9b0
    Address review comments Justine Tunney 2024-04-21 11:56:15 -07:00
  • a9a2983630 Merge remote-tracking branch 'origin/master' into grammar-reps ochafik 2024-04-21 18:52:34 +01:00
  • 5cf5e7d490
    build: generate hex dump of server assets during build (#6661) b2710 Olivier Chafik 2024-04-21 18:48:53 +01:00
  • 24769f9a80 grammars: fix bad merge ochafik 2024-04-21 18:34:59 +01:00
  • 1fb300c311 Merge remote-tracking branch 'origin/master' into grammar-fast ochafik 2024-04-21 18:31:21 +01:00
  • 5cf8ccb191
    llama : minor Georgi Gerganov 2024-04-21 20:06:30 +03:00
  • eb9a1ff63d add chat_get_added_part ngxson 2024-04-21 18:13:42 +02:00
  • 40f74e4d73
    llama : add option to render special/control tokens (#6807) b2709 Georgi Gerganov 2024-04-21 18:36:45 +03:00
  • cbc75809be grammars: faster llama_grammar_copy ochafik 2024-04-21 15:52:25 +01:00
  • f608415de0 grammars: cache decoded tokens ochafik 2024-04-21 15:52:16 +01:00
  • 98f33bae76 grammars: early exit when no next_candidates to reject ochafik 2024-04-21 01:12:05 +01:00
  • b9cc76d87e
    ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) b2708 Georgi Gerganov 2024-04-21 16:47:57 +03:00
  • 8ac7656bd1 examples/main: basic multimodal support ported from llava-cli Ivan Chikish 2024-04-21 16:21:09 +03:00
  • d2b7b46225 llava.cpp: allow --image from pipes/sockets Ivan Chikish 2024-02-26 01:34:39 +03:00
  • 7dbdba5690
    llama : add llama-3 chat template (#6751) b2707 Wouter 2024-04-21 15:03:39 +02:00
  • ed5d273c4d
    swift : fix build Georgi Gerganov 2024-04-21 16:02:41 +03:00
  • c1386c936e
    gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761) pmysl 2024-04-21 14:49:30 +02:00
  • e8d35f47cb
    doc : add link to falcon (#6789) Jan Boon 2024-04-21 20:35:40 +08:00
  • 2cca09d509
    readme : add Fedora instructions (#6783) Mohammadreza Hendiani 2024-04-21 16:02:05 +03:30
  • 16f8bba496
    Merge branch 'master' into master Georgi Gerganov 2024-04-21 15:21:38 +03:00
  • 89b0bf0d5d
    llava : use logger in llava-cli (#6797) Justine Tunney 2024-04-21 08:19:04 -04:00
  • 1f45c2adc7
    readme : add API change notice Georgi Gerganov 2024-04-21 15:15:39 +03:00
  • 0a37fb2357
    llama : add option to render special tokens Georgi Gerganov 2024-04-21 15:12:15 +03:00
  • f53661a595
    make : fix common dep on llama.h Georgi Gerganov 2024-04-21 15:11:51 +03:00
  • b97bc3966e
    llama : support Llama 3 HF conversion (#6745) b2702 Pedro Cuenca 2024-04-21 13:50:41 +02:00
  • ff5d21e608 switch to namedtuple, no need to dataclass Sigbjørn Skjæret 2024-04-21 11:43:43 +02:00
  • e5956f5bbe make script executable Sigbjørn Skjæret 2024-04-21 11:42:12 +02:00
  • c971ac034c
    llama : fix model type string for 8B model Georgi Gerganov 2024-04-21 12:03:21 +03:00
  • 23b8dd7dd4
    llama : fix codegemma EOT token + add TODOs Georgi Gerganov 2024-04-21 11:15:45 +03:00
  • 7ab0939c0d
    convert : replacing EOS token is a hack Georgi Gerganov 2024-04-21 11:15:18 +03:00
  • d0a4cc8ec8
    llama : auto-detect more EOT tokens when missing in KV data Georgi Gerganov 2024-04-21 11:14:19 +03:00
  • ce89dad24c flake.lock: Update github-actions[bot] 2024-04-21 00:17:47 +00:00
  • d9fd65b0e3
    llava : use logger in llava-cli Justine Tunney 2024-04-20 13:32:37 -07:00
  • 3e4fc41505
    llamafile : improve sgemm.cpp Justine Tunney 2024-04-20 13:04:45 -07:00
  • 9037892127 ChatON: Add a note HanishKVC 2024-04-20 23:42:25 +05:30
  • e23b5c8689 ChatOn+Main: Cleanup the Requested ChatOn ReversePrompt handling HanishKVC 2024-04-20 23:26:16 +05:30
  • ca55da2b6f ChatOn+Main: ChatApplyTemplateSimple cleanup HanishKVC 2024-04-20 22:41:27 +05:30
  • aac2ee6e9d Common:ChatOn+Main:DBUG: Cleanup ChatTmplSimp, RevPrompt Llama2 HanishKVC 2024-04-20 20:08:00 +05:30
  • b8109bc013
    doc : server tests require llama to be built with curl enabled (#6788) b2701 Jan Boon 2024-04-21 00:29:50 +08:00
  • bb98fc6870 doc : server tests require llama to be built with curl enabled Jan Boon 2024-04-21 00:18:56 +08:00
  • 4ba357f68c doc : add link to falcon Jan Boon 2024-04-21 00:04:40 +08:00
  • 4b47c24bf2
    llama : add llama_token_is_eog() Georgi Gerganov 2024-04-20 16:46:46 +03:00
  • 3750706962
    llama : add llama_token_is_eog() gg/llama3-support Georgi Gerganov 2024-04-20 16:46:46 +03:00
  • 2b2fd541c2
    fix title Mohammadreza Hendiani 2024-04-20 17:11:33 +03:30
  • 92b6b94602
    removed old instructions Mohammadreza Hendiani 2024-04-20 17:08:03 +03:30
  • cf74dd7fbf
    made instructions more clean Mohammadreza Hendiani 2024-04-20 17:06:42 +03:30
  • f83cf85c85
    removed rest of 'cli commands' Mohammadreza Hendiani 2024-04-20 17:02:02 +03:30
  • 4adabc6640
    renamed 'cli commands' to 'numa' as to be coherent with rest of readme Mohammadreza Hendiani 2024-04-20 16:58:05 +03:30
  • 0a8797b28e Main:Update to support chaton mode HanishKVC 2024-04-20 18:40:55 +05:30