Commit graph

  • fa864af945 update shared state n_threads in parallel region slaren 2024-05-30 09:47:29 +02:00
  • 972b555ab9
    README: explain parallel build [no ci] (#7618) Johannes Gäßler 2024-05-30 09:52:39 +02:00
  • 7918ed7f2c ggml : Limit the number of threads used to avoid deadlock msy-kato 2024-05-30 14:05:52 +09:00
  • 0d75e07bd9
    Merge branch 'ggerganov:master' into server-ui-pr Yazan Agha-Schrader 2024-05-30 08:28:26 +02:00
  • 3854c9d07f
    [SYCL] fix intel docker (#7630) b3042 Meng, Hengyu 2024-05-30 14:19:08 +08:00
  • 02f41853e4 add missed in server Meng, Hengyu 2024-05-30 05:26:29 +00:00
  • 73747fe8eb proof-of-concept stdlib implementation Christian Zhou-Zheng 2024-05-30 00:31:29 -04:00
  • b454089d66 reset intel docker in CI Meng, Hengyu 2024-05-30 02:52:13 +00:00
  • ff46daec13 workaround for https://github.com/intel/oneapi-containers/issues/70 Meng, Hengyu 2024-05-30 02:51:19 +00:00
  • 505d0a3346 move new ui to "/public" due to otherwise problematic CORS behaviour Yazan Agha-Schrader 2024-05-30 04:00:56 +02:00
  • 8b937a1a71 add a button to the new ui Yazan Agha-Schrader 2024-05-30 03:59:28 +02:00
  • e41e73412c
    Update main-intel.Dockerfile Meng, Hengyu 2024-05-30 09:55:53 +08:00
  • d3179662cd
    Merge branch 'ggerganov:master' into add-inferenceable Sameer Charles 2024-05-30 10:59:56 +10:00
  • 6f9718b69a
    Merge branch 'ggerganov:master' into master Andrew Ferruolo 2024-05-29 20:52:58 -04:00
  • eb57fee51f
    gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627) Galunid 2024-05-30 02:10:40 +02:00
  • 734be4dcc9 Merge branch 'master' into server-ui-pr Yazan Agha-Schrader 2024-05-30 01:47:22 +02:00
  • d55081767c fix css path Yazan Agha-Schrader 2024-05-30 01:17:47 +02:00
  • 89b1b38144 move files, clean code Yazan Agha-Schrader 2024-05-30 01:13:10 +02:00
  • fef99155cc Build vocab.special_tokens_cache using vocab token types jaime-m-p 2024-05-29 23:57:41 +02:00
  • c8f93774c9 Fix MUL_MAT_ID matrix vector shader dispatch size 0cc4m 2024-05-29 23:38:07 +02:00
  • 8ecdda1e5e Merge remote-tracking branch 'origin/master' into 0cc4m/vulkan-moe 0cc4m 2024-05-29 22:42:59 +02:00
  • 45928e8d21 Fix MUL_MAT_ID matrix matrix shader 0cc4m 2024-05-29 22:42:07 +02:00
  • 62820eea95 Fix accidental print Galunid 2024-05-29 22:40:29 +02:00
  • 63de7201fa set default prompt to empty Yazan Agha-Schrader 2024-05-29 22:34:15 +02:00
  • dcdc11a5c4 add cmd-r prompt et reduce redundancy Yazan Agha-Schrader 2024-05-29 22:24:24 +02:00
  • 87bcbbb6c2 fix toggle state localstorage Yazan Agha-Schrader 2024-05-29 22:23:40 +02:00
  • cc7aef6829 fix AMD Johannes Gäßler 2024-05-29 21:39:57 +02:00
  • 55d62262a9
    metal : remove invalid asserts (#7617) b3040 Georgi Gerganov 2024-05-29 22:20:40 +03:00
  • 8a8f8b953f
    llama : print a log of the total cache size gg/cache-token-to-piece Georgi Gerganov 2024-05-29 21:44:55 +03:00
  • 95a5896870 Add tokenizer.ggml.pre to gguf-new-metadata.py Galunid 2024-05-29 20:35:24 +02:00
  • 62056fa679 add autogenerated .cu files Johannes Gäßler 2024-05-29 20:14:31 +02:00
  • 2eb0f7f7e8 make generate_cu_files.py executable Johannes Gäßler 2024-05-29 20:12:16 +02:00
  • 9740ae0adc fix cmake Johannes Gäßler 2024-05-29 20:10:10 +02:00
  • 1494a1841e
    llama : throw on unknown tokenizer types Georgi Gerganov 2024-05-29 21:06:56 +03:00
  • c2badb4697 add hacky llama2 prompt solution, reduce redundancy in promptFormats.js Yazan Agha-Schrader 2024-05-29 20:03:20 +02:00
  • 21ccd645df
    llama : use vectors and avoid has_cache Georgi Gerganov 2024-05-29 20:56:52 +03:00
  • 975ec63ff2
    metal : add missing asserts (#7617) b3039 Georgi Gerganov 2024-05-29 20:45:25 +03:00
  • af95ae49a3 fix metal tests Johannes Gäßler 2024-05-29 19:34:37 +02:00
  • 9964cd02f7
    llama : cache llama_token_to_piece Georgi Gerganov 2024-05-28 13:15:27 +03:00
  • fb76ec31a9
    ggml : fix YARN + add tests + add asserts (#7617) b3038 Georgi Gerganov 2024-05-29 20:17:31 +03:00
  • de0f0d0016
    Merge branch 'master' into auto-model-support teleprint-me 2024-05-29 12:46:06 -04:00
  • aa15e7d246
    tests : reduce RoPE tests Georgi Gerganov 2024-05-29 19:01:06 +03:00
  • a69935baac
    ggml : assert contiguousness Georgi Gerganov 2024-05-29 18:55:10 +03:00
  • 61d44b0089 fix flake8 Johannes Gäßler 2024-05-29 17:09:25 +02:00
  • 182f70648e docker.yml: reenable intel docker builds brian khuu 2024-05-30 01:06:37 +10:00
  • 5db268c9d8
    cuda : add asserts for rope/norm + fix DS2 Georgi Gerganov 2024-05-29 17:44:17 +03:00
  • 1e41f2fc7e
    tests : add non-cont tests Georgi Gerganov 2024-05-29 15:21:02 +03:00
  • b822605abd
    ggml : fixes (hopefully) Georgi Gerganov 2024-05-29 14:50:22 +03:00
  • 9d5605f965
    tests : add rope tests Georgi Gerganov 2024-05-29 14:31:50 +03:00
  • f0e220bacf Add brew installation instruction to README [no ci] Manuel 2024-05-29 12:58:28 +02:00
  • d193b2dcdf README: explain parallel build [no ci] Johannes Gäßler 2024-05-29 14:50:39 +02:00
  • cce3dcffc5
    cuda : non-cont concat support (#7610) b3037 Georgi Gerganov 2024-05-29 15:38:26 +03:00
  • 84d9277fe2 split fattn compile via extern templates Johannes Gäßler 2024-05-28 16:45:22 +02:00
  • 1c24ab6e20 move prompt style Yazan Agha-Schrader 2024-05-29 14:09:19 +02:00
  • 210d99173d
    llama-bench : add support for the RPC backend (#7435) b3036 Radoslav Gerganov 2024-05-29 14:45:44 +03:00
  • 87bdf2a199
    ggml : use atomic_flag for critical section (#7598) b3035 slaren 2024-05-29 13:36:39 +02:00
  • f2ef89415c do not separate with new line or comma Yazan Agha-Schrader 2024-05-29 13:36:07 +02:00
  • 39a163f76e add missing char Yazan Agha-Schrader 2024-05-29 13:32:33 +02:00
  • 00281b7be3
    scripts : remove mpi remnants Georgi Gerganov 2024-05-29 14:31:18 +03:00
  • 2ab977282b
    sync : ggml b3033 Georgi Gerganov 2024-05-29 14:29:52 +03:00
  • 72de268bec
    ggml : restore ggml_rope_xpos_inplace (ggml/0) Georgi Gerganov 2024-05-26 18:35:23 +03:00
  • 513406ab60 add more comon stop tokens Yazan Agha-Schrader 2024-05-29 13:29:00 +02:00
  • 80b6143f78 more prompt format fixes Yazan Agha-Schrader 2024-05-29 13:19:22 +02:00
  • ca565f4ed6 fix llama3 prompt template Yazan Agha-Schrader 2024-05-29 12:08:39 +02:00
  • 9fa0aa53f5 fix chatml & add llama3 format Yazan Agha-Schrader 2024-05-29 11:26:34 +02:00
  • e9b940e3b7 github: add contact links to issues and convert question into research [no ci] brian khuu 2024-05-29 19:02:34 +10:00
  • 5fa255edfb add user message suffix Yazan Agha-Schrader 2024-05-29 10:28:07 +02:00
  • 7291bd8323
    Merge b414599a72 into 0e8d8bfd6c kunnis 2024-05-29 09:40:57 +02:00
  • 3136da6f47
    Merge branch 'ggerganov:master' into add-inferenceable towardmay 2024-05-29 17:15:30 +10:00
  • 738008fbcc
    cuda : non-cont concat support Georgi Gerganov 2024-05-29 10:14:34 +03:00
  • eac8d739a5 update forgotten css theme Yazan Agha-Schrader 2024-05-29 08:54:04 +02:00
  • 0e8d8bfd6c
    Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) Akarshan Biswas 2024-05-29 12:23:47 +05:30
  • 3e88692988 llama-bench : add support for the RPC backend Radoslav Gerganov 2024-05-21 11:19:42 +03:00
  • 81c983a4cf
    tests : add non-cont concat tests Georgi Gerganov 2024-05-29 09:45:33 +03:00
  • aa493e022d add css class Yazan Agha-Schrader 2024-05-29 08:45:20 +02:00
  • 200c39cc6b batched : make n_threads and n_threads_batch configurable in batched & batched-bench msy-kato 2024-05-27 18:52:57 +09:00
  • 993c5f3389 Readme: add HyperMink/inferenceable to HTTP server nobody 2024-05-29 15:52:03 +10:00
  • 88dc99a20b Merge branch 'threadpool' of https://github.com/CodeLinaro/llama.cpp into threadpool fmz 2024-05-28 22:35:31 -07:00
  • 65c11d415d llama-bench threadpool CLI params fmz 2024-05-28 22:31:15 -07:00
  • f3bc337f43 optimize convert-hf-to-gguf.py for chatglm model XingXing Qiao 2024-05-16 11:42:53 +08:00
  • f626b7175c fix lint error XingXing Qiao 2024-05-24 14:13:36 +08:00
  • 5a914ffce0 remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model XingXing Qiao 2024-05-15 11:00:04 +08:00
  • 6630a2da48 add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b XingXing Qiao 2024-05-29 13:30:07 +08:00
  • e300f86096
    Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro Akarshan Biswas 2024-05-29 10:58:16 +05:30
  • e9a70b10c2 ggml: Added OpenMP for multi-threads processing msy-kato 2024-05-29 13:33:37 +09:00
  • 9bb074e1f6 add phi3 to dropdown Yazan Agha-Schrader 2024-05-29 06:28:27 +02:00
  • be675948d4 add phi-3 prompt template Yazan Agha-Schrader 2024-05-29 05:28:52 +02:00
  • 504f0c340f
    ggml : fix typo in ggml.c (#7603) b3030 zhouwg 2024-05-29 10:09:31 +08:00
  • d53f405727
    fix typo in ggml.c zhou.weiguo 2024-05-29 09:33:35 +08:00
  • 6a725cf2d1
    Merge branch 'master' into auto-model-support teleprint-me 2024-05-28 19:19:08 -04:00
  • 5c92809397
    refactor: Apply updates to example script for generating the registry teleprint-me 2024-05-28 19:16:52 -04:00
  • f1d067e7a6
    refactor: Simplify huggingface hub api and update to reflect changes in constants.py teleprint-me 2024-05-28 19:16:32 -04:00
  • b864b50ce5
    [SYCL] Align GEMM dispatch (#7566) b3029 Meng, Hengyu 2024-05-29 07:00:24 +08:00
  • aa28cfe6ec
    chore: Fix import path, token comparisons, and update token type references teleprint-me 2024-05-28 18:44:57 -04:00
  • 9dbc9571a3
    refactor: Simplify tokenizers implementation teleprint-me 2024-05-28 18:42:39 -04:00
  • 19db321a61 add windows shims slaren 2024-05-28 22:47:52 +02:00
  • c38d152d7d fix warnings caitianchi 2024-05-29 04:35:08 +08:00
  • 45979b05f4 ggml : use atomic_flag for critical section slaren 2024-05-28 22:11:32 +02:00
  • 07f48f9669 fix warnings caitianchi 2024-05-29 04:09:44 +08:00
  • 02c1ecad07
    Tokenizer WPM fixes (#7500) b3028 jaime-m-p 2024-05-28 21:46:34 +02:00