Commit graph

  • 53dbba1ce8 server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127) Pierrick HYMBERT 2024-03-20 05:43:28 +01:00
  • 6c0b287748
    update readme sycl for new update (#6151) Neo Zhang Jianyu 2024-03-20 11:21:41 +08:00
  • d26e8b669d
    increase igpu cluster limit (#6159) b2466 Abhilash Majumder 2024-03-20 08:28:49 +05:30
  • d9feb41cb2 fix hip slaren 2024-03-20 03:37:40 +01:00
  • 37db7ef9d0
    Update README-sycl.md Neo Zhang Jianyu 2024-03-20 10:27:59 +08:00
  • 7befc54e82 fix leaks slaren 2024-03-20 03:20:22 +01:00
  • 4fd4344216 update for verify device id part jianyuzh 2024-03-20 09:56:46 +08:00
  • 64d6765a09 update w64devkit link jianyuzh 2024-03-20 09:50:53 +08:00
  • 9ac1008fd3 update by review comments jianyuzh 2024-03-20 09:42:07 +08:00
  • 329ae27760
    Update README-sycl.md Neo Zhang Jianyu 2024-03-20 09:39:49 +08:00
  • 89c37062a4
    Update README-sycl.md Neo Zhang Jianyu 2024-03-20 09:38:29 +08:00
  • 513fbf094e
    Merge branch 'ggerganov:master' into master Yingbei Tong 2024-03-20 00:08:13 +00:00
  • 177414d522
    support tools call field. tested on both local and openai Yingbei 2024-03-19 17:05:03 -07:00
  • 5b04360420 fix hip slaren 2024-03-19 23:44:04 +01:00
  • 9c72e1dc83 cuda : refactor to remove global resources slaren 2024-03-19 02:32:33 +01:00
  • 9a424a3872 server : fix tests expecting old repeat penalty compilade/fix-server-tests-penalty Francis Couture-Harpin 2024-03-19 17:12:28 -04:00
  • e5ed3077b5 fix build ngxson 2024-03-19 21:49:10 +01:00
  • e1c7a3fb01 server: version bump for httplib and json ngxson 2024-03-19 21:34:10 +01:00
  • 9419190533 Pipeline KV operations Branden Butler 2024-03-19 15:01:39 -05:00
  • 7a3c908512 Merge branch 'master' into sycl_readme_update OuadiElfarouki 2024-03-19 19:53:52 +00:00
  • 615a3a4a50 llama : clearer error messages for invalid logits or embeddings ids Francis Couture-Harpin 2024-03-19 15:01:21 -04:00
  • 8ddd557d3e Add fallback for integrated GPUs if no dedicated GPUs are found 0cc4m 2024-03-19 20:08:14 +01:00
  • bcdd6531db Default to all dedicated GPUs 0cc4m 2024-03-19 19:50:29 +01:00
  • d2de181d95 Port transactions from mpi-speculative, fix incorrect seq_id syncing (not tested) Branden Butler 2024-03-19 13:42:49 -05:00
  • 8f70dcb0f3 perplexity : make Winogrande work as it does on master Francis Couture-Harpin 2024-03-19 14:07:48 -04:00
  • ab5f5f57ea Trimed white spaces OuadiElfarouki 2024-03-19 17:48:46 +00:00
  • 458c98bb33
    Merge branch 'ggerganov:master' into cluster_igpu Abhilash Majumder 2024-03-19 21:58:13 +05:30
  • d8b009a945
    Remove undeed header file. (#6158) b2465 DAN™ 2024-03-19 12:16:09 -04:00
  • 57ac2e7268 Fix all examples to use new thread vector params, remove mpi example Branden Butler 2024-03-19 11:13:38 -05:00
  • 7fca458615 pragma unroll, use_mask template parameter Johannes Gäßler 2024-03-19 12:00:51 +01:00
  • be63161d04 Fix incorrect sched hash size, refactor new cmdline params to align with new style Branden Butler 2024-03-19 11:02:18 -05:00
  • 09dd23d92b server tests: fix connect on dual-stack systems Jared Van Bortel 2024-03-19 11:37:17 -04:00
  • 4479b85b8a server tests: handle TimeoutExpired exception Jared Van Bortel 2024-03-19 11:03:11 -04:00
  • 6014a63125
    Update build.yml fraxy-v 2024-03-19 16:52:01 +02:00
  • d80d415518
    Update examples/llava/convert-image-encoder-to-gguf.py Ziang Wu 2024-03-19 22:47:23 +08:00
  • 3e31203b8d
    Update examples/llava/MobileVLM-README.md Ziang Wu 2024-03-19 22:46:47 +08:00
  • 56360a77ae
    llama: fix formatting of llm_load_tensors logs Daniel Bevenius 2024-03-19 15:29:43 +01:00
  • 927be9b58e add test in build action Y. Velkov 2024-03-19 16:25:23 +02:00
  • 284800b1e3 convert-llama2c-to-ggml: enable conversion of multiqueries, #5608 Y. Velkov 2024-03-19 15:23:59 +02:00
  • 0d59e213da
    increase igpu cluster limit Abhilash Majumder 2024-03-19 17:56:47 +05:30
  • f93089d3f7 Addressed PR comments OuadiElfarouki 2024-03-19 11:59:11 +00:00
  • 7fc759b84f json: fix date pattern Olivier Chafik 2024-03-19 11:59:06 +00:00
  • 08f24b1e06 Remove undeed header file. DAN™ 2024-03-19 07:38:40 -04:00
  • d0d5de42e5
    gguf-split: split and merge gguf per batch of tensors (#6135) Pierrick Hymbert 2024-03-19 12:05:44 +01:00
  • 538db34084 Add nvidia and amd backends Aidan 2024-03-19 10:58:26 +00:00
  • 40dd802e5d feat: add chat template to existed gguf file bruce 2024-03-19 18:38:30 +08:00
  • f5fed7404e add quant types from cuda abhilash1910 2024-03-19 03:19:51 -07:00
  • 7466e4edef add quants abhilash1910 2024-03-19 03:12:36 -07:00
  • 2dc6830a27 gguf-split: remove --upload not implemented Pierrick HYMBERT 2024-03-19 10:42:33 +01:00
  • 86386e2ca7 Rework working buffer allocation, reduces vram use noticeably 0cc4m 2024-03-19 10:41:12 +01:00
  • 7f0e73b27a
    split : minor style + fix compile warnings Georgi Gerganov 2024-03-19 11:17:29 +02:00
  • 874599e749 json: create examples/json-schema-pydantic-example.py ochafik 2024-03-19 09:10:39 +00:00
  • 60392d78a0
    Update README-sycl.md Neo Zhang Jianyu 2024-03-19 16:58:33 +08:00
  • 28b6f88ee0
    Update README-sycl.md Neo Zhang Jianyu 2024-03-19 16:58:22 +08:00
  • 634e3fca41
    Update README-sycl.md Neo Zhang Jianyu 2024-03-19 16:58:12 +08:00
  • 6299e8d4c3
    Update README-sycl.md Neo Zhang Jianyu 2024-03-19 16:56:03 +08:00
  • 4b7aaae8f3
    Merge branch 'ggerganov:master' into iq2_s Abhilash Majumder 2024-03-19 14:08:56 +05:30
  • 7f70fbe227
    Merge pull request #6 from ggerganov/iq2_s Abhilash Majumder 2024-03-19 14:07:32 +05:30
  • b80cf3b2d1
    common : disable repeat penalties by default (#6127) b2463 Georgi Gerganov 2024-03-19 10:21:54 +02:00
  • 970a48060a
    ci : exempt some labels from being tagged as stale (#6140) b2462 slaren 2024-03-19 09:06:54 +01:00
  • 9fa92aa789 fix build abhilash1910 2024-03-19 00:13:08 -07:00
  • 4c28b82529
    common : print usage on '-h' and '--help' (#6145) b2461 DAN™ 2024-03-19 01:59:36 -04:00
  • a553def52e refactor logic abhilash1910 2024-03-18 22:11:19 -07:00
  • df23000795
    Update MobileVLM-README.md Ziang Wu 2024-03-19 12:13:34 +08:00
  • b1215c6d2a Add MobileVLM_V2 backup ZiangWu 2024-03-19 12:09:50 +08:00
  • cc551dfdfe Fix breaks in gpt_params_find_arg Branden Butler 2024-03-18 21:56:47 -05:00
  • 1d744d8226
    Merge branch 'master' into mpi-heterogenous Branden Butler 2024-03-18 21:49:31 -05:00
  • 155eeed9ae
    return raw content if parse failed Yingbei 2024-03-18 19:31:41 -07:00
  • 263a86e148 json: cleaner build of test ochafik 2024-03-19 02:12:15 +00:00
  • 84da0f553e update readme sycl for new update jianyuzh 2024-03-19 09:54:39 +08:00
  • 02e3bde6b4 json: don't complain about unknown format type in server if unset ochafik 2024-03-19 01:45:23 +00:00
  • d04cfaf2f5 llama : fix llama_output_reserve nullptr deref when new_size is 0 Francis Couture-Harpin 2024-03-18 21:26:08 -04:00
  • e7de6433cb json: catch schema conversion errors in server ochafik 2024-03-19 01:21:49 +00:00
  • 9bd7dbb17b
    a first working version integrated tree_sitter with python parser code Yingbei 2024-03-18 18:21:33 -07:00
  • 8b826c5b08 ggml : skip empty tensors in all backends Francis Couture-Harpin 2024-03-18 21:12:53 -04:00
  • 4551e7eba8 llama : use a vector for ctx->output_ids Francis Couture-Harpin 2024-03-18 20:51:32 -04:00
  • 09bb15a66a ggml : make ggml_is_empty public and work with views Francis Couture-Harpin 2024-03-18 20:21:02 -04:00
  • 05fd7e3020 json: fix json handling in server when there's no response_format ochafik 2024-03-18 20:46:57 +00:00
  • eecaf58e9a Homogenize Llama, Mistral, Mixtral under the same entry. Pedro Cuenca 2024-03-18 21:02:22 +01:00
  • 2d15886bb0 flake.lock: Update b2460 github-actions[bot] 2024-03-17 06:37:44 +00:00
  • 731134958f server tests : do not catch e.g. SystemExit; use print_exc Jared Van Bortel 2024-03-18 13:53:46 -04:00
  • 70d675580f server tests : use built-in subprocess features, not os.kill and psutil Jared Van Bortel 2024-03-18 13:45:03 -04:00
  • 4560484488 server tests : remove seemingly redundant newlines in print() Jared Van Bortel 2024-03-18 14:26:02 -04:00
  • 1cd0029edf Print usage on '-h' and '--help'. DAN™ 2024-03-18 14:20:39 -04:00
  • 5d30eb3236 removed outdated comment OuadiElfarouki 2024-03-18 18:13:24 +00:00
  • 77f313e31b Allow conversion of Mistral HF models Pedro Cuenca 2024-03-18 18:56:28 +01:00
  • 82f7512df3 Revisited & updated SYCL build documentation OuadiElfarouki 2024-03-18 17:05:14 +00:00
  • 32589a642f
    supress assert Abhilash Majumder 2024-03-18 22:25:19 +05:30
  • b3a94dd9e0 gguf-split: rename --split-tensors-size to --split-max-tensors. Set general.split_count KV to all split Pierrick HYMBERT 2024-03-18 17:51:20 +01:00
  • f2eb4ad472 ci : exempt some labels from being tagged as stale slaren 2024-03-18 17:50:40 +01:00
  • 33c72d02ff gguf-split: build with make toolchain Pierrick HYMBERT 2024-03-18 17:50:20 +01:00
  • d199ca79f2
    mpt : implement backwards compatiblity with duped output tensor (#6139) b2459 Jared Van Bortel 2024-03-18 12:49:02 -04:00
  • 3f4b657a27 fix missing argument to create_tensor Jared Van Bortel 2024-03-18 12:38:12 -04:00
  • e5da60b978 address review comment Jared Van Bortel 2024-03-18 12:35:01 -04:00
  • e2a4cacc0c mpt : implement backwards compatiblity with duped output tensor Jared Van Bortel 2024-03-18 12:22:32 -04:00
  • 104f5e0fc1
    clip : fix memory leak (#6138) b2458 Felix 2024-03-18 16:40:22 +01:00
  • 5e1b7f94a0
    backend : set max split inputs to GGML_MAX_SRC (#6137) b2457 slaren 2024-03-18 16:33:44 +01:00
  • ac55ee567d Fix memory leak in clip.cpp felrock 2024-03-18 15:48:18 +01:00
  • f315402d9b Merge remote-tracking branch 'origin/master' into 0cc4m/vulkan-improvements 0cc4m 2024-03-18 15:37:18 +01:00
  • 15617b870c
    format Abhilash Majumder 2024-03-18 19:43:09 +05:30