Commit graph

  • 30f15760d8
    Update CMakeLists.txt clibdev 2024-02-18 18:24:50 +02:00
  • 66c1968f7a
    server : graceful server shutdown (#5244) b2176 Daniel Hiltgen 2024-02-18 08:23:16 -08:00
  • 1dcc3fde00
    common : fix ub (#5530) b2175 Georgi Gerganov 2024-02-18 18:21:52 +02:00
  • 5d3de51f97
    ggml, common, examples, tests : fixed type arguments in printf (#5528) b2174 Herman Semenov 2024-02-18 16:20:12 +00:00
  • fc0c8d286a
    llava : update surgery script to not remove tensors (#5536) Daniel Bevenius 2024-02-18 17:19:23 +01:00
  • bd2d4e393b
    1.5 bit quantization (#5453) b2172 Kawrakow 2024-02-18 18:16:55 +02:00
  • c8e0d7efeb flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • 25ed501ef1 server with graphics flag -skvg pudepiedj 2024-02-18 13:00:19 +00:00
  • aed7507eb8 server branch update with graphics pudepiedj 2024-02-18 12:08:59 +00:00
  • 9f70b0ed1d Merge branch 'master' into fix-cmake-cuda lindevs 2024-02-18 10:40:22 +02:00
  • eb7a979d65 harmonizing #if defined blocks for numa code to __gnu_linux__ since that's the only model that's being followed anyways root 2024-02-18 07:51:15 +00:00
  • 0f7495469c
    Add penalty_threshold parameter Philipp Emanuel Weidmann 2024-02-18 12:08:16 +05:30
  • fb770241e0 Changed gates on numa platform specific stuff to __gnu_linux__ to skip any platforms without glibc root 2024-02-18 00:38:59 +00:00
  • f01e8e5946 flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • 458bd9b7f5 added in some __ANDROID__ if def gates around numa code and forced GLIBC prior to 2.29 to use a syscall for getcpu instead of the wrapper root 2024-02-18 00:07:38 +00:00
  • 2890de4ecf #ifdef out some code NUMA blocks for Android due to lack of support root 2024-02-17 21:44:42 +00:00
  • 8f1be0d42f
    ggml : add ALiBi support for ggml_soft_max_ext (#5488) Georgi Gerganov 2024-02-17 23:04:16 +02:00
  • 6e4e973b26
    ci : add an option to fail on compile warning (#3952) Ananta Bastola 2024-02-17 16:03:14 -05:00
  • 2f2973de78 remove commented code CJ Pais 2024-02-17 12:42:24 -08:00
  • 6d9fea6aae move clip_image to header CJ Pais 2024-02-17 12:40:34 -08:00
  • 44a80b4119 CUDA: switch tile sizes based on binary version JohannesGaessler 2024-02-17 20:09:13 +01:00
  • 4290af090b
    Merge 2d0826247b into d250c9d61d shibe2 2024-02-17 14:41:35 -05:00
  • 8354b7d2ad server: init working 1.6 CJ Pais 2024-02-17 11:32:04 -08:00
  • dba4337f85 Merge branch 'master' into xsn/chat_apply_template ngxson 2024-02-17 20:28:09 +01:00
  • 011af99af8 llama_chat_apply_template: correct docs ngxson 2024-02-17 20:24:05 +01:00
  • aaa20e1f10 test-backend-ops : add null pos test to soft_max slaren 2024-02-17 18:16:27 +01:00
  • b8e4267054
    Merge af50604c6e into d250c9d61d Herman Semenov 2024-02-17 21:51:48 +05:00
  • d250c9d61d
    gitignore : update for CLion IDE (#5544) b2168 clibdev 2024-02-17 18:28:37 +02:00
  • 974e3cadff
    ggml : try another fix gg/fix-android Georgi Gerganov 2024-02-17 18:14:35 +02:00
  • 8214cbca22
    Merge branch 'master' into HEAD Georgi Gerganov 2024-02-17 18:08:04 +02:00
  • bfa03a02a7
    ci : add fatal warnings to ggml-ci Georgi Gerganov 2024-02-17 18:06:25 +02:00
  • 1c3e59229b
    ci : disable fatal warnings for MPI build Georgi Gerganov 2024-02-17 18:05:25 +02:00
  • ffab4152aa
    ggml : fix strncpy warning Georgi Gerganov 2024-02-17 18:03:56 +02:00
  • 7a3eac8cb3 llama_chat_apply_template: add zephyr template ngxson 2024-02-17 16:54:30 +01:00
  • e9caab61a2
    ggml : no cpu_set_t on Android Georgi Gerganov 2024-02-17 17:50:39 +02:00
  • 6012ad651f add clarification for llama_chat_apply_template ngxson 2024-02-17 16:45:31 +01:00
  • ce8fe5d88d server: document new status no slot available in the README.md Pierrick HYMBERT 2024-02-17 15:05:14 +01:00
  • d1124575e3 server: fix print usage LF in new --n-predict option Pierrick HYMBERT 2024-02-17 14:54:33 +01:00
  • 8852de34be server: ensure client request cannot override n_predict if set Pierrick HYMBERT 2024-02-17 14:36:47 +01:00
  • cf7137e8d6 server: document --n-predict Pierrick HYMBERT 2024-02-17 13:18:00 +01:00
  • fb1c1d0fb1 server: enrich health endpoint with available slots, return 503 if not slots are available Pierrick HYMBERT 2024-02-17 12:45:17 +01:00
  • da50d12c45 update gitignore for clion ide clibdev 2024-02-17 11:35:00 +02:00
  • ce4a979c90 cmake cuda: fix clibdev 2024-02-17 11:27:31 +02:00
  • 168a68395c Commit all of update_token_print pudepiedj 2024-02-17 08:02:10 +00:00
  • 7d64a4d28d Wire up graceful server shutdown Daniel Hiltgen 2024-01-29 15:31:03 -08:00
  • 242f0e1c1f Trying to understand server.cpp pudepiedj 2024-02-16 17:23:42 +00:00
  • 5bf2b94dd4
    cmake : fix VULKAN and ROCm builds (#5525) b2167 Georgi Gerganov 2024-02-16 19:05:56 +02:00
  • 9c4422fbe9 chat_template: do not use std::string for buffer ngxson 2024-02-16 17:04:01 +01:00
  • bba75c792f test-chat-template: remove dedundant vector ngxson 2024-02-16 16:33:49 +01:00
  • 4e644408a5 llama: add llama_chat_apply_template ngxson 2024-02-16 16:27:05 +01:00
  • 2ce173f5cf
    llava: update surgery script to not remove tensors Daniel Bevenius 2024-02-16 15:20:22 +01:00
  • f104678afc common, examples, llama : optimize using reserve if possible Herman Semenov 2024-02-16 16:58:45 +03:00
  • d2819d5577
    scripts : add helpers script for bench comparing commits (#5521) Georgi Gerganov 2024-02-16 15:14:40 +02:00
  • 64dcb2835b fix make flags slaren 2024-02-16 14:08:32 +01:00
  • a148899966 set flags after checking the command line slaren 2024-02-16 14:06:25 +01:00
  • 4cb0727698
    llava : removed excess free(NULL) operation (#5531) Herman Semenov 2024-02-16 12:43:23 +00:00
  • c506e6a820
    ci : disable fatal warnings for windows, ios and tvos Georgi Gerganov 2024-02-16 14:41:20 +02:00
  • bb57ceba13 llava : removed excess free(NULL) operation Herman Semenov 2024-02-16 15:29:41 +03:00
  • c7d0b67a80 common : fixed critical UB inserting map size into himself Herman Semenov 2024-02-16 15:04:25 +03:00
  • c607df8b8a
    ggml : fix unreachable code warnings Georgi Gerganov 2024-02-16 13:46:15 +02:00
  • 65085c713e
    llama : minor fixed return int value (#5529) Herman Semenov 2024-02-16 11:45:48 +00:00
  • 7026c7bd53 llama : minor fixed return int value Herman Semenov 2024-02-16 14:38:10 +03:00
  • 6dcc02d244
    server : add "samplers" param to control the samplers order (#5494) Alexey Parfenov 2024-02-16 11:33:25 +00:00
  • 775aab17bb ggml, common, examples, tests : fixed type arguments in printf Herman Semenov 2024-02-16 14:29:22 +03:00
  • 6039d58513
    minor : fix compile warnings Georgi Gerganov 2024-02-16 13:27:05 +02:00
  • 34b7d5fb48
    cmake : minor Georgi Gerganov 2024-02-16 13:22:09 +02:00
  • 8bb09b61ba
    cmake : fix Georgi Gerganov 2024-02-16 13:19:29 +02:00
  • 5249985578 Major changes to server plus kvcache graphic pudepiedj 2024-02-16 11:16:13 +00:00
  • 736ff7c00a
    vulkan : fix compile warnings Georgi Gerganov 2024-02-16 13:16:00 +02:00
  • f5c8bb42ee
    server: add "samplers" param to control the samplers order ZXED 2024-02-13 21:44:10 +03:00
  • 1fd716b4dd
    cmake : fix (cont) Georgi Gerganov 2024-02-16 13:11:31 +02:00
  • 2b70e9e257
    cmake : fix VULKAN and ROCm builds Georgi Gerganov 2024-02-16 13:00:58 +02:00
  • 10d5e21a70
    Merge branch 'master' into HEAD Georgi Gerganov 2024-02-16 12:42:32 +02:00
  • 5b29c6823c
    attempt fix vulkan build Abhilash Majumder 2024-02-16 16:08:22 +05:30
  • 5f5808ca7b
    server : fix system prompt cli (#5516) Rőczey Barnabás 2024-02-16 11:00:56 +01:00
  • f486f6e1e5
    ggml : add numa options (#5377) bmwl 2024-02-16 01:31:07 -08:00
  • 60ed04cf82
    llava : fix clip-model-is-vision flag in README.md (#5509) Daniel Bevenius 2024-02-16 10:24:39 +01:00
  • 00a4ab627d
    scripts : detect CUDA Georgi Gerganov 2024-02-16 10:48:25 +02:00
  • 1657f92d2f
    llama : init kq_pos only if needed Georgi Gerganov 2024-02-16 10:41:38 +02:00
  • 57abd79f3c
    scripts : add helpers script for bench comparing commits Georgi Gerganov 2024-02-16 10:19:08 +02:00
  • 833490b16f
    metal : pre-compute ALiBi slopes Georgi Gerganov 2024-02-16 10:17:59 +02:00
  • ac91033ccb
    Merge branch 'master' into gg/refactor-alibi Georgi Gerganov 2024-02-16 10:12:28 +02:00
  • 594845aab1
    ci : fix BERT model download and convert Georgi Gerganov 2024-02-16 09:57:55 +02:00
  • 7e9de41d8e server : fix system prompt cli An0nie 2024-02-16 01:18:07 +01:00
  • 26ea983bf1 whitespace Jared Van Bortel 2024-02-15 17:27:18 -05:00
  • a3cf7bf60f removed redundant else from cli argument processing of --numa root 2024-02-15 22:21:09 +00:00
  • a5c9a5d52e
    Merge branch 'ggerganov:master' into master bmwl 2024-02-15 09:40:16 -08:00
  • 7d1f026a58
    Update ggml.c bmwl 2024-02-15 09:39:32 -08:00
  • 4524290e87
    Use correct type of pooling for embedding models (#5500) Douglas Hanley 2024-02-15 11:21:49 -06:00
  • c06e45d729
    clip : fix wrong loop condition Georgi Gerganov 2024-02-15 18:49:08 +02:00
  • 7fd024c5e9
    cuda : precompute ALiBi constants Georgi Gerganov 2024-02-15 18:25:48 +02:00
  • 34aa045de4 convert script fixes Douglas Hanley 2024-02-15 10:17:24 -06:00
  • d2b77cce91 small typing fix from linter Douglas Hanley 2024-02-15 09:51:16 -06:00
  • 9060a1e9df
    cuda : print message when initialization fails (#5512) slaren 2024-02-15 16:49:01 +01:00
  • 201d70b7db use CUDA_NAME both times slaren 2024-02-15 16:47:43 +01:00
  • da652113f1 unified ggml_numa_strategy enum and fixed text alignment in server.cpp example root 2024-02-15 15:35:32 +00:00
  • e4f9d9d848 cuda : print message when initialization fails slaren 2024-02-15 16:30:59 +01:00
  • 5de34f568a
    Merge branch 'ggerganov:master' into master bmwl 2024-02-15 07:18:20 -08:00
  • 377b58ffe0
    Update common/common.cpp bmwl 2024-02-15 07:15:05 -08:00
  • c847828c89
    Update examples/server/server.cpp bmwl 2024-02-15 07:13:41 -08:00