Commit graph

  • 5296c96ca8
    group norm Georgi Gerganov 2024-12-10 20:33:29 +02:00
  • 6ef14091c0
    first conv Georgi Gerganov 2024-12-10 19:18:04 +02:00
  • aac7e04953
    extract features Georgi Gerganov 2024-12-10 18:23:10 +02:00
  • ff2ea75fb4
    wip Georgi Gerganov 2024-12-10 16:31:02 +02:00
  • f169965158
    llama : add OuteTTS support (wip) Georgi Gerganov 2024-12-10 14:40:03 +02:00
  • e65556f174
    server : do not normalize embeddings when there is no pooling Georgi Gerganov 2024-12-17 13:36:32 +02:00
  • 1b18b2d7b0
    server : be explicit about the pooling type in the tests Georgi Gerganov 2024-12-17 11:45:18 +02:00
  • 06e85401b0
    server : output embeddings for all tokens when pooling = none Georgi Gerganov 2024-12-17 10:56:20 +02:00
  • 89eaf5036a
    server : add "tokens" output Georgi Gerganov 2024-12-16 21:03:24 +02:00
  • c0cca53d85 Merge branch 'master' into xsn/fix_logprobs Xuan Son Nguyen 2024-12-18 12:46:50 +01:00
  • 50b3813319 rebuild Xuan Son Nguyen 2024-12-18 12:41:05 +01:00
  • 152610eda9
    server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +02:00
  • 2dcdd483d4
    server : remove rebase artifact Georgi Gerganov 2024-12-18 12:43:04 +02:00
  • 84bbd366c1 tests: disable GGUF test for bad value size Johannes Gäßler 2024-12-18 11:12:11 +01:00
  • 126883acf2
    Merge a6648b9df7 into 0e70ba686e Georgi Gerganov 2024-12-18 10:56:29 +01:00
  • 600cebc9a8
    server : update readme [no ci] Georgi Gerganov 2024-12-18 11:55:28 +02:00
  • 2a5510ed82
    tests : update server tests Georgi Gerganov 2024-12-18 11:33:46 +02:00
  • 87df60166d
    server : fixes Georgi Gerganov 2024-12-18 11:13:29 +02:00
  • 3a7c001fe3
    server : update readme Georgi Gerganov 2024-12-17 16:12:15 +02:00
  • 7e693f92d7
    server : do not normalize embeddings when there is no pooling Georgi Gerganov 2024-12-17 13:36:32 +02:00
  • abf33e2017
    server : update /embeddings and /v1/embeddings endpoints Georgi Gerganov 2024-12-17 15:59:55 +02:00
  • 2a94c33028
    server : be explicit about the pooling type in the tests Georgi Gerganov 2024-12-17 11:45:18 +02:00
  • 2dea48758e
    server : fix spacing [no ci] Georgi Gerganov 2024-12-17 11:37:08 +02:00
  • d424afac5f
    server : update readme [no ci] Georgi Gerganov 2024-12-17 11:01:29 +02:00
  • 07946a3a30
    server : output embeddings for all tokens when pooling = none Georgi Gerganov 2024-12-17 10:56:20 +02:00
  • 44eeb6a88e
    server : add "tokens" output Georgi Gerganov 2024-12-16 21:03:24 +02:00
  • 0e70ba686e
    server : add "tokens" output (#10853) b4354 Georgi Gerganov 2024-12-18 11:05:29 +02:00
  • 46828872c3
    server : (embeddings) using same format for "input" and "content" (#10872) b4353 Xuan Son Nguyen 2024-12-18 09:55:09 +01:00
  • 6b064c92b4
    docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -08:00
  • 92e41ec4b9 Update log to only print when input and output characters are different Billel Mokeddem 2024-12-18 08:20:28 +00:00
  • 99cb6be1d3
    server : remove "tokens" from the OAI endpoint Georgi Gerganov 2024-12-18 10:16:46 +02:00
  • 1dae1d884f ggml: Show detected features with GGML_NATIVE Adrien Gallouët 2024-12-17 12:11:30 +01:00
  • 7eb81e1603 ggml: GGML_NATIVE uses -mcpu=native on ARM Adrien Gallouët 2024-12-10 11:08:02 +00:00
  • 5bf29af841
    tests : improve "tokens" type check Georgi Gerganov 2024-12-18 10:02:01 +02:00
  • fe9235d795 Force max subgroup size for coopmat shaders 0cc4m/vulkan-coopmat-amd-windows 0cc4m 2024-12-10 20:27:04 +00:00
  • d8d2f370dc Add a log message to better track the when the following line of code is triggered Billel Mokeddem 2024-12-18 07:23:35 +00:00
  • b3d022aa1a Add comment explaining the logic behind the if statement Billel Mokeddem 2024-12-18 05:46:07 +00:00
  • fc055407b7 Add fix for adding bos to added special tokens Billel Mokeddem 2024-12-18 04:58:00 +00:00
  • 36423273dc docs: Fix HIP (née hipBLAS) in README Brian 'redbeard' Harrington 2024-12-17 20:48:15 -08:00
  • a20dde36ff
    SYCL: reg_get_proc_address func, update to the current func signature Akarshan Biswas 2024-12-18 09:20:52 +05:30
  • 82ce602ee7
    SYCL: Use GGML_SYCL_DEBUG after reverting Akarshan Biswas 2024-12-18 09:19:43 +05:30
  • eeb04751d9
    Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" Akarshan Biswas 2024-12-18 09:11:17 +05:30
  • bfa0298900 server: avoid overwriting Authorization header Gaetan Bisson 2024-12-17 16:08:48 -10:00
  • 4da69d1abd
    Revert "llama : add Falcon3 support (#10864)" (#10876) b4351 Diego Devesa 2024-12-18 01:36:46 +01:00
  • e10dc009b5
    Revert "llama : add Falcon3 support (#10864)" Diego Devesa 2024-12-17 23:25:36 +01:00
  • d62b532c52
    Use model->gguf_kv for loading the template instead of using the C API. (#10868) b4350 DAN™ 2024-12-17 17:24:22 -05:00
  • 2e04ccf4e6 llama_server_response_fields nvrxq 2024-12-18 01:21:44 +03:00
  • 101e772c73 fix test Xuan Son Nguyen 2024-12-17 22:28:53 +01:00
  • d4b9ec098b handle empty input case Xuan Son Nguyen 2024-12-17 21:50:36 +01:00
  • a2d4b6fc81 llama: Ensure KV cache is fully defragmented. Jesse Gross 2024-12-13 16:11:59 -08:00
  • 9a566806f0 fix test case Xuan Son Nguyen 2024-12-17 21:36:50 +01:00
  • d4e0bad0ae server : (embeddings) using same format for "input" and "content" Xuan Son Nguyen 2024-12-17 21:33:29 +01:00
  • 8bcfc5551e
    server : return tokens ids only if requested Georgi Gerganov 2024-12-17 21:44:09 +02:00
  • 52bfa235e3 Use model->gguf_kv for efficiency. DAN™ 2024-12-17 14:00:45 -05:00
  • bf51f65a1c Improve progress bar Eric Curtin 2024-12-13 22:46:13 +00:00
  • 081b29bd2a
    tests: add tests for GGUF (#10830) b4349 Johannes Gäßler 2024-12-17 19:09:35 +01:00
  • 919fe432c3 Bump model_template to 16384 bytes to support larger chat templates. DAN™ 2024-12-17 11:02:26 -05:00
  • 5437d4aaf5
    sync : ggml b4348 Georgi Gerganov 2024-12-17 18:36:02 +02:00
  • 78f766768d
    cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +02:00
  • 8dd19a4812
    vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +09:00
  • 130d0c90bd
    ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +01:00
  • 3919da8e33
    ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +01:00
  • 0006f5a74a
    ggml : update ggml_backend_cpu_device_supports_op (#10867) b4343 Georgi Gerganov 2024-12-17 18:35:42 +02:00
  • 4fbb801a9d
    ggml : update ggml_backend_cpu_device_supports_op gg/cpu-fix-cpy-iq Georgi Gerganov 2024-12-17 18:09:02 +02:00
  • fe67caaca5 docs: update link to ramalama on readme Charlie Drage 2024-12-17 11:07:38 -05:00
  • 8cc7145cc7
    ggml : disable tests involving i-matrix quantization Georgi Gerganov 2024-12-17 18:03:47 +02:00
  • 05c3a444b8
    server : fill usage info in embeddings and rerank responses (#10852) b4342 krystiancha 2024-12-17 16:00:24 +00:00
  • b0597b1493
    ggml : fix cpy op for IQ-quants to use reference impl Georgi Gerganov 2024-12-17 17:54:04 +02:00
  • 382bc7f2e8
    llama : add Falcon3 support (#10864) b4341 Billel Mokeddem 2024-12-17 19:24:56 +04:00
  • 88cc9719c4 server : fill usage info in reranking response Krystian Chachuła 2024-12-16 14:45:06 +01:00
  • 357a7bac41 server : fill usage info in embeddings response Krystian Chachuła 2024-12-16 14:42:41 +01:00
  • 38725ef6da server : add bad input handling in embeddings Krystian Chachuła 2024-12-17 13:04:02 +01:00
  • d2b1a41a2c
    Merge 4c7195e839 into 4f51968aca Ilan F. S. Theodoro 2024-12-17 11:16:40 +01:00
  • 4f51968aca
    readme : update typos (#10863) Ruan 2024-12-17 17:47:20 +08:00
  • d146334c11 Add Falcon3 model support Billel Mokeddem 2024-12-17 09:46:19 +00:00
  • 8f1330666c
    readme : update typos Ruan 2024-12-17 17:29:50 +08:00
  • 227d7c5a7f
    server : (UI) fix missing async generator on safari (#10857) Xuan Son Nguyen 2024-12-17 09:52:09 +01:00
  • 6ad1f8dae9 fix Xuan Son Nguyen 2024-12-17 09:45:32 +01:00
  • 7b1ec53f56
    vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809) b4338 Eve 2024-12-17 05:52:55 +00:00
  • 5ba1fda5cf server : (UI) fix missing async generator on safari Xuan Son Nguyen 2024-12-17 00:38:08 +01:00
  • d5f69e8a43 fixes to position embeddings Sukriti-Sharma4 2024-12-16 15:28:09 -07:00
  • 22bea1d791 vulkan: optimize coopmat2 dequant functions Jeff Bolz 2024-12-07 13:21:10 -06:00
  • 160bc039c8
    rwkv6: add wkv6 support for Vulkan backend (#10829) b4337 Zhiyuan Li 2024-12-17 05:00:46 +08:00
  • d58f8a1b6b
    server : update readme Georgi Gerganov 2024-12-16 21:05:19 +02:00
  • 79a8176883
    server : add "tokens" output Georgi Gerganov 2024-12-16 21:03:24 +02:00
  • aa13d69905 fix erros in EditorConfig Checker Zhiyuan Li 2024-12-16 19:34:17 +08:00
  • 353c5f8c7b add uma support Zhiyuan Li 2024-12-16 18:43:22 +08:00
  • 08ea539df2
    unicode : improve naming style (#10838) Georgi Gerganov 2024-12-16 12:31:45 +02:00
  • 644fd71b44
    sampling : refactor + optimize penalties sampler (#10803) Georgi Gerganov 2024-12-16 12:31:14 +02:00
  • 6ea605ddfc add [[unroll]] and remove unnecessary conditions Zhiyuan Li 2024-12-16 17:35:44 +08:00
  • b58ebf30ae
    webui : update Georgi Gerganov 2024-12-16 11:25:17 +02:00
  • e27c711981
    llama : minor Georgi Gerganov 2024-12-15 11:36:25 +02:00
  • 60d26ded4b
    readme : restore hint about --ignore-eos flag [no ci] Georgi Gerganov 2024-12-13 14:04:09 +02:00
  • 685c84c35e
    common : move back the penalties at the front of the sampling chain Georgi Gerganov 2024-12-13 12:54:10 +02:00
  • 1ff9296253
    common : ignore all EOG tokens Georgi Gerganov 2024-12-12 22:50:34 +02:00
  • 97261aa216
    common : by default, move the penalties at the end of the sampling chain Georgi Gerganov 2024-12-12 22:29:09 +02:00
  • 9847a375f3
    params : allow penalty_last_n == -1 to be equal to context size Georgi Gerganov 2024-12-12 21:55:20 +02:00
  • a04a5b526b
    batched : remove penalties sampler Georgi Gerganov 2024-12-12 21:33:28 +02:00
  • 58a5c3bb0f
    common : apply ignore_eos as logit bias Georgi Gerganov 2024-12-12 21:22:33 +02:00
  • 0a1f7fb66d
    sampling : refactor + optimize penalties sampler Georgi Gerganov 2024-12-12 20:39:16 +02:00