Commit graph

  • a710d58d88 Try fix quantized k-cache on ROCm ik/try_fix_rocm_k_cache Iwan Kawrakow 2024-03-21 20:18:50 +02:00
  • 00381b07bb avoid copying the entire vector Pierrick HYMBERT 2024-03-21 19:18:39 +01:00
  • 1892ae7eb1 llama_model_loader: PR feedbacks: - use only one gguf_context for metadata only - store all ggml_context in a vector as the files and mappings - store all weights in a vector along with the source tensor - rename ctx_gguf to meta - rename ctx_meta to contexts Pierrick HYMBERT 2024-03-21 19:11:37 +01:00
  • f5c758244c Changing from streams to using cstdio Clint Herron 2024-03-21 12:21:29 -04:00
  • f28bfa3876 Revising GBNF validator program to be much simpler. Clint Herron 2024-03-15 23:00:45 -04:00
  • 1fceeb9046 Fix Intel dequant issue 0cc4m 2024-03-21 18:34:19 +01:00
  • 6052e3b3a7 Fixed f_norm_rms_eps bug Julius Arkenberg 2024-03-21 16:58:51 +00:00
  • 751787e6f7 minor fixes Minsoo Cheong 2024-03-22 01:35:09 +09:00
  • cfe80d68d3 Fix params underscore convert to dash. DAN™ 2024-03-21 11:14:58 -04:00
  • 95612548a0
    Revert convert-hf-to-gguf to default options Julius Arkenberg 2024-03-21 15:34:38 +01:00
  • 956b609b0d
    Corrected typo to wrong file semidark 2024-03-21 15:22:45 +01:00
  • 99a163d6f7
    tests : disable system() calls Georgi Gerganov 2024-03-21 16:00:53 +02:00
  • 59b389f123 Add support for Grok model architecture Julius Arkenberg 2024-03-21 13:44:59 +00:00
  • a80267a110 Merge branch 'master' into sycl_readme_update Ouadie EL FAROUKI 2024-03-21 13:12:32 +00:00
  • 7e0eeaf257 added new end line Ouadie EL FAROUKI 2024-03-21 13:12:02 +00:00
  • 1d6112bace
    llama : pad n_ctx by 32 Georgi Gerganov 2024-03-21 14:55:15 +02:00
  • 654daa068c
    server : fix n_keep always showing as 0 in response Jan Boon 2024-03-21 20:49:11 +08:00
  • d0a71233fb
    cuda : disable host register by default (#6206) b2489 slaren 2024-03-21 19:54:28 +01:00
  • f372c49ccd
    Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
  • 924ce1dce7
    tests : disable system() calls (#6198) b2487 Georgi Gerganov 2024-03-21 16:20:05 +02:00
  • 03a8f8fafe
    cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
  • cfd3be76e3
    ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
  • 4f7e57a23f cuda : fix LLAMA_CUDA_F16 build slaren 2024-03-21 13:41:43 +01:00
  • d1cb8fedbd Maintain previous behaviour for igpu Aidan 2024-03-21 11:56:03 +00:00
  • 5b7b0ac8df
    json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
  • 68e4fed4d9 Now fix test-quantize-fns ik/fix_k_cache_backend_tests Iwan Kawrakow 2024-03-21 12:18:03 +01:00
  • 10aef3680f Fix batched impl Aidan 2024-03-19 16:07:41 +00:00
  • 60a87ae051 Merge branch 'master' into hp/split/load-model Pierrick HYMBERT 2024-03-21 11:48:58 +01:00
  • 30eef31b07 Make quantize_row_iq4_nl do the same thing is quantization on CUDA Iwan Kawrakow 2024-03-21 12:19:16 +02:00
  • 4c46aec60f json: nits ochafik 2024-03-21 10:11:26 +00:00
  • c26e7b87ce json: space before & refs for consistency ochafik 2024-03-21 10:07:47 +00:00
  • ad6c4755e0 json: iostream -> fprintf ochafik 2024-03-21 10:03:51 +00:00
  • 1943c01981
    ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
  • be6074a48d ci: fix indentation error Vaibhav Srivastav 2024-03-21 10:26:50 +01:00
  • 5e43ba8742
    build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
  • cd4a7c4cb4 Make quantize_row_iq4_nl do the same thing is quantization on CUDA Iwan Kawrakow 2024-03-21 10:37:38 +02:00
  • 50f9967043 add README Minsoo Cheong 2024-03-21 17:30:30 +09:00
  • eb760a9f8f add retrieval example Minsoo Cheong 2024-03-21 16:18:27 +09:00
  • 76aa30a263
    Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) b2481 Kawrakow 2024-03-21 08:27:57 +01:00
  • 9740a227dc
    minor : remove trailing whitespaces Georgi Gerganov 2024-03-21 08:34:55 +02:00
  • c5b8595e3f
    Add nvidia and amd backends (#6157) b2480 AidanBeltonS 2024-03-21 06:10:52 +00:00
  • 18ff6ca847 split: move llama_tensor_offset to llama_model_loader Pierrick HYMBERT 2024-03-21 07:06:14 +01:00
  • b8feff411f
    Avoir copying the entire vector Pierrick Hymbert 2024-03-21 04:36:06 +01:00
  • 5f33a675ca perplexity : make hellaswag and multiple-choice outputs identical to master Francis Couture-Harpin 2024-03-20 22:48:19 -04:00
  • 7d8d6b589f llama : handle errors from llama_output_reserve at call sites Francis Couture-Harpin 2024-03-20 22:23:46 -04:00
  • 42e21c6882
    cuda : fix conflict with std::swap (#6186) b2479 slaren 2024-03-21 01:47:46 +01:00
  • a5b2aa58cf
    return function call in OAI format -- tools_call field Yingbei 2024-03-20 15:52:00 -07:00
  • b8c0025c6d Update server.feature ochafik 2024-03-20 22:12:06 +00:00
  • 9260350384 json: fix zig build ochafik 2024-03-20 22:03:58 +00:00
  • ac3637142d formatting changes. Julia Longtin 2024-03-20 21:34:12 +00:00
  • 7c64fef91b split: support in llama_model_loader Pierrick HYMBERT 2024-03-19 13:42:37 +01:00
  • 76e66e77c2 use the same header as ggml.c, and remove some warnings. Julia Longtin 2024-03-20 21:12:22 +00:00
  • 24e5039f9d Add debug info which device is allocating memory 0cc4m 2024-03-20 21:44:09 +01:00
  • 2005a7c577 cuda : fix conflict with std::swap slaren 2024-03-20 21:38:31 +01:00
  • d0600d91e9 json: avoid using namespace std ochafik 2024-03-20 20:26:33 +00:00
  • ee27148629 remove intrinsics import, and use upConv to save 12 bytes of memory transit. Julia Longtin 2024-03-20 20:15:16 +00:00
  • df00efbba1 json: fix naming of top-level c++ function (+ drop unused one) ochafik 2024-03-20 20:09:10 +00:00
  • 6dcf856259 Merge remote-tracking branch 'origin/master' into json-fixes ochafik 2024-03-20 20:05:11 +00:00
  • 1c51f98adc
    cuda : print the returned error when CUDA initialization fails (#6185) b2478 slaren 2024-03-20 21:03:26 +01:00
  • 2f7f7c02d8 cuda : print the returned error when CUDA initialization fails slaren 2024-03-20 20:00:24 +01:00
  • 9e1bda9315 k_cache: add newly added supported types to llama-bench and CUDA supports_op Iwan Kawrakow 2024-03-20 19:39:15 +01:00
  • 131c74cf7e forward contribution credits for building the workflow. Nicolas Patry 2024-03-20 18:57:49 +01:00
  • d8a498dcbe k_cache: be able to use IQ4_NL on Metal Iwan Kawrakow 2024-03-20 18:05:09 +01:00
  • 9711e1eed2 k_cache: be able to use IQ4_NL - just CUDA for now Iwan Kawrakow 2024-03-20 18:47:18 +02:00
  • d68030b820 k_cache: be able to use Q5_1 on Metal Iwan Kawrakow 2024-03-20 17:14:25 +01:00
  • fef4a23e2c k_cache: be able to use Q5_0 on Metal Iwan Kawrakow 2024-03-20 17:06:40 +01:00
  • 5d8822b096 k_cache: be able to use Q5_1 on CODA Iwan Kawrakow 2024-03-20 17:33:14 +02:00
  • 5e09ce41a8 k_cache: be able to use Q5_0 Iwan Kawrakow 2024-03-20 17:16:14 +02:00
  • 05642a3f33 Initial commit - add mac prebuilds. Vaibhav Srivastav 2024-03-20 17:40:46 +01:00
  • 5a6c8db97f
    Merge branch 'master' into patch-2 Ziang Wu 2024-03-20 23:31:28 +08:00
  • f9c7ba3447
    llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00
  • 32c03bb73b
    Update MobileVLM-README.md Ziang Wu 2024-03-20 23:29:37 +08:00
  • d925266ef6
    Update MobileVLM-README.md Ziang Wu 2024-03-20 23:27:28 +08:00
  • 272935b281
    llava : add MobileVLM_V2 backup (#6175) b2476 Ziang Wu 2024-03-20 23:02:32 +08:00
  • 10ee30f1b8 json: indent 4 spaces Olivier Chafik 2024-03-20 14:47:21 +00:00
  • 712b5d6344
    metal : require ne00 >= 128 for mat-mat kernels Georgi Gerganov 2024-03-20 16:36:17 +02:00
  • 7628bd8c76 json: move json.hpp & json-schema-to-grammar.{cpp,h} to common Olivier Chafik 2024-03-20 14:35:10 +00:00
  • 82ae7f3357 fused attention kernel for batch size 1 Johannes Gäßler 2024-03-19 21:04:28 +01:00
  • ccf58aa3ec
    cuda : refactor to remove global resources (#6170) b2475 slaren 2024-03-20 14:42:59 +01:00
  • 79aad2ffa9
    Merge branch 'ggerganov:master' into master Ziang Wu 2024-03-20 21:22:37 +08:00
  • aead12d896
    fix deifinition mistake in clip.cpp Ziang Wu 2024-03-20 21:17:59 +08:00
  • fa7c6ddd30 Merge branch 'master' into sycl_readme_update OuadiElfarouki 2024-03-20 12:33:23 +00:00
  • 91f8ad167d
    Server: version bump for httplib and json (#6169) b2474 Xuan Son Nguyen 2024-03-20 13:30:36 +01:00
  • 37a30aa05f Merge remote-tracking branch 'origin/master' into sl/cuda-refactor-1 slaren 2024-03-20 13:22:35 +01:00
  • 427f9af59e minor slaren 2024-03-20 13:22:30 +01:00
  • 6b7e76d28c
    gitignore : ignore curl-related files Georgi Gerganov 2024-03-20 14:17:34 +02:00
  • bc0baab2ea
    server : allow to override -ngl in tests (#6170) Georgi Gerganov 2024-03-20 14:14:32 +02:00
  • d795988d9e
    Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)" b2471 Georgi Gerganov 2024-03-20 13:29:49 +02:00
  • f8c4e745e1
    llava : add a MobileVLM_V2-1.7B backup (#6152) Ziang Wu 2024-03-20 19:20:37 +08:00
  • fab912f56c
    clip : fix whitespace Georgi Gerganov 2024-03-20 13:20:19 +02:00
  • 47cc7a7bf9
    Server: Handle n_keep parameter in the request (#6174) Karthick 2024-03-20 16:32:34 +05:30
  • 2fa1e045aa Merge branch 'master' into xsn/server-lib-version-bump ngxson 2024-03-20 11:08:33 +01:00
  • 66247e4cf7 bring back content_length ngxson 2024-03-20 11:08:21 +01:00
  • 3e67baab9e Server: Handle n_keep parameter in the request Karthick Jeyapal 2024-03-20 15:26:52 +05:30
  • 2605c139a6
    Update build.yml fraxy-v 2024-03-20 08:59:38 +02:00
  • 3e9d3dbff9
    Update build.yml fraxy-v 2024-03-20 08:50:46 +02:00
  • bd60d82d0c
    server tests : more pythonic process management; fix bare except: (#6146) b2468 Jared Van Bortel 2024-03-20 01:33:49 -04:00
  • 6c8c71af16 server: tests: add new tokens regex following new repeat penalties default changed in (#6127) Pierrick HYMBERT 2024-03-20 06:12:19 +01:00
  • bab4ad88ff server: tests: add new tokens regex following new repeat penalties default changed in (#6127) Pierrick HYMBERT 2024-03-20 06:02:51 +01:00
  • df03b2d20f server: tests: remove the hack on windows since now we get the good socket family Pierrick HYMBERT 2024-03-20 05:54:22 +01:00