Commit graph

  • c34239476a
    nix: .#widnows: init hutli 2024-02-15 14:25:04 +01:00
  • 3a0345970e
    make : whitespace Georgi Gerganov 2024-03-27 15:02:49 +02:00
  • 843bc30c34 clean up arg parse ngxson 2024-03-27 13:39:43 +01:00
  • 12b255487f split by max size ngxson 2024-03-27 12:57:34 +01:00
  • 1e13987fba
    embedding : show full embedding for single prompt (#6342) howlger 2024-03-27 12:15:44 +01:00
  • 9a8649625a
    Update examples/embedding/embedding.cpp Georgi Gerganov 2024-03-27 13:15:28 +02:00
  • ab1c46a7bf
    respond error in case there's no space in the kv cache Jan Boon 2024-03-27 19:11:47 +08:00
  • d2f63d6c5f
    embedding : show full embedding for single prompt howlger 2024-03-27 12:10:07 +01:00
  • 5462817851
    remove trailing whitespace Jan Boon 2024-03-27 18:49:00 +08:00
  • 662aaea8c9
    llama : save and restore kv cache for single seq id Jan Boon 2024-03-27 16:56:35 +08:00
  • 6be02b5969
    cuda : fix build gg/flash-attn-wip Georgi Gerganov 2024-03-27 10:31:52 +02:00
  • 013721df2b
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-03-27 10:24:09 +02:00
  • 4e6df37d12
    enable with rebase Abhilash Majumder 2024-03-27 13:48:40 +05:30
  • e82f9e2b83
    [SYCL] Fix batched impl for NVidia GPU (#6164) b2548 AidanBeltonS 2024-03-27 08:16:40 +00:00
  • 3f11848eb7 fix set main gpu crash Jianyu Zhang 2024-03-27 16:07:06 +08:00
  • cbc8343619
    Make IQ1_M work for QK_K = 64 (#6327) Kawrakow 2024-03-27 08:44:27 +01:00
  • e562b9714b
    common : change --no-penalize-nl to --penalize-nl (#6334) Sigbjørn Skjæret 2024-03-27 08:23:10 +01:00
  • 8c07b8f85f
    Merge branch 'ggerganov:master' into iq2_s Abhilash Majumder 2024-03-27 12:48:53 +05:30
  • 2ab4f00d25
    llama2c : open file as binary (#6332) Georgi Gerganov 2024-03-27 09:16:02 +02:00
  • 1740d6dd4e
    readme : add php api bindings (#6326) Mateusz Charytoniuk 2024-03-27 08:08:59 +01:00
  • 731001f1fd
    readme : add link to PR Georgi Gerganov 2024-03-27 09:08:46 +02:00
  • ff4ace5e48
    disable to check perf Abhilash Majumder 2024-03-27 12:36:57 +05:30
  • 62397b7757 doc: add doc for ubatch-size Ting Sun 2024-03-27 13:34:26 +07:00
  • 0642b22cd1
    server: public: use relative routes for static files (#6325) b2543 Eric Zhang 2024-03-27 13:55:29 +08:00
  • cec6481ae0
    retrigger CI Abhilash Majumder 2024-03-27 11:23:04 +05:30
  • e4a16f2493 llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers. root 2024-03-27 04:22:09 +00:00
  • c8d4b6b54e doc: fix outdated default value of batch size Ting Sun 2024-03-27 10:46:07 +07:00
  • 871a135bd3 iq2s and other quant logic add abhilash1910 2024-03-26 20:42:51 -07:00
  • 19772fab9c add condition for iq2s abhilash1910 2024-03-26 20:36:02 -07:00
  • 3c0b830808
    Merge branch 'ggerganov:master' into master hxer7963 2024-03-27 11:17:44 +08:00
  • 69aaa3d78b revert logic abhilash1910 2024-03-26 19:39:48 -07:00
  • a4f569e8a3
    [SYCL] fix no file in win rel (#6314) b2542 Neo Zhang Jianyu 2024-03-27 09:47:06 +08:00
  • 3c49d9387a Add slash to dir options and replace slashes with backslash on windows when loading file trollkotze 2024-03-27 01:37:26 +01:00
  • f2ba723bd9
    rename back Yingbei 2024-03-26 16:42:23 -07:00
  • 2a1345eb23
    rename to add files Yingbei 2024-03-26 16:41:55 -07:00
  • 4b12a7e651
    Update documentation too Sigbjørn Skjæret 2024-03-27 00:30:14 +01:00
  • d4897432a1 Restrict control vectors to predefined options trollkotze 2024-03-27 00:15:52 +01:00
  • 568bdfc281
    Change --no-penalize-nl to --penalize-nl Sigbjørn Skjæret 2024-03-27 00:09:04 +01:00
  • eaa0f3d065
    the finish reason for function calling should be Yingbei 2024-03-26 15:26:09 -07:00
  • 711fda99c6
    update notebook for easier comparison Yingbei 2024-03-26 15:11:49 -07:00
  • a837649711 style fix: strlen(str) == 0 --> *str == 0 Mikko Juola 2024-03-26 14:54:18 -07:00
  • 32c8486e1f
    wpm : portable unicode tolower (#6305) b2541 Jared Van Bortel 2024-03-26 17:46:21 -04:00
  • cd7b5f7f78 Make tokenizer.cpp CLI tool nicer. Mikko Juola 2024-03-20 15:04:44 -07:00
  • 6eae8bf5c3 utils.hpp: make from_json utility for llama_control_vector_load_info static trollkotze 2024-03-26 21:08:23 +01:00
  • 2506fed8e8 Don't double-apply CORS header in POST /control-vectors trollkotze 2024-03-26 20:56:58 +01:00
  • 80508e1ef5 Access-Control-Allow-Origin header for GET /control-vectors trollkotze 2024-03-26 20:34:48 +01:00
  • b0d0bdd07b iq1_m: QK_K = 64 seems to work on Metal and ARM_NEON Iwan Kawrakow 2024-03-26 19:19:05 +01:00
  • 5c953a1a15 iq1_m: make it work for QK_K = 64 (scalar and AVX2) Iwan Kawrakow 2024-03-26 20:03:11 +02:00
  • d674812474
    Update ggml-quants.c Sigbjørn Skjæret 2024-03-26 18:58:59 +01:00
  • 7b9e8726d1 Routes for hot-reloading and reading current vector composition trollkotze 2024-03-26 18:57:06 +01:00
  • bd9f6b9dcf log time measurements trollkotze 2024-03-26 18:52:31 +01:00
  • d0304f7656 llama_control_vector_load: free gguf_context before ggml_context Anon 2024-03-26 01:28:55 +00:00
  • 9914014e17 llama_control_vector_load: free contexts on successful exit Anon 2024-03-26 01:28:34 +00:00
  • 181879f942 llama_control_vector_load: let gguf_init_from_file allocate the ggml_context Anon 2024-03-26 01:28:18 +00:00
  • e1939bc869 iq1_m: make it work for QK_K = 64 (WIP) Iwan Kawrakow 2024-03-26 18:39:11 +01:00
  • 915be46ffc
    Merge branch 'master' into Nexesenex-IQ1_XS-IQ1_S-quant-strategies Nexesenex 2024-03-26 17:49:36 +01:00
  • a02c09229f
    add php bindings to readme Mateusz Charytoniuk 2024-03-26 17:36:25 +01:00
  • e9377baf7a add conditions abhilash1910 2024-03-26 09:19:06 -07:00
  • d4b182ccd5 refine condition abhilash1910 2024-03-26 09:05:02 -07:00
  • 87a6088ffe rename unicodedata.{cpp,h} to unicode-data.{cpp,h} ceb/wpm-portable-tolower Jared Van Bortel 2024-03-26 10:52:33 -04:00
  • 557410b8f0
    llama : greatly reduce output buffer memory usage (#6122) b2540 compilade 2024-03-26 10:46:41 -04:00
  • 20248e80cd readme : update recent API changes, and warn about Vulkan Francis Couture-Harpin 2024-03-26 10:28:19 -04:00
  • 55c1b2a3bb
    IQ1_M: 1.75 bpw quantization (#6302) Kawrakow 2024-03-26 15:21:27 +01:00
  • 6e4cef5d0c iq1_M: PR comments Iwan Kawrakow 2024-03-26 15:46:11 +02:00
  • 40490f43cf
    server: public: use relative routes for static files EZForever 2024-03-26 21:31:04 +08:00
  • 599a4b2cc6
    Update llama.cpp - switch from IQ4_XS to Q4_K in related cases. Nexesenex 2024-03-26 13:41:16 +01:00
  • e097633f63
    convert-hf : fix exception in sentencepiece with added tokens (#6320) b2538 Pedro Cuenca 2024-03-26 13:32:19 +01:00
  • d25b1c31b0
    quantize : be able to override metadata by key (#6321) Kawrakow 2024-03-26 13:09:30 +01:00
  • 9c5fd6be14
    minor : spacing ik/quantize_with_kv_overrides Georgi Gerganov 2024-03-26 14:09:02 +02:00
  • 1c1f876994
    typo on the step name Pierrick Hymbert 2024-03-26 11:11:32 +01:00
  • fc4c2a6fc3 quantize: be able to override metadata by key Iwan Kawrakow 2024-03-26 11:53:42 +02:00
  • 7fca39e80c [convert-hf] Fix exception in sentencepiece Pedro Cuenca 2024-03-26 10:35:49 +01:00
  • 876b70d9eb Fix tokenizer, permute tensors Pedro Cuenca 2024-03-26 10:26:12 +01:00
  • eaf9571d9b
    Update llama.cpp - exception for the IQ2_S token embedding error Nexesenex 2024-03-26 10:11:46 +01:00
  • deb7240100
    embedding : adjust n_ubatch value (#6296) b2536 Minsoo Cheong 2024-03-26 18:11:46 +09:00
  • 3d032ece8e
    server : add n_discard parameter (#6300) Jan Boon 2024-03-26 16:47:43 +08:00
  • a11a72cea3
    Merge branch 'ggerganov:master' into master hxer7963 2024-03-26 16:38:09 +08:00
  • 458c1d16b0 * gguf-py: remove redundant logs * llama: remove the init_mapping_prefetch custom parameter root 2024-03-26 08:32:30 +00:00
  • d1839362fc
    Update llama.cpp - remove trailing space Nexesenex 2024-03-26 09:17:09 +01:00
  • 337c13b226 ci: bench: fix mermaid values, markdown generated Pierrick HYMBERT 2024-03-26 08:39:30 +01:00
  • bff4644f49 ci: bench: fix typo Pierrick HYMBERT 2024-03-26 08:20:28 +01:00
  • fb3b2f5eb1 ci: bench: fix duration Pierrick HYMBERT 2024-03-26 08:13:32 +01:00
  • 225f63bacc ci: bench: trigger build Pierrick HYMBERT 2024-03-26 08:10:39 +01:00
  • 5c2f8e6bfb ci: bench: more resilient, more metrics Pierrick HYMBERT 2024-03-26 08:07:08 +01:00
  • ada101ef2a explicit add conditions fp32 abhilash1910 2024-03-26 00:01:05 -07:00
  • 5027d81f0a
    llama : minor Georgi Gerganov 2024-03-26 08:49:49 +02:00
  • e33c6f442e
    llama.cpp: fix typo in source file llama.cpp zhou.weiguo 2024-03-26 13:50:10 +08:00
  • cdb2d65c8e cuda: assert -> NO_DEVICE_CODE Iwan Kawrakow 2024-03-26 06:29:48 +02:00
  • 9a5786e939 iq1_m: use common definition of iq1m_scale_t Iwan Kawrakow 2024-03-26 06:16:02 +02:00
  • b68f32b391 iq1_m: fix Windows ARM Iwan Kawrakow 2024-03-26 05:54:03 +02:00
  • 04c6a7b46b fix no file in win rel for sycl Zhang 2024-03-26 11:31:58 +08:00
  • e9095aca20 llama : allow loading state saved with a different ctx size Francis Couture-Harpin 2024-03-25 23:13:50 -04:00
  • 6e1fbf87b0 Indentation trollkotze 2024-03-26 04:09:37 +01:00
  • 2258098c49 Revert "use %ld instead of %lld" Minsoo Cheong 2024-03-26 11:13:14 +09:00
  • 544b447696 Merge branch 'embedding-assign-n_ubatch-value,-print-error-on-n_batch-overflow' of github.com:mscheong01/llama.cpp into embedding-assign-n_ubatch-value,-print-error-on-n_batch-overflow Minsoo Cheong 2024-03-26 10:56:28 +09:00
  • ea753ede90 use %ld instead of %lld Minsoo Cheong 2024-03-26 10:56:18 +09:00
  • 694dcd98a3 server: fix system_tokens being erased in kv_cache; MasterYi 2024-03-26 09:46:55 +08:00
  • 62c1f5b681
    Update llama.cpp typo Nexesenex 2024-03-26 02:25:07 +01:00
  • f162b2ef3f
    Update llama.cpp - correction embd.weight GQA-4 & qkv.weight to K-Quants Nexesenex 2024-03-26 02:22:04 +01:00
  • eaec0b8748
    some clean up Yingbei 2024-03-25 18:11:21 -07:00