Commit graph

  • e2f6704d6c fixup! lookup: evaluation tools, use corpus/previous gens Johannes Gäßler 2024-03-22 23:34:02 +01:00
  • 5070d0a0f8 fixup! lookup: evaluation tools, use corpus/previous gens Johannes Gäßler 2024-03-22 23:28:03 +01:00
  • a51a401dcd lookup: evaluation tools, use corpus/previous gens JohannesGaessler 2024-02-11 12:33:23 +01:00
  • 14eebe23fc ggml : fix missing #defines before windows.h Jared Van Bortel 2024-03-21 17:29:47 -04:00
  • a04bdfb4fa
    Fallback to tokenizer.json if vocab.json does not exist Sigbjørn Skjæret 2024-03-22 22:10:16 +01:00
  • abdc8ea34a
    llama : fix grok rope type Georgi Gerganov 2024-03-22 22:18:47 +02:00
  • 56a00f0a2f
    common : default --hf-file to --model (#6234) b2508 Georgi Gerganov 2024-03-22 21:10:39 +02:00
  • 92397d87a4
    convert-llama2c-to-ggml : enable conversion of GQA models (#6237) fraxy-v 2024-03-22 20:49:06 +02:00
  • 1d0331c12a
    quantize: options for output and token embedding tensors qtype (#6239) Kawrakow 2024-03-22 19:47:14 +01:00
  • dba1af6129
    llama_model_loader: support multiple split/shard GGUFs (#6187) Pierrick Hymbert 2024-03-22 19:00:01 +01:00
  • ee804f6223
    ci: apply concurrency limit for github workflows (#6243) Minsoo Cheong 2024-03-23 02:15:06 +09:00
  • eb7828a3d8 ci: limit concurrency for github workflows Minsoo Cheong 2024-03-23 01:31:03 +09:00
  • 09532120e0
    ggml : fix CPU soft_max Georgi Gerganov 2024-03-22 17:49:42 +02:00
  • 0a243da7d4 fixes based on review @cebtenzzre Minsoo Cheong 2024-03-23 00:24:01 +09:00
  • 27f2e85520 fix original_logits allocation Minsoo Cheong 2024-03-23 00:14:25 +09:00
  • 3a468e6f9f
    llama : fix type of KQ_mask and KQ_pos gg/flash-attn-rebase Georgi Gerganov 2024-03-22 17:12:17 +02:00
  • 3eb40f3c0c fix format Jianyu Zhang 2024-03-22 23:07:30 +08:00
  • fddd201942 free original_logits Minsoo Cheong 2024-03-22 23:56:50 +09:00
  • 29d877e9ed sampling: remove duplicated code for probability distribution access Minsoo Cheong 2024-03-22 23:48:37 +09:00
  • 5deee33184 fix error Jianyu Zhang 2024-03-22 22:45:35 +08:00
  • a3dbd17c58
    Merge 9419190533 into 80bd33bc2c Branden Butler 2024-03-22 17:42:43 +03:00
  • 9495d3982d
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-03-22 16:34:34 +02:00
  • 0e826d12a5 quantize: be able to specify the token embedding tensor type ik/quantize_not_repeating Iwan Kawrakow 2024-03-22 16:27:34 +02:00
  • b3f3b8f076 fix error Jianyu Zhang 2024-03-22 22:25:54 +08:00
  • 7883796f71 quantize: be able to specify the output tensor type Iwan Kawrakow 2024-03-22 16:11:34 +02:00
  • 764c7afee7 fix llama_split_prefix ngxson 2024-03-22 15:10:52 +01:00
  • 0fa2dc2563
    Merge branch 'master' into master fraxy-v 2024-03-22 16:09:08 +02:00
  • c6ebf055e2
    convert-llama2c-to-ggml: enable conversion of multiqueries, #5608 (#1) fraxy-v 2024-03-22 16:04:29 +02:00
  • 8657af927f fix value Jianyu Zhang 2024-03-22 22:02:08 +08:00
  • 6ef692c18e fix value Jianyu Zhang 2024-03-22 22:00:34 +08:00
  • d7d8d4a876 fix value Jianyu Zhang 2024-03-22 21:59:02 +08:00
  • 0a02cfe0c5 support release win Jianyu Zhang 2024-03-22 21:54:07 +08:00
  • 1f3875995f llama_model_loader: put mapping in a unique_ptr from the moment it is allocated Pierrick HYMBERT 2024-03-22 14:44:07 +01:00
  • 80bd33bc2c
    common : add HF arg helpers (#6234) b2503 Georgi Gerganov 2024-03-22 15:33:38 +02:00
  • 8c3d5b5a79
    common : remove defaults gg/hf-args Georgi Gerganov 2024-03-22 15:33:24 +02:00
  • e80f06d2a1
    llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) b2502 Nexesenex 2024-03-22 14:32:02 +01:00
  • 12aa74ba7d
    minor : spacing patch-1 Georgi Gerganov 2024-03-22 15:24:57 +02:00
  • f77a8ffd3b
    tests : conditional python & node json schema tests (#6207) b2501 Olivier Chafik 2024-03-22 13:09:07 +00:00
  • 72114edf06
    json-schema-to-grammar : fix order of props + non-str const/enum (#6232) Olivier Chafik 2024-03-22 13:07:44 +00:00
  • 2f0e81e053
    cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208) b2499 slaren 2024-03-22 14:05:31 +01:00
  • 1b2f0a9ee8
    common : add HF arg helpers Georgi Gerganov 2024-03-22 14:32:36 +02:00
  • 29ab270e65
    readme : add RecurseChat to the list of UIs (#6219) Xiaoyi Chen 2024-03-22 04:29:49 -07:00
  • f616b38b6b docs: add model shard in hot topic Pierrick HYMBERT 2024-03-22 12:12:13 +01:00
  • 6b8bb3a31d
    server : fix n_keep always showing as 0 in response (#6211) b2497 Jan Boon 2024-03-22 19:12:05 +08:00
  • 68e210b354
    server : enable continuous batching by default (#6231) b2496 Georgi Gerganov 2024-03-22 13:08:28 +02:00
  • 49c99c5bac json: support non-string const / enums ochafik 2024-03-22 09:27:03 +00:00
  • 62d0b3d194 json: ws nits ochafik 2024-03-22 09:00:18 +00:00
  • f00b0b936a json: ordered json in server/schema converter to respect orig order ochafik 2024-03-22 08:59:45 +00:00
  • b3e94f26ba
    metal : proper assert for mat-mat memory alignment (#6225) b2495 Georgi Gerganov 2024-03-22 11:35:53 +02:00
  • 31f2d03f1b
    server : enable continuous batching by default gg/enable-cb-default Georgi Gerganov 2024-03-22 11:16:43 +02:00
  • dbc35acff0
    llama : introduce some typedef helpers Georgi Gerganov 2024-03-22 10:58:42 +02:00
  • 60c75080ef json: print env vars in test ochafik 2024-03-22 08:49:28 +00:00
  • 8326607cfe
    llama : minor Georgi Gerganov 2024-03-22 10:17:34 +02:00
  • 072c56fcdb
    metal : fix the fix gg/metal-dequant-align Georgi Gerganov 2024-03-22 09:58:22 +02:00
  • b2075fd6a5
    ci : add CURL flag for the mac builds (#6214) b2494 Vaibhav Srivastav 2024-03-22 08:53:43 +01:00
  • 3966d68127
    readme : add notice about the bug fix Georgi Gerganov 2024-03-22 09:50:07 +02:00
  • 2f8be164ad
    metal : proper assert for mat-mat memory alignment Georgi Gerganov 2024-03-22 09:47:56 +02:00
  • 95d576b48e
    metal : pad n_ctx by 32 (#6177) b2493 Georgi Gerganov 2024-03-22 09:36:03 +02:00
  • 59c17f02de
    add blog link (#6222) Neo Zhang Jianyu 2024-03-22 15:19:37 +08:00
  • e474e456eb llama_split_prefix: use a clearer version, not pass split path len but dest max len. Pierrick HYMBERT 2024-03-22 07:48:50 +01:00
  • fb31d14926 add blog link Jianyu Zhang 2024-03-22 14:42:56 +08:00
  • 4c04400969 llama_model_loader: fix map -> unordered map Pierrick HYMBERT 2024-03-22 07:07:00 +01:00
  • 208e1f03c3 store similarities in separate vector Minsoo Cheong 2024-03-22 15:03:34 +09:00
  • b19af3643f llama_model_loader: be sure the model mappings has enough capacity before allocating backend buffer Pierrick HYMBERT 2024-03-22 07:03:14 +01:00
  • a9e88c6e57 llama_model_loader: immediately add the backend buffer to the model buffers in order to free them if an error occurs in the next allocation. Reserve the expected size. Pierrick HYMBERT 2024-03-22 06:59:04 +01:00
  • ec372c66a4 llama_model_loader: use at instead of operator[] if this should never add to the map. Pierrick HYMBERT 2024-03-22 06:52:00 +01:00
  • 9940df4f11 llama_model_loader: ensure mappings vector has the expected size Pierrick HYMBERT 2024-03-22 06:51:21 +01:00
  • 7cbe1eac78 llama_model_loader: if n_tensors declared not equals to loaded tensors in split, throw an exception instead of asserting Pierrick HYMBERT 2024-03-22 06:48:15 +01:00
  • d33a015749 remove use of variable sized array Minsoo Cheong 2024-03-22 14:47:37 +09:00
  • 2c77fe484b
    Update README.md Xiaoyi Chen 2024-03-21 21:25:48 -07:00
  • c1e55756b2 cast filepos on print Minsoo Cheong 2024-03-22 13:08:47 +09:00
  • 5e548af244 fix order Meng, Hengyu 2024-03-21 19:21:51 -07:00
  • 72eb4ba2f5 add offload_op in sycl Meng, Hengyu 2024-03-22 10:13:29 +08:00
  • fa046eafbc
    Fix params underscore convert to dash. (#6203) b2491 DAN™ 2024-03-21 21:32:42 -04:00
  • 4522501efa
    Update common/common.cpp slaren 2024-03-22 02:32:29 +01:00
  • cbb3e2f629 json: ensure py/js schema conv tested on ubuntu-focal-make ochafik 2024-03-22 01:00:42 +00:00
  • ef3565fe0d json: orange warnings when tests skipped ochafik 2024-03-22 00:53:20 +00:00
  • 1a179bfc4e
    fix loop over pointer Pierrick Hymbert 2024-03-22 00:38:23 +01:00
  • 0fd652eba7
    spacing Pierrick Hymbert 2024-03-22 00:37:01 +01:00
  • f9a29735fc llama_model_loader: fail if any of backend buffer cannot be allocated Pierrick HYMBERT 2024-03-22 00:25:11 +01:00
  • 6df9757ad6 llama_model_loader: minor, use same variable name for consistency, fix spacing in types cast Pierrick HYMBERT 2024-03-21 23:26:45 +01:00
  • 69bdee939a llama_model_loader: only map tensors included in the context Pierrick HYMBERT 2024-03-21 21:42:30 +01:00
  • 078a1aca06 llama_model_loader: map file to backend buffer if the allocation succeeds only Pierrick HYMBERT 2024-03-21 21:33:14 +01:00
  • be07a03217
    server : update readme doc from slot_id to id_slot (#6213) Jan Boon 2024-03-22 06:41:24 +08:00
  • 3cf8de4939
    update notebook Yingbei 2024-03-21 15:08:29 -07:00
  • 6539c3178b Add CURL flags for the mac builds. Vaibhav Srivastav 2024-03-21 22:17:42 +01:00
  • 02020b0463 fix mmap buffer management slaren 2024-03-21 22:06:37 +01:00
  • e0849e404e
    server : update readme doc from slot_id to id_slot Jan Boon 2024-03-22 04:57:40 +08:00
  • 9185e14922 be more specific about the length of our list of run amounts. Julia Longtin 2024-03-21 20:38:49 +00:00
  • d8b567d254 llama_model_loader: fail if backend cannot allocate buffer Pierrick HYMBERT 2024-03-21 21:05:15 +01:00
  • 81ce9df3ee Fix whitespaces Julius Arkenberg 2024-03-21 19:59:15 +00:00
  • 01708f7b70 add LLAMA_CUDA_NO_PEER_COPY to HIP build slaren 2024-03-21 20:51:18 +01:00
  • 1c931f3d4f
    Handle optional tensors Pierrick Hymbert 2024-03-21 20:50:28 +01:00
  • c34a5deee8
    Simplify this by making these optional, switch some layer creation tensor optional Pierrick Hymbert 2024-03-21 20:50:11 +01:00
  • 6e205252b4 Adding final newline character. Clint Herron 2024-03-21 15:35:11 -04:00
  • 54e252ea5b
    correction of the attn.v.weight quantization for IQ3_XS Nexesenex 2024-03-21 20:00:14 +01:00
  • cfbf76b1e8 cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy slaren 2024-03-21 19:46:24 +01:00
  • 3c8a37a05f json: only attempt python & node schema conversion tests if their bins are present ochafik 2024-03-21 18:45:46 +00:00
  • 0979522fbe spacing changes. Julia Longtin 2024-03-21 18:36:25 +00:00
  • 3d9a179275 cuda : disable host register by default slaren 2024-03-21 19:33:08 +01:00