Commit graph

  • 55ef24a341
    Update common.cpp cpumaxx 2024-04-05 11:19:30 -07:00
  • 4a4f3993e7 move include, reject bom as well Jan Boon 2024-04-06 00:59:34 +08:00
  • f2a4777d4a strict filename validation Jan Boon 2024-04-06 00:44:37 +08:00
  • b57af0c9dd
    metal : initial FA vec kernel Georgi Gerganov 2024-04-05 17:47:01 +03:00
  • f8d709f01a
    metal : simplify Georgi Gerganov 2024-04-05 16:29:29 +03:00
  • 1b496a745c
    [SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464) b2612 Ouadie EL FAROUKI 2024-04-05 14:35:06 +01:00
  • c4dff1ec91
    metal : reduce registers Georgi Gerganov 2024-04-05 16:24:10 +03:00
  • ea2b79534e ggml : group all experts in a single ggml_mul_mat_id cuda : improve mmid row copy slaren 2024-04-03 21:03:45 +02:00
  • e51778de5e
    metal : switch to parallel reduce Georgi Gerganov 2024-04-05 16:10:15 +03:00
  • 96f07c769e gguf.py: add licence and version to gguf writer brian khuu 2024-04-05 23:30:10 +11:00
  • 2423c29c17 Modify mat mat mul shader for mul_mat_id, modify mat vec mul shaders for single call batch operation 0cc4m 2024-04-05 14:16:53 +02:00
  • ec613b856c Add line S 2024-04-05 12:48:43 +01:00
  • 2204a6f47d
    Update proprietary description Hoang Nguyen 2024-04-05 18:32:51 +07:00
  • 5733b00e53
    metal : opt Georgi Gerganov 2024-04-05 14:26:28 +03:00
  • 8d2a61f068
    metal : opts Georgi Gerganov 2024-04-05 13:57:54 +03:00
  • 5eab7454dd
    metal : support more than 1 warps Georgi Gerganov 2024-04-05 13:41:00 +03:00
  • d15898481a
    metal : add BS=1 kernel for flash attention (wip) Georgi Gerganov 2024-04-05 13:29:48 +03:00
  • 13759014f6
    Add MindMac to UI list Hoang Nguyen 2024-04-05 17:09:33 +07:00
  • 1e66c3a7b2 implement automatic max ngl detection Yui 2024-04-05 11:59:38 +02:00
  • f3532ff80c Whitespace S 2024-04-05 10:37:25 +01:00
  • 553b09ba8f Fix embedding layer based on Noeda's example S 2024-04-05 10:35:19 +01:00
  • 89961dea87
    Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-05 09:44:12 +03:00
  • 0b1ef49fbf bench: update doc for batched bench Ting Sun 2024-04-05 09:11:41 +07:00
  • fa4802f983 bench: make n_batch and n_ubatch configurable Ting Sun 2024-04-05 09:09:32 +07:00
  • 8789e17f8f ci: bench: fix finish reason rate Pierrick HYMBERT 2024-04-05 01:30:44 +02:00
  • b6b50b11f9 ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics Pierrick HYMBERT 2024-04-05 01:30:24 +02:00
  • c354db751e Export new tensors in 1D so they are not quantized. S 2024-04-05 00:10:07 +01:00
  • 59dc4bbb99 ci: bench: fix case when there is no token generated Pierrick HYMBERT 2024-04-05 00:53:08 +02:00
  • a37696d4f1 speculative : more robust tokenizer comparison ceb/bert-tokenizer-fixes Jared Van Bortel 2024-04-04 18:25:19 -04:00
  • 92591c125f examples : rely on new behavior of add_special Jared Van Bortel 2024-04-04 18:12:33 -04:00
  • 3694026669 ci: bench: remove total pp and tg as it is not accurate Pierrick HYMBERT 2024-04-05 00:17:48 +02:00
  • 1534d903d7 ci: bench: README.md EOL Pierrick HYMBERT 2024-04-05 00:13:24 +02:00
  • 713fa9867e ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode Pierrick HYMBERT 2024-04-05 00:01:08 +02:00
  • d1a1b614cd spm : fix special_add_bos default Jared Van Bortel 2024-04-04 17:54:46 -04:00
  • 45983e3a47 convert : remove now-unused ignore_nonllama parameter Jared Van Bortel 2024-04-04 17:44:58 -04:00
  • e4b2e2d339 Loading works up to LayerNorm2D S 2024-04-04 22:30:17 +01:00
  • 909f6be291 convert scripts : fix python 3.8 compatibility Jared Van Bortel 2024-04-04 17:14:46 -04:00
  • 6a9d3c0911 convert : fix Tensor type annotations Jared Van Bortel 2024-04-04 17:07:07 -04:00
  • 2e86228a9b
    Update ggml-phi-knc-dot_q5_K_q8_K.c Julia Longtin 2024-04-04 21:24:08 +01:00
  • 8422686d0a
    Update ggml-phi-knc.c Julia Longtin 2024-04-04 21:23:55 +01:00
  • 0d052cbe39 Merge branch 'master' into ceb/bert-tokenizer-fixes Jared Van Bortel 2024-04-04 16:02:31 -04:00
  • 8803582721 llama : handle added special tokens like HF does Jared Van Bortel 2024-03-27 16:59:49 -04:00
  • 748fc8baa3 convert-hf-to-gguf : fix BERT abuse of LlamaHfVocab Jared Van Bortel 2024-03-27 16:13:09 -04:00
  • fbab98497b Add Command R Plus GGUF S 2024-04-04 18:27:54 +01:00
  • 2efcd87b12 Add Command R Plus GGUF S 2024-04-04 18:23:23 +01:00
  • a307375c02
    readme : add Dot to UI list (#6487) b2611 alexpinel 2024-04-04 18:22:50 +01:00
  • b660a5729e
    readme : fix typo (#6481) Jun Jie 2024-04-05 01:16:37 +08:00
  • 0a1d889e27
    server: add cURL support to server Dockerfiles (#6474) Ed Lepedus 2024-04-04 17:31:22 +01:00
  • 7dda1b727e
    ci: exempt master branch workflows from getting cancelled (#6486) b2608 Minsoo Cheong 2024-04-05 01:30:53 +09:00
  • 3bb70c394d apply to bench.yml Minsoo Cheong 2024-04-05 01:19:54 +09:00
  • c500c03a29
    Add project to README.md alexpinel 2024-04-04 17:08:41 +01:00
  • 68ac639d27 ci: exempt master branch workflows from getting cancelled Minsoo Cheong 2024-04-05 00:57:24 +09:00
  • 31b0d99598 test pr Minsoo Cheong 2024-04-05 00:37:38 +09:00
  • c666ba26c3
    build CI: Name artifacts (#6482) Ewout ter Hoeven 2024-04-04 17:08:55 +02:00
  • 1e9e53e514
    Merge e7552a4d78 into 2e66913e5f Mike DuPont 2024-04-04 17:05:06 +02:00
  • 2e66913e5f
    server: allow penalizing repetition of newlines on server webpage (#6431) Shakhar Dasgupta 2024-04-04 11:03:00 -04:00
  • 8120efee1d
    ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478) Pierrick Hymbert 2024-04-04 16:59:04 +02:00
  • 613a8260c6
    build CI: Name artifacts Ewout ter Hoeven 2024-04-04 16:55:46 +02:00
  • 8db1e4d45f llama : use std::find for seq_nodes in llama_rs_cache Francis Couture-Harpin 2024-04-04 10:46:43 -04:00
  • 8f4b7f7edb typo error in README file junnjiee 2024-04-04 22:31:53 +08:00
  • a74401f0e5
    Correct README link (#6458) limitedAtonement 2024-04-04 10:30:02 -04:00
  • 4b54e71f4d Cleaning up debug message to make a bit more sense. Clint Herron 2024-04-04 10:03:44 -04:00
  • d24dabc99f Reorganizing tests for readability. Clint Herron 2024-04-04 09:55:28 -04:00
  • 0bf5704b47 Comment cleanup. Clint Herron 2024-04-04 09:52:40 -04:00
  • b930945fba Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. Clint Herron 2024-04-04 09:50:02 -04:00
  • b7264d6efd Fixing whitespace errors and cleaning error message alert to be clearer. Clint Herron 2024-04-04 09:43:14 -04:00
  • 345eae3021 Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. Clint Herron 2024-04-04 02:32:09 -04:00
  • e6bb23285e
    fix typo in server-vulkan.Dockerfile Ed Lepedus 2024-04-04 13:36:48 +01:00
  • 42f31ca12c server: add cURL support to server-vulkan.Dockerfile elepedus 2024-04-04 11:49:37 +01:00
  • a2a6f15d08 server: add cURL support to server-intel.Dockerfile elepedus 2024-04-04 11:47:20 +01:00
  • e55ca1bad6 server: add cURL support to full-rocm.Dockerfile and server-rocm.Dockerfile elepedus 2024-04-04 11:43:02 +01:00
  • a1a539fada ci: bench fix concurrency for workflow trigger dispatch with sha1 Pierrick HYMBERT 2024-04-04 13:41:49 +02:00
  • d9fd0d7eb8
    Merge branch 'master' into feature/save-restore-seq Jan Boon 2024-04-04 19:21:40 +08:00
  • b14f96af4e server: add cURL support to full-cuda.Dockerfile and server-cuda.Dockerfile elepedus 2024-04-04 11:36:06 +01:00
  • 7a2c92637a
    ci: bench: add more ftype, fix triggers and bot comment (#6466) Pierrick Hymbert 2024-04-04 11:57:58 +02:00
  • 94cfcd432c server: add cURL support to full.Dockerfile elepedus 2024-04-04 10:39:48 +01:00
  • 205c44c212
    readme : update API changes date Georgi Gerganov 2024-04-04 11:44:26 +03:00
  • 4bcd6b959c
    common: remove duplicate check for curl (#6471) Daniel Bevenius 2024-04-04 09:49:21 +02:00
  • 9b84ae1806
    examples : add GBNF validator program (#5948) Clint Herron 2024-04-04 03:44:28 -04:00
  • e0f2a1bd9b
    common: remove duplicate check for curl Daniel Bevenius 2024-04-04 07:19:19 +02:00
  • 4399f13fb9
    server : remove obsolete --memory-f32 option Georgi Gerganov 2024-04-04 09:34:58 +03:00
  • 1a43c7254e
    server : add option to disable KV offload (#6468) Xiao-Yong Jin 2024-04-04 01:33:48 -05:00
  • 72d73af651
    convert : fix for lint error complaining of bare except (#6470) Clint Herron 2024-04-04 02:32:53 -04:00
  • ebae7a96dd Fix for lint error complaining of bare except during CI builds. Clint Herron 2024-04-04 00:33:42 -04:00
  • ecef370e75 server: add option to disable KV offload Xiao-Yong Jin 2024-04-03 22:23:27 -05:00
  • 271104c65c wip: llama : separate recurrent states from the KV cache Francis Couture-Harpin 2024-04-03 11:07:16 -04:00
  • 53773e0b4a replace tabs with spaces. Julia Longtin 2024-04-03 23:42:34 +00:00
  • 9152143fe7 reformat, and label what these files are. Julia Longtin 2024-04-03 23:21:24 +00:00
  • 9ad5efafb0 use GGML_F32_EPR, and remove some dead code. Julia Longtin 2024-04-03 22:04:45 +00:00
  • 84df774d6a whoops. missing tab. Julia Longtin 2024-04-03 21:58:29 +00:00
  • 9412572205 add Makefile rule for generation .s file, for manual inspection. Julia Longtin 2024-04-03 20:30:25 +00:00
  • 6f67ea886f formatting changes. Julia Longtin 2024-04-03 20:24:00 +00:00
  • 5fb1574c81
    A few small fixes to server's README docs (#6428) Fattire 2024-04-03 13:22:57 -07:00
  • 8685c2cda2 Fix trailing spaces Pierrick HYMBERT 2024-04-03 21:59:30 +02:00
  • 64c7534b00 ci: bench: add per slot metric in the commit status Pierrick HYMBERT 2024-04-03 21:33:41 +02:00
  • 04e1ce3498 Add iteration in the commit status, reduce again the autocomment Pierrick HYMBERT 2024-04-03 21:24:06 +02:00
  • 96fdd214c8 indent headers consistently. Julia Longtin 2024-04-03 19:01:18 +00:00
  • a380b95274 ci: bench: artefact name perf job Pierrick HYMBERT 2024-04-03 20:25:50 +02:00
  • 60cdf40cc3
    server : handle exception on wrong type in request (#6452) JH23X 2024-04-03 20:09:52 +02:00
  • bb43cf7e9d
    llama : add SEA-LION support (#6448) bryanSwk 2024-04-04 02:05:10 +08:00