Commit graph

  • a0e584defd
    imatrix : fix wname for mul_mat_id ops (#6271) Georgi Gerganov 2024-03-24 16:18:45 +02:00
  • bc3d6db862 separate filling aux16 from consuming aux16 by making it an array of vectors. Julia Longtin 2024-03-24 14:18:08 +00:00
  • ca0dc26704 loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors. Julia Longtin 2024-03-24 13:35:05 +00:00
  • 7aed0ffe68
    Fixed lookup compilation issues on Windows (#6273) b2521 Johannes Gäßler 2024-03-24 14:21:17 +01:00
  • cf481cf901 promote aux8 into a vector. Julia Longtin 2024-03-24 12:50:01 +00:00
  • 169a145409 fix our reference to src in the second place, and use a more accurate comment. Julia Longtin 2024-03-24 12:41:21 +00:00
  • c28bfe4552 spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src. Julia Longtin 2024-03-24 12:37:47 +00:00
  • 901f9f58d7 also filter tensor names in mul_mat_id ops slaren 2024-03-24 13:35:28 +01:00
  • d7fa7a615e fix lint Minsoo Cheong 2024-03-24 21:35:19 +09:00
  • ecdcb67a92 remove seed setting Minsoo Cheong 2024-03-24 21:34:20 +09:00
  • 09eeea66c6 fix usage printing Minsoo Cheong 2024-03-24 21:32:43 +09:00
  • 0cacdecd13 fix merge conflict Minsoo Cheong 2024-03-24 21:28:17 +09:00
  • 3cb7567607 Merge branch 'master' into add-retrieval-example Minsoo Cheong 2024-03-24 21:27:49 +09:00
  • 0796bcbb30 add usage description in README Minsoo Cheong 2024-03-24 21:21:03 +09:00
  • ba4f4129b3 better comments, and fix some small errors. Julia Longtin 2024-03-24 12:17:06 +00:00
  • 3073b380cb use vector for query_emb Minsoo Cheong 2024-03-24 21:13:38 +09:00
  • 5d02fe94bd fix --context-file option to be provided multiple times for multiple files Minsoo Cheong 2024-03-24 21:11:10 +09:00
  • 03a3e0eb7a perform 16 operations at a time. Julia Longtin 2024-03-24 12:04:44 +00:00
  • 56b7db971e define retrieval-only parameters in retrieval.cpp Minsoo Cheong 2024-03-24 20:48:38 +09:00
  • 00efa65eb1 add minicpm chat template ngxson 2024-03-24 12:41:15 +01:00
  • e425810bb6
    tests : add hs=256 Georgi Gerganov 2024-03-24 12:21:41 +02:00
  • ea279d5609
    ci : close inactive issue, increase operations per run (#6270) b2520 Pierrick Hymbert 2024-03-24 09:57:06 +01:00
  • 586e7bc561
    sampling : deduplicated code for probability distribution access (#6240) Minsoo Cheong 2024-03-24 17:54:07 +09:00
  • cbac91b7cf Fixed lookup compilation issues on Windows Johannes Gäßler 2024-03-24 09:38:37 +01:00
  • 04b7384b2f Fix heap corruption from wmode out-of-bound writes on windows Flipbook 2024-03-24 01:33:29 -07:00
  • d3690068de
    imatrix : fix wname for mul_mat_id ops Georgi Gerganov 2024-03-24 08:58:38 +02:00
  • f7fc8f0f8d ci: close inactive issue, increase operations per run Pierrick HYMBERT 2024-03-24 07:18:28 +01:00
  • ddf6568510
    [SYCL] offload op (#6217) b2518 Meng, Hengyu 2024-03-24 12:04:25 +08:00
  • 0f304d9b58 cuda : refactor into multiple files slaren 2024-03-23 02:38:35 +01:00
  • 5f8a87d752 remove sycl part from common backend Meng, Hengyu 2024-03-24 02:43:01 +00:00
  • bb38278e6a change function name to llama_sampling_prepare Minsoo Cheong 2024-03-24 11:16:29 +09:00
  • d03224ac98
    Support build win release for SYCL (#6241) b2517 Neo Zhang Jianyu 2024-03-24 09:44:01 +08:00
  • 038a851d85 flake.lock: Update github-actions[bot] 2024-03-24 00:17:59 +00:00
  • 5935bb34f4 use proper mov operator, and pass addresses. Julia Longtin 2024-03-23 23:46:36 +00:00
  • 94d1b3b411
    use _wfopen instead of fopen on Windows (#6248) b2516 Jared Van Bortel 2024-03-23 18:48:02 -04:00
  • a5132a1507 attempt our first FMA. Julia Longtin 2024-03-23 22:16:57 +00:00
  • 4477b8e123 add I32 vector memory clearing. Julia Longtin 2024-03-23 21:16:23 +00:00
  • ea1edb0600 promote aux32 to a vector. Julia Longtin 2024-03-23 21:12:35 +00:00
  • f967690a41 add missing address of operators. Julia Longtin 2024-03-23 21:05:50 +00:00
  • 2fdd11fe3a promote aux16 to a vector. Julia Longtin 2024-03-23 21:00:51 +00:00
  • f09b3ed79e use quotes properly. Julia Longtin 2024-03-23 20:53:16 +00:00
  • bb5eb95816 use better memory save operator. Julia Longtin 2024-03-23 20:49:11 +00:00
  • 9d7ca41703 expand mask, and align memory. Julia Longtin 2024-03-23 20:48:43 +00:00
  • bd6d7e6238 try to use vectorized zeroing function. Julia Longtin 2024-03-23 19:55:12 +00:00
  • f985372e3a add missing variable. Julia Longtin 2024-03-23 19:49:16 +00:00
  • 31d4f9312b copy right block. Julia Longtin 2024-03-23 19:47:21 +00:00
  • 95562175f8
    gitignore : gguf-split Georgi Gerganov 2024-03-23 21:35:23 +02:00
  • d05c13b3b9 llama : fix BPE LF token on MSVC ceb/fix-win-unicode-fpaths Jared Van Bortel 2024-03-23 14:03:16 -04:00
  • f482bb2e49
    common: llama_load_model_from_url split support (#6192) b2514 Pierrick Hymbert 2024-03-23 18:07:00 +01:00
  • 1997577d5e
    server: docs: --threads and --threads, --ubatch-size, --log-disable (#6254) Pierrick Hymbert 2024-03-23 18:00:38 +01:00
  • 72d4eb5e5c
    spacing Pierrick Hymbert 2024-03-23 17:51:34 +01:00
  • 34a7665ee7
    spacing Pierrick Hymbert 2024-03-23 17:51:21 +01:00
  • 4da00c1484
    spacing Pierrick Hymbert 2024-03-23 17:51:06 +01:00
  • 476b0251b2
    llama : add grok-1 support (#6204) Julius Arkenberg 2024-03-23 17:41:53 +01:00
  • 9a9e6cde66
    llama : minor Georgi Gerganov 2024-03-23 18:41:10 +02:00
  • e43a63e7c6 fix typo. Julia Longtin 2024-03-23 16:29:30 +00:00
  • bdef0ec2cb server: tests: enable back Release test on PR Pierrick HYMBERT 2024-03-23 17:28:05 +01:00
  • f092a10dc9 promote aux16 into a vector. (part three) Julia Longtin 2024-03-23 16:27:11 +00:00
  • c72157a5a6 promote aux16 into a vector. Julia Longtin 2024-03-23 16:24:11 +00:00
  • e3503c924a promote aux16 into a vector. Julia Longtin 2024-03-23 16:21:20 +00:00
  • edb76ffddb formatting improvement. Julia Longtin 2024-03-23 16:19:17 +00:00
  • 21cad01b6e
    split: add gguf-split in the make build target (#6262) Pierrick Hymbert 2024-03-23 17:18:13 +01:00
  • 6face8a0be first fixes. Julia Longtin 2024-03-23 15:56:47 +00:00
  • 0a2051aa88 attempt to speed up float clearing. Julia Longtin 2024-03-23 15:55:00 +00:00
  • a5d66ad5e1 split: add gguf-split in the make build target Pierrick HYMBERT 2024-03-23 16:52:11 +01:00
  • 0b012c03ef allow using code from ggml-phi-knc-dot_q5_K_q8_K.c Julia Longtin 2024-03-23 15:02:56 +00:00
  • 0b3f17127f force to compile. Julia Longtin 2024-03-23 14:58:33 +00:00
  • 18f353987c tell ggml-common.h to export what we want. Julia Longtin 2024-03-23 14:49:35 +00:00
  • cd20404250 pull in ggml specific types. Julia Longtin 2024-03-23 14:38:15 +00:00
  • 8f57803f58 import stdio.h for size_t. Julia Longtin 2024-03-23 14:29:59 +00:00
  • 9bcb8350d5 import stdint.h for sizeSt. Julia Longtin 2024-03-23 14:28:29 +00:00
  • a7bd64c130 begin work on targeting dot_q5_K_q8_K. Julia Longtin 2024-03-23 14:19:47 +00:00
  • 9c2de3b6eb
    Merge 01d17ff137 into 1b26aebe4d Christian Gollwitzer 2024-03-23 21:42:35 +08:00
  • 4f0e9000fe typo xfwu 2024-03-23 20:51:40 +08:00
  • f9dc033797 fix param definitions Minsoo Cheong 2024-03-23 21:48:23 +09:00
  • e16279ed0e assign n_batch value to n_ubatch Minsoo Cheong 2024-03-23 21:41:22 +09:00
  • 62c99fd8d7 Description: [bugfix] building error with Windows+ROCm5.7+Cmake xfwu 2024-03-23 20:08:12 +08:00
  • 1b26aebe4d
    server: flush stdout after logging in both text and json layout (#6253) b2510 Pierrick Hymbert 2024-03-23 13:18:45 +01:00
  • 52d7f44823 common: move llama_download_hide_password_in_url inside llama_download_file as a lambda Pierrick HYMBERT 2024-03-23 12:57:08 +01:00
  • b4a2ed8585 server: tests: add split tests, and HF options params Pierrick HYMBERT 2024-03-23 12:53:30 +01:00
  • c534980d20 server: docs: --threads and --threads, --ubatch-size, --log-disable Pierrick HYMBERT 2024-03-23 11:00:22 +01:00
  • 2187f34b4a server: flush stdout after logging in both text and json layout Pierrick HYMBERT 2024-03-23 10:49:54 +01:00
  • 3ba5f2d124 common: clean up curl if file cannot be loaded in gguf Pierrick HYMBERT 2024-03-23 10:37:46 +01:00
  • 8187983e60 common: use a constant for max url length Pierrick HYMBERT 2024-03-23 09:44:20 +01:00
  • 4fa1c63bf4 llama: llama_model_loader fix log Pierrick HYMBERT 2024-03-23 09:35:07 +01:00
  • 0c2aa1a249 leave the schedule to ggml_backend_sched entirely Meng, Hengyu 2024-03-23 08:29:54 +00:00
  • 08a0c132a2 server: support HF URL options Pierrick HYMBERT 2024-03-23 09:24:25 +01:00
  • 4b9f3b432b remove no USM methods Meng, Hengyu 2024-03-23 08:18:32 +00:00
  • dc3469ee99 common: minor comment Pierrick HYMBERT 2024-03-23 09:11:32 +01:00
  • c7d4db3227 common: change max url max length Pierrick HYMBERT 2024-03-23 09:09:55 +01:00
  • 7d819d088e fix error message printing Minsoo Cheong 2024-03-23 17:05:27 +09:00
  • fbcf2ab99f common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition Pierrick HYMBERT 2024-03-23 09:02:47 +01:00
  • 7c6364425a common: EOL EOF Pierrick HYMBERT 2024-03-23 08:59:31 +01:00
  • ddb13ed6be llama: llama_split_prefix fix strncpy does not include string termination common: llama_load_model_from_url: - fix header name case sensitive - support downloading additional split in parallel - hide password in url Pierrick HYMBERT 2024-03-23 08:55:09 +01:00
  • 789a194da1 print error on insufficient batch size Minsoo Cheong 2024-03-23 14:33:03 +09:00
  • 6f4fd8f114 use wide versions of file path functions on Windows Jared Van Bortel 2024-03-21 17:03:08 -04:00
  • 50ccaf5eac
    lookup: complement data from context with general text statistics (#5479) b2509 Johannes Gäßler 2024-03-23 01:24:36 +01:00
  • 63d03a9d76 fixup! lookup: evaluation tools, use corpus/previous gens Johannes Gäßler 2024-03-23 00:38:07 +01:00
  • e04cf1a24c fixup! lookup: evaluation tools, use corpus/previous gens Johannes Gäßler 2024-03-23 00:34:59 +01:00
  • 79fd89a62b
    minor fix to address tools_call output format Yingbei 2024-03-22 15:41:44 -07:00