Commit graph

  • 996b35a0ad remove useless backend check Meng, Hengyu 2024-06-14 09:23:48 +00:00
  • 069369f3fe
    fix masking in __compute_fp32_to_bf16 Sigbjørn Skjæret 2024-06-14 11:06:21 +02:00
  • 3cd9404cc0 rpc : throw an exception when the RPC endpoint is unreachable Radoslav Gerganov 2024-06-14 11:47:48 +03:00
  • b30565e0c8 rpc : enable async operations Radoslav Gerganov 2024-06-13 09:57:24 +03:00
  • 7a8961fff5 delete redundant Eddie-Wang1120 2024-06-14 12:30:27 +08:00
  • 65765c9ea9 iq2_xxs netrunnereve 2024-06-13 23:42:21 -04:00
  • f3ce371243 fix: divide 0 exception in mamba thxCode 2024-06-14 11:38:57 +08:00
  • 75370d779e iq1_s netrunnereve 2024-06-13 23:05:06 -04:00
  • 7a5d932eaf move changes from local to BertModel wheelspawn 2024-06-13 21:59:23 -05:00
  • fe59684e32
    review: modify codes as review comments zhou.weiguo 2024-06-14 10:54:15 +08:00
  • a0c5a0e82f fix line wheelspawn 2024-06-13 14:48:06 -05:00
  • f5e2558f3b fix issue #7924 wheelspawn 2024-06-13 14:45:46 -05:00
  • edb1cca353 Revert "fix issue #7924" wheelspawn 2024-06-13 14:40:43 -05:00
  • 42c90d21ca fix issue #7924 wheelspawn 2024-06-13 14:37:23 -05:00
  • 88cc7d7878 some fixes Hamdoud Hakem 2024-06-13 20:20:22 +01:00
  • 07530a8dce Fix unicode whitespaces (deepseek-coder) jaime-m-p 2024-06-13 20:43:42 +02:00
  • 974d40b513 Fix 'jina-v2' per token attributes jaime-m-p 2024-06-13 20:40:56 +02:00
  • f58de3174e update brute force random test jaime-m-p 2024-06-13 20:39:55 +02:00
  • 80ba2aef4a try CI fix Johannes Gäßler 2024-06-13 20:22:47 +02:00
  • 46b4054e6e try CI fix Johannes Gäßler 2024-06-13 18:54:14 +02:00
  • f4d33f87f8 Fix issues with Windows build slaren 2024-06-13 18:52:41 +02:00
  • 45c483cedb Merge remote-tracking branch 'origin/master' into direct_io slaren 2024-06-13 18:07:38 +02:00
  • d3131ce565 Fix editorconfig and unused variable slaren 2024-06-13 18:06:41 +02:00
  • 87099452ed try CI fix Johannes Gäßler 2024-06-13 18:06:07 +02:00
  • d962a56baa CUDA: faster q2_K, q3_K MMQ + int8 tensor cores Johannes Gäßler 2024-06-11 15:43:39 +02:00
  • c39d5ecd2b
    Apply suggestions from code review Markus Tavenrath 2024-06-13 15:55:23 +02:00
  • 6d2464aef5 code style ngxson 2024-06-13 15:36:03 +02:00
  • f99be2c3ff disable GPU for PCA ngxson 2024-06-13 15:21:49 +02:00
  • 91f7dbfda2 typo ngxson 2024-06-13 14:55:26 +02:00
  • b7a9d40e51
    examples: refine tensor dump in examples/benchmark/benchmark-matmult.cpp zhou.weiguo 2024-06-13 20:54:06 +08:00
  • 64cad20c2e change compile target to llama-cvector-generator ngxson 2024-06-13 14:51:11 +02:00
  • 2f055584cf Merge branch 'master' into xsn/control-vector-generator ngxson 2024-06-13 14:33:45 +02:00
  • 86869fbdab Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. Markus Tavenrath 2024-06-13 14:32:03 +02:00
  • 172c825684
    rpc : fix ggml_backend_rpc_supports_buft() (#7918) b3145 Radoslav Gerganov 2024-06-13 15:18:44 +03:00
  • 3e921a9821 rpc : fix ggml_backend_rpc_supports_buft() Radoslav Gerganov 2024-06-13 14:58:53 +03:00
  • ca86d4fd33 escape prompt by default ngxson 2024-06-13 13:29:58 +02:00
  • 25fb0a6e61 beautify help msg ngxson 2024-06-13 13:29:46 +02:00
  • 18133cab40 Revert "use the correct SYCL context for host USM allocations" codeplay/revert-host-alloc Joe Todd 2024-06-13 12:08:27 +01:00
  • abd7c7b8c2 Formatting Joe Todd 2024-06-13 10:36:05 +01:00
  • 0c0f3f0000 [SYCL] Update unsupported ops Joe Todd 2024-06-13 10:33:34 +01:00
  • 9b81b57239 [SYCL] unify rope norm/neox Joe Todd 2024-06-13 10:30:43 +01:00
  • a55eb1bf0f
    readme : Remove outdated instructions from README.md (#7914) [no ci] Galunid 2024-06-13 09:42:41 +02:00
  • 5598fbd15d
    review: make a MVP(Minimum Viable PR) style PR in upstream zhou.weiguo 2024-06-13 15:41:53 +08:00
  • 4c29bb0494
    clear out all non-normals on load Sigbjørn Skjæret 2024-06-13 08:16:29 +02:00
  • 904673f262 [no ci] Remove outdated instructions from README.md Galunid 2024-06-13 08:03:09 +02:00
  • d38f1aecc5 check devices for the same SYCL platform rather than the same backend Ben Ashbaugh 2024-06-12 22:00:19 -07:00
  • 5ff64adfe4 iq1_m netrunnereve 2024-06-12 23:55:51 -04:00
  • d342abca57 trim white space Meng, Hengyu 2024-06-05 20:34:58 +08:00
  • 224273e0dd remove duplicate extra and global work group size Meng, Hengyu 2024-06-05 20:22:32 +08:00
  • 9c5476ead4 remove duplicate buft initialization Meng, Hengyu 2024-06-05 16:57:17 +08:00
  • abe11feab6 update mul_mat condition Meng, Hengyu 2024-06-04 16:47:23 +08:00
  • 2a034d2b41 remove useless extra Meng, Hengyu 2024-06-04 11:27:37 +08:00
  • d0186d381c replace global variables with context[2/2] Meng, Hengyu 2024-05-31 16:51:56 +08:00
  • c29eccc4df seperate DPCT helpers outside remove global variables and pack into context Meng, Hengyu 2024-05-30 20:41:54 +08:00
  • f578b86b21
    move BLAS to a separate backend (#6210) b3143 slaren 2024-06-13 03:11:35 +02:00
  • 211fb045f1 sched : allow ops with weights on an incompatible buffer type slaren 2024-06-13 02:38:36 +02:00
  • ae9cd85698 fix metal being used in layers not offloaded slaren 2024-06-13 02:04:06 +02:00
  • fe21ef7920
    correct counts on load Sigbjørn Skjæret 2024-06-13 02:10:17 +02:00
  • f03e9b935b Merge remote-tracking branch 'origin/master' into json-bounds2 Olivier Chafik 2024-06-13 00:45:11 +01:00
  • 322d611378 Merge remote-tracking branch 'origin/master' into json-additional Olivier Chafik 2024-06-13 00:44:10 +01:00
  • 4de4dc045c Merge remote-tracking branch 'origin/master' into json-type Olivier Chafik 2024-06-13 00:43:45 +01:00
  • 1c641e6aac
    build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) Olivier Chafik 2024-06-13 00:41:52 +01:00
  • 73d4a4ae03 Merge branch 'bins' of https://github.com/ochafik/llama.cpp into bins Olivier Chafik 2024-06-13 00:31:22 +01:00
  • 48e5009e64 rename gguf-split & quantize bins refs in **/tests.sh Olivier Chafik 2024-06-13 00:31:04 +01:00
  • 4ad3eb21bf
    skip imatrix entries with non-normal data Sigbjørn Skjæret 2024-06-13 01:08:44 +02:00
  • 0e36739915
    save partial imatrix Sigbjørn Skjæret 2024-06-13 01:02:22 +02:00
  • 03a8b80c7d
    Merge pull request #4 from julialongtin/0.99-rebase Julia Longtin 2024-06-12 22:32:41 +00:00
  • ded062c87f
    Merge branch 'master' into 0.99-rebase Julia Longtin 2024-06-12 22:31:09 +00:00
  • 430a2b178a
    Merge 9422668ed4 into 963552903f wbpxre150 2024-06-12 21:46:49 +00:00
  • 3704f33389 sycl: always set the main device after initialization Ben Ashbaugh 2024-06-12 14:41:27 -07:00
  • 75840fe6a6 Fix merge: 'smaug' jaime-m-p 2024-06-12 23:01:10 +02:00
  • c863752ca7 Generalize 'jina-v2' per token attributes jaime-m-p 2024-06-12 23:00:04 +02:00
  • 33425a7e1e mamba : fix non-contiguous usage of ggml_silu Francis Couture-Harpin 2024-06-12 12:57:02 -04:00
  • ff794f5535 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-06-12 12:10:29 -04:00
  • d23ae55de7
    Merge 76512cbc92 into 963552903f Andrew Ferruolo 2024-06-12 11:42:15 -04:00
  • 963552903f
    CUDA: fix broken oob check for FA vec f32 kernel (#7904) b3141 Johannes Gäßler 2024-06-12 17:41:51 +02:00
  • 46325233c9 Revert 7777 revert-7777-host-usm-context-fix Aidan 2024-06-12 16:21:41 +01:00
  • 60c373dd62 CUDA: fix broken oob check for FA vec f32 kernel Johannes Gäßler 2024-06-12 17:19:13 +02:00
  • 334dbaed3f shorten help msg ngxson 2024-06-12 17:13:19 +02:00
  • c59bfa6368 add print_usage ngxson 2024-06-12 17:12:02 +02:00
  • b22c8459ff clean up a bit ngxson 2024-06-12 16:08:27 +02:00
  • a2a5f1bfbd better error handling ngxson 2024-06-12 16:01:00 +02:00
  • 679f5137f8 move param parser to common ngxson 2024-06-12 15:58:20 +02:00
  • a9cae48003
    tests : add non-cont unary tests (#7857) b3140 Georgi Gerganov 2024-06-12 16:00:22 +03:00
  • 8412561c4b
    ggml : update unary asserts and "supports_op" gg/unary-non-cont Georgi Gerganov 2024-06-10 16:17:51 +03:00
  • ebf95c2225
    tests : add non-cont unary tests Georgi Gerganov 2024-06-10 15:46:54 +03:00
  • bfaa676b08
    ggml : improve ggml_is_contiguous logic (#7856) b3139 Georgi Gerganov 2024-06-12 15:24:20 +03:00
  • cd026b48ef
    ggml : support more contiguous cases gg/ggml-cont Georgi Gerganov 2024-06-12 15:12:32 +03:00
  • 1ebe20789b Free resources except for backend. Markus Tavenrath 2024-06-12 13:45:41 +02:00
  • 704a35b183
    server : restore numeric prompts (#7883) b3138 Georgi Gerganov 2024-06-12 14:42:29 +03:00
  • 06531cbaec update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM stefan 2024-06-12 11:30:08 +00:00
  • f54cb8e307 reuse allocr ngxson 2024-06-12 12:53:17 +02:00
  • 8ee0c96688 fix compile warn ngxson 2024-06-12 12:50:29 +02:00
  • e683b9af60 attemp to fix compile problem on mac ngxson 2024-06-12 12:49:01 +02:00
  • 19102415ea
    Update README.md Olivier Chafik 2024-06-12 11:32:55 +01:00
  • ecdde745ba
    Update README.md Olivier Chafik 2024-06-12 11:29:31 +01:00
  • 08da184147 add hot topic notice to README.md Olivier Chafik 2024-06-12 11:27:01 +01:00
  • ceb2859eef Merge remote-tracking branch 'origin/master' into bins Olivier Chafik 2024-06-12 10:43:17 +01:00
  • 7297817d13 use ggml_backend_tensor_copy ngxson 2024-06-12 11:41:37 +02:00
  • be66f9e605 Revert "update llama-rpc-server bin name + doc" Olivier Chafik 2024-06-12 10:40:49 +01:00