Commit graph

  • 7c00fd5184 win32 support Radoslav Gerganov 2024-05-09 15:10:21 +03:00
  • 3d55181445 fix warning Radoslav Gerganov 2024-05-07 14:30:02 +03:00
  • 0b5e8a7183 add get_device_memory Radoslav Gerganov 2024-05-07 14:05:33 +03:00
  • 7a963c3087 implement get_alignment and get_max_size Radoslav Gerganov 2024-05-07 11:32:18 +03:00
  • ef9be32791 wrap sockfd into a struct Radoslav Gerganov 2024-05-07 11:02:02 +03:00
  • dfadd1a82c Address review comments Radoslav Gerganov 2024-04-30 15:27:54 +03:00
  • cddbf972c8 Address review comments Radoslav Gerganov 2024-04-30 14:48:38 +03:00
  • 654c1cc279 implement llama_max_devices() for RPC Radoslav Gerganov 2024-04-30 14:34:09 +03:00
  • 95c16c263c fix warning Radoslav Gerganov 2024-04-30 14:08:09 +03:00
  • c8546879c4 Address review comments Radoslav Gerganov 2024-04-30 13:08:53 +03:00
  • 3562c33212 add CI workflows Radoslav Gerganov 2024-04-29 16:15:57 +03:00
  • e539b45e5a set TCP_NODELAY Radoslav Gerganov 2024-04-29 14:38:45 +03:00
  • bfdd4f4045 ggml : add RPC backend Radoslav Gerganov 2024-04-18 14:07:22 +03:00
  • 541600201e
    llama : disable pipeline parallelism with nkvo (#7265) b2876 slaren 2024-05-14 09:33:42 +02:00
  • efc8f767c8
    move ndk code to a new library (#6951) b2875 Elton Kola 2024-05-14 03:30:30 -04:00
  • e0f556186b
    Add left recursion check: quit early instead of going into an infinite loop (#7083) b2874 Haggai Nuchi 2024-05-13 22:25:56 -07:00
  • 27f65d6267
    docs: Fix typo and update description for --embeddings flag (#7026) Ryuei 2024-05-14 14:20:47 +09:00
  • 284870c868
    Merge branch 'master' into fix-convert-modelname fix-convert-modelname Brian 2024-05-14 15:05:49 +10:00
  • cdb56b14a7 docs: Fix typo and update description for --embeddings flag Ryuei 2024-05-01 23:38:22 +09:00
  • a30c3ab02c ggml-quants, llama: removed excess checks, it has already been checked before Herman Semenov 2024-05-13 21:26:01 -05:00
  • c08d69f924 grammar, json, llama: replace push on emplace if it possible Herman Semenov 2024-05-13 20:54:34 -05:00
  • ced5bfeb33 Added const reference for std::pair<> and std::tuple<> more 16 bytes: Herman Semenov 2024-05-13 20:07:53 -05:00
  • ca61d3e498 ggml, ngram-cache, log: added const and const ref for function params Herman Semenov 2024-05-13 19:29:19 -05:00
  • 2a9a84be7d ggml llama: align structs for memory optimization on 64-bit platforms: Herman Semenov 2024-05-13 18:38:48 -05:00
  • b90a41d609 spacing changes. Julia Longtin 2024-05-12 09:36:08 +00:00
  • f7b062fce9 do 2 rounds of 4, instead of 4 rounds of 2. and properly offset unalligned reads across a 64 byte boundary. Julia Longtin 2024-05-11 20:28:47 +00:00
  • 464a74f0c0 make offset available in a register. Julia Longtin 2024-05-11 19:57:45 +00:00
  • 92bd588433 load from identical addresses for low and high side. Julia Longtin 2024-05-11 19:48:53 +00:00
  • f0d4f513c6 minor comment fixes. Julia Longtin 2024-05-11 19:47:20 +00:00
  • d52607486d make the offset of q4 available. Julia Longtin 2024-05-11 19:39:53 +00:00
  • 259da93e5a add missing vector. Julia Longtin 2024-05-11 19:29:09 +00:00
  • 5c76364410 fill and increment r12 and r13. Julia Longtin 2024-05-11 19:24:11 +00:00
  • 5e7d7ab70d relabel some other labels. Julia Longtin 2024-05-11 19:02:48 +00:00
  • 939606155a rename some labels. Julia Longtin 2024-05-11 17:56:10 +00:00
  • 25fc1d669c rename label 1 to 3. Julia Longtin 2024-05-11 14:24:30 +00:00
  • 8e854f4c3f introduce r10 and r11, for vloadunpackhd. Julia Longtin 2024-05-11 14:02:36 +00:00
  • 5615c8611f spacing changes. Julia Longtin 2024-05-11 13:35:50 +00:00
  • 6ed6f2f536 spacing changes. Julia Longtin 2024-05-11 13:26:00 +00:00
  • 9372048875 add missing jump. Julia Longtin 2024-05-11 12:53:23 +00:00
  • 41a9ed02f1 look at the right final memory location. Julia Longtin 2024-05-11 11:27:52 +00:00
  • e8087c5d60 subtract the correct amount. Julia Longtin 2024-05-11 11:11:15 +00:00
  • cfe47d048d change from handling three iterations per loop to four. Julia Longtin 2024-05-11 11:07:16 +00:00
  • 7819247614 comment clarification. Julia Longtin 2024-05-10 21:57:16 +00:00
  • fc828b46df correct a comment, and use jz when comparing to zero. Julia Longtin 2024-05-10 20:30:56 +00:00
  • f1af881a23 use values inside of the loop as soon as we have them. Julia Longtin 2024-05-10 19:33:58 +00:00
  • 6bd8dcb282 fix loop. Julia Longtin 2024-05-10 17:07:27 +00:00
  • 50800b91f0 move sub earlier, and move the compare of iterations to outside, and at the end of the loop. Julia Longtin 2024-05-10 17:03:41 +00:00
  • 1d74ddb15c spacing and comment changes. Julia Longtin 2024-05-10 16:50:39 +00:00
  • 156b9b676a remove useless prefetches. Julia Longtin 2024-05-10 16:28:53 +00:00
  • cb96a48ed1 perform better prefetches, and invert the test of our clear flag for clarity. Julia Longtin 2024-05-10 16:14:28 +00:00
  • 14638be66c use vbroadcastss in place of vbroadcast32x4. Julia Longtin 2024-05-10 15:52:35 +00:00
  • 0261b3b8f8 Use a vectorized assembly function to handle remaining chunks less than vector wide. Julia Longtin 2024-05-10 14:52:46 +00:00
  • 7efdcf5b4f broadcast a single int8, instead of 4 of them. Julia Longtin 2024-05-10 14:19:27 +00:00
  • 201566c965 use different restrict syntax, to make g++ happy. Julia Longtin 2024-05-09 23:08:43 +00:00
  • bf674be34f fix typo Julia Longtin 2024-05-09 20:41:50 +00:00
  • 30e8b37f33 remove a warning. Julia Longtin 2024-05-09 20:40:50 +00:00
  • de44c6633e add batch fp16<->fp32 conversion functions. Julia Longtin 2024-05-09 19:31:28 +00:00
  • b33cd8d614 minor spacing and comment changes. Julia Longtin 2024-05-09 16:57:59 +00:00
  • e108564e2d spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. Julia Longtin 2024-04-26 14:44:08 +00:00
  • 1ba6534846 spacing and capitalization changes. Julia Longtin 2024-04-25 21:23:22 +00:00
  • 93d0a0ae7a use or, instead of and. bug fix? Julia Longtin 2024-04-24 17:50:12 +00:00
  • 2cfc15b0a9 comment and spacing fixes. Julia Longtin 2024-04-24 17:38:42 +00:00
  • d27cd93d11 fix an offset error, and get rid of tabs. Julia Longtin 2024-04-22 18:29:31 +00:00
  • 5b2023bb12 fix some small errors. Julia Longtin 2024-04-22 18:22:22 +00:00
  • 934f869a51 further optimizations. 0.99 tokens per second. Julia Longtin 2024-04-22 18:16:28 +00:00
  • a33c82b6bb replace tabs with spaces. Julia Longtin 2024-04-03 23:42:34 +00:00
  • 039685d78c reformat, and label what these files are. Julia Longtin 2024-04-03 23:21:24 +00:00
  • feb8bccfab use GGML_F32_EPR, and remove some dead code. Julia Longtin 2024-04-03 22:04:45 +00:00
  • 7214391ff7 whoops. missing tab. Julia Longtin 2024-04-03 21:58:29 +00:00
  • 10f06379d7 add Makefile rule for generation .s file, for manual inspection. Julia Longtin 2024-04-03 20:30:25 +00:00
  • e544a3faa2 formatting changes. Julia Longtin 2024-04-03 20:24:00 +00:00
  • 481f1746c0 indent headers consistently. Julia Longtin 2024-04-03 19:01:18 +00:00
  • aa33f281e3 formatting. Julia Longtin 2024-04-02 17:01:53 +00:00
  • 021ae03bd6 minor changes. Julia Longtin 2024-04-02 16:55:40 +00:00
  • efcd202f0f massively rewrite assembly routines. Julia Longtin 2024-04-02 15:41:56 +00:00
  • e66a97f765 fix vector sizes. Julia Longtin 2024-03-25 19:43:37 +00:00
  • 5a6024279f separate filling aux16 from consuming aux16 by making it an array of vectors. Julia Longtin 2024-03-24 14:18:08 +00:00
  • d351d995b0 loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors. Julia Longtin 2024-03-24 13:35:05 +00:00
  • 185d4b8bf7 promote aux8 into a vector. Julia Longtin 2024-03-24 12:50:01 +00:00
  • a95c7b0138 fix our reference to src in the second place, and use a more accurate comment. Julia Longtin 2024-03-24 12:41:21 +00:00
  • babe051eaa spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src. Julia Longtin 2024-03-24 12:37:47 +00:00
  • b5c1135f4d better comments, and fix some small errors. Julia Longtin 2024-03-24 12:17:06 +00:00
  • 7e3eb5c01d perform 16 operations at a time. Julia Longtin 2024-03-24 12:04:44 +00:00
  • 6d4535e829 use proper mov operator, and pass addresses. Julia Longtin 2024-03-23 23:46:36 +00:00
  • e72539bcc5 attempt our first FMA. Julia Longtin 2024-03-23 22:16:57 +00:00
  • b22e3e021e add I32 vector memory clearing. Julia Longtin 2024-03-23 21:16:23 +00:00
  • 1446a724df promote aux32 to a vector. Julia Longtin 2024-03-23 21:12:35 +00:00
  • a9cc0e74d3 add missing address of operators. Julia Longtin 2024-03-23 21:05:50 +00:00
  • bff7b695b3 promote aux16 to a vector. Julia Longtin 2024-03-23 21:00:51 +00:00
  • df33835700 use quotes properly. Julia Longtin 2024-03-23 20:53:16 +00:00
  • 2dc7991809 use better memory save operator. Julia Longtin 2024-03-23 20:49:11 +00:00
  • 588a0b19cc expand mask, and align memory. Julia Longtin 2024-03-23 20:48:43 +00:00
  • 3994d81bf0 try to use vectorized zeroing function. Julia Longtin 2024-03-23 19:55:12 +00:00
  • e227717136 add missing variable. Julia Longtin 2024-03-23 19:49:16 +00:00
  • d5a27eb507 copy right block. Julia Longtin 2024-03-23 19:47:21 +00:00
  • 9f92f9730e fix typo. Julia Longtin 2024-03-23 16:29:30 +00:00
  • 484c4abf8d promote aux16 into a vector. (part three) Julia Longtin 2024-03-23 16:27:11 +00:00
  • fb0fb9ff1b promote aux16 into a vector. Julia Longtin 2024-03-23 16:24:11 +00:00
  • 405b5fa731 promote aux16 into a vector. Julia Longtin 2024-03-23 16:21:20 +00:00
  • b92e06456c formatting improvement. Julia Longtin 2024-03-23 16:19:17 +00:00