Commit graph

  • af1c9966c8 gguf : start write tensor info gguf-python M. Yusuf Sarıgöz 2023-07-27 10:32:31 +03:00
  • c85d3178b3
    refactor : reduce code duplication and better API (#2415) M. Yusuf Sarıgöz 2023-07-27 10:29:29 +03:00
  • 1038d1d2bc
    metal : fix out-of-bounds access + style changes Georgi Gerganov 2023-07-27 10:10:51 +03:00
  • 8332d26123 refactor: reduce code duplication and better API M. Yusuf Sarıgöz 2023-07-27 09:48:08 +03:00
  • 2855ffa7f4 server : Support dark mode Ebrahim Byagowi 2023-07-27 09:44:47 +03:30
  • 5ef33fbd5c complete JSON grammar Evan Jones 2023-07-26 21:15:26 -04:00
  • ffb8c87caa Merge remote-tracking branch 'upstream/master' into json-schema Evan Jones 2023-07-26 20:46:52 -04:00
  • a0f564ff4a Merge remote-tracking branch 'origin/master' into prompt-array Xiao-Yong Jin 2023-07-26 17:43:08 -05:00
  • bb3770b3e6 server: tokenize endpoint no longer adds BOS Xiao-Yong Jin 2023-07-26 17:42:20 -05:00
  • b5472ea0ad
    ggml : fix assert in ggml_set_unary_op (#2410) master-b5472ea slaren 2023-07-26 23:57:23 +02:00
  • f9c3a3fd60 ggml : fix assert in ggml_set_unary_op slaren 2023-07-26 22:04:29 +02:00
  • d8491fc7e3
    gguf : add comments Georgi Gerganov 2023-07-26 22:56:26 +03:00
  • 5628ec7163
    gguf : read / write sample models Georgi Gerganov 2023-07-26 20:04:22 +03:00
  • e9c17039db Create example bash script for LlaMa 2 Chat lionelchg 2023-07-26 21:31:30 +02:00
  • 6df1f5940f
    make : build with -Wmissing-prototypes (#2394) master-6df1f59 Cebtenzzre 2023-07-26 14:00:04 -04:00
  • cddfec9ff2 Create bash script for LlaMa 2 Chat models lionelchg 2023-07-26 19:16:07 +02:00
  • 01814b6014
    fix:workaround for missing _mm256_setr_m128i in GCC < 8 in new-added k_quants.c Lee 2023-07-27 00:48:51 +08:00
  • e46870f5af
    gguf : gguf.c is now part of ggml.c Georgi Gerganov 2023-07-26 18:55:32 +03:00
  • d313c0fa33
    gguf : simplify gguf_get_val Georgi Gerganov 2023-07-26 18:53:57 +03:00
  • cb871fa022
    gguf : do not support passing existing ggml_context to gguf_init Georgi Gerganov 2023-07-26 18:48:52 +03:00
  • 860c9c63ce
    gguf : add gguf_get_tensor_name() Georgi Gerganov 2023-07-26 16:36:03 +03:00
  • 78b226a959
    gguf : initial model loading - not tested Georgi Gerganov 2023-07-26 16:32:05 +03:00
  • d91b985d2d
    gguf : read tensor info Georgi Gerganov 2023-07-26 14:58:35 +03:00
  • 8d6acfec12
    gguf : read header + meta data Georgi Gerganov 2023-07-26 14:33:53 +03:00
  • 6873148771
    gguf : first API pass Georgi Gerganov 2023-07-26 13:24:20 +03:00
  • 7e82d25f40
    ci : disable CI temporary to not waste energy Georgi Gerganov 2023-07-26 11:26:14 +03:00
  • bae6b125f6
    wip : implement GGUF (#2397) M. Yusuf Sarıgöz 2023-07-26 11:17:05 +03:00
  • 4d698495ea
    gguf : init Georgi Gerganov 2023-07-26 11:16:07 +03:00
  • 5488fb789e
    ggml : allocate graphs in a context (#2392) master-5488fb7 slaren 2023-07-26 15:56:53 +02:00
  • 0509a68016 Adding newline to eof Stephen Nichols 2023-07-26 07:14:44 -05:00
  • 1b4fd4e0d9 cleanup slaren 2023-07-26 13:10:41 +02:00
  • e5055f0971 fix: remove the unnecessary last \n nhamanasu 2023-07-26 19:49:51 +09:00
  • 7949dcaaf7 add GGML_PAD slaren 2023-07-26 12:49:18 +02:00
  • 5c74eb0b2e add: server chat mode with llama2 nhamanasu 2023-07-26 19:19:36 +09:00
  • 156d99abde cleanup slaren 2023-07-26 11:48:20 +02:00
  • 5c19dd3eef
    Update ggml.c slaren 2023-07-26 11:34:17 +02:00
  • fb9a06773c WIP: python class to write GGUF, incomplete C apı for reading M. Yusuf Sarıgöz 2023-07-26 10:57:03 +03:00
  • bee2a3d981 Add docs Make fschat and flask-cors optional Elsa 2023-07-25 18:16:31 +08:00
  • ea5a7fbc95 Use coversation template from fastchat for api proxy Fix eventsource format Elsa 2023-07-25 18:07:27 +08:00
  • de41d5ecd8 Fix static declarations goerch 2023-07-26 08:30:25 +02:00
  • 94e0a06daf updated lite, up ver (+1 squashed commits) Concedo 2023-07-26 10:35:53 +08:00
  • b184380aae Revert "a better default rms_norm_eps" Concedo 2023-07-26 10:23:45 +08:00
  • f53d2aabb4 Merge branch 'master' into concedo_experimental Concedo 2023-07-26 10:19:59 +08:00
  • 97110251b9 fix mpi slaren 2023-07-26 01:43:53 +02:00
  • 78ef528958 make : build with -Wmissing-prototypes Cebtenzzre 2023-07-25 18:42:18 -04:00
  • a9019963a1 WIP: super not working attempt atm. will update as I learn more ggml :D Aniket 2023-07-25 16:37:38 -04:00
  • 78f8e4d604 add the new example directory in gitignore Aniket 2023-07-25 16:36:39 -04:00
  • 77d662faa5 llama.cpp : allocate graph in the context slaren 2023-07-25 20:36:32 +02:00
  • 567b5e24ed allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx slaren 2023-07-25 20:35:59 +02:00
  • 59e808b49b ggml : graph allocation in contexts slaren 2023-07-25 20:29:02 +02:00
  • 3811c0a505 Reverting assert edits. Stephen Nichols 2023-07-25 12:14:21 -05:00
  • 1b2ec1aa72 Move to graph function similar to CUDA implementation 0cc4m 2023-07-25 19:01:28 +02:00
  • 3e3f38af48 Fixing race condition in server.cpp and partial stream handling in completion.js Stephen Nichols 2023-07-25 11:57:29 -05:00
  • b4a5461ff8 Resolve merge conflict with grammar stuff. goerch 2023-07-25 18:14:38 +02:00
  • 3bdf106e06
    Merge branch 'master' into fix-2023 goerch 2023-07-25 17:59:13 +02:00
  • 5e7a26628b
    Merge branch 'ggerganov:master' into hellaswag_scores klosax 2023-07-25 17:58:04 +02:00
  • fae04ddd97
    perplexity.cpp : clean up klosax 2023-07-25 17:57:15 +02:00
  • e68580f993 Remove llama.cpp.h goerch 2023-07-25 17:49:24 +02:00
  • ae4d116bdf
    perplexity.cpp : add hellswag scores / remove perplexity-lines klosax 2023-07-25 17:43:34 +02:00
  • a40f608249
    common.cpp : add hellaswag / remove perplexity-lines klosax 2023-07-25 17:37:00 +02:00
  • eb542d3932
    Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384) master-eb542d3 Kawrakow 2023-07-25 18:35:53 +03:00
  • 522a29c426
    common.h : add hellaswag / remove perplexity-lines klosax 2023-07-25 17:31:43 +02:00
  • 6a054b80b0 Merge branch 'master' into concedo_experimental Concedo 2023-07-25 22:55:55 +08:00
  • 0c26799e77 a better default rms_norm_eps Concedo 2023-07-25 22:51:01 +08:00
  • 07aaa0f63f
    ggml : fix ggml_flash_attn to use op_params (#2387) master-07aaa0f slaren 2023-07-25 16:20:12 +02:00
  • e25e15c9c5 fix slaren 2023-07-25 15:50:57 +02:00
  • 8a927cf487 ggml : fix ggml_flash_attn to use op_params slaren 2023-07-25 15:43:46 +02:00
  • fce48caf9a
    convert.py : support bpe tokenizer (#2228) ldwang 2023-07-25 21:22:09 +08:00
  • 875086bdb9
    ggml : relax contiguous constraints in activation function (#2371) master-875086b Jiahao Li 2023-07-25 20:58:32 +08:00
  • 53c2db1685 server: fix prompt check Xiao-Yong Jin 2023-07-25 07:47:17 -05:00
  • da1889834a
    ggml : improve graph build time via hash table lookup (#2329) master-da18898 slaren 2023-07-25 14:32:20 +02:00
  • 82552b7f54
    build : fix line breaking error in build-info.sh (#2349) Hesen Peng 2023-07-25 05:24:09 -07:00
  • 0c06204fb3
    main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS (#2304) master-0c06204 Xiao-Yong Jin 2023-07-25 07:19:11 -05:00
  • 1fed755b1f
    ci : add non-AVX scalar build/test (#2356) master-1fed755 Eve 2023-07-25 08:16:13 -04:00
  • be2301bcda
    k_quants : add AVX support to dot functions with QK_K as 64 (#2339) master-be2301b katsu560 2023-07-25 21:13:41 +09:00
  • 055bee91af Add LLAMA_DEFAULT_RMS_EPS so we can change the default Iwan Kawrakow 2023-07-25 15:01:17 +03:00
  • 1aa18ef994
    metal : concurrently dispatch commands (#2358) master-1aa18ef Shouzheng Liu 2023-07-25 08:00:19 -04:00
  • 141d88d916
    metal : code style changes Georgi Gerganov 2023-07-25 14:59:17 +03:00
  • ea02f675f9
    Merge branch 'master' into HEAD Georgi Gerganov 2023-07-25 14:21:14 +03:00
  • 3e68cdd26a Merge branch 'master' into concedo_experimental Concedo 2023-07-25 18:52:48 +08:00
  • 9a08eaf3c4
    Another speed gain for Q4_0 and Q4_1 on Metal (#2375) Kawrakow 2023-07-25 13:48:29 +03:00
  • 129d844c87
    Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) master-129d844 Kawrakow 2023-07-25 13:48:04 +03:00
  • 450a7c76de
    ggml : mul_mat threads yield Georgi Gerganov 2023-07-25 13:26:32 +03:00
  • 66e4b5141e fix horde worker host and client agent Concedo 2023-07-25 18:18:41 +08:00
  • d5512b782b
    server: add rms_norm_eps parameter (#2380) master-d5512b7 slaren 2023-07-25 11:36:17 +02:00
  • 2076a9b3d9 ggml : mul_mat block tiling attempt Georgi Gerganov 2023-07-25 11:34:32 +03:00
  • c798308e3a
    [Server] Escape HTML in webchat (#2368) master-c798308 Henri Vasserman 2023-07-25 10:27:34 +03:00
  • 69554cee9e Add fallback for devices only supporting one DescriptorSet per DescriptorPool 0cc4m 2023-07-25 07:01:02 +02:00
  • b12859937f Merge remote-tracking branch 'origin/master' into prefix-bos Xiao-Yong Jin 2023-07-24 21:51:48 -05:00
  • 11d2405486 examples/common: move input_prefix_bos to other bools Xiao-Yong Jin 2023-07-24 21:48:44 -05:00
  • bba27edadc Merge remote-tracking branch 'origin/master' into prompt-array Xiao-Yong Jin 2023-07-24 21:40:19 -05:00
  • 97deb25398 server: use tokenizePrompt(json) and default "" if empty prompt Xiao-Yong Jin 2023-07-24 21:39:35 -05:00
  • f4519830ed first crack at lamma2.c model conversion Aniket 2023-07-24 22:29:30 -04:00
  • 010a3cbe81 added Dockerfile for server John Jones 2023-07-24 20:34:48 -04:00
  • 4e580284c0 Allow parallel execution of kernels, parallelize third and fourth dimension calls 0cc4m 2023-07-24 22:51:19 +02:00
  • 3d4359e21b server: add rms_norm_eps parameter slaren 2023-07-24 22:45:35 +02:00
  • 48c27a9ce1 hotfix for 70b broadcast issues Concedo 2023-07-25 01:32:47 +08:00
  • 9731682ad6
    Update Makefile (#345) Александр Герман 2023-07-24 21:21:32 +05:00
  • 7f98561243 Have N_DST, etc., be template parameters Iwan Kawrakow 2023-07-24 19:05:44 +03:00
  • 6f489a77dd metal: don't call find_concurrency automatically. lshzh-ww 2023-07-24 11:59:12 -04:00