Commit graph

  • 93356bdb7a
    ggml : mul mat tweaks (#2372) master-93356bd Georgi Gerganov 2023-08-07 14:25:58 +03:00
  • 60baff7c85
    ggml : pad result of ggml_nbytes() master-60baff7 Georgi Gerganov 2023-08-07 14:24:42 +03:00
  • 9082b5dfbf
    ggml : change params pointer (style change) (#2539) master-9082b5d Georgi Gerganov 2023-08-07 13:55:18 +03:00
  • dd50b77d37
    ggml : fix params pointer Georgi Gerganov 2023-08-07 13:26:56 +03:00
  • 99d29c0094
    ggml : sync (custom ops) (#2537) master-99d29c0 Georgi Gerganov 2023-08-07 13:20:09 +03:00
  • ea73dace98 Fix when stop in request is null Elsa 2023-08-07 18:09:15 +08:00
  • 6ae3702f69 Merge remote-tracking branch 'origin/master' Elsa 2023-08-07 18:08:17 +08:00
  • b6524985df Include review comments Martin Krasser 2023-08-07 11:46:26 +02:00
  • 5ddfbffbaf
    llama : replace (permute + reshape + view_1d) with (view_3d) Georgi Gerganov 2023-08-07 12:32:58 +03:00
  • 9133e456d2 Merge branch 'master' into concedo_experimental Concedo 2023-08-07 17:33:42 +08:00
  • cae6a847ad cuda free only for non mmq (+2 squashed commit) Concedo 2023-08-07 16:40:13 +08:00
  • 9b643601e6
    ggml : sync (custom ops) Georgi Gerganov 2023-08-07 11:52:32 +03:00
  • 3d9a551816
    Fixed mmap prefetch for GPU offloading (#2529) master-3d9a551 Johannes Gäßler 2023-08-07 10:09:40 +02:00
  • f6f9896ac3
    metal : fix out-of-bounds access + inc concurrency nodes (#2416) Georgi Gerganov 2023-08-07 10:52:57 +03:00
  • 30ea0e1685
    metal : increase concurrency nodes to 2*GGML_MAX_NODES Georgi Gerganov 2023-08-07 10:52:13 +03:00
  • 9f16a4c4ef switch to upstream implementation of pool malloc Concedo 2023-08-07 15:16:37 +08:00
  • 34a14b28ff
    [Makefile] Move ARM CFLAGS before compilation (#2536) master-34a14b2 GiviMAD 2023-08-06 23:21:46 -07:00
  • 7297128db8
    [Zig] Rewrite build for Zig 0.11 (#2514) Henri Vasserman 2023-08-07 08:35:53 +03:00
  • e660943d3d Add further ops 0cc4m 2023-08-07 06:02:57 +02:00
  • 6659652c9f lower actual temp used when temp=0 Concedo 2023-08-07 11:05:06 +08:00
  • 0e41b94f40 improve detection for 70B. Concedo 2023-08-07 10:43:06 +08:00
  • fb44d72a78 Merge remote-tracking branch 'johannes/cuda-fix-mmap-prefetch' into concedo_experimental Concedo 2023-08-07 10:17:43 +08:00
  • 559c0e2d1f updated lite again, fix for wi Concedo 2023-08-07 10:15:20 +08:00
  • 0b8c9efe8f Refactor makefile fix build with CLBlast in arm Miguel Álvarez 2023-08-07 01:01:30 +02:00
  • 2bf422eafd
    add train function using automatic gradient checkpointing backward pass and allocator xaedes 2023-08-06 23:07:57 +02:00
  • d9024df759 Fixed mmap prefetch for GPU offloading JohannesGaessler 2023-08-06 10:18:05 +02:00
  • 68365e2291 one can now specify where ggml-metal.metal file is with en variable GGML_METAL_PATH Marc 2023-08-06 18:20:59 +02:00
  • d43af4b543
    Merge branch 'master' into pr-train-mem-usage-improvements xaedes 2023-08-06 17:30:17 +02:00
  • d442888626 Merge branch 'master' into concedo_experimental Concedo 2023-08-06 22:47:33 +08:00
  • 198cc826fc updated lite Concedo 2023-08-06 22:19:18 +08:00
  • 5d52192f73 Remove inactive code. goerch 2023-08-06 13:51:26 +02:00
  • bb6a58d0c3 Simplifying an expression. goerch 2023-08-06 13:35:27 +02:00
  • 19e950f051 Adding support for Aquila (GPT2?) tokenizer. goerch 2023-08-06 13:24:05 +02:00
  • e99416cdfe blasbatchsize Concedo 2023-08-06 17:47:59 +08:00
  • bcfdd0e662 fixed bbs -1 and allow bbs = 2048 Concedo 2023-08-06 17:47:05 +08:00
  • 86c3219895
    console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) master-86c3219 DannyDaemonic 2023-08-05 23:49:34 -07:00
  • 2e8265ae17
    convert.py : add missing abstract methods for quantized data (#2491) Keiichi Tabata 2023-08-06 15:34:05 +09:00
  • 1b5442923a Fix tokenizer regression in convert.py and improve CPP interface for llama_tokenize goerch 2023-08-06 07:47:55 +02:00
  • d9f75f3ccf Allow passing grammar to completion endpoint Martin Krasser 2023-08-05 14:05:15 +02:00
  • 0480362f12 remove from llama_context_params netrunnereve 2023-08-06 00:44:29 -04:00
  • ce6d86ec41 fix netrunnereve 2023-08-06 00:40:13 -04:00
  • 215e2f21d0 only activate pp_threads for main for now netrunnereve 2023-08-06 00:22:14 -04:00
  • 590feeac1d add printout of pp_threads netrunnereve 2023-08-06 00:13:02 -04:00
  • 30a0e4ccba Fixing function ordering issue goerch 2023-08-06 05:55:14 +02:00
  • 1de711d4f8 builds fine netrunnereve 2023-08-05 23:45:58 -04:00
  • ccd2592782 Add further missing barrier 0cc4m 2023-08-06 05:25:33 +02:00
  • 5f022185a1 test pp_threads netrunnereve 2023-08-05 22:39:44 -04:00
  • f514d1b306
    CUDA: faster k-quant mul_mat_q kernels (#2525) master-f514d1b Johannes Gäßler 2023-08-05 18:20:44 +02:00
  • fe6a8f80ff CUDA: faster k-quant mul_mat_q kernels JohannesGaessler 2023-08-02 15:54:53 +02:00
  • b139ca4e94 server: add --numa support Cheng Shao 2023-08-05 12:36:25 +00:00
  • c760fd2452 Fix issue related to Windows 11 PowerShell console mode persistence Danny Daemonic 2023-08-04 21:27:51 -07:00
  • 04b6f2ce20 server : convert prob to percentage + show original value as div title Jhen 2023-08-05 07:11:28 +08:00
  • 3e1e86d89c Merge branch 'master' into server-probs Jhen 2023-08-05 07:10:14 +08:00
  • 332311234a
    fix firefox autoscroll (#2519) master-3323112 Jonas Wunderlich 2023-08-04 20:16:11 +00:00
  • dff68fd968
    fix firefox autoscroll Jonas Wunderlich 2023-08-04 21:54:43 +02:00
  • 182af739c4
    server: regenerate completion.js.hpp (#2515) master-182af73 Cebtenzzre 2023-08-04 15:00:57 -04:00
  • 6fc2847aaf simplify object var names Henri Vasserman 2023-08-04 21:43:15 +03:00
  • c1cb4c11be Disable LTO on Windows. Henri Vasserman 2023-08-04 21:19:37 +03:00
  • 555d132d2a server: regenerate completion.js.hpp Cebtenzzre 2023-08-04 12:16:18 -04:00
  • 4329d1acb0
    CUDA: use min compute capability of GPUs actually used (#2506) master-4329d1a Cebtenzzre 2023-08-04 11:35:22 -04:00
  • 02f9d96a86
    CUDA: check if event is NULL before cudaStreamWaitEvent (#2505) master-02f9d96 Cebtenzzre 2023-08-04 11:34:32 -04:00
  • 3498588e0f
    Add --simple-io option for subprocesses and break out console.h and cpp (#1558) master-3498588 DannyDaemonic 2023-08-04 08:20:12 -07:00
  • a36255062f zig build fixes Henri Vasserman 2023-08-04 18:16:24 +03:00
  • 18bb0ab127 up ver, support 16k ctx Concedo 2023-08-04 21:47:17 +08:00
  • 5f631c2679
    Fixing race condition in server and partial stream handling in frontend. (#2391) master-5f631c2 Stephen Nichols 2023-08-04 06:37:24 -05:00
  • 415e99fec2
    Stream save llama context data to file instead of allocating entire buffer upfront (#2488) master-415e99f l3utterfly 2023-08-04 19:29:52 +08:00
  • d6360ade08
    Apply code review suggestions l3utterfly 2023-08-04 19:15:22 +08:00
  • e74d42dfff
    Apply suggestions from code review l3utterfly 2023-08-04 19:14:26 +08:00
  • ff966e7ca6
    build : fix several cast and printf warnings (#2499) master-ff966e7 Borislav Stanimirov 2023-08-04 13:07:21 +03:00
  • db5618ad99
    cmpnct_gpt2bpe.hpp : comments klosax 2023-08-04 04:57:51 +02:00
  • f0764c6cfb fix indentation, increase server thread count Concedo 2023-08-04 10:29:56 +08:00
  • d09e54aad1 Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental Concedo 2023-08-04 10:22:53 +08:00
  • 278ada9572
    gguf.py : bytesarray for gpt2bpe tokenizer klosax 2023-08-04 04:07:57 +02:00
  • fb0b243705
    Makefile : remove gptneox-common klosax 2023-08-04 04:02:10 +02:00
  • 5d98989cf6
    gpt2 bpe tokenizer (handles merges and unicode) klosax 2023-08-04 03:58:44 +02:00
  • e6f19ba240
    gptneox-main.cpp : gpt2 bpe tokenizer klosax 2023-08-04 03:56:37 +02:00
  • 2922280a1a
    convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer klosax 2023-08-04 03:55:23 +02:00
  • 6691aa8797
    Delete gptneox-common.h klosax 2023-08-04 03:52:01 +02:00
  • 23abbe8e00
    Delete gptneox-common.cpp klosax 2023-08-04 03:51:43 +02:00
  • c1320fd54a CUDA: use min compute capability of GPUs actually used Cebtenzzre 2023-08-03 16:40:34 -04:00
  • c79c66bdf3 CUDA: check if event is NULL before cudaStreamWaitEvent Cebtenzzre 2023-08-03 15:04:24 -04:00
  • d4a126cdb5 removed unused llama-util.h include l3utterfly 2023-08-03 21:03:54 +08:00
  • 142da8f9dc fixed whitepace l3utterfly 2023-08-03 21:03:29 +08:00
  • 30144f7634 restored save load state example l3utterfly 2023-08-03 21:03:15 +08:00
  • cffe923ea3 fixed function declaration order l3utterfly 2023-08-03 20:52:54 +08:00
  • 7c5b2b57b0 - restored breakage of the llama_copy_state_data API - moved new logic for copying llama state data to internal function l3utterfly 2023-08-03 20:47:35 +08:00
  • 63ec711a70
    fix: still send full result after streaming duncannah 2023-08-03 14:35:43 +02:00
  • 601eef7f06 Add --simple-io option for subprocesses and break out console.h and cpp Danny Daemonic 2023-05-21 22:35:32 -07:00
  • 86ac49c5c5
    build : fix several cast and printf warnings Borislav Stanimirov 2023-08-03 13:40:00 +03:00
  • 1ffa6be726 updated save load state to use public function in llama.cpp l3utterfly 2023-08-03 17:30:13 +08:00
  • 987387859b fixed save load state example l3utterfly 2023-08-03 17:21:55 +08:00
  • 81f347c19d fixed trailing whitespaces l3utterfly 2023-08-03 17:13:12 +08:00
  • 87fcdd971c added comments explaining how copy_state_data works l3utterfly 2023-08-03 17:11:17 +08:00
  • 3cebd6e4b7 generalised copying state data to file or buffer l3utterfly 2023-08-03 17:02:29 +08:00
  • 4709545c06 Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental Concedo 2023-08-03 12:52:43 +08:00
  • ba2040d1df compile fix for ARM NEON Concedo 2023-08-03 12:52:06 +08:00
  • 3fa6befdaf increase max free blocks Concedo 2023-08-03 10:50:16 +08:00
  • 34e60be41a compile fix Concedo 2023-08-03 10:36:14 +08:00
  • 8183159cf3
    examples : generate JSON according to schema (#1887) Evan Jones 2023-08-02 22:05:44 -04:00
  • 034894f590 support integer type and adjust usage text Evan Jones 2023-08-02 21:11:15 -04:00