Commit graph

  • 1bc4dc5c15 Bump IQ3_M Nexesenex 2024-08-09 22:49:42 +02:00
  • b72942fac9
    Merge commit from fork b3561 Georgi Gerganov 2024-08-09 23:03:21 +03:00
  • 788b4d199e common : call llama_decode() during warmup only if the model has decoder Stanisław Szymczyk 2024-08-09 21:58:13 +02:00
  • 9e0ac9895c Fix float32 concat f16 shader validation error 0cc4m 2024-08-09 21:47:02 +02:00
  • 5292fdb41e
    Fix memory leak in src/llama.cpp Micah Talkiewicz 2024-08-09 15:40:13 -04:00
  • 94597ecfe2 llama : whitespace formatting Stanisław Szymczyk 2024-08-09 21:32:10 +02:00
  • efe6aca5eb Rework and fix Vulkan descriptor set and descriptor pool handling 0cc4m 2024-08-09 21:26:25 +02:00
  • 02916f6de6 Add support for encoder-only T5 models (#8900) Stanisław Szymczyk 2024-08-09 21:10:44 +02:00
  • 88105b7f12 Reuse querybatch to reduce frequent memory allocation gtygo 2024-08-10 01:44:31 +08:00
  • fe6dc61143 retrieval gtygo 2024-08-10 01:11:51 +08:00
  • 6afd1a99dc
    llama : add support for lora adapters in T5 model (#8938) b3560 fairydreaming 2024-08-09 18:53:09 +02:00
  • b27f87d6da token healing : fix rebase bug mare5x 2024-07-08 16:18:19 +02:00
  • 940ab81784 readme : list possible token healing values mare5x 2024-07-08 15:53:23 +02:00
  • b317368191 token healing : change argument order mare5x 2024-07-01 12:23:21 +02:00
  • ea4abc9d82 token healing : refactor argument parsing mare5x 2024-07-01 11:51:39 +02:00
  • 3ba5c55bc4 server : token healing for infilling/FIM mare5x 2024-06-30 22:30:15 +02:00
  • d5eea13797 server : add token healing support mare5x 2024-06-26 17:12:57 +02:00
  • fc8773d309 token healing : handle more special tokens mare5x 2024-06-30 20:14:18 +02:00
  • 414fc13248 token healing : refactor to return struct mare5x 2024-06-29 13:42:00 +02:00
  • db9c018891 token healing : change dynamic rollback mare5x 2024-06-29 13:02:30 +02:00
  • 13885c747e main : add token healing mare5x 2024-06-27 16:08:24 +02:00
  • 272e3bd95e
    make : fix llava obj file race (#8946) b3559 Georgi Gerganov 2024-08-09 18:24:30 +03:00
  • 45a55b91aa
    llama : better replace_all (cont) (#8926) Georgi Gerganov 2024-08-09 18:23:52 +03:00
  • f547b52f2a Fix little mistakes Nexesenex 2024-08-09 17:09:50 +02:00
  • 23198ce844 Create a Custom Quantization Scheme (CQS) FTYPE Nexesenex 2024-08-09 16:53:23 +02:00
  • 886f44de69
    gguf-py: fix examples/reader.py tarilabs 2024-08-09 16:29:45 +02:00
  • e9aad96b0b llama : add missing lora adapters in T5 model Stanisław Szymczyk 2024-08-09 16:03:47 +02:00
  • 14b549c708
    ggml : move rope type enum to ggml.h Daniel Bevenius 2024-08-09 15:35:39 +02:00
  • bd575f01de Revert tensors quantization tree edits Nexesenex 2024-08-09 14:25:08 +02:00
  • 71a773522e
    make : fix llava obj file race Georgi Gerganov 2024-08-09 14:57:20 +03:00
  • 0596a99f09
    minor : add struct members for clarity Georgi Gerganov 2024-08-09 14:36:58 +03:00
  • 3071c0a5f2
    llava : support MiniCPM-V-2.5 (#7599) b3557 tc-mb 2024-08-09 18:33:53 +08:00
  • 069631ee2d try fix 1 caitianchi 2024-08-09 16:39:34 +08:00
  • af3cba04d5 Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. Markus Tavenrath 2024-08-06 12:17:25 +02:00
  • 4305b57c80
    sync : ggml b3556 Georgi Gerganov 2024-08-09 10:03:48 +03:00
  • 70c0ea3560
    whisper : use vulkan as gpu backend when available (whisper/2302) Matt Stephenson 2024-07-16 03:21:09 -04:00
  • 82755ed08a fix some compilation warning mingfeima 2024-08-08 23:33:32 -07:00
  • 5b2c04f492
    embedding : add --pooling option to README.md [no ci] (#8934) Daniel Bevenius 2024-08-09 08:33:30 +02:00
  • 6f6496bb09
    llama : fix typo in llama_tensor_get_type comment [no ci] (#8937) Daniel Bevenius 2024-08-09 08:32:23 +02:00
  • daef3ab233
    server : add one level list nesting for embeddings (#8936) Mathieu Geli 2024-08-09 08:32:02 +02:00
  • 1260a2d8b3
    code : deduplicate replace_all Georgi Gerganov 2024-08-09 08:59:16 +03:00
  • 345a686d82
    llama : reduce useless copies when saving session (#8916) b3551 compilade 2024-08-08 23:54:00 -04:00
  • 5a9edda7ca gguf-py : Numpy dequantization for most types Francis Couture-Harpin 2024-08-08 23:11:42 -04:00
  • 78e44f080e
    Merge 0e51cc38cb into 3a14e00366 cosmo 2024-08-08 21:43:38 -04:00
  • 0aa883b6bf readme: introduce gguf-parser thxCode 2024-08-05 21:15:00 +08:00
  • 514fb76105 readme: introduce gpustack thxCode 2024-08-05 21:09:15 +08:00
  • 6481eb4cea Merge branch 'ggerganov-master' Aliebc 2024-08-08 13:57:10 -04:00
  • 2dab5c02f5 Add YX UI for llama-server Aliebc 2024-06-15 17:50:00 +08:00
  • 1b748743c2 Add YX simple filter for llama-server Aliebc 2024-06-15 10:45:01 +08:00
  • 3a14e00366
    gguf-py : simplify support for quant types (#8838) compilade 2024-08-08 13:33:09 -04:00
  • 1118c046df
    correct mistake in conditionality for attn.k Nexes the Old 2024-08-08 18:56:20 +02:00
  • 8006b15fd1
    Avoid to shrink attn.k.weight for IQ3_XS and XXS when GQA or MOE Nexes the Old 2024-08-08 18:50:48 +02:00
  • a22bfc6b64
    llama : fix typo in llama_tensor_get_type comment Daniel Bevenius 2024-08-08 18:02:10 +02:00
  • 153f5efedc Add one level list nesting for embeddings Mathieu GELI 2024-08-08 17:24:43 +02:00
  • 60257c0c6a
    embedding : add --pooling option to README.md Daniel Bevenius 2024-08-08 15:44:23 +02:00
  • e69b098133
    Merge 40d169874d into afd27f01fe Robert Sinclair 2024-08-08 08:28:15 -04:00
  • c8aeea500f fix Xuan Son Nguyen 2024-08-08 14:21:40 +02:00
  • fde165e968 default n_swa for phi-3 Xuan Son Nguyen 2024-08-08 14:02:14 +02:00
  • afd27f01fe
    scripts : sync cann files (#0) Georgi Gerganov 2024-08-08 14:56:52 +03:00
  • 366d486c16
    scripts : fix sync filenames (#0) Georgi Gerganov 2024-08-08 14:40:12 +03:00
  • a29a2af943 Add error checking to return default value Jia Liu 2024-08-08 19:17:47 +08:00
  • e44a561ab0
    sync : ggml b3547 Georgi Gerganov 2024-08-08 13:19:47 +03:00
  • f93d49ab1e
    ggml : ignore more msvc warnings (ggml/906) Borislav Stanimirov 2024-08-07 10:00:56 +03:00
  • 5b33ea1ee7
    metal : fix struct name (ggml/912) Georgi Gerganov 2024-08-07 09:57:00 +03:00
  • 85fca8deb6
    metal : add abort callback (ggml/905) Conrad Kramer 2024-08-07 02:55:49 -04:00
  • 59952cba13
    llama : better replace_all (cont) Georgi Gerganov 2024-08-08 12:36:09 +03:00
  • 924c832461
    Added perplexity metrics for llama 3.1 with different quantization settings fedric95 2024-08-08 10:55:33 +02:00
  • ebd541a570
    make : clean llamafile objects (#8923) b3543 Pablo Duboue 2024-08-08 04:44:51 -04:00
  • 3380da4129
    Makefile was missing an object file on clean Pablo Duboue 2024-08-08 01:39:29 -07:00
  • 02ec917284 add new macros to avoid windows+mingw64 Jia Liu 2024-08-08 16:13:20 +08:00
  • 0421009d0e update README mingfeima 2024-07-24 01:17:37 -07:00
  • 70f469a7e3 update CMakeList mingfeima 2024-07-24 01:16:06 -07:00
  • 3ff0c0e16f add amx kernel for gemm mingfeima 2024-04-06 19:57:25 -07:00
  • 190898a63f
    Merge pull request #30 from wwoodsTM/test-dry-sampler l3utterfly 2024-08-08 14:09:39 +09:00
  • 2d14c81931 try fix clip caitianchi 2024-08-08 11:08:00 +08:00
  • 4c796180fd
    add model suppoerts jiahao su 2024-08-08 10:29:27 +08:00
  • c240638374 Reimplement unicode_regex_split() jaime-m-p 2024-08-08 01:35:20 +02:00
  • 7afe6df6a2 Unicode data whitespaces as ranges jaime-m-p 2024-08-07 23:14:36 +02:00
  • fc4ed23673
    correct a third party typo Nexes the Old 2024-08-07 23:09:52 +02:00
  • 80f41234e4 Update bruteforce test: fix binary search jaime-m-p 2024-08-07 23:08:04 +02:00
  • 60d11d0107
    trailing whitespaces Nexes the Old 2024-08-07 22:42:29 +02:00
  • 259c5f3a92
    correct ident and trailing whitespaces Nexes the Old 2024-08-07 22:41:05 +02:00
  • 867e3523f9
    trailing whitespace Nexes the Old 2024-08-07 22:39:39 +02:00
  • 28a41e7bdd
    Merge branch 'master' into lcpp_pr_specific_quants Nexes the Old 2024-08-07 22:13:55 +02:00
  • 4a95bd5d7d Quantize: specify each major tensor quant in CLI for common LLMs Nexesenex 2024-08-07 22:08:46 +02:00
  • 9329953a61 llama : avoid double tensor copy when saving session to buffer compilade/faster-session-sizes Francis Couture-Harpin 2024-08-07 16:03:17 -04:00
  • dca7ad8627 llama : avoid useless copies in dummy session writer Francis Couture-Harpin 2024-08-07 15:42:11 -04:00
  • 96b3d411e0 ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot Francis Couture-Harpin 2024-08-07 15:04:13 -04:00
  • 47001c439c README: add llama.sh to the available UIs Michael Coppola 2024-08-07 12:29:50 -04:00
  • 15fa07a5c5
    make : use C compiler to build metal embed object (#8899) b3542 slaren 2024-08-07 18:24:05 +02:00
  • 7764ab911d update guide update_sycl_doc Neo Zhang 2024-08-07 22:01:02 +08:00
  • 616f3ea532 fix ubuntu-make error caitianchi 2024-08-07 21:50:29 +08:00
  • cdf3a251fa Add loop unrolled 4xN and MX4 dimension GEMM functions with parallel delta multiplication Srihari-mcw 2024-08-07 06:15:09 -07:00
  • 8de399a3be use rm + rmdir to avoid -r flag in rm slaren 2024-08-07 15:14:25 +02:00
  • 712fd7cd89 fix makefile error caitianchi 2024-08-07 20:30:49 +08:00
  • 0eb0bfaa91 fix Type-Check error caitianchi 2024-08-07 20:16:06 +08:00
  • e3eff2aea6 fix Type-Check error caitianchi 2024-08-07 20:11:50 +08:00
  • 28230d0b13 fix Type-Check error caitianchi 2024-08-07 20:07:27 +08:00
  • 5ab95776f7 fix Type-Check error caitianchi 2024-08-07 19:50:10 +08:00
  • be55695eff
    ggml-backend : fix async copy from CPU (#8897) b3541 slaren 2024-08-07 13:29:02 +02:00