Commit graph

  • 2b5f49a078 add exaone model support Minsoo Cheong 2024-08-14 14:59:57 +09:00
  • 2a24c8caa6
    Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922) b3592 Yoshi Suhara 2024-08-15 19:23:33 -07:00
  • e3f6fd56b1
    ggml : dynamic ggml_sched_max_splits based on graph_size (#9047) b3591 Nico Bosshard 2024-08-16 04:22:55 +02:00
  • fe40950b85 Fixed and readded debug code for causes Nico Bosshard 2024-08-16 02:07:06 +00:00
  • b8c85df705 Fix inference example lacks required parameters Aisuko 2024-08-16 10:43:20 +10:00
  • f84cf24925
    Merge branch 'master' into vulkan Changyeon Kim 2024-08-16 07:29:17 +09:00
  • 4929de0415 gguf-py : bump version from 0.9.1 to 0.10.0 Francis Couture-Harpin 2024-08-15 18:23:38 -04:00
  • 2793b863bb
    Merge branch 'master' into const-ref-pair Herman Semenov 2024-08-15 21:48:17 +00:00
  • d86f10ad5f ggml : Dynamic ggml_sched_max_splits based on graph_size Nico Bosshard 2024-08-15 18:58:57 +00:00
  • efeccedaf6 llama : suppress conversion from 'size_t' to 'int' Daniel Bevenius 2024-08-15 18:53:04 +01:00
  • 12ab18bba0 [fix] Add missing parameters. Changyeon Kim 2024-08-15 22:10:41 +09:00
  • c5657d574a rpc : prevent crashes on invalid input Radoslav Gerganov 2024-08-15 14:05:19 +03:00
  • ef243b2e7d rpc : print error message when failed to connect endpoint Radoslav Gerganov 2024-08-15 14:50:00 +03:00
  • 4b9afbbe90
    retrieval : fix memory leak in retrieval query handling (#8955) b3590 gtygo 2024-08-15 15:40:12 +08:00
  • 37501d9c79
    server : fix duplicated n_predict key in the generation_settings (#8994) b3589 Riceball LEE 2024-08-15 15:28:05 +08:00
  • 4af8420afb
    common : remove duplicate function llama_should_add_bos_token (#8778) b3588 Zhenwei Jin 2024-08-15 15:23:23 +08:00
  • 6bda7ce6c3
    llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (#8850) b3587 Esko Toivonen 2024-08-15 10:17:12 +03:00
  • d5492f0525
    ci : disable bench workflow (#9010) Georgi Gerganov 2024-08-15 10:11:11 +03:00
  • 4adb77f7bc
    Merge branch 'ggerganov:master' into master Christopher 2024-08-15 14:28:08 +08:00
  • 234b30676a
    server : init stop and error fields of the result struct (#9026) b3585 Jiří Podivín 2024-08-15 08:21:57 +02:00
  • 7d261a9f96 add warning code when quantizing to Q4_0, Q4_1, Q5_0, or Q5_1 chentyjpm 2024-08-15 14:20:07 +08:00
  • 702e1995a1 Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-08-14 20:46:28 -04:00
  • eeccd31a9c Merge branch 'master' into pr/8836 Nexesenex 2024-08-15 02:30:10 +02:00
  • 72ebbfc653
    Merge branch 'ggerganov:master' into hk Henry Kroll III 2024-08-14 14:25:55 -08:00
  • 5fd89a70ea
    Vulkan Optimizations and Fixes (#8959) b3584 0cc4m 2024-08-14 18:32:53 +02:00
  • 12d214f4d8 Remove trailing whitespaces 0cc4m 2024-08-14 16:21:52 +02:00
  • 81fca39112 Removing flake8-no-print plugin, due to dependency conflict Jiri Podivin 2024-08-14 14:14:02 +02:00
  • 252443914e Setting stop and error fields of the result struct Jiri Podivin 2024-08-14 13:55:21 +02:00
  • 62d7b6c87f
    cuda : re-add q4_0 gg/hf-test Georgi Gerganov 2024-08-14 13:37:03 +03:00
  • 38f4863a24 Abstract into GGML Alan Gray 2024-08-14 01:38:00 -07:00
  • d38a928751 trigger ci nopperl 2024-08-14 10:24:04 +02:00
  • 503983a69a
    cuda : build only necessary templates Georgi Gerganov 2024-08-14 10:29:23 +03:00
  • ae41fd2e65
    make : force CPU extensions [no ci] Georgi Gerganov 2024-08-13 16:59:12 +03:00
  • 98a532d474
    server : fix segfault on long system prompt (#8987) b3583 compilade 2024-08-14 02:51:02 -04:00
  • 57b79fda88 add hf2gguf conv format of q4_0 q4_1 q5_0 q5_1 chentyjpm 2024-08-14 14:34:47 +08:00
  • 440b00ab94 Merge branch 'master' into chameleon nopperl 2024-08-14 08:15:04 +02:00
  • 43bdd3ce18
    cmake : remove unused option GGML_CURL (#9011) b3582 Georgi Gerganov 2024-08-14 09:14:49 +03:00
  • 0645adc8b3 Cover corner case for role_scaling not in config.json Yoshi Suhara 2024-08-13 21:25:43 -07:00
  • 93ec58b932 server : fix typo in comment compilade/fix-server-long-system-prompt Francis Couture-Harpin 2024-08-13 22:12:26 -04:00
  • af2f84c964 Merge branch 'master' into compilade/fix-server-long-system-prompt Francis Couture-Harpin 2024-08-13 22:06:11 -04:00
  • c1b738ef43 server : fix parallel generation with very small batch sizes Francis Couture-Harpin 2024-08-13 22:03:57 -04:00
  • 35cc5567c8 ggml-quants : deduplicate TQ1_0 and TQ2_0 __ARM_FEATURE_DOTPROD support Francis Couture-Harpin 2024-08-13 18:00:06 -04:00
  • e4bb91b02e Use for bias tensors Yoshi Suhara 2024-08-13 14:45:03 -07:00
  • 82b240406d Merge branch 'master' into compilade/bitnet-ternary Francis Couture-Harpin 2024-08-13 17:36:09 -04:00
  • 69f772682e ggml-quants : allow using ARM dot product instructions for TQ1_0 Francis Couture-Harpin 2024-08-13 17:21:19 -04:00
  • 895004f3f8 convert : allow direct conversion to TQ1_0 and TQ2_0 Francis Couture-Harpin 2024-08-13 17:17:43 -04:00
  • db78320b4d Fix compiler complaints jaime-m-p 2024-08-13 21:19:18 +02:00
  • 06943a69f6
    ggml : move rope type enum to ggml.h (#8949) b3581 Daniel Bevenius 2024-08-13 21:13:15 +02:00
  • 368eea3a5b
    remove extra line slaren 2024-08-13 21:05:58 +02:00
  • b67c81d1fa Fix previous commit jaime-m-p 2024-08-13 20:25:45 +02:00
  • dcac74792b Using 32bit wchar_t by default, uint32_t on Windows jaime-m-p 2024-08-13 19:58:36 +02:00
  • 50e1b1e36d Remove unused function jaime-m-p 2024-08-13 19:55:12 +02:00
  • 7ff916eae8 Original regex for 'tekken' jaime-m-p 2024-08-13 17:39:41 +02:00
  • 5a93d2ec50 Reimplement unicode_regex_split(): jaime-m-p 2024-08-13 17:38:46 +02:00
  • b565148cb4 Update codepoint_categ: jaime-m-p 2024-08-13 16:42:33 +02:00
  • 312c4322cc Remove invalid assert jaime-m-p 2024-08-13 16:30:30 +02:00
  • 828d6ff7d7
    export-lora : throw error if lora is quantized (#9002) b3580 Xuan Son Nguyen 2024-08-13 11:41:14 +02:00
  • cc2b62fef2 src: remove duplicate function llama_should_add_bos_token zhenweijin 2024-07-31 00:18:37 +08:00
  • 3b23ea74e2 Use ring buffer to store prev in sampling zhenweijin 2024-08-06 18:01:51 +08:00
  • 1ca3f06a54 fix type-check caitianchi 2024-08-13 09:52:52 +08:00
  • 33a5c8e37c llama : prepare next graph while the current one is being evaluated slaren 2024-08-13 02:39:52 +02:00
  • bd76198618 Remove mutable variable Yoshi Suhara 2024-08-10 22:28:16 -07:00
  • ae86b5e3d9 Replace ggml_mul_mat()->llm_build_lora_mm() Yoshi Suhara 2024-08-10 22:17:55 -07:00
  • 6f369f3ffa Address comments by @compilade Yoshi Suhara 2024-08-09 02:11:43 -07:00
  • 092382fee3 Update src/llama.cpp Yoshi Suhara 2024-08-09 02:06:04 -07:00
  • b841554d0c Update convert_hf_to_gguf.py Yoshi Suhara 2024-08-09 02:05:50 -07:00
  • 147cdf641a Remove unnecessary write_tensors() Yoshi Suhara 2024-08-08 09:38:27 -07:00
  • 45e9d164ac Fix formatting issues Yoshi Suhara 2024-08-08 09:18:59 -07:00
  • aa2f4a79fe Add nemotron GGUF conversion & inference support Yoshi Suhara 2024-08-07 23:46:12 -07:00
  • 8c9017bfbe Simplify IQ4_XSR Nexesenex 2024-08-12 22:20:02 +02:00
  • 8c10533409 Merge branch 'master' into pr/8836 Nexesenex 2024-08-12 20:28:38 +02:00
  • cd92ba612f IQ4_XSR (test FTYPE) and attention_wv logic for all attn_*.weights Nexesenex 2024-08-12 19:45:46 +02:00
  • f98191b01e
    Merge 201559d177 into fc4ca27b25 Joan Fontanals 2024-08-13 01:46:26 +08:00
  • 48607c7a77
    cont : fix save-load-state RNG seeding Georgi Gerganov 2024-08-12 19:44:44 +03:00
  • efd85365a0
    cmake : remove unused option Georgi Gerganov 2024-08-12 19:39:14 +03:00
  • 65f460dbe3
    ci : disable bench workflow Georgi Gerganov 2024-08-12 19:29:48 +03:00
  • fc4ca27b25
    ci : fix github workflow vulnerable to script injection (#9008) b3579 Diogo Teles Sant'Anna 2024-08-12 13:28:23 -03:00
  • 6174762877
    cont : store params in llama_sampling implementation Georgi Gerganov 2024-08-12 19:24:12 +03:00
  • 1f67436c5e
    ci : enable RPC in all of the released builds (#9006) b3578 Radoslav Gerganov 2024-08-12 19:17:03 +03:00
  • a3d48e448a Simplify and improve CUDA graphs through use of indirect copy pointers Alan Gray 2024-08-01 03:35:05 -07:00
  • 71a16147e3
    fix: github workflow vulnerable to script injection Diogo Teles Sant'Anna 2024-08-08 18:06:28 +00:00
  • 9287fb6ad9 Fix a spelling mistake Liu Jia 2024-08-12 23:44:20 +08:00
  • 5d755b8e5c
    Merge 346f64f0d8 into 0fd93cdef5 David Heidelberg 2024-08-12 17:17:36 +02:00
  • 0fd93cdef5
    llama : model-based max number of graph nodes calculation (#8970) b3577 Nico Bosshard 2024-08-12 17:13:59 +02:00
  • d352b01af9
    cont : llama_sampling_init() use llama_sampling_params Georgi Gerganov 2024-08-12 17:14:56 +03:00
  • 714007088e ci : enable RPC in all of the released builds Radoslav Gerganov 2024-08-12 16:19:56 +03:00
  • e6e018dafe Fix-up the missing initial parameter to resolve the compilation warning. Changyeon Kim 2024-08-12 22:19:10 +09:00
  • 47eb0a5530 fix num in convert caitianchi 2024-08-12 21:16:01 +08:00
  • f30c5e1123 fix convert caitianchi 2024-08-12 21:14:56 +08:00
  • 1123376309 fix convert script and readme caitianchi 2024-08-12 21:11:25 +08:00
  • ef693e99d7 Add the new data types across files Srihari-mcw 2024-08-12 05:58:11 -07:00
  • f7ce132258 Add changes to fix compiler issues Srihari-mcw 2024-07-14 23:19:28 -07:00
  • db6657eeaf Fic more conflicts in quantize.cpp Srihari-mcw 2024-07-30 09:46:22 -07:00
  • 5a6a235ac7 Fix build issues in sgemm.cpp post rebase Srihari-mcw 2024-07-07 19:29:04 -07:00
  • c480818d97 Fix issues with SSE3 version for vec_dot_q4_0_b16_q8_0_b16 Srihari-mcw 2024-05-24 10:33:54 -07:00
  • 9e5174ce5d Remove additional ifdef conditions Srihari-mcw 2024-05-24 07:23:03 -07:00
  • 983b03ab6a Add additional comments Srihari-mcw 2024-05-23 08:49:41 -07:00
  • e26fd70dce Introduce Q4_0 and Q8_0 quantizations with BF16 delta values Srihari-mcw 2024-08-12 05:54:21 -07:00
  • c2e2cb99a6
    Update src/llama.cpp slaren 2024-08-12 14:51:07 +02:00
  • 84eb2f4fad
    docs: introduce gpustack and gguf-parser (#8873) b3576 Frank Mai 2024-08-12 20:45:50 +08:00