Commit graph

  • fd905e8be1 tests : fix printfs (#8068) Georgi Gerganov 2024-07-25 18:57:44 +03:00
  • ca87ca98a7 ggml : add and use ggml_cpu_has_llamafile() (#8664) Georgi Gerganov 2024-07-25 12:37:42 +03:00
  • 43d92892c3 examples : remove finetune and train-text-from-scratch (#8669) Xuan Son Nguyen 2024-07-25 10:39:04 +02:00
  • 5a5d5d28f8 docs : Quantum -> Quantized (#8666) Ujjawal Panchal 2024-07-25 13:43:27 +05:30
  • fc5b21bf10 llama: use sliding window for phi3 (#8627) Fan Shupei 2024-07-25 15:21:09 +08:00
  • 65e54b5db4 readme : update games list (#8673) MorganRO8 2024-07-24 12:48:00 -04:00
  • 0aeae29190 Build Llama SYCL Intel with static libs (#8668) Joe Todd 2024-07-24 14:36:00 +01:00
  • 791e3e0b27 readme : update UI list [no ci] (#8505) Thorsten Sommer 2024-07-24 14:52:30 +02:00
  • dc7836c79e llama : fix llama_chat_format_single for mistral (#8657) Xuan Son Nguyen 2024-07-24 13:48:46 +02:00
  • 146da8b6c6 Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667) Joe Todd 2024-07-24 11:55:26 +01:00
  • a5f1f44b6a add llama_lora_adapter_clear (#8653) Xuan Son Nguyen 2024-07-24 11:25:19 +02:00
  • a17bcdfd4c examples : Fix llama-export-lora example (#8607) Xuan Son Nguyen 2024-07-23 23:48:37 +02:00
  • b7e8bada5e server : fix URL.parse in the UI (#8646) Vali Malinoiu 2024-07-23 17:37:42 +03:00
  • e2d7ec46fc sycl : Add support for non-release DPC++ & oneMKL (#8644) Joe Todd 2024-07-23 14:58:37 +01:00
  • 96d14f4b58 llama : move vocab, grammar and sampling into separate files (#8508) Georgi Gerganov 2024-07-23 13:10:17 +03:00
  • 4b93675489 Vulkan IQ4_NL Support (#8613) 0cc4m 2024-07-23 10:56:49 +02:00
  • f54ffa8f04 Allow all RDNA2 archs to use sdot4 intrinsic (#8629) Jeroen Mostert 2024-07-23 10:50:40 +02:00
  • 6676362327 contrib : clarify PR squashing + module names (#8630) Georgi Gerganov 2024-07-23 11:28:38 +03:00
  • a9c03e4827 [SYCL] fix scratch size of softmax (#8642) luoyu-intel 2024-07-23 07:43:28 +00:00
  • 549e1c7e41 llama : fix codeshell support (#8599) Keke Han 2024-07-23 00:43:43 +08:00
  • c70ddd889f llama : add support for SmolLm pre-tokenizer (#8609) Jason Stillerman 2024-07-22 10:43:01 -04:00
  • b4c8a9c8a6 *.py: Stylistic adjustments for python (#8233) Jiří Podivín 2024-07-22 15:44:53 +02:00
  • 525b48a49c llama : allow overrides for tokenizer flags (#8614) Georgi Gerganov 2024-07-22 13:33:22 +03:00
  • 52f90c6d3c tests : re-enable tokenizer tests (#8611) Georgi Gerganov 2024-07-22 13:32:49 +03:00
  • af37c4bd7f llama : add Mistral Nemo inference support (#8604) Douglas Hanley 2024-07-22 03:06:17 -05:00
  • 9ffb78d54f server : update doc to clarify n_keep when there is bos token (#8619) Jan Boon 2024-07-22 16:02:09 +08:00
  • 6220d93595 ggml: fix compile error for RISC-V (#8623) Mark Zhuang 2024-07-22 15:56:45 +08:00
  • 6f19b8c09d examples: fix android example cannot be generated continuously (#8621) devojony 2024-07-22 14:54:42 +08:00
  • 24404ef9f3 flake.lock: Update (#8610) Georgi Gerganov 2024-07-21 16:45:10 +03:00
  • 52a7238985 examples : Rewrite pydantic_models_to_grammar_examples.py (#8493) M-A 2024-07-20 22:09:17 -04:00
  • 82478f1934 gguf-py : fix some metadata name extraction edge cases (#8591) compilade 2024-07-20 21:58:49 -04:00
  • 264c2830d8 convert_hf : fix Gemma v1 conversion (#8597) compilade 2024-07-20 21:53:01 -04:00
  • 6887f5f02a CUDA: MMQ code deduplication + iquant support (#8495) Johannes Gäßler 2024-07-20 22:25:26 +02:00
  • c3ca2aa58e gguf : handle null name during init (#8587) Georgi Gerganov 2024-07-20 17:15:42 +03:00
  • 4b895a57cf llama : add support for Tekken pre-tokenizer (#8579) Michael Coppola 2024-07-20 09:43:51 -04:00
  • 8b6f28ab31 llama.swiftui: fix end of generation bug (#8268) Huifeng Ou 2024-07-20 09:09:37 -04:00
  • 3a4d206f7b gguf_dump.py: fix markddown kv array print (#8588) Brian 2024-07-20 17:35:25 +10:00
  • 384b0cd40e ggml : fix quant dot product with odd number of blocks (#8549) slaren 2024-07-19 17:17:27 +02:00
  • c70cd4fbd1 convert-*.py: remove add_name from ChatGLMModel class (#8590) Brian 2024-07-20 00:04:38 +10:00
  • af831106a4 llama : bump max layers from 256 to 512 (#8530) Georgi Gerganov 2024-07-19 16:50:47 +03:00
  • 093ee371b0 readme : fix server badge Georgi Gerganov 2024-07-19 14:34:55 +03:00
  • 0ed406ebd2 ggml : add friendlier error message to fopen errors (#8575) Clint Herron 2024-07-19 07:05:45 -04:00
  • e5cf8b43ec fix: typo of chatglm4 chat tmpl (#8586) Frank Mai 2024-07-19 17:44:41 +08:00
  • fa91d869cf convert-*.py: add general.name kv override (#8571) Brian 2024-07-19 17:51:51 +10:00
  • a2266e0742 CUDA: fix partial offloading for ne0 % 256 != 0 (#8572) Johannes Gäßler 2024-07-18 23:48:47 +02:00
  • a65ae90d01 cmake : install all ggml public headers (#8480) 65a 2024-07-18 07:47:12 -07:00
  • 72e72d866e server: use relative routes for static files in new UI (#8552) Eric Zhang 2024-07-18 18:43:49 +08:00
  • f8b9db4941 convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) Brian 2024-07-18 20:40:15 +10:00
  • b6abb66029 server : respect --special cli arg (#8553) RunningLeon 2024-07-18 16:06:22 +08:00
  • 8158c7d3aa lookup: fibonacci hashing, fix crashes (#8548) Johannes Gäßler 2024-07-17 23:35:44 +02:00
  • 6fee9c279f build : Fix docker build warnings (#8535) (#8537) Al Mochkin 2024-07-17 20:21:55 +02:00
  • 0fefb804a2 CONTRIBUTING.md : remove mention of noci (#8541) Brian 2024-07-18 00:57:06 +10:00
  • ab3c069d5d [CANN] Add Ascend NPU backend (#6035) hipudding 2024-07-17 19:23:50 +08:00
  • 8dad318d6f batched: fix n_predict parameter (#8527) Masaya, Kato 2024-07-17 16:34:28 +09:00
  • 050ba94e38 llama : disable context-shift for DeepSeek v2 (#8501) Georgi Gerganov 2024-07-17 10:32:59 +03:00
  • f48ea9f0d2 make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) Johannes Gäßler 2024-07-16 21:20:59 +02:00
  • a6c80d929f gguf-hash : update clib.json to point to original xxhash repo (#8491) Brian 2024-07-16 17:14:16 +10:00
  • 1582cd98bd export-lora : handle help argument (#8497) Steve Bonds 2024-07-16 00:04:45 -07:00
  • 8712a2b112 llama : valign + remove unused ftype (#8502) Georgi Gerganov 2024-07-16 10:00:30 +03:00
  • 5c4bf32bd9 convert_hf : faster lazy safetensors (#8482) compilade 2024-07-15 23:13:10 -04:00
  • cbeb27324f Refactor lora adapter support (#8332) Xuan Son Nguyen 2024-07-15 20:50:47 +02:00
  • 38903b558d fix ci (#8494) Xuan Son Nguyen 2024-07-15 19:23:10 +02:00
  • e128332d8e ggml : suppress unknown pragma 'GCC' on windows (#8460) Daniel Bevenius 2024-07-15 14:48:17 +02:00
  • bc4f1c40be server: update README.md with llama-server --help output [no ci] (#8472) M-A 2024-07-15 08:04:56 -04:00
  • d45222a6ff common : add --no-cont-batching arg (#6358) Georgi Gerganov 2024-07-15 14:54:58 +03:00
  • 57da8d8305 docs: fix links in development docs [no ci] (#8481) NikolaiLyssogor 2024-07-15 04:46:39 -07:00
  • 88db0dc6be [SYCL] add concat through dim 1/2 (#8483) Meng, Hengyu 2024-07-15 19:32:15 +08:00
  • cd40dda983 llama : de-duplicate deepseek2 norm Georgi Gerganov 2024-07-15 14:10:39 +03:00
  • 2d8a56d365 Vulkan MMQ Fix (#8479) 0cc4m 2024-07-15 09:38:52 +02:00
  • 09cc6841fb pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
  • ee658a34b5 flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
  • ee6b97b9af llama : fix Gemma-2 Query scaling factors (#8473) Georgi Gerganov 2024-07-14 14:05:09 +03:00
  • 0d34939b4f gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
  • 4e3d43f66b llama : fix pre-tokenization of non-special added tokens (#8228) compilade 2024-07-13 23:35:10 -04:00
  • 08bd5616c1 vulkan : cmake integration (#8119) bandoti 2024-07-13 13:12:39 -03:00
  • 2aa671745c metal : template-ify some of the kernels (#8447) Georgi Gerganov 2024-07-13 18:32:33 +03:00
  • 742a95a2c7
    ggml : add missing semicolon (#0) Georgi Gerganov 2024-07-27 15:57:09 +03:00
  • e667f09f8b
    sync : ggml Georgi Gerganov 2024-07-27 15:53:48 +03:00
  • 00ad8646ad
    ggml : loop tiling optimizations for scalar path (ggml/898) Mahesh Madhav 2024-07-25 00:54:08 -07:00
  • 2419633902
    ggml: add support for float16 input tensors in pooling operations (ggml/895) Ivan Filipov 2024-07-22 14:32:02 +03:00
  • bb2c02f0a6
    vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893) Tony Wasserka 2024-07-20 20:49:44 +02:00
  • 2dac8033af
    cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885) Borislav Stanimirov 2024-07-12 17:24:20 +03:00
  • a0249b269d
    ggml : remove unnecessary UNUSED macro call (ggml/880) Daniel Bevenius 2024-07-08 12:03:42 +02:00
  • b5e95468b1
    llama : add support for llama 3.1 rope scaling factors (#8676) b3472 Jeffrey Morgan 2024-07-27 05:03:45 -07:00
  • 92090eca21
    llama : add function for model-based max number of graph nodes (#8622) b3471 Georgi Gerganov 2024-07-27 14:59:29 +03:00
  • 9d03d085dd
    common : add --no-warmup option for main/llama-cli (#8712) b3470 Daniel Bevenius 2024-07-27 12:45:02 +02:00
  • 2d74714535
    llama : disable 405B max_nodes path due to lack of complaints Georgi Gerganov 2024-07-27 13:32:44 +03:00
  • 48856e1251 update comment katsu560 2024-07-27 18:57:47 +09:00
  • 2bad597a5a rename to gguf_add_file.py katsu560 2024-07-27 18:54:45 +09:00
  • bfb4c74981
    cann: Fix Multi-NPU execution error (#8710) b3469 wangshuai09 2024-07-27 16:36:44 +08:00
  • 658041d107
    Update convert_hf_to_gguf.py Jeffrey Morgan 2024-07-27 00:41:30 -07:00
  • e6d5bed7d3
    Update src/llama.cpp Jeffrey Morgan 2024-07-27 00:39:13 -07:00
  • aca0af79cf cann: update comment for ggml_backend_cann_supports_buft wangshuai09 2024-07-27 06:52:40 +00:00
  • 937a12c1bc cann: fix multi-npu exec error wangshuai09 2024-07-26 09:28:17 +00:00
  • e33b5c9837 refactoring: print the name of unsupport op hongruichen 2024-07-27 13:49:49 +08:00
  • 8ab1f15fe3 refactoring: remove internal functions, use op table directly hongruichen 2024-07-27 13:43:07 +08:00
  • e0c9b34016 feat: check if dims equal for add hongruichen 2024-07-27 13:31:57 +08:00
  • 5da73f8085 refactoring: move forward and supports_op into ops file hongruichen 2024-07-27 12:52:59 +08:00
  • 867c91bfaf feat: add error string for QnnOpPackage_Error_t hongruichen 2024-07-27 11:56:21 +08:00
  • ccfec70106 refactoring: remove unused get_rpcmem_from_memhandle func hongruichen 2024-07-27 11:22:29 +08:00