Commit graph

  • bd2d24aa0d halfway Yutong Dai 2024-07-24 07:12:58 +00:00
  • 19d6ad9db7 [conver_hf_to_gguf.py] add phi3 sliding window Shupei Fan 2024-07-24 15:08:01 +08:00
  • 013a08f725 clean up and simplify XLMRoberta conversion Douglas Hanley 2024-07-24 01:50:16 -05:00
  • f3db6d7553 fix format Meng, Hengyu 2024-07-24 05:38:30 +00:00
  • 22c72c5a0f fix intel mkl Chen Xi 2024-07-24 04:05:08 +00:00
  • e4b86a1295 fix the perf issue of multi-device Chen Xi 2024-07-23 06:50:13 +00:00
  • 433f7aa287 add xgen-mm surgery Yutong Dai 2024-07-23 23:21:23 +00:00
  • de280085e7
    examples : Fix llama-export-lora example (#8607) b3449 Xuan Son Nguyen 2024-07-23 23:48:37 +02:00
  • 2538411955 fix typo ngxson 2024-07-23 21:48:11 +02:00
  • f70331ebd2 fix llama_chat_format_single for mistral ngxson 2024-07-23 21:43:07 +02:00
  • 78e0d8d5c7 typo ngxson 2024-07-23 20:58:59 +02:00
  • 4bd55ec02c better check ngxson 2024-07-23 20:48:44 +02:00
  • 1c7996a17c add conversion for bge-m3; small fix in unigram tokenizer Douglas Hanley 2024-07-08 13:31:23 -05:00
  • 0ec2b5818a reject merging subset ngxson 2024-07-23 20:13:01 +02:00
  • 1f965737f0 add more logging ngxson 2024-07-23 20:01:23 +02:00
  • b41cbd08f3 add llama_lora_adapter_clear ngxson 2024-07-23 19:39:06 +02:00
  • 9c0a61f8c3 Merge branch 'master' into compilade/batch-splits Francis Couture-Harpin 2024-07-23 13:37:09 -04:00
  • b6c9b539c9 Updated Swift and Android bindings to use the new llama_sampling_* refactor from #8643 HanClinto 2024-07-23 12:57:09 -04:00
  • b841d07408
    server : fix URL.parse in the UI (#8646) Vali Malinoiu 2024-07-23 17:37:42 +03:00
  • dbf85440c7
    llama : use struct llama_sampling in the sampling API Georgi Gerganov 2024-07-23 17:35:28 +03:00
  • 8c9784c65d lookup: single sequence -> tree of sequences Johannes Gäßler 2024-07-20 13:38:59 +02:00
  • 64cf50a0ed
    sycl : Add support for non-release DPC++ & oneMKL (#8644) b3447 Joe Todd 2024-07-23 14:58:37 +01:00
  • 076548d0a3 fix llama-server ui: Replace URL.parse Method in llama-server ui Vali Malinoiu 2024-07-23 15:24:49 +03:00
  • c0430176e8 Fix trailing whitespace Joe Todd 2024-07-23 13:14:05 +01:00
  • 92d66cd8ca Only get MKL library when needed Joe Todd 2024-07-23 12:58:46 +01:00
  • 60d47894f3 convert-*.py: add tensor hash general.hash.sha256 to kv store brian khuu 2024-07-23 21:51:01 +10:00
  • f2bcdb3806 Remove unused find_package(oneMKL) Joe Todd 2024-07-23 12:43:59 +01:00
  • f4a9303e49 MUSA adds support for __vsubss4 Xiaodong Ye 2024-07-23 19:37:38 +08:00
  • f866cb9342
    llama : move sampling rngs from common to llama Georgi Gerganov 2024-07-23 10:23:44 +03:00
  • 938943cdbf
    llama : move vocab, grammar and sampling into separate files (#8508) Georgi Gerganov 2024-07-23 13:10:17 +03:00
  • 8515cfa822
    Merge 3db5058dd3 into 751fcfc6c3 jaime-m-p 2024-07-23 11:55:29 +02:00
  • 751fcfc6c3
    Vulkan IQ4_NL Support (#8613) b3445 0cc4m 2024-07-23 10:56:49 +02:00
  • 46e47417aa
    Allow all RDNA2 archs to use sdot4 intrinsic (#8629) Jeroen Mostert 2024-07-23 10:50:40 +02:00
  • 14dbd9273f
    Update docs/build.md Andreas (Andi) Kunar 2024-07-23 10:43:09 +02:00
  • e7e6487ba0
    contrib : clarify PR squashing + module names (#8630) Georgi Gerganov 2024-07-23 11:28:38 +03:00
  • 063d99ad11
    [SYCL] fix scratch size of softmax (#8642) b3442 luoyu-intel 2024-07-23 07:43:28 +00:00
  • 6fd0937e9f remove the extern "C", MINICPMV_API caitianchi 2024-07-23 15:25:32 +08:00
  • fcde997126 remove load_image_size into clip_ctx caitianchi 2024-07-23 15:24:43 +08:00
  • 3642be9937 fix KEY_HAS_MINICPMV_PROJ caitianchi 2024-07-23 14:55:55 +08:00
  • fe28a7b9d8
    llama : clean-up gg/llama-reorganize Georgi Gerganov 2024-07-23 08:38:50 +03:00
  • 4a4f074aa1 fix scratch size of softmax luoyu-intel 2024-07-23 13:33:28 +08:00
  • dad4abe1bc add warn caitianchi 2024-07-23 11:57:42 +08:00
  • 62fa15bcd2 fix cmakefile caitianchi 2024-07-23 11:52:34 +08:00
  • 60e36747ca XLMRoberta support Oliver Ye 2024-07-22 17:26:11 -07:00
  • 0e2a0d4d09 add alias for lora adaptors zhhan 2024-07-22 16:15:46 -07:00
  • 8ea2402195 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into hk themanyone 2024-07-22 13:56:52 -08:00
  • d8f782acc3 format batch image output according to --template themanyone 2024-07-21 16:35:31 -08:00
  • 7e492b3e0e Python (Pre-compiled CFFI module for CPU and CUDA) Marko Tasic 2024-07-22 22:52:37 +02:00
  • b4e3de6b17
    fix typo, "data_swa" -> "data" Fan Shupei 2024-07-23 01:09:32 +08:00
  • dae3cae841
    llama : suffix the internal APIs with "_impl" Georgi Gerganov 2024-07-22 19:59:00 +03:00
  • 39fbaf9f50
    llama : redirect external API to internal APIs Georgi Gerganov 2024-07-19 16:56:20 +03:00
  • 66ac80f5b9
    make : update llama.cpp deps [no ci] Georgi Gerganov 2024-07-19 16:25:53 +03:00
  • 8fef5b1897
    llama : move tokenizers into llama-vocab Georgi Gerganov 2024-07-19 15:44:30 +03:00
  • e7dffa6bc7
    llama : deprecate llama_sample_grammar Georgi Gerganov 2024-07-19 14:43:56 +03:00
  • 689d377916
    cont Georgi Gerganov 2024-07-19 14:21:33 +03:00
  • b4b242e6bd
    cont : pre-fetch rules Georgi Gerganov 2024-07-16 23:12:29 +03:00
  • 5a71d1aefd
    cont Georgi Gerganov 2024-07-16 23:01:20 +03:00
  • 675f305f31
    llama : move grammar code into llama-grammar Georgi Gerganov 2024-07-16 16:09:08 +03:00
  • 0ddc8e361c
    llama : move sampling code into llama-sampling Georgi Gerganov 2024-07-19 18:15:36 +03:00
  • 081fe431aa
    llama : fix codeshell support (#8599) b3441 Keke Han 2024-07-23 00:43:43 +08:00
  • a2ae810a38 llama : move codeshell after smollm below to respect the enum order hankeke303 2024-07-23 00:40:21 +08:00
  • 706793f078 fix: back to qnn tensor v1 to fix the create tensor error hongruichen 2024-07-22 21:34:33 +08:00
  • 3b47056c97 refactoring: change the tensor binding mode between qnn tensor and ggml tensor hongruichen 2024-07-22 12:45:26 +08:00
  • b2d898059a Remove unneeded include Joe Todd 2024-07-22 16:02:18 +01:00
  • f15ea2c928
    Merge branch 'master' into merge-to-upstream-v2 Keke Han 2024-07-22 22:58:54 +08:00
  • d94c6e0ccb
    llama : add support for SmolLm pre-tokenizer (#8609) b3440 Jason Stillerman 2024-07-22 10:43:01 -04:00
  • 4c755832fe remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir caitianchi 2024-07-22 21:44:56 +08:00
  • 566daa5a5b
    *.py: Stylistic adjustments for python (#8233) Jiří Podivín 2024-07-22 15:44:53 +02:00
  • be8b5b2f8d fix code review caitianchi 2024-07-22 21:34:21 +08:00
  • 98ea5e704c fix lint nopperl 2024-07-22 15:17:40 +02:00
  • fa99dc27c9 Merge branch 'master' into chameleon nopperl 2024-07-22 15:14:13 +02:00
  • 562a0c9612
    contrib : fix typo + add list of modules Georgi Gerganov 2024-07-22 16:12:49 +03:00
  • 71527c9a8d
    contrib : clarify PR squashing Georgi Gerganov 2024-07-22 14:36:12 +03:00
  • 237b818354 Update cmake to support nvidia hardware & open-source compiler Joe Todd 2024-07-22 13:58:43 +01:00
  • 5cefb2b286 Stylistic adjustments for python Jiri Podivin 2024-07-01 11:53:47 +02:00
  • 0ee896d149
    fix punctuation regex in chameleon pre-tokenizer (@compilade) nopperl 2024-07-22 11:47:35 +00:00
  • 1e1e78a324
    Update src/llama.cpp nopperl 2024-07-22 11:46:18 +00:00
  • 05f138551f add comment regarding special token regex in chameleon pre-tokenizer nopperl 2024-07-22 13:44:24 +02:00
  • 6e0ded3637 move swin_norm in gguf writer nopperl 2024-07-22 13:22:21 +02:00
  • 6f11a83e4e
    llama : allow overrides for tokenizer flags (#8614) b3438 Georgi Gerganov 2024-07-22 13:33:22 +03:00
  • e093dd2382
    tests : re-enable tokenizer tests (#8611) b3437 Georgi Gerganov 2024-07-22 13:32:49 +03:00
  • 3d0f1ede06
    Allow all RDNA2 archs to use sdot4 intrinsic Jeroen Mostert 2024-07-22 12:11:30 +02:00
  • 03cb5cda6d use sliding window for phi3 Shupei Fan 2024-07-22 17:06:45 +08:00
  • 50e05353e8
    llama : add Mistral Nemo inference support (#8604) b3436 Douglas Hanley 2024-07-22 03:06:17 -05:00
  • 628154492a
    server : update doc to clarify n_keep when there is bos token (#8619) Jan Boon 2024-07-22 16:02:09 +08:00
  • 04bab6b7da
    ggml: fix compile error for RISC-V (#8623) b3434 Mark Zhuang 2024-07-22 15:56:45 +08:00
  • 0250f3da0f
    Merge e7416df4bb into b7c11d36e6 Iaroslav Chelombitko 2024-07-22 15:55:43 +08:00
  • 2cf997fc97 ggml: fix compile error for RISC-V Mark Zhuang 2024-07-22 15:00:20 +08:00
  • b7c11d36e6
    examples: fix android example cannot be generated continuously (#8621) b3433 devojony 2024-07-22 14:54:42 +08:00
  • 7e27c17572
    llama : model-based max number of graph nodes Georgi Gerganov 2024-07-22 09:50:57 +03:00
  • d4c257d1f9
    fix: Android example cannot be generated continuously devojony 2024-07-22 13:20:06 +08:00
  • 6f5334136c
    server : update doc to clarify n_keep count when there is a begin of string token Jan Boon 2024-07-22 04:28:45 +08:00
  • 525e78936a removed .inp and out .out ggufs Jason Stillerman 2024-07-21 11:54:56 -04:00
  • 45f2c19cc5
    flake.lock: Update (#8610) Georgi Gerganov 2024-07-21 16:45:10 +03:00
  • 57349e1db3
    llama : allow overrides for tokenizer flags gg/allow-kv-overrides Georgi Gerganov 2024-07-21 14:42:15 +03:00
  • 3252afb323 Fix Vulkan DeepSeek-Coder-V2-Lite MoE support 0cc4m 2024-07-21 10:58:05 +02:00
  • 6274b3f835 Add Vulkan IQ4_NL support 0cc4m 2024-07-21 10:57:01 +02:00
  • e18281940e
    docfix: server readme: quantum models -> quantized models. Ujjawal Panchal 2024-07-21 14:18:16 +05:30
  • 0ab192f500
    docfix: imatrix readme, quantum models -> quantized models. Ujjawal Panchal 2024-07-21 14:15:46 +05:30
  • 38babee528
    cmake : sort Georgi Gerganov 2024-07-21 10:45:25 +03:00