Commit graph

  • f30bd63252
    refactor: Add function for building and parsing CLI arguments teleprint-me 2024-05-25 14:41:13 -04:00
  • e9759dee0b
    docs: Add revisions to hub-vocab.py module level docstring teleprint-me 2024-05-25 14:33:23 -04:00
  • 96811fdf63 duo: v2 Oleksandr Kuvshynov 2024-05-25 14:23:57 -04:00
  • 78938bc0c9 duo: v0 Oleksandr Kuvshynov 2024-05-25 13:59:28 -04:00
  • fc59407efe convert-hf : support Mini-Jamba conversion Francis Couture-Harpin 2024-05-25 13:55:11 -04:00
  • ea2e63e9d2 convert-hf : check for unprocessed Jamba experts Francis Couture-Harpin 2024-05-25 12:54:30 -04:00
  • 11f78c6a2d convert-hf : adapt ArcticModel to use yield too compilade/lazier-moe-convert-hf Francis Couture-Harpin 2024-05-25 12:52:53 -04:00
  • 96a299ff60 Merge branch 'master' into compilade/lazier-moe-convert-hf Francis Couture-Harpin 2024-05-25 12:49:41 -04:00
  • d703fa9fa5 convert-hf : fix flake8 indentation lint Francis Couture-Harpin 2024-05-25 12:47:01 -04:00
  • c755bd6223
    Merge branch 'master' into master Brian 2024-05-26 01:32:48 +10:00
  • aa3fd500b1
    Bug fix suggested by Georgi Alexander Komarov 2024-05-25 07:49:18 -07:00
  • 02240912ff
    Merge branch 'master' into codecov-badge Brian 2024-05-26 00:17:10 +10:00
  • 6d2f3d9d51 SimpleChat: Note about trying to keep things simple yet flexible HanishKVC 2024-05-25 19:25:19 +05:30
  • f5a3bbbdda Set RULE_LAUNCH_COMPILE to detected ccache absolute path. S David 2024-05-25 09:51:12 -04:00
  • 5ca93f7907 main: replace --no-special with --special brian khuu 2024-05-25 22:58:56 +10:00
  • 9588f196b1
    train : change default FA argument (#7528) b2998 Georgi Gerganov 2024-05-25 15:21:30 +03:00
  • c1e7f488c3 threadpool: add persistent threadpool for llama-bench Max Krasnyansky 2024-05-25 05:15:11 -07:00
  • e77167446a threadpool: proper handling for non-specified cpumask Max Krasnyansky 2024-05-25 05:14:29 -07:00
  • 739648f3e6 Implement Q8_0 quantization fully in PyTorch. Heiner 2024-05-23 20:24:47 +02:00
  • abc958b07e Move noqa comment to where the lastest flake8 likes it. Heiner 2024-05-23 15:19:49 +02:00
  • 0a1ef1127f Write tensors in layer order. Heiner 2024-05-23 15:07:27 +02:00
  • 60b29ea6e4 More constants from gguf. Heiner 2024-05-23 11:26:35 +02:00
  • e2f13a3346 Use Q8_0 quantization from gguf module. Heiner 2024-05-23 11:10:59 +02:00
  • f177b6596c Fix layer order. Heiner 2024-05-21 23:17:14 +02:00
  • 9a0629d545 Don't multiply embeddings with embedding_multiplier_scale as it happens in llama.cpp. Heiner 2024-05-10 12:40:05 +02:00
  • ef671c693d Address review comments by foldl. Heiner 2024-05-09 22:43:38 +02:00
  • d894497a96 Move print to logging: Fixes. Heiner 2024-05-09 17:12:07 +02:00
  • 5bc4f10ee9 Update convert_grok.py to use logging module Brian 2024-05-10 01:17:56 +10:00
  • 08427630c3 Use only one list of weight names, with values from the gguf module. Heiner 2024-05-03 23:44:47 +02:00
  • 3c57743874 Don't split MoE weights. Heiner 2024-05-03 22:38:05 +02:00
  • 6ddf93b286 Script to convert Grok-1 weights from raw JAX pickle files. Heiner 2024-04-24 23:39:06 +02:00
  • 3cbd23ed88
    labeler: added Apple Metal detector (+Kompute) (#7529) Brian 2024-05-25 19:30:42 +10:00
  • 00c6390793
    main : don't print special tokens with --grammar (#6923) b2996 Justine Tunney 2024-05-25 05:04:03 -04:00
  • faa0e6979a
    ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (#7433) b2995 Masaya, Kato 2024-05-25 17:42:31 +09:00
  • 6c1b0111a1
    refactor: Apply huggingface_hub api to CLI teleprint-me 2024-05-25 04:16:10 -04:00
  • 63c3410492
    refactor: Add support for model file types teleprint-me 2024-05-25 04:15:39 -04:00
  • 2ffe6b89c8
    Refactor HFubModel and HFHubTokenizer to fix reference issues teleprint-me 2024-05-25 04:15:15 -04:00
  • 9791f40258
    android : module (#7502) b2994 Elton Kola 2024-05-25 04:11:33 -04:00
  • fda2319d7b
    refactor: Streamline method signatures and clarify method names related to downloading repo files teleprint-me 2024-05-25 03:32:27 -04:00
  • e75c5ca451
    main: remove special token file descriptor feature (#5) Brian 2024-05-25 17:04:31 +10:00
  • 4438d052aa
    refactor: Abstract file and logger management to streamline api interface teleprint-me 2024-05-25 02:57:59 -04:00
  • 590720fa38
    Merge branch 'master' into embedding-parameters Brian 2024-05-25 16:54:57 +10:00
  • 99275a1606
    refactor: Simplify API and merge HFModel into HFHub teleprint-me 2024-05-25 02:10:52 -04:00
  • ef1b87d1af free batch threadpool in main fmz 2024-05-24 22:20:52 -07:00
  • 168297f11c
    refactor: Add remote repository listings to the bas HFHub class teleprint-me 2024-05-24 23:57:45 -04:00
  • 83aabb3fb7 readme Oleksandr Kuvshynov 2024-05-24 23:56:48 -04:00
  • 902184dd3a
    fix missing slash in fs_get_cache_directory() (#7503) b2993 Xuan Son Nguyen 2024-05-25 05:30:59 +02:00
  • 8bdae2192b roll back CMakePresets.json changes fmz 2024-05-24 20:20:53 -07:00
  • 61a88a1da3 llama : fix BERT inference without KV cache Francis Couture-Harpin 2024-05-24 22:41:38 -04:00
  • 51e933a962 Fix falcon punctuation regex jaime-m-p 2024-05-25 04:32:45 +02:00
  • 0794b77714 Move 'add_special_bos/eos' logic to llm_tokenizer_bpe jaime-m-p 2024-05-25 04:32:22 +02:00
  • 10d5aefed5 logging Oleksandr Kuvshynov 2024-05-24 22:21:41 -04:00
  • 6168399112 Add BPE models for testing jaime-m-p 2024-05-25 04:17:05 +02:00
  • 614d0bb874 Update random test: add_eos_token jaime-m-p 2024-05-25 04:15:22 +02:00
  • 6da2bd6fbc
    patch: Apply fix for paths and logging teleprint-me 2024-05-24 21:47:47 -04:00
  • b0e2c23bf9 labeler: add Kompute to detector [no ci] brian khuu 2024-05-25 11:25:44 +10:00
  • 57684331fc
    Make tokenize CLI tool have nicer command line arguments. (#6188) b2992 Mikko Juola 2024-05-24 18:14:42 -07:00
  • b83bab15a5
    gguf-py : fix and simplify quantized shape round-trip (#7483) compilade 2024-05-24 21:11:48 -04:00
  • fd6e851881 labeler: added Apple Metal detector [no ci] brian khuu 2024-05-25 11:08:37 +10:00
  • 6f4c300bff Refactor llm_tokenizer_bpe: move code to constructor jaime-m-p 2024-05-25 02:20:55 +02:00
  • fe3c531915 bugfix: custom regex split fails with codepoint 0 jaime-m-p 2024-05-25 02:10:08 +02:00
  • 0fd13e9473 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-24 19:35:16 -04:00
  • cbc743e600 llama : support Jamba Francis Couture-Harpin 2024-05-24 19:27:27 -04:00
  • a64f086d72
    Merge 8be06dc745 into d041d2ceaa Justine Tunney 2024-05-24 18:49:07 -04:00
  • 55e387b2d5 Add BPE models for testing jaime-m-p 2024-05-25 00:19:31 +02:00
  • e013b23102 Update random test: add_bos_token jaime-m-p 2024-05-25 00:13:08 +02:00
  • b30bea3257 add comments ngxson 2024-05-24 22:50:03 +02:00
  • 7e13f19fb5 llama : rethink recurrent state cell counts Francis Couture-Harpin 2024-05-24 16:19:25 -04:00
  • f0ffd0e347 server: do not remove whitespace at the start of a completion chunk mgroeber9110 2024-05-24 22:12:13 +02:00
  • 120f7bf527
    Add optional MLP bias for Granite models Steffen Roecker 2024-05-09 10:05:47 +02:00
  • 0ce35a6712 reset cpu affinity every time for main thread fmz 2024-05-24 12:53:46 -07:00
  • 467583831c set bitmask properly on windows fmz 2024-05-24 19:28:03 -07:00
  • 26cb415267
    fixed typo in previous commit Alexander Komarov 2024-05-24 12:26:11 -07:00
  • 9a4bdc8c12 Introduce ggml_threadpool fmz 2024-05-24 12:04:00 -07:00
  • b3d55bcc72
    replaced call to kernel_mul_mv_f16_f32_l4 with kernel_mul_mv_f16_f32_l4_large Alexander Komarov 2024-05-24 11:52:13 -07:00
  • cd2322c996
    Added kernel_mul_mv_f16_f32_l4_large which performs 32x more ops Alexander Komarov 2024-05-24 11:50:23 -07:00
  • b3afd6c86a SimpleChat:Add n_predict (equiv max_tokens) for llamacpp server HanishKVC 2024-05-24 23:16:55 +05:30
  • 8f172b9070 SimpleChat: Try make user experience better, if possible HanishKVC 2024-05-24 22:53:43 +05:30
  • 3cf7bdf1e9 Fix flake8 complaints Galunid 2024-05-24 19:24:08 +02:00
  • 87509005a8 Fix gguf not imported correctly Galunid 2024-05-24 19:20:16 +02:00
  • 66982abcb1 fixes Oleksandr Kuvshynov 2024-05-24 12:22:59 -04:00
  • d041d2ceaa
    flake.lock: Update (#7232) Georgi Gerganov 2024-05-24 18:59:06 +03:00
  • bb9c361802 gguf-py : re-add SCALING_YARN_LOG_MUL removed during merge by accident Stanisław Szymczyk 2024-05-24 16:37:29 +02:00
  • a54685b98a Merge remote-tracking branch 'upstream/master' into deepseek-v2 Stanisław Szymczyk 2024-05-24 16:13:00 +02:00
  • 02e2c91d01 correct split id Oleksandr Kuvshynov 2024-05-24 09:52:28 -04:00
  • 27891f6db0
    docker.yml: disable light-intel and server-intel test (#7515) b2989 Brian 2024-05-24 23:47:56 +10:00
  • fbca2f27fc
    Add support for ArcticForCausalLM (#7020) b2988 fairydreaming 2024-05-24 14:31:13 +02:00
  • 69729a34b5 Fix imports Galunid 2024-05-24 14:30:19 +02:00
  • c48cccacf1 Fix lost convert.py in ci/run.sh Galunid 2024-05-24 14:20:54 +02:00
  • 7b042f0d81 Fix convert-no-torch -> convert-legacy-llama Galunid 2024-05-24 14:18:58 +02:00
  • 36558d9795 Merge branch 'master' into move-convert-py Galunid 2024-05-24 14:16:52 +02:00
  • 068d0793c4 Move vocab thing to vocab.py Galunid 2024-05-24 14:10:55 +02:00
  • f70e3df72b docker.yml: disable server-intel test brian khuu 2024-05-24 21:12:08 +10:00
  • 602c80d918 llama : fix whitespace formatting Stanisław Szymczyk 2024-05-24 12:44:13 +02:00
  • 3aa20e1376 Merge remote-tracking branch 'upstream/master' into snowflake-arctic-clean Stanisław Szymczyk 2024-05-24 12:41:30 +02:00
  • 8dd0008ac6 docker.yml: disable light-intel test brian khuu 2024-05-24 20:17:25 +10:00
  • ba33fea342 erase backend handle when free hongruichen 2024-05-24 18:15:36 +08:00
  • c31c118d86 calc diff ngxson 2024-05-24 11:46:47 +02:00
  • 0a46d73056 add control-vector-generator ngxson 2024-05-24 11:11:55 +02:00
  • 64096942ce
    refactor: Simplify the huggingface hub api to enable flexible model requests teleprint-me 2024-05-24 02:40:34 -04:00