Commit graph

  • 7377cdb506
    Update README.md Bingan 2024-05-20 16:36:16 +08:00
  • 457b07f134
    server : disable tests relying on parallel determinism Georgi Gerganov 2024-05-20 11:17:18 +03:00
  • 204695fd10
    server : fix temperature Georgi Gerganov 2024-05-20 11:16:58 +03:00
  • 7f5255a709 Remove messages Aidan 2024-05-17 09:26:26 +01:00
  • 03344c1e78 Formatting Aidan 2024-05-16 11:10:59 +01:00
  • e79bfca781 Update SYCL upscale operation Aidan 2024-05-16 11:00:23 +01:00
  • 5d777e9c22
    requirements : remove Georgi Gerganov 2024-05-20 10:55:29 +03:00
  • d08fbf9298
    llama : remove Persimmon Georgi Gerganov 2024-05-20 10:53:36 +03:00
  • 213e90ed73
    ggml-opencl, llama: using reserve() if count already known (#7272) b2946 Herman Semenov 2024-05-20 07:33:21 +00:00
  • 7fb66eb58c
    server : fix test regexes Georgi Gerganov 2024-05-20 10:15:19 +03:00
  • 65c58207ec
    ggml : add loongarch lsx and lasx support (#6454) b2945 junchao-loongson 2024-05-20 15:19:21 +08:00
  • 1cc0155d04
    server : tuning tests (#7388) Georgi Gerganov 2024-05-20 10:16:41 +03:00
  • cc98fddcb1
    tests : set explicit temperature Georgi Gerganov 2024-05-20 09:51:57 +03:00
  • dfadac7813 SimpleChat: textarea for multiline user chat, inturn shift+enter 4 enter HanishKVC 2024-05-20 11:56:41 +05:30
  • 8ed8fa9733
    tests : fix the fix 0.8f -> 0.8 Georgi Gerganov 2024-05-20 08:59:26 +03:00
  • 189963283c
    server : increase timeout Georgi Gerganov 2024-05-19 18:48:17 +03:00
  • f159c9d2b1
    server : don't pass temperature as string Georgi Gerganov 2024-05-19 18:47:38 +03:00
  • e932094d58
    server : return error on too large embedding input (#7389) b2943 Georgi Gerganov 2024-05-20 08:56:05 +03:00
  • 2789baf480
    tests : fix --keep_split -> --keep-split (#7374) Georgi Gerganov 2024-05-20 08:55:09 +03:00
  • bdd0286bd0
    refactor: Use proper names for referenced member variables teleprint-me 2024-05-20 01:39:09 -04:00
  • a1951e27dc
    refactor: Add proper names for remote model references teleprint-me 2024-05-20 01:36:44 -04:00
  • c88088c7a1 SimpleChat:HtmlCss: Cleanup UI flow HanishKVC 2024-05-20 10:40:50 +05:30
  • 6fc4492b3f
    chore: Add english pangram to vocab tests teleprint-me 2024-05-20 00:51:35 -04:00
  • 381dad5eb3
    fix: Add missing model architectures teleprint-me 2024-05-20 00:50:42 -04:00
  • 9a2834e24e
    fix: Use __name__ as logger name teleprint-me 2024-05-19 22:39:30 -04:00
  • a0362ea475
    patch: Fix nested quotes for dict refs teleprint-me 2024-05-19 22:39:05 -04:00
  • afad05d15c
    Merge branch 'master' into master Herman Semenov 2024-05-20 02:36:35 +00:00
  • f2e4d92528 Added const reference for std::pair<> and std::tuple<> more 16 bytes: Herman Semenov 2024-05-19 21:34:42 -05:00
  • 89a46fe818
    feat: Attempt to mirror the llama.cpp API for compatibility teleprint-me 2024-05-19 22:31:05 -04:00
  • c6f2a48af7
    feat: Add prototype for identifying the vocab type teleprint-me 2024-05-19 22:30:37 -04:00
  • ce4a3904d1
    Merge branch 'ggerganov:master' into const-ref-pair Herman Semenov 2024-05-20 02:27:25 +00:00
  • 4ee29e5e1c ggml-opencl, llama: using reserve() if count already known Herman Semenov 2024-05-19 21:25:12 -05:00
  • 33c8d50acc
    Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (#7258) b2941 Srihari-mcw 2024-05-19 19:18:39 -07:00
  • c711b028b0
    Merge branch 'master' into thread_pool Brian 2024-05-20 12:16:07 +10:00
  • ee26b8ff10 update 2 junchao-loongson 2024-05-20 09:09:47 +08:00
  • 47edd35d78
    OpenMP: remove repetitive thread creation using OpenMP Jean-Baptiste BESNARD 2024-05-20 00:10:54 +02:00
  • b442ab03c7
    Merge branch 'ggerganov:master' into master Bartowski 2024-05-19 19:39:26 -04:00
  • d359f30921
    llama : remove MPI backend (#7395) b2940 slaren 2024-05-20 01:17:03 +02:00
  • 1dd185751e q8_0 k works Johannes Gäßler 2024-05-20 00:23:43 +02:00
  • 08d8a6b528 f16 still works Johannes Gäßler 2024-05-19 23:28:06 +02:00
  • 0ae2860faa
    Apply suggestions from code review jaime-m-p 2024-05-19 21:13:20 +02:00
  • 3ae7235e94 Whitespace formatting fixes. Stanisław Szymczyk 2024-05-19 20:14:14 +02:00
  • f99df46f98 Replaced hardcoded mscale value with rescaling attn_factor that results in the final mscale value equal to 1.0. Stanisław Szymczyk 2024-05-19 19:59:03 +02:00
  • 78cded5394 llama : remove MPI backend slaren 2024-05-19 19:36:43 +02:00
  • 1ea2a0036e
    quantize : fix --keep-split check (#7374) b2939 Fred Douglas 2024-05-19 11:37:04 -05:00
  • 85137247b2
    server : return error on too large embedding input Georgi Gerganov 2024-05-19 19:01:34 +03:00
  • bcd24f8974 main: use seperate stream for control characters brian khuu 2024-05-20 01:57:43 +10:00
  • 063b0d4841
    Merge branch 'ggerganov:master' into master Bartowski 2024-05-19 11:45:31 -04:00
  • f030ec1f7a
    Vulkan Embedding Fix (#7360) b2938 0cc4m 2024-05-19 17:19:53 +02:00
  • a29c197de2 Add Smaug support Colin Kealty 2024-05-19 11:09:15 -04:00
  • e4e6f67be6
    ggml : fix another case of quants nans (#7387) b2937 slaren 2024-05-19 17:08:46 +02:00
  • a84d76b40e ggml : fix another case of quants nans slaren 2024-05-19 16:50:27 +02:00
  • 5ca49cbecd
    ggml: implement quantized KV cache for FA (#7372) b2936 Johannes Gäßler 2024-05-19 16:46:13 +02:00
  • b7da2e86db ggml: implement quantized KV cache for FA Johannes Gäßler 2024-05-18 19:50:57 +02:00
  • 1b01f06db0
    server: add test for token probs (#7347) Johannes Gäßler 2024-05-19 16:26:02 +02:00
  • 41858392e1
    server: fix seed being reported back (#7382) b2934 Johannes Gäßler 2024-05-19 16:06:33 +02:00
  • 6aade19ee7
    Add StableLM2 pre-tokenizer (#7349) b2933 Anas Ahouzi 2024-05-19 14:46:46 +02:00
  • ab33f7a338
    cuda : clear error after buffer allocation failure (#7376) b2932 slaren 2024-05-19 14:19:37 +02:00
  • f3803dcc96 Merge remote-tracking branch 'origin/master' into sl/cudamalloc-clear-error slaren 2024-05-19 14:19:10 +02:00
  • b5e4e19387 server: fix seed being reported back Johannes Gäßler 2024-05-19 14:17:06 +02:00
  • e23b974f4c
    labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363) Brian 2024-05-19 20:51:03 +10:00
  • 71a742256c Temporarily hard-coded mscale value for DeepSeek-V2 (FIXME!). Stanisław Szymczyk 2024-05-19 12:22:54 +02:00
  • c191e475d5 SimpleChat:HTML: Add viewport meta for better mobile friendliness HanishKVC 2024-05-19 15:51:07 +05:30
  • 7e4786bbfb Added expert_weights_scale parameter for scaling MoE gate weights. Stanisław Szymczyk 2024-05-19 12:19:40 +02:00
  • 8a0d9a304f update junchao-loongson 2024-05-19 16:32:40 +08:00
  • ab5685e0a9 Fix Vulkan llava segfault when not offloading layers 0cc4m 2024-05-19 10:19:36 +02:00
  • 854d365aba
    cmake : update android comments (#7341) b2930 Georgi Gerganov 2024-05-19 11:01:01 +03:00
  • 6db8ec3a71 Fix Vulkan validation errors on embedding models with no offloaded layers 0cc4m 2024-05-19 09:55:36 +02:00
  • dcc5d4241d
    fix: Remove dangling if statement teleprint-me 2024-05-19 00:06:30 -04:00
  • 5840b6f0b0
    refactor: Simplify the get_vocab_base_pre method teleprint-me 2024-05-18 23:59:52 -04:00
  • 316b404d94
    patch: Fix CLI option for generating vocab tests teleprint-me 2024-05-18 23:59:22 -04:00
  • da5deebda1
    fix: Apply fix to verbose help description and generating vocab tests option teleprint-me 2024-05-18 23:34:33 -04:00
  • ce777c8910
    Merge branch 'master' into auto-model-support teleprint-me 2024-05-18 22:46:00 -04:00
  • d02a0f42f9
    feat: Add vocab generation script teleprint-me 2024-05-18 22:15:12 -04:00
  • bd32266c87
    feat: Add function for generating vocab script and fix CLI opts teleprint-me 2024-05-18 22:14:58 -04:00
  • 0479e9695f
    patch: Add exception handling for non-existent vocab related files teleprint-me 2024-05-18 22:14:19 -04:00
  • 4b3735ca50
    chore: Remove cluttered vocab files teleprint-me 2024-05-18 22:13:21 -04:00
  • 1a82573126
    feat: Add example script for automating generating tokenizer model checksums and tests teleprint-me 2024-05-18 20:49:22 -04:00
  • 150b835b96 flake.lock: Update github-actions[bot] 2024-05-19 00:39:00 +00:00
  • 006bb60d27
    chore: Fix model path references teleprint-me 2024-05-18 19:20:19 -04:00
  • a46dfcfd5c Type fix jaime-m-p 2024-05-19 01:20:17 +02:00
  • f5bf761747
    Capture CUDA logging output (#7298) b2929 fraxy-v 2024-05-19 01:44:42 +03:00
  • 69d392ae9c cuda : clear error after buffer allocation failure slaren 2024-05-19 00:39:15 +02:00
  • dd0d1590f6 Fix special tokens rtrim jaime-m-p 2024-05-19 00:08:35 +02:00
  • 5b61c04223 Fix added tokens jaime-m-p 2024-05-18 23:55:13 +02:00
  • 04aad94a60 Update brute force test: special tokens jaime-m-p 2024-05-18 23:42:05 +02:00
  • 5976126c26 SimpleChat:Readme: Note about handle_systemprompt begin/anytime HanishKVC 2024-05-19 03:42:44 +05:30
  • 40655e8990 Merge remote-tracking branch 'origin/master' into grammar-fast ochafik 2024-05-18 22:50:59 +01:00
  • 7905f2fcbe SimpleChat:JS: Allow for changing system prompt anytime for future HanishKVC 2024-05-19 03:20:30 +05:30
  • 60745acad4 grammars: remove early exit --> https://github.com/ggerganov/llama.cpp/pull/7370 ochafik 2024-05-18 22:37:58 +01:00
  • 939e143fe2 grammars: mutex-guarded lazy caching of token pieces in llama_sample_grammar ochafik 2024-05-18 22:37:14 +01:00
  • 676053fc7f SimpleChat:HTML:Group user input+btn together; Note about multichat HanishKVC 2024-05-19 02:52:33 +05:30
  • 06444bbfb7 fix inverted strcmp checking for quantize --keep-split Fred Douglas 2024-05-18 16:04:08 -05:00
  • b6f70b8a0e
    chore: Fix line spacing teleprint-me 2024-05-18 16:59:20 -04:00
  • b51ae5eecb Add minimal python client example for the server, streaming callback Christopher Rutherford 2024-05-18 21:48:13 +01:00
  • 5a5f6ab848 SimpleChat: Update notes a bit. Try keep browser happy HanishKVC 2024-05-18 23:12:41 +05:30
  • 6050941653 Corrected mscale calculation. Stanisław Szymczyk 2024-05-18 22:24:26 +02:00
  • 832b449cbd
    feat: Add pre-tokenizer CLI tooling teleprint-me 2024-05-18 14:33:56 -04:00
  • 04fb7886c5
    chore: Apply isort to package gguf init teleprint-me 2024-05-18 14:33:22 -04:00
  • 2ef73ee6e4
    refactor: Apply SoC for HF requests, vocab, and weights teleprint-me 2024-05-18 13:45:21 -04:00