Commit graph

  • 842ff3fed1 Added 16B and 236B model types for DeepSeek-V2. Stanisław Szymczyk 2024-05-21 17:12:14 +02:00
  • d8b373c146
    Merge branch 'master' into grammar-token Justine Tunney 2024-05-21 08:11:32 -07:00
  • d93b5cad0a
    minor : cleanup Georgi Gerganov 2024-05-16 13:53:22 +03:00
  • 4f787ead14
    backends : fix pragma semicolons Georgi Gerganov 2024-05-16 13:42:00 +03:00
  • e7c7d8ca42
    tests : update to use new rope API Georgi Gerganov 2024-05-16 13:34:57 +03:00
  • f4cb482c62
    minor : style Georgi Gerganov 2024-05-16 13:33:01 +03:00
  • 352c3859a7
    backends : add dev messages to support rope freq. factors Georgi Gerganov 2024-05-16 13:23:30 +03:00
  • 471d8170bc
    ggml : update ggml_rope_ext API to support freq. factors Georgi Gerganov 2024-05-16 13:23:04 +03:00
  • 2d473a4a9a
    metal : support rope freq_factors Georgi Gerganov 2024-05-16 12:03:53 +03:00
  • 8a9c897fd0
    add one line of comments liuwei 2024-05-11 21:42:32 +00:00
  • d05ae12e93
    set to the short freq factor when context size is small than trained context size liuwei 2024-05-11 20:30:32 +00:00
  • b1f491a297
    fix flint warnings on convert-hf-to-gguf.py liuwei 2024-05-11 19:16:45 +00:00
  • 5683db3bf7
    remove unused rope scaling type 'su' frin gguf converter liuwei 2024-05-11 17:55:41 +00:00
  • 6333ed1a30
    make freq factors only depend on ctx size liuwei 2024-05-11 17:25:08 +00:00
  • c5569311a4
    add long rope support in ggml cpu backend liuwei 2024-05-11 10:44:31 +00:00
  • 9f871298b6
    adjust index value in cuda long rope freq factors liuwei 2024-05-11 10:38:22 +00:00
  • cc19780a55
    address build warnings on llama.cpp liuwei 2024-05-01 18:50:10 +00:00
  • 56d9fa72de
    add phi3 128k support in cuda Wei Liu 2024-05-02 02:06:20 +08:00
  • 8fa413d8b5
    add phi3 128k support in convert-hf-to-gguf Wei Liu 2024-05-01 15:15:12 +08:00
  • 14083d157f SimpleChat:MCUI: Support for new chat sessions HanishKVC 2024-05-21 19:56:21 +05:30
  • 0e53d42614 add build shared lib in win release package arthw 2024-05-21 22:30:11 +08:00
  • fbe6bc5c99 grammars: remove todo Olivier Chafik 2024-05-21 15:27:44 +01:00
  • 11474e756d
    examples: cache hf model when --model not provided (#7353) b2956 Amir 2024-05-21 17:13:12 +03:00
  • d8ee902227
    CUDA: deduplicate mmq code (#7397) b2955 Johannes Gäßler 2024-05-21 16:02:12 +02:00
  • 2a407192fc SimpleChat:MCUI: Ensure req-resp failure doesnt lock up things HanishKVC 2024-05-21 18:51:14 +05:30
  • d9a738cf00 CUDA: deduplicate mmq code Johannes Gäßler 2024-05-19 18:16:49 +02:00
  • 68ef7401ed SimpleChat:Cleanup corners HanishKVC 2024-05-21 18:33:27 +05:30
  • d7e852c1bc
    Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425) jaime-m-p 2024-05-21 14:39:48 +02:00
  • b9f9c0ec6e SimpleChat:MCUI: Allow selected chat-session btn to be highlighted HanishKVC 2024-05-21 17:48:18 +05:30
  • 356daef971 examples: cache hf model when --model not provided Amir 2024-05-21 12:24:27 +00:00
  • 3458d2f8c3 SimpleChat:GetSystemLatest, fix a oversight. HanishKVC 2024-05-21 17:39:49 +05:30
  • 7297dda376 SimpleChat: Take care of system prompt HanishKVC 2024-05-21 17:24:06 +05:30
  • d57274b2a3 SimpleChat:MCUI: Delay enabling user-input to avoid race HanishKVC 2024-05-21 16:48:32 +05:30
  • 928cc36427 SimpleChat:MCUI: Store and use current chat session id HanishKVC 2024-05-21 16:01:11 +05:30
  • 1b82f2281f SimpleChat:MCUI:Show available chat sessions, try switch btw them HanishKVC 2024-05-21 15:05:17 +05:30
  • 1cd10ae9c4 SimpleChat: Move ui elements into MultiChatUI, Update el IDs HanishKVC 2024-05-21 09:43:02 +05:30
  • 0594bfdd5e
    Revert "server : fix test regexes" Georgi Gerganov 2024-05-21 11:34:10 +03:00
  • a8fe1624d4
    metal : handle F16 inf values Georgi Gerganov 2024-05-21 10:26:53 +03:00
  • 8f76ba54ba main: refactor ctrl_token_no_out --> no_special brian khuu 2024-05-21 16:03:18 +10:00
  • 7d52482bac main: renamed --no-special from --ctrl-token-no-out and other refactoring brian khuu 2024-05-21 16:00:59 +10:00
  • 2fe28ad4d3
    chore: Rename from repo to model repo and reorder for improved readability teleprint-me 2024-05-21 01:41:35 -04:00
  • 4768650aff
    chore: Add formatting, set common vocab files, apply pattern to model map teleprint-me 2024-05-21 01:38:29 -04:00
  • fb32f50834
    feat: Add hf model mapping descriptors for each repo teleprint-me 2024-05-21 01:07:13 -04:00
  • c1e8a6d1c0 main: must check pipe status on very top of program brian khuu 2024-05-21 14:58:34 +10:00
  • a3bdac091c
    chore: Remove unused enum import reference teleprint-me 2024-05-21 00:46:31 -04:00
  • 6296206392
    chore: Apply deduped token type references teleprint-me 2024-05-21 00:45:06 -04:00
  • dc24e7ef67 Move convert.py to examples/convert-no-torch.py Galunid 2024-05-21 06:34:04 +02:00
  • a35b76755f
    Merge branch 'master' into auto-model-support teleprint-me 2024-05-21 00:16:34 -04:00
  • aed0573f68
    proto: Add experimental vocab pre-tokenizer regular expressions teleprint-me 2024-05-21 00:14:26 -04:00
  • 12537fdabc
    chore: Add tokenizer constants for model metadata teleprint-me 2024-05-21 00:13:49 -04:00
  • 19e78d29b0 Fix non-CPU backend wrapping Branden Butler 2024-05-20 22:37:14 -05:00
  • 50048f5b45 main: rejig control token descriptor handling brian khuu 2024-05-21 11:20:00 +10:00
  • f2340b43fc Default values for add_bos_token and add_eos_token jaime-m-p 2024-05-21 02:22:31 +02:00
  • 9b21dc3aef Enable rtrim when pre-inserting BOS jaime-m-p 2024-05-21 02:11:28 +02:00
  • 3d490e8529 Update brute force test: add_special jaime-m-p 2024-05-21 02:09:11 +02:00
  • c78a53ade9 Style jaime-m-p 2024-05-21 02:07:16 +02:00
  • 1b17ed7ab6 Direct I/O and Transparent HugePages Pavel Fatin 2024-05-20 21:55:33 +02:00
  • a4556f0f2d grammars: fix resampling logic ochafik 2024-05-20 22:22:29 +01:00
  • 5978bb007d
    chore: Fix and update comments teleprint-me 2024-05-20 14:59:40 -04:00
  • 90456a5717 main: only merge stdout and control token if not in conversation or grammar mode brian khuu 2024-05-21 04:57:26 +10:00
  • 8b67acc8d0
    Merge branch 'ggerganov:master' into gguf-model-template Austin 2024-05-20 14:53:57 -04:00
  • 2fa2c7a86c
    chore: Move enums and model map to constants teleprint-me 2024-05-20 14:51:03 -04:00
  • 7be6aeb6d6 SimpleChat:JS: Move to dictionary of SimpleChat, instead of array HanishKVC 2024-05-21 00:18:58 +05:30
  • 5032f18f20 common.cpp: accidentally removed --interactive-first brian khuu 2024-05-21 04:47:49 +10:00
  • c9ea9df7fb main: remove --ctrl-token-fd-out in favor for fcntl() based detection brian khuu 2024-05-21 04:44:50 +10:00
  • d9ba963cd4
    refactor: Restructure tokenizer model metadata teleprint-me 2024-05-20 14:42:59 -04:00
  • 917dc8cfa6
    Tokenizer SPM fixes for phi-3 and llama-spm (#7375) b2953 jaime-m-p 2024-05-20 20:15:57 +02:00
  • 8ef1aa97a6 SimpleChat:JS: Move handle submit into MultiChat, build on same HanishKVC 2024-05-20 23:43:51 +05:30
  • 18bb36e496
    chore: Allow the user to config the logger teleprint-me 2024-05-20 14:06:21 -04:00
  • fcf2af9504 SimpleChat:JS:Keep MultiChatUI simple for now HanishKVC 2024-05-20 23:21:04 +05:30
  • 5c1a9f4d8b SimpleChat:JS: Move system prompt begin/anytime into SimpleChat HanishKVC 2024-05-20 22:59:32 +05:30
  • 9b97feab45 SimpleChat:JS: MultiChat initial skeleton HanishKVC 2024-05-20 22:38:36 +05:30
  • 7be56da99a Added YaRN log multiplier model header parameter corresponding to the multiplier of the ln(s) from the sqrt(1/t) = 0.1 ln(s) + 1 equation. Stanisław Szymczyk 2024-05-20 18:51:23 +02:00
  • fabf30b4c4
    llama : remove Persimmon (#7408) b2952 Georgi Gerganov 2024-05-20 19:35:28 +03:00
  • 20385cebcc
    perplexity: update README FP16 results [no ci] (#7413) Johannes Gäßler 2024-05-20 18:15:38 +02:00
  • af621975bb SimpleChat:JS:CI: Avoid space at end of jsdoc param line HanishKVC 2024-05-20 21:34:48 +05:30
  • 3fc607f832 SimpleChat: Screen fixed view and scrolling, Printing full HanishKVC 2024-05-20 21:27:39 +05:30
  • a041ced0fd
    wip gg/kv-determinism Georgi Gerganov 2024-05-20 17:00:55 +03:00
  • 68a5103026 Referenced the relevant GitHub discussion instead of providing long comments. Stanisław Szymczyk 2024-05-20 17:20:18 +02:00
  • 2e70b6e374 examples: cache hf model when --model not provided Amir 2024-05-20 15:05:32 +00:00
  • 5372f9bdb0 examples: cache hf model when --model not provided Amir 2024-05-17 22:23:10 +00:00
  • ad4b6097c0 main: dprintf isn't part of the IEEE POSIX standard. Just use write(). brian khuu 2024-05-21 00:37:12 +10:00
  • db10f01310
    rpc : track allocated buffers (#7411) b2950 Radoslav Gerganov 2024-05-20 16:36:55 +03:00
  • e5000cdb83 SimpleChat: Rename simplechat.html to index.html, update readme HanishKVC 2024-05-20 18:18:23 +05:30
  • 9f445a793d main: use dprintf and add --ctrl-token-no-out and --ctrl-token-fd-out brian khuu 2024-05-20 22:35:08 +10:00
  • 3bc10cb485
    server : fix temperature + disable some tests (#7409) b2949 Georgi Gerganov 2024-05-20 15:10:03 +03:00
  • 299c2c6517 rpc : pack rpc_tensor tightly Radoslav Gerganov 2024-05-20 14:37:48 +03:00
  • b2c0f7f303 update HIP_UMA #7399 Djip007 2024-05-20 13:22:18 +02:00
  • 6bf9b66fa3
    [SYCL] Update SYCL upscale operation (#7321) b2948 AidanBeltonS 2024-05-20 12:08:23 +01:00
  • 6597fafeae SimpleChat: Make vertical layout better responsive (flex based) HanishKVC 2024-05-20 16:13:24 +05:30
  • 01116c4030 perplexity: update README FP16 results [no ci] Johannes Gäßler 2024-05-20 12:51:42 +02:00
  • 26cd4237bc
    Update README.md (#7410) Bingan 2024-05-20 17:55:34 +08:00
  • 4e2bcb530b
    ci : change server Debug -> RelWithDebInfo Georgi Gerganov 2024-05-20 12:54:54 +03:00
  • 75ef7619a2 add q4_1 q5_0 q5_1 support Johannes Gäßler 2024-05-20 11:52:10 +02:00
  • 14e80c413b q4_0 works Johannes Gäßler 2024-05-20 11:46:02 +02:00
  • 9d3f69d917 rpc : track allocated buffers Radoslav Gerganov 2024-05-20 11:56:43 +03:00
  • 75096c6e6e q8_0 works Johannes Gäßler 2024-05-20 11:29:29 +02:00
  • 8a10e5c03c FP16 v still works Johannes Gäßler 2024-05-20 11:22:11 +02:00
  • ca6d82885c FP16 V still works Johannes Gäßler 2024-05-20 10:54:42 +02:00
  • 1b49f47c22 q4_0 works Johannes Gäßler 2024-05-20 10:47:38 +02:00