Commit graph

  • d1d1b4c585 Update vulkan rope implementation to support frequency factors 0cc4m 2024-05-22 22:16:18 +02:00
  • 3b57b55c6f Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-22 15:34:24 -04:00
  • a8dd83b5d8
    opencl : restore QK_K=256 define Georgi Gerganov 2024-05-22 22:16:34 +03:00
  • 5234723f99 llama : remove duplicate LLM_TENSOR_NAMES map entry for LLM_ARCH_GPTNEOX - didn't notice it was already present Stanisław Szymczyk 2024-05-22 21:12:38 +02:00
  • 1aa777f823 llama : added missing model type for the smallest pythia-14m Stanisław Szymczyk 2024-05-22 20:54:20 +02:00
  • 7fd26bd802
    ggml : drop support for QK_K=64 Georgi Gerganov 2024-05-22 21:39:36 +03:00
  • 43bcb50f13
    squash! llama : add getters for n_threads/n_threads_batch Daniel Bevenius 2024-05-22 16:17:00 +02:00
  • 43f1d316f5
    llama : add getters for n_threads/n_threads_batch Daniel Bevenius 2024-05-22 15:11:01 +02:00
  • 8334b5becb gguf-py : do not use internal numpy types Francis Couture-Harpin 2024-05-22 14:29:50 -04:00
  • 1e374365d1
    SimpleChat: a simple and dumb web front end for testing /chat/completions and /completions end points and try chat (#7350) HanishKVC 2024-05-22 23:23:21 +05:30
  • 197ff91462
    build : remove zig (#7471) b2970 Georgi Gerganov 2024-05-22 20:05:38 +03:00
  • 875090ddcc
    build : remove zig Georgi Gerganov 2024-05-22 19:53:24 +03:00
  • 6ff13987ad
    common : normalize naming style (#7462) b2969 Georgi Gerganov 2024-05-22 20:04:20 +03:00
  • 374a95f924
    zig : try to fix build Georgi Gerganov 2024-05-22 19:23:15 +03:00
  • 38c03478a3
    CUDA: fix FA out-of-bounds writes (#7465) b2968 Johannes Gäßler 2024-05-22 17:58:25 +02:00
  • 6004f6b13a
    Added Bunny in Supported Models Raj Hammeer Singh Hada 2024-05-22 21:18:03 +05:30
  • da9e19ff6f
    Merge eb9a1ff63d into b18532a4ef Xuan Son Nguyen 2024-05-22 16:13:01 +01:00
  • b18532a4ef
    phi3 : duplicate rope factors in each layer (#7447) b2967 slaren 2024-05-22 16:10:46 +02:00
  • f9357395bb CUDA: fix FA out-of-bounds writes Johannes Gäßler 2024-05-22 16:08:15 +02:00
  • a1a5508d67 llama : Replaced obsolete ggml_rope_custom() calls with ggml_rope_ext(). Stanisław Szymczyk 2024-05-22 15:52:10 +02:00
  • eb58c4b28d Merge remote-tracking branch 'upstream/master' into snowflake-arctic-clean Stanisław Szymczyk 2024-05-22 15:54:41 +02:00
  • fcda1128bc
    vulkan: add workaround for iterator boundary check to fix clang-cl debug build (#7426) b2966 k.h.lai 2024-05-22 20:53:21 +08:00
  • dfdeb8aaf5
    ggml : remove ggml_flash_attn and ggml_flash_ff Georgi Gerganov 2024-05-22 15:46:00 +03:00
  • fb74a4e413
    common : match declaration / definition order Georgi Gerganov 2024-05-22 15:30:47 +03:00
  • b88267566d
    common : normalize naming style Georgi Gerganov 2024-05-22 15:23:50 +03:00
  • ef8e9e72b4 replace bool parameters with named flags slaren 2024-05-22 14:18:43 +02:00
  • d8b8ecf885 Merge remote-tracking branch 'upstream/master' into gpt-neox Stanisław Szymczyk 2024-05-22 13:26:19 +02:00
  • 03d8900ebe
    llama : add missing model type names (#7445) b2965 Justine Tunney 2024-05-22 07:08:18 -04:00
  • 6816522ea7 Trailing whitespace Yann Follet 2024-05-22 10:40:04 +00:00
  • 720e88657d Added missing support for GPTNeoXForCausalLM (Pythia and GPT-NeoX base models). Stanisław Szymczyk 2024-05-22 12:09:28 +02:00
  • 749b803dbd
    Update README.md Yann Follet 2024-05-22 17:46:05 +08:00
  • a142933ec4
    Implement review suggestions tristandruyen 2024-05-22 11:45:36 +02:00
  • 9b3d833189
    cuda : fix compile warning (#7454) b2964 Georgi Gerganov 2024-05-22 12:36:37 +03:00
  • 625bdb5225 add parameters for embeddings --embd-normalize --embd-output-format --embd-separator description in the README.md Yann Follet 2024-05-22 09:29:15 +00:00
  • 95fb0aefab
    CUDA: remove incorrect precision check (#7454) b2963 Johannes Gäßler 2024-05-22 10:24:29 +02:00
  • 8be06dc745
    Update examples/server/server.cpp Justine Tunney 2024-05-22 01:11:38 -07:00
  • 3e5faa8503
    cuda : fix rope + add tests (#7452) b2962 Georgi Gerganov 2024-05-22 11:01:35 +03:00
  • 3cb42757ea
    Fix CI errors Justine Tunney 2024-05-22 00:57:01 -07:00
  • 66ba5d573a
    tests : add rope tests using frequency factors Georgi Gerganov 2024-05-22 10:51:53 +03:00
  • 092549b110
    ggml : support freq_factors for f16 rope (CPU) Georgi Gerganov 2024-05-22 10:39:26 +03:00
  • ebbc728e37
    Make atomic operations explicit Justine Tunney 2024-05-22 00:35:13 -07:00
  • 4aaeb42598 tokenize.cpp: iostream header no longer required brian khuu 2024-05-22 17:31:48 +10:00
  • 8435ab0ae8
    Avoid INIT synchronization barrier when possible Justine Tunney 2024-05-22 00:31:28 -07:00
  • 5cf3c42412 CUDA: remove incorrect precision check Johannes Gäßler 2024-05-22 09:28:32 +02:00
  • 7deec14bd9
    Make MUL_MAT initialization go fast Justine Tunney 2024-05-22 00:20:24 -07:00
  • ce89cd5573
    ggml : drop mode & 1 == 1 support for ggml_rope Georgi Gerganov 2024-05-22 10:10:42 +03:00
  • f9d2b25261
    cuda : fix rope pos data Georgi Gerganov 2024-05-22 10:00:29 +03:00
  • 12285b5325
    chore: Map model file and vocab types teleprint-me 2024-05-22 02:58:12 -04:00
  • ae6ee0b777
    Revert "ggml : use dynamic thread scheduling for matrix multiplication (#6915)" Justine Tunney 2024-05-21 23:51:05 -07:00
  • a1c4aac384 server: ultra basic tools, tool_choice, tool_calls support ochafik 2024-05-22 04:15:14 +01:00
  • 793f4ff3f5 agent: support OpenAI: --endpoint https://api.openai.com --auth "Bearer $OPENAI_API_KEY" ochafik 2024-05-22 04:11:48 +01:00
  • a39e6e0758 openai: pretty indent json response ochafik 2024-05-22 03:51:49 +01:00
  • c8458fa5f7 openai: make content optional for tool call grammar gen ochafik 2024-05-22 03:51:20 +01:00
  • 2849247c4f duo: more cleanup Oleksandr Kuvshynov 2024-05-21 22:45:59 -04:00
  • 0b43e14030
    refactor: Add experimental mapping for BPE pre-tokenizers teleprint-me 2024-05-21 22:45:45 -04:00
  • 6e4865f6b4 vulkan: add workaround for iterator boundary check to fix clang-cl debug build Adriankhl 2024-05-22 10:40:18 +08:00
  • f3965704fd duo: simplify a little Oleksandr Kuvshynov 2024-05-21 22:31:52 -04:00
  • a94895217c
    Make sampling not throw exception Justine Tunney 2024-05-21 17:44:18 -07:00
  • 45108eccef
    Add regression test for new phi3 chat template tristandruyen 2024-05-22 03:41:07 +02:00
  • bde2e7605f
    Fix phi3 template matching vs zephyr tristandruyen 2024-05-22 03:19:01 +02:00
  • aa3094c91d
    Update common/sampling.cpp Justine Tunney 2024-05-21 16:20:38 -07:00
  • 6b178988f5
    Update examples/perplexity/perplexity.cpp Justine Tunney 2024-05-21 16:20:25 -07:00
  • cc363da1b6
    Update llama.cpp Justine Tunney 2024-05-21 16:20:05 -07:00
  • bc9a2e8e7f
    Update llama.cpp Justine Tunney 2024-05-21 16:19:59 -07:00
  • 5224b6534e
    Update llama.cpp Justine Tunney 2024-05-21 16:19:53 -07:00
  • ac6bed1e4d
    Update llama.cpp Justine Tunney 2024-05-21 16:19:38 -07:00
  • 6dadcd2519 Merge remote-tracking branch 'origin/master' into agent-example ochafik 2024-05-22 00:16:35 +01:00
  • 8266b7cdb0 Merge remote-tracking branch 'origin/master' into grammar-reps ochafik 2024-05-22 00:15:34 +01:00
  • 34e14ae96d
    refactor: Add experimental model mappings teleprint-me 2024-05-21 19:11:51 -04:00
  • e9095e6098 async direct io per tensor test sl/dio-test slaren 2024-05-22 01:08:52 +02:00
  • 99291c04f7
    Check for llama_get_logits_ith() errors Justine Tunney 2024-05-21 14:13:38 -07:00
  • 477973d2e1 phi3 : duplicate rope factors in each layer slaren 2024-05-21 23:08:51 +02:00
  • 773973a2d9
    Add missing model type names Justine Tunney 2024-05-21 13:41:25 -07:00
  • 201cc11afa
    llama : add phi3 128K model support (#7225) b2961 liuwei-git 2024-05-22 04:28:32 +08:00
  • d52d193e58 duo v0 Oleksandr Kuvshynov 2024-04-19 22:13:01 -04:00
  • b2aac685d5
    docs: Fix comment teleprint-me 2024-05-21 16:07:12 -04:00
  • 83b9fcd3e4
    refactor: Rename constants to reduce confusion between references teleprint-me 2024-05-21 16:06:39 -04:00
  • 6369bf0433
    metal : handle F16 inf values, fix FA partial offload (#7434) Georgi Gerganov 2024-05-21 23:03:42 +03:00
  • 2477d84c97 Merge remote-tracking branch 'origin/master' into grammar-fast ochafik 2024-05-21 20:54:02 +01:00
  • e402de364b
    grammars: fix resampling logic regression (#7424) Olivier Chafik 2024-05-21 20:40:00 +01:00
  • 1d87c50201 SimpleChat: Cleanup the log/dialog messages a bit HanishKVC 2024-05-22 00:42:50 +05:30
  • 7528c705b0
    llama : fix uninitialized tensors Georgi Gerganov 2024-05-21 22:02:00 +03:00
  • fb848b296b SimpleChat: Update readme, title, show usage if no chat to show HanishKVC 2024-05-21 23:02:47 +05:30
  • 46db3506aa address review comments Pavel Fatin 2024-05-21 20:05:26 +02:00
  • 88e06d04e5 convert-no-torch -> convert-legacy-llama Galunid 2024-05-21 19:41:09 +02:00
  • fcf6538ba6
    CUDA: fix unused warning in mmq.cu (#7442) b2958 Johannes Gäßler 2024-05-21 19:27:12 +02:00
  • 5ea637e42c openai: fix merge Olivier Chafik 2024-05-21 18:12:36 +01:00
  • 85e4e2b777 Fix CI, scripts, readme files Galunid 2024-05-21 19:06:24 +02:00
  • c2e8d62707 SimpleChat:MCUI:CornerCases:Skip new chat, show only if current HanishKVC 2024-05-21 21:55:05 +05:30
  • c3f8d58356
    tests : test-tokenizer-0.sh print more info (#7402) Georgi Gerganov 2024-05-21 19:53:48 +03:00
  • 92711138f9
    convert : read/write n_head_kv Georgi Gerganov 2024-05-21 19:40:01 +03:00
  • 219f39b068 CUDA: fix unused warning in mmq.cu Johannes Gäßler 2024-05-21 18:33:21 +02:00
  • e9acbce624
    cuda : fix compile warning Georgi Gerganov 2024-05-21 19:08:12 +03:00
  • c0cc883ae9 Added nproc for systems that don't default to nproc John Boero 2024-05-21 17:01:54 +01:00
  • c033958d7c Removed usage of output bias tensor since it's not present in DeepSeek-V2 models. Stanisław Szymczyk 2024-05-21 17:53:37 +02:00
  • 12fcea5d04 llama: rename llama_token_is_control_token() to llama_token_is_control() brian khuu 2024-05-22 01:45:07 +10:00
  • d3405ea072 SimpleChat:MCUI: NewChat btn first before existing chat sessions HanishKVC 2024-05-21 21:07:34 +05:30
  • 23b72b871c
    llama : remove tmp assert Georgi Gerganov 2024-05-21 18:29:12 +03:00
  • 600896b882
    llama : move rope factors from KV header to tensors Georgi Gerganov 2024-05-21 18:26:55 +03:00
  • 6c71277513 SimpleChat:MCUI: CreateSessionBtn helper, use wrt NewChat HanishKVC 2024-05-21 20:19:25 +05:30