Commit graph

  • d39130a398
    py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00
  • 223eb18788 merge master toyer 2024-07-07 10:39:39 +00:00
  • b81ba1f96b
    finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
  • 210eb9ed0a
    finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
  • 5b760f26a4 fix rope ratio to solve incorrect answers toyer 2024-07-07 10:27:05 +00:00
  • ed54a65d10
    Merge pull request #2 from Umpire2018/fix/flake8-error toyer 2024-07-07 18:13:24 +08:00
  • 41e8c733f6 Transpose after setting data Lorenzo Toniazzi 2024-07-07 10:32:53 +01:00
  • cb4d86c4d7
    server: Retrieve prompt template in /props (#8337) b3328 Bjarke Viksøe 2024-07-07 11:10:38 +02:00
  • c081567304
    llama : fix n_rot default Georgi Gerganov 2024-07-07 10:48:14 +03:00
  • 3e6348b8dc fix bug in clip caitianchi 2024-07-07 13:12:46 +08:00
  • 5ecc95ccb6
    Merge 4f0a128e28 into 86e7299ef5 Ryan Hua 2024-07-07 06:55:44 +02:00
  • 27cea69fc8
    fix web link error [no ci] b4b4o 2024-07-07 11:18:08 +08:00
  • e3297b8442 finetune: Rename an old command name in finetune.sh Masanari Iida 2024-07-07 11:57:03 +09:00
  • 1aa086ca37 finetune: Rename command name in README.md Masanari Iida 2024-07-07 11:49:33 +09:00
  • 30fdca6152 flake.lock: Update github-actions[bot] 2024-07-07 00:19:43 +00:00
  • 60c39aca43 server-tests : model metadata is a dict Francis Couture-Harpin 2024-07-06 20:18:10 -04:00
  • 959c057bd9 server-tests : strip "chat" from base_url in oai_chat_completions Francis Couture-Harpin 2024-07-06 19:40:41 -04:00
  • 71b50a148c server-tests : add more type annotations Francis Couture-Harpin 2024-07-06 19:27:38 -04:00
  • fbf4a85868 server-tests : use trailing slash in openai base_url Francis Couture-Harpin 2024-07-06 18:22:12 -04:00
  • cc2ae1b10b
    Update README.md Denis Spasyuk 2024-07-06 16:18:55 -06:00
  • 931134b536 transpose when loading Lorenzo Toniazzi 2024-07-06 22:59:15 +01:00
  • e29fd9634c py : type-check all Python scripts with Pyright Francis Couture-Harpin 2024-07-06 11:36:28 -04:00
  • dff9b2e105 Use intermediate vector for string assignment Bjarke Viksøe 2024-07-06 23:34:00 +02:00
  • 798cde72a1 transpose and run cont Lorenzo Toniazzi 2024-07-06 21:40:22 +01:00
  • ada3cbf8c5 Using chat_template naming convention Bjarke Viksøe 2024-07-06 23:07:26 +02:00
  • 8f0272c9d7 update branch notes Lorenzo Toniazzi 2024-07-06 21:19:52 +01:00
  • 86e7299ef5
    added support for Authorization Bearer tokens when downloading model (#8307) b3327 Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
  • bc3ed77ddf Add doc and better string handling Bjarke Viksøe 2024-07-06 21:08:53 +02:00
  • ff0912e323 llama-gguf-hash: verification added brian khuu 2024-07-07 04:32:37 +10:00
  • baf344b5b3 build example/main.cpp as shared library libllama-cli.so; set custom STDOUT and STDERR file descriptors; set custom fprintf and fflush functions to intercept token generation in shared library Marko Tasic 2024-07-06 19:46:50 +02:00
  • a21b89fd0e Make string buffer dynamic Bjarke Viksøe 2024-07-06 19:36:52 +02:00
  • 608b1224f6
    Update common/common.cpp Xuan Son Nguyen 2024-07-06 19:16:06 +02:00
  • 60d83a0149
    update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
  • 5475fc92d7 server: Retrieve prompt template in /props Bjarke Viksøe 2024-07-06 17:25:52 +02:00
  • a44f22e7d3 py : use cpu-only torch in requirements.txt compilade/requirements-cpu-torch Francis Couture-Harpin 2024-07-06 10:28:12 -04:00
  • 8830397efc update main readme ngxson 2024-07-06 15:34:47 +02:00
  • b88ce0f892 correct ggml_backend_tensor_copy ngxson 2024-07-06 15:06:32 +02:00
  • 1b4ffbac47 llama_lora_adapter_apply ngxson 2024-07-06 14:24:56 +02:00
  • 8fed75edca llama-gguf-hash: change argument from xxhash --> xxh64 and update readme brian khuu 2024-07-06 21:48:24 +10:00
  • 4e28ad40a0 correct tensor patch ngxson 2024-07-06 13:29:37 +02:00
  • e9d7b6c05f add patch tensor function ngxson 2024-07-06 12:07:29 +02:00
  • 87e25a1d1b
    llama : add early return for empty range (#8327) gguf-v0.9.0 b3325 Daniel Bevenius 2024-07-06 09:22:16 +02:00
  • c9d6700adc
    Update src/llama.cpp Georgi Gerganov 2024-07-06 10:22:09 +03:00
  • 43c78d0160 removed auth_token, removed set_ function, other small fixes Derrick T. Woolworth 2024-07-05 22:13:59 -05:00
  • 8f3749642e llama-gguf-hash: ignore maybe-uninitialized gcc-8 error brian khuu 2024-07-06 00:50:34 +10:00
  • 67c5e14d06 lora: load to devide buft ngxson 2024-07-06 02:12:53 +02:00
  • e3e86419ef goto production Wenjing Yu 2024-07-05 15:58:54 -07:00
  • 02e65ad068 Fix doc heading Mason M 2024-07-05 17:49:31 -03:00
  • f638ade774 Merge branch 'master' into vulkan-build-integration Mason M 2024-07-05 17:42:23 -03:00
  • 04b6e66622
    Update README.md Denis Spasyuk 2024-07-05 12:25:38 -06:00
  • 213701b51a
    Detokenizer fixes (#8039) b3324 jaime-m-p 2024-07-05 19:01:35 +02:00
  • eb572f9ac6
    squash! llama : add early return for empty range Daniel Bevenius 2024-07-05 18:36:55 +02:00
  • 4eb8073c54
    llama : add static_cast to fix CI warning/error Daniel Bevenius 2024-07-05 15:02:47 +02:00
  • f341bd6c86
    llama : add early return for empty range Daniel Bevenius 2024-07-05 14:06:03 +02:00
  • be20e7f49d
    Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
  • 81b8bab516 de-duplicate sections ngxson 2024-07-05 17:12:51 +02:00
  • 263ffa962e small opt of the qnn graph config init hongruichen 2024-07-05 23:07:27 +08:00
  • 3be4270fc8 fix: resolve Flake8 errors in convert-hf-to-gguf.py Umpire2018 2024-07-05 14:44:29 +00:00
  • 7ed03b8974
    llama : fix compile warning (#8304) b3322 Georgi Gerganov 2024-07-05 17:32:09 +03:00
  • 1d894a790e
    cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
  • b0589e6672
    Update README.md Denis Spasyuk 2024-07-05 08:20:03 -06:00
  • 0137683e11
    style: spaces jaime-m-p 2024-07-05 15:58:44 +02:00
  • 4e8d3bde75
    Update README.md Denis Spasyuk 2024-07-05 07:48:38 -06:00
  • 5360799e85 llama-gguf-hash: add sha256 brian khuu 2024-07-05 23:14:45 +10:00
  • 8ea02fced1 fix style ngxson 2024-07-05 14:48:51 +02:00
  • 05014d67e6
    Update README.md Denis Spasyuk 2024-07-05 06:48:26 -06:00
  • 1f3e1b66e2
    Enabled more data types for oneMKL gemm_batch (#8236) Ouadie EL FAROUKI 2024-07-05 13:23:25 +01:00
  • ccc6ef7b3c llama-gguf-hash: makefile sha1 and xxhash moved to it's own obj file brian khuu 2024-07-05 22:13:17 +10:00
  • 4b0f6b0cd6 add helper function to get Qnn_TensorType_t from ggml_tensor hongruichen 2024-07-05 19:34:56 +08:00
  • acdcbb8082 sycl : fix powf call in device code Alberto Cabrera 2024-07-05 12:04:13 +01:00
  • 0f2e68713c move tensor related function to utils hongruichen 2024-07-05 18:38:20 +08:00
  • ab4b1a7553 Merge branch 'master' into mixed_types_gemm OuadiElfarouki 2024-07-05 11:59:11 +01:00
  • 32c97de747 remove unused file RunningLeon 2024-07-05 18:25:06 +08:00
  • 9122ddecc9 add link to build docs ngxson 2024-07-05 12:09:00 +02:00
  • 1833af76ce add link among docs ngxson 2024-07-05 11:57:40 +02:00
  • a192757edc re-organize docs ngxson 2024-07-05 11:45:33 +02:00
  • 58cec14092 reformat hongruichen 2024-07-05 17:31:22 +08:00
  • 32b6d12938 update internlm2 RunningLeon 2024-07-04 20:19:19 +08:00
  • 148ec970b6
    convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
  • 2cccbaa008
    llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
  • ac61d7c25f
    llama : use int for layer iterators [no ci] Georgi Gerganov 2024-07-05 10:14:28 +03:00
  • 8e558309dc
    CUDA: MMQ support for iq4_nl, iq4_xs (#8278) b3317 Johannes Gäßler 2024-07-05 09:06:31 +02:00
  • 0a423800ff
    CUDA: revert part of the RDNA1 optimizations (#8309) b3316 Daniele 2024-07-05 07:06:09 +00:00
  • d12f781074
    llama : streamline embeddings from "non-embedding" models (#8087) b3315 Douglas Hanley 2024-07-05 02:05:56 -05:00
  • bcefa03bc0
    CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) b3314 Johannes Gäßler 2024-07-05 09:05:34 +02:00
  • efe9213fc5
    convert : remove AWQ remnants Georgi Gerganov 2024-07-05 10:00:54 +03:00
  • 5a7447c569
    readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
  • 61ecafa390
    passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
  • 00e6c4b3a5
    Update examples/passkey/README.md Georgi Gerganov 2024-07-05 09:14:17 +03:00
  • dc91715b44
    llama : minor indentation during tensor loading Georgi Gerganov 2024-07-04 19:34:04 +03:00
  • aa5898dc53
    llama : prefer n_ over num_ prefix (#8308) b3311 Georgi Gerganov 2024-07-05 09:10:03 +03:00
  • 6c05752c50
    contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
  • 4bab92ca5b
    contributing : update guidelines [no ci] Georgi Gerganov 2024-07-05 08:21:57 +03:00
  • 13dc3a02c3 use qnn graph inside add and mul ops hongruichen 2024-07-05 13:08:14 +08:00
  • 0acf7f6405
    passkey : add short intro to README.md [no-ci] Daniel Bevenius 2024-07-05 07:22:53 +02:00
  • a688ed324b add op param to add_nodes hongruichen 2024-07-05 13:07:48 +08:00
  • a9554e20b6
    [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) b3309 luoyu-intel 2024-07-05 05:06:13 +00:00
  • e235b267a2
    py : switch to snake_case (#8305) Georgi Gerganov 2024-07-05 07:53:33 +03:00
  • a3efa29e03 fix bug of minicpm1b,minicpm2b root 2024-07-05 12:10:51 +08:00
  • e33a69129e
    Update README.md Denis Spasyuk 2024-07-04 22:00:35 -06:00