Commit graph

  • 176993c871
    Merge branch 'master' into server-rev Georgi Gerganov 2023-10-22 15:04:16 +03:00
  • 01533762b8 batch.seq_id[j] -> batch.seq_id[j][0] Galunid 2023-10-22 13:05:05 +02:00
  • db09c028b4 Merge branch 'master' into stablelm-support Galunid 2023-10-22 12:58:59 +02:00
  • 839a1838bd Fix formatting in llama.cpp Galunid 2023-10-22 11:12:35 +02:00
  • e167ebcb22 Fix formatting in gguf.py Galunid 2023-10-22 11:08:19 +02:00
  • 76b4495cec Fix rope parameters Galunid 2023-10-22 10:04:38 +02:00
  • a71041a05f Use ggml_norm not ggml_rms_norm Galunid 2023-10-22 09:57:52 +02:00
  • cb79f8a2d8
    llama : add SKIP_KQ_KQV option perf-study Georgi Gerganov 2023-10-22 09:58:29 +03:00
  • ed9fde7a1e
    ggml : skip nops Georgi Gerganov 2023-10-22 09:55:37 +03:00
  • 2471d56a2e
    llama : profiling the attention compute Georgi Gerganov 2023-10-22 09:22:54 +03:00
  • 73cab75f41 server : fix 'terminated by signal SIGSEGV' error when suffix is empty cyc 2023-10-22 14:18:54 +08:00
  • fafe999ff9 update lite and colab (+1 squashed commits) Concedo 2023-10-22 13:55:55 +08:00
  • 22c69a2794
    batched : add len CLI argument b1408 Georgi Gerganov 2023-10-22 08:37:20 +03:00
  • cff75061fe fixed some old models failing due to tokenizer changes, update lite (+1 squashed commits) Concedo 2023-10-21 11:33:32 +08:00
  • 7b127a734d added LLAMA_API Marcus Dunn 2023-10-21 16:07:26 -07:00
  • a4ab8e5d83 added llama_model_token_* variants to all the llama_token_* functions. Marcus Dunn 2023-10-21 15:59:54 -07:00
  • 2eb4c11ec5 fix image load + view image in chat FSSRepo 2023-10-21 14:34:19 -04:00
  • 108d698923 Prepend newline to usage output Mason M 2023-10-21 14:54:54 -03:00
  • d0e14e6ecd Allow caller to handle help/argument exceptions Mason M 2023-10-21 14:40:57 -03:00
  • 17b23eb9cb
    server : fix multibyte handle in partial response (#3706) Jhen-Jie Hong 2023-10-21 19:58:03 +08:00
  • a0db45f742 Switch to signal semaphores for flexibility 0cc4m 2023-10-21 11:50:37 +02:00
  • 4fbce39089 Fix model conversion script Galunid 2023-10-21 11:12:01 +02:00
  • a7b81c3b8d server : fix multibyte handle in partial response jhen 2023-10-21 11:17:17 +08:00
  • dd1d61ea6b colab is fixed (+1 squashed commits) Concedo 2023-10-21 09:49:19 +08:00
  • 8d3a461f8b
    Update README.md - remove unsupported node.js library Ian Scrivener 2023-10-21 09:56:01 +11:00
  • 1b66b8b203 Bake SPIR-V bytecode into the library instead of loading shaders from file 0cc4m 2023-10-20 23:55:01 +02:00
  • 0ec595fe38 Add q5_k support 0cc4m 2023-10-20 21:39:42 +02:00
  • 6d126d0acc fix win staviq 2023-10-20 21:26:07 +02:00
  • 226ed5f291 fix empty str vs nullptr for setlocale and getenv staviq 2023-10-20 21:19:44 +02:00
  • 465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat b1407 shibe2 2023-10-12 16:01:23 +04:00
  • d1031cf49c
    sampling : refactor init to use llama_sampling_params (#3696) b1406 Georgi Gerganov 2023-10-20 21:07:23 +03:00
  • 778c070d1b
    server : logs + minor code style Georgi Gerganov 2023-10-20 20:44:51 +03:00
  • 5d540e80d1
    server : no need for atomic int - already using mutex Georgi Gerganov 2023-10-20 20:44:29 +03:00
  • 113dd60005
    server : bach has to be allocated for n_parallel sequences Georgi Gerganov 2023-10-20 20:42:45 +03:00
  • 6b2437e32d added thread safe pipeline FSSRepo 2023-10-20 12:07:32 -04:00
  • 4a97d2d1ec Add q4_k support 0cc4m 2023-10-20 17:53:28 +02:00
  • 56ba00b923
    sampling : hide prev behind API and apply #3661 sampling-refactor Georgi Gerganov 2023-10-20 18:26:20 +03:00
  • 7e2b5fb1dd
    sampling : add llama_sampling_print helper Georgi Gerganov 2023-10-20 18:02:50 +03:00
  • b526561583
    sampling : rename penalty params + reduce size of "prev" vector Georgi Gerganov 2023-10-20 17:47:13 +03:00
  • 6119a2b5b2 revert lite change Concedo 2023-10-20 22:13:56 +08:00
  • 84ed48b473
    examples : remove embd-input and gptneox-wip Georgi Gerganov 2023-10-20 17:08:32 +03:00
  • 6e6587656f
    llama : combine repetition, frequency and presence penalties in 1 call Georgi Gerganov 2023-10-20 17:05:46 +03:00
  • 6fa681b692 fixed a race condition with SSE streaming Concedo 2023-10-20 22:01:09 +08:00
  • 14cf93b14c
    fix YaRN ramp, make mscale conditional, add --yarn-orig-ctx (#2) Jeffrey Quesnelle 2023-10-20 06:18:17 -07:00
  • cd1e937821
    sampling : refactor init to use llama_sampling_params Georgi Gerganov 2023-10-20 14:58:20 +03:00
  • 5f5d5f1d86 quick fix Concedo 2023-10-20 19:43:56 +08:00
  • 8cf19d60dc
    gguf : support big endian platform (#3552) b1405 Qin Yue Chen 2023-10-20 06:19:40 -05:00
  • 4389bdac81
    Merge branch 'ggerganov:master' into master Qin Yue Chen 2023-10-20 05:59:40 -05:00
  • eb5b8327f6 Compare "GGUF" with file header char by char 1. Set GGUF_MAGIC to "GGUF" string instead of int value 2. Compare "GGUF" char by char to ensure its byte order 3. Move bytes swap code from convert.py to gguf.py write_tensor_data chenqiny 2023-10-20 18:45:19 +08:00
  • 012c53367d minor lite fixes Concedo 2023-10-20 18:41:17 +08:00
  • a0edf73bda
    server : fix uninitialized sampling context (close #3685) b1404 Georgi Gerganov 2023-10-20 13:06:10 +03:00
  • f439e506e8
    ggml : fix rope + llama minor optimizations (#3560) b1403 Herman Semenov 2023-10-20 10:02:12 +00:00
  • d3c7b7cc71 colab fix Concedo 2023-10-20 16:34:45 +08:00
  • 574a6581ba Using const auto references in range-based loop C++17 German Semenov 2023-10-20 11:14:05 +03:00
  • 8bc1943efe Minor fixes and fixed memleak German Semenov 2023-10-20 11:13:06 +03:00
  • d5016fdc8f updated lite bug Concedo 2023-10-20 16:03:06 +08:00
  • ee93213218 updated lite Concedo 2023-10-20 15:44:52 +08:00
  • cd3bb3ede2 update colab link Concedo 2023-10-20 13:49:34 +08:00
  • e78f3ef24a
    convert : restore compat with old Falcon models (#3680) cebtenzzre 2023-10-20 01:32:08 -04:00
  • 8947142c46 updated lite and colab Concedo 2023-10-20 11:35:44 +08:00
  • 8396208c00 remove redundant C locale check staviq 2023-10-20 05:15:40 +02:00
  • a72c053f0c
    Update common/console.cpp staviq 2023-10-20 03:04:23 +00:00
  • 9ae10b3aee
    Fix YaRN inverted scaling and add "rope.scaling.type" to GGUF (#1) Jeffrey Quesnelle 2023-10-19 19:36:16 -07:00
  • 1ad5224227 fix non-utf locale 2 staviq 2023-10-20 04:16:49 +02:00
  • 1e328f43f7 fix non-utf locale staviq 2023-10-20 04:03:22 +02:00
  • 5616b439df fmt staviq 2023-10-19 21:49:09 +02:00
  • 9eab8b69dd fix getwchar failing when LC_ALL undefined staviq 2023-10-19 21:29:35 +02:00
  • f3b25e4043
    multimodal : add BakLLaVA conversion support (#3682) M. Yusuf Sarıgöz 2023-10-19 19:40:41 +03:00
  • 8d31550d48 fix groupchat Concedo 2023-10-19 23:40:15 +08:00
  • 957e245285 Merge branch 'master' into concedo_experimental Concedo 2023-10-19 23:32:52 +08:00
  • ddce116ec9
    Fix for Top K disabling (#480) kalomaze 2023-10-19 10:20:44 -05:00
  • 8c6001de2a updated lite Concedo 2023-10-19 23:18:14 +08:00
  • fd770bb105 patch Concedo 2023-10-19 23:04:26 +08:00
  • 4382e51719 updated lite and default horde ctx amount Concedo 2023-10-19 22:49:59 +08:00
  • cd6f21807a multimodal : add BakLLava conversion support M. Yusuf Sarıgöz 2023-10-19 17:40:19 +03:00
  • 60abea9798
    llava : avoid segfault in case of non-existent mmproj file (#3674) b1400 M. Yusuf Sarıgöz 2023-10-19 16:59:11 +03:00
  • 9dba8392d8 convert : restore compat with old Falcon models Cebtenzzre 2023-10-19 08:48:28 -04:00
  • 325d1793f7
    server : minor sync Georgi Gerganov 2023-10-19 15:03:24 +03:00
  • f1dd430f67 Remove random junk print Galunid 2023-10-19 13:56:00 +02:00
  • 9740824ba5
    server : snake case Georgi Gerganov 2023-10-19 14:44:37 +03:00
  • e3a2c3fe32
    server : use refs + use llama_batch_clear() Georgi Gerganov 2023-10-19 14:44:04 +03:00
  • 3d5929e8ee
    server : bug fix in ingest_images Georgi Gerganov 2023-10-19 14:43:19 +03:00
  • a8c981b734
    server : remove beam-search functionality Georgi Gerganov 2023-10-19 14:10:37 +03:00
  • 654e0a1fe0
    server : coding-style normalization (part 2) Georgi Gerganov 2023-10-19 14:09:45 +03:00
  • e44ed60187
    server : coding-style normalization Georgi Gerganov 2023-10-19 13:37:39 +03:00
  • 97d67e8a3a Merge branch 'master' of github.com:ggerganov/llama.cpp vvhg1 2023-10-19 12:16:20 +02:00
  • f0d3971d9e Merge branch 'master' of github.com:vvhg1/llama.cpp vvhg1 2023-10-19 08:35:50 +02:00
  • 02ac367d0c removed unneccessary static string escape vvhg1 2023-10-19 08:35:27 +02:00
  • 50093895e6 Better error handling to avoid segfaults for non-existant CLIP models M. Yusuf Sarıgöz 2023-10-19 08:54:36 +03:00
  • ab2fc00224 latest changes of sampling API FSSRepo 2023-10-18 16:57:48 -04:00
  • 8540568c48 Merge branch 'master' of https://github.com/ggerganov/llama.cpp FSSRepo 2023-10-18 16:55:26 -04:00
  • 7196c4e08a new sampling API FSSRepo 2023-10-18 16:50:09 -04:00
  • 4a95d913e0 CLBlast: Add outer loops over src0 for broadcasting in mulmat shibe2 2023-10-12 16:01:23 +04:00
  • 004797f6ac
    readme : update hot topics Georgi Gerganov 2023-10-18 21:44:43 +03:00
  • 4e82b2ea3f
    speculative : bug fixes b1398 Georgi Gerganov 2023-10-18 18:49:40 +03:00
  • 0e89203b51
    speculative : add tree-based sampling example (#3624) b1397 Georgi Gerganov 2023-10-18 16:21:57 +03:00
  • 1ee5cc3076 Make stablelm conversion script use .safetensors Galunid 2023-10-18 14:51:50 +02:00
  • 84b8f2b060
    Merge branch 'ggerganov:master' into master Steward Garcia 2023-10-18 08:43:17 -04:00
  • c67fe68e41
    metal : implement q5_0 and q5_1 kernels (#3648) b1396 Jhen-Jie Hong 2023-10-18 07:21:48 -05:00
  • 7a88522975
    minor : spaces / formatting Georgi Gerganov 2023-10-18 15:20:35 +03:00