root
953bef9374
fix memeory error & clean the print statements
2024-10-07 21:36:10 +00:00
root
aa23425236
Fix Vit & Patch merging
2024-10-05 00:41:54 +00:00
Yutong Dai
56e149d627
add quantize method
2024-10-04 22:38:39 +00:00
Yutong Dai
e28cdd78b0
quantization script added
2024-10-03 22:08:36 +00:00
Yutong Dai
8c179f41fa
get a new ckpt
2024-10-02 23:32:35 +00:00
Yutong Dai
279308c74a
changed llama.cpp (build_phi3 to load bias for lm.head); fixed dumb eos token issues
2024-09-19 01:13:25 +00:00
Yutong Dai
30b751ef06
update to get some results; need to check vit and llm
2024-09-16 17:11:12 +00:00
Yutong Dai
cc553a0ae0
running; buggy
2024-09-10 19:41:34 +00:00
root
f07cb4a73d
initial integration of VIT+Projector
2024-09-09 23:28:31 +00:00
Yutong Dai
a81ba75193
patch mergeing + masking done
2024-09-03 18:39:46 +00:00
Yutong Dai
6c1f137ba5
another push
2024-08-28 23:36:09 +00:00
Yutong Dai
f70fdf5a86
add merge type
2024-08-28 22:50:16 +00:00
root
f645b0bc8c
vit v_tokenizer integration
2024-08-28 19:21:01 +00:00
Yutong Dai
07baee57c9
push for juntao
2024-08-27 00:18:21 +00:00
Yutong Dai
e353c037d8
vis the difference
2024-08-22 17:33:53 +00:00
Yutong Dai
ba0861e384
the difference is from resize
2024-08-22 00:04:54 +00:00
Yutong Dai
b31dc0b5ed
Merge branch 'master' into yutong/dev
...
get the latest update from upstream::main
2024-08-19 00:19:16 +00:00
Georgi Gerganov
554b049068
flake.lock: Update ( #9068 )
2024-08-18 07:43:32 -07:00
ltoniazzi
2339a0be1c
tests : add integration test for lora adapters ( #8957 )
...
* Add printing to check weights match torch version
* minor code style changes
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-08-18 11:58:04 +02:00
Yoshi Suhara
2fb9267887
Fix incorrect use of ctx_split for bias tensors ( #9063 )
2024-08-17 15:34:21 +02:00
Xuan Son Nguyen
8b3befc0e2
server : refactor middleware and /health endpoint ( #9056 )
...
* server : refactor middleware and /health endpoint
* move "fail_on_no_slot" to /slots
* Update examples/server/server.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix server tests
* fix CI
* update server docs
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-16 17:19:05 +02:00
tc-mb
d565bb2fd5
llava : support MiniCPM-V-2.6 ( #8967 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* support minicpmv2.6
* modify convert script of minicpmv
* modify convert
* modify convert
* add readme
* add resampler of v2.6
* modify clip
* modify readme
* fix type-check
* fix type-check
* fix type-check
* fix type-check
* modify convert script and readme
* fix convert script and readme
* fix convert
* fix num in convert
* fix type-check
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com>
Co-authored-by: harvestingmoon <leewenyeong@gmail.com>
2024-08-16 16:34:41 +03:00
Farbod Bijary
ee2984bdaf
py : fix wrong input type for raw_dtype in ggml to gguf scripts ( #8928 )
...
Co-authored-by: farbod <farbod.bjary82@gmail.com>
2024-08-16 13:36:30 +03:00
Aisuko
c8ddce8560
Fix inference example lacks required parameters ( #9035 )
...
Signed-off-by: Aisuko <urakiny@gmail.com>
2024-08-16 11:08:59 +02:00
compilade
23fd453544
gguf-py : bump version from 0.9.1 to 0.10.0 ( #9051 )
2024-08-16 09:36:11 +03:00
Minsoo Cheong
c679e0cb5c
llama : add EXAONE model support ( #9025 )
...
* add exaone model support
* add chat template
* fix whitespace
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add ftype
* add exaone pre-tokenizer in `llama-vocab.cpp`
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* fix lint
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* add `EXAONE` to supported models in `README.md`
* fix space
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <113953597+compilade@users.noreply.github.com>
Co-authored-by: compilade <git@compilade.net>
2024-08-16 09:35:18 +03:00
Liu Jia
fb487bb567
common : add support for cpu_get_num_physical_cores() on Windows ( #8771 )
...
* Add support for cpu_get_num_phsical_cores() on Windows
* fix build bug on msys2-clang64 and ucrt64
* avoid adding new function
* add new macros to avoid windows+mingw64
* Add error checking to return default value
2024-08-16 09:23:12 +03:00
Yoshi Suhara
2a24c8caa6
Add Nemotron/Minitron GGUF Conversion & Inference Support ( #8922 )
...
* Add nemotron GGUF conversion & inference support
* Fix formatting issues
* Remove unnecessary write_tensors()
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* Address comments by @compilade
* Replace ggml_mul_mat()->llm_build_lora_mm()
* Remove mutable variable
* Use for bias tensors
* Cover corner case for role_scaling not in config.json
---------
Co-authored-by: compilade <git@compilade.net>
2024-08-16 04:23:33 +02:00
Nico Bosshard
e3f6fd56b1
ggml : dynamic ggml_sched_max_splits based on graph_size ( #9047 )
...
* ggml : Dynamic ggml_sched_max_splits based on graph_size
* Fixed and readded debug code for causes
2024-08-16 04:22:55 +02:00
gtygo
4b9afbbe90
retrieval : fix memory leak in retrieval query handling ( #8955 )
...
* retrieval
* Reuse querybatch to reduce frequent memory allocation
* delete unused white space
2024-08-15 10:40:12 +03:00
Riceball LEE
37501d9c79
server : fix duplicated n_predict key in the generation_settings ( #8994 )
2024-08-15 10:28:05 +03:00
Zhenwei Jin
4af8420afb
common : remove duplicate function llama_should_add_bos_token ( #8778 )
2024-08-15 10:23:23 +03:00
Esko Toivonen
6bda7ce6c3
llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish ( #8850 )
2024-08-15 10:17:12 +03:00
Georgi Gerganov
d5492f0525
ci : disable bench workflow ( #9010 )
2024-08-15 10:11:11 +03:00
Jiří Podivín
234b30676a
server : init stop and error fields of the result struct ( #9026 )
...
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-08-15 09:21:57 +03:00
0cc4m
5fd89a70ea
Vulkan Optimizations and Fixes ( #8959 )
...
* Optimize Vulkan REPEAT performance
* Use Vulkan GLSL fused multiply-add instruction where possible
* Add GGML_VULKAN_PERF option to output performance data per operator
* Rework and fix Vulkan descriptor set and descriptor pool handling
* Fix float32 concat f16 shader validation error
* Add Vulkan GROUP_NORM eps parameter
* Fix validation error with transfer queue memory barrier flags
* Remove trailing whitespaces
2024-08-14 18:32:53 +02:00
compilade
98a532d474
server : fix segfault on long system prompt ( #8987 )
...
* server : fix segfault on long system prompt
* server : fix parallel generation with very small batch sizes
* server : fix typo in comment
2024-08-14 09:51:02 +03:00
Georgi Gerganov
43bdd3ce18
cmake : remove unused option GGML_CURL ( #9011 )
2024-08-14 09:14:49 +03:00
Daniel Bevenius
06943a69f6
ggml : move rope type enum to ggml.h ( #8949 )
...
* ggml : move rope type enum to ggml.h
This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.
The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.
Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.
* squash! ggml : move rope type enum to ggml.h
This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.
I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.
* squash! ggml : move rope type enum to ggml.h
This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.
* squash! ggml : move rope type enum to ggml.h
This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.
* squash! ggml : move rope type enum to ggml.h
This commit fixes the editorconfig-checker warnings.
* squash! ggml : move rope type enum to ggml.h
Update comment for ggml_rope function.
* Revert "squash! ggml : move rope type enum to ggml.h"
This reverts commit 6261222bd0
.
* squash! ggml : move rope type enum to ggml.h
Add GGML_ROPE_TYPE_NEOX to rope_common.comp.
* remove extra line
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-08-13 21:13:15 +02:00
Xuan Son Nguyen
828d6ff7d7
export-lora : throw error if lora is quantized ( #9002 )
2024-08-13 11:41:14 +02:00
Diogo Teles Sant'Anna
fc4ca27b25
ci : fix github workflow vulnerable to script injection ( #9008 )
...
Signed-off-by: Diogo Teles Sant'Anna <diogoteles@google.com>
2024-08-12 19:28:23 +03:00
Radoslav Gerganov
1f67436c5e
ci : enable RPC in all of the released builds ( #9006 )
...
ref: #8912
2024-08-12 19:17:03 +03:00
Nico Bosshard
0fd93cdef5
llama : model-based max number of graph nodes calculation ( #8970 )
...
* llama : model-based max number of graph nodes calculation
* Update src/llama.cpp
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-08-12 17:13:59 +02:00
Frank Mai
84eb2f4fad
docs: introduce gpustack and gguf-parser ( #8873 )
...
* readme: introduce gpustack
GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.
Signed-off-by: thxCode <thxcode0824@gmail.com>
* readme: introduce gguf-parser
GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.
Signed-off-by: thxCode <thxcode0824@gmail.com>
---------
Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-08-12 14:45:50 +02:00
DavidKorczynski
1262e7ed13
grammar-parser : fix possible null-deref ( #9004 )
...
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680
Signed-off-by: David Korczynski <david@adalogics.com>
2024-08-12 15:36:41 +03:00
DavidKorczynski
df5478fbea
ggml: fix div-by-zero ( #9003 )
...
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5
Signed-off-by: David Korczynski <david@adalogics.com>
2024-08-12 14:21:41 +02:00
Liu Jia
2589292cde
Fix a spelling mistake ( #9001 )
2024-08-12 11:46:03 +02:00
Georgi Gerganov
d3ae0ee8d7
py : fix requirements check '==' -> '~=' ( #8982 )
...
* py : fix requirements check '==' -> '~='
* cont : fix the fix
* ci : run on all requirements.txt
2024-08-12 11:02:01 +03:00
Georgi Gerganov
5ef07e25ac
server : handle models with missing EOS token ( #8997 )
...
ggml-ci
2024-08-12 10:21:50 +03:00
compilade
4134999e01
gguf-py : Numpy dequantization for most types ( #8939 )
...
* gguf-py : Numpy dequantization for most types
* gguf-py : Numpy dequantization for grid-based i-quants
2024-08-11 14:45:41 -04:00