examples: support LLaVA v1.5 (multimodal model) (#3436)

* WIP: start implementing LLaVA

* rm scratch buf for now, will revert after cleanup

* LLaVA image encoder is working. will combine with llama

* Add llava inference code, but it's buggy. debugging

* LLaVA is working e2e, needs to optimize memory allocation + cleanup

* Use ggml_allocr + rm unnecessary code

* fix: crlf -> lf

* fix: new line at EoF

* fix: trailing whitespace

* Add readme

* Update readme

* Some cleanup

* Are you happy editorconfig?

* rm unused batch image preprocessing

* rm unused import

* fix: rm designated initializers

* introduce pad-to-square mode for non-square images

* are you happy editorconfig?

* gitignore /llava

* Handle cases where image file does not exist

* add llava target to Makefile

* add support for 13b model variant

* Maybe seed is unlucky?

* Check if apples are compared to apples

* are you happy editorconfig?

* Use temperature = 0.1 by default

* command line: use gpt_params_parse()

* minor

* handle default n_predict

* fix typo

* llava : code formatting, rename files, fix compile warnings

* do not use Wno-cast-qual for MSVC

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
M. Yusuf Sarıgöz 2023-10-12 18:23:18 +03:00 committed by GitHub
parent 9e24cc6e2e
commit 370359e5ba
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
15 changed files with 10214 additions and 2 deletions

View file

@ -384,6 +384,18 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
break;
}
params.lora_base = argv[i];
} else if (arg == "--mmproj") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.mmproj = argv[i];
} else if (arg == "--image") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.image = argv[i];
} else if (arg == "-i" || arg == "--interactive") {
params.interactive = true;
} else if (arg == "--embedding") {
@ -703,6 +715,8 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
printf(" -np N, --parallel N number of parallel sequences to decode (default: %d)\n", params.n_parallel);
printf(" -ns N, --sequences N number of sequences to decode (default: %d)\n", params.n_sequences);
printf(" -cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)\n");
printf(" --mmproj MMPROJ_FILE path to a multimodal projector file for LLaVA. see examples/llava/README.md\n");
printf(" --image IMAGE_FILE path to an image file. use with multimodal models\n");
if (llama_mlock_supported()) {
printf(" --mlock force system to keep model in RAM rather than swapping or compressing\n");
}