llama.cpp

Author	SHA1	Message	Date
pudepiedj	f3bb1e55c6	Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch	2024-03-02 22:10:30 +00:00
pudepiedj	bf366d2d9a	add api key	2024-03-02 22:10:28 +00:00
pudepiedj	8bda1c1041	Merge branch 'ggerganov:master' into server_branch	2024-03-02 12:09:07 +00:00
Neo Zhang Jianyu	715641391d	Support multiple GPUs (split mode) on SYCL backend (#5806 ) * suport multiple cards: split-mode - layer\|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments	2024-03-02 19:49:30 +08:00
pudepiedj	68814783c5	Merge remote-tracking branch 'origin/master' into server_branch	2024-03-02 10:28:37 +00:00
pudepiedj	5d61ae8d2a	Renaming some vars	2024-03-02 10:24:07 +00:00
crasm	9bf297a02b	workflows : remove nocleanup arg for check-requirements.sh (#5826 ) Reduces peak tmpfs usage and should prevent the check from failing from running out of space. Fixes the 'No space left on device' issue mentioned in #5703.	2024-03-02 00:11:06 -05:00
Tushar	cb5e8f7fc4	build(nix): Introduce flake.formatter for `nix fmt` (#5687 ) * build(nix): Introduce flake.formatter for `nix fmt` * chore: Switch to pkgs.nixfmt-rfc-style	2024-03-01 15:18:26 -08:00
nold	da3b9ba2b7	convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792 )	2024-03-01 16:51:12 -05:00
Sourab Mangrulkar	c29af7e225	llama : add StarCoder2 support (#5795 ) * Add support for starcoder2 * handle rope type * skip rope freq and rotary embeddings from being serialized * resolve comments * Update llama.cpp * remove redundant changes * handle `rope-theta` * llama : change starcoder2 rope type * address comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-01 21:30:46 +02:00
Georgi Gerganov	38d16b1426	server : remove api_like_OAI.py proxy script (#5808 )	2024-03-01 20:00:58 +02:00
pudepiedj	f51554180a	Merge remote-tracking branch 'origin/master' into server_branch	2024-03-01 17:26:01 +00:00
ddpasa	c2224f003b	ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813 )	2024-03-01 18:00:00 +01:00
pudepiedj	b47525df0a	server tweak	2024-03-01 15:53:56 +00:00
kunal-vaishnavi	e743386728	gemma : fix bfloat16 -> float16 conversion issue (#5810 )	2024-03-01 16:08:08 +02:00
Miwa / Ensan	f49a535686	common : fix flag `--logits-all` to `--all-logits` (#5805 )	2024-03-01 15:48:56 +02:00
Pierrick Hymbert	3ab8b3a92e	llama : cleanup unused mmq flags (#5772 ) * cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q * remove: mul_mat_q in compare llama bench and usage * update llama-bench --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-01 13:39:06 +02:00
Douglas Hanley	9600d59e01	unicode : switch to multimap based nfd_map (#5799 ) * switch to multimap based nfd_map due to compile time issues * simplify multimap keys * dont construct new locale every time	2024-03-01 11:15:36 +02:00
Pierrick Hymbert	5cb02b4a01	server: allow to override threads server pool with --threads-http (#5794 )	2024-03-01 10:08:08 +01:00
Eve	6ea0f010ff	ci : add Ubuntu 22 Vulkan CI run (#5789 )	2024-03-01 10:54:53 +02:00
Georgi Gerganov	f105471ef6	server : fix newlines in help (#5785 )	2024-03-01 09:59:43 +02:00
AidanBeltonS	38d1521608	[SYCL] Use batched mul_mat pathway (#5591 ) * Use batched mul_mat pathway * rm extra line * Explicitly state scaled data type --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-03-01 13:06:47 +05:30
Xuan Son Nguyen	052051d8ae	Server: normalize naming (#5779 ) * server: normalize naming * fix spacing	2024-02-29 21:42:11 +01:00
pudepiedj	13d0948fdc	server tweak	2024-02-29 18:14:08 +00:00
pudepiedj	71f885f2d0	Llamaserver.py changes	2024-02-29 16:56:51 +00:00
pudepiedj	a451708e90	n_ctx change	2024-02-29 12:40:53 +00:00
Marcus Dunn	d5ab29757e	llama : constified `llama_set_state_data`'s `src` (#5774 )	2024-02-29 10:17:23 +02:00
Georgi Gerganov	87c91c0766	ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771 ) ggml-ci	2024-02-28 21:44:21 +02:00
Eve	317709b2a8	make portability_enumeration_ext apple only (#5757 )	2024-02-28 20:33:37 +01:00
Georgi Gerganov	08c5ee87e4	llama : remove deprecated API (#5770 ) ggml-ci	2024-02-28 18:43:38 +02:00
Georgi Gerganov	78aacf3634	awq-py : remove (#5768 )	2024-02-28 17:36:53 +02:00
pudepiedj	ee7f05b52b	Exploring stdout redirection	2024-02-28 12:22:25 +00:00
pudepiedj	b56b9895ed	std::cerr	2024-02-28 12:05:08 +00:00
pudepiedj	dade1cefd4	Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch	2024-02-28 12:03:04 +00:00
pudepiedj	09e087f691	Merge remote-tracking branch 'origin/master' into server_branch	2024-02-28 12:03:01 +00:00
pudepiedj	7516a5b9ee	Merge branch 'ggerganov:master' into server_branch	2024-02-28 12:01:51 +00:00
pudepiedj	9f40bb7983	LOG_VERBOSE sorted	2024-02-28 11:59:45 +00:00
Georgi Gerganov	8c0e8f4e73	sync : ggml	2024-02-28 11:17:32 +02:00
slaren	2774b0c974	add google magika inference example (ggml/748) * add magika inference example * ggml : fix unaligned accesses in custom ops * ggml : fix FP32 GELU for values that exceed the FP16 range * use ggml_pool_1d * add README * Update README.md * pad inputs if the files are too small * cleanup ggml-ci	2024-02-28 11:17:06 +02:00
UEXTM.com	5f70671856	Introduce backend GUIDs (ggml/743) * Introduce backend GUIDs Initial proposed implementation of backend GUIDs (Discussed in https://github.com/ggerganov/ggml/pull/741) Hardcoded CPU backend GUID (for now) Change ggml_backend_is_cpu logic to use GUID * Remove redundant functions Remove redundant functions `ggml_backend_i::get_name` and `ggml_backend_guid` which are not desired for future expansion * Add spaces to match style Co-authored-by: slaren <slarengh@gmail.com> * Fix brace style to match Co-authored-by: slaren <slarengh@gmail.com> * Add void to () in function signature Co-authored-by: slaren <slarengh@gmail.com> * Add back ggml_backend_guid and make CPU_GUID a local static in ggml_backend_cpu_guid * add guids to all backends ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-02-28 11:17:05 +02:00
Xuan Son Nguyen	a693bea1e6	server : hit Ctrl+C twice to exit (#5734 ) * server: twice ctrl+C to exit * std::atomic_flag * sigint: message * sigint: stderr * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2024-02-28 10:55:37 +02:00
compilade	adcb12a9ba	llama : fix non-quantization of expert gating tensors (#5754 ) This reverts a single line from #5475	2024-02-28 10:52:56 +02:00
Douglas Hanley	177628bfd8	llama : improve BERT tokenization (#5740 ) * implement nfd for stripping accents in wpm tokenizer * sort nfd map; reuse iterator * use builtin tolower * add locale include * Simplify to_lower cases Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2024-02-28 10:51:11 +02:00
Daniel Bevenius	6c4416868d	readme : add link to LLaVA 1.6 models (#5758 ) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-28 10:39:39 +02:00
Jorge A	efc72253f7	server : add "/chat/completions" alias for "/v1/...` (#5722 ) * Add "/chat/completions" as alias for "/v1/chat/completions" * merge to upstream master * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-28 10:39:15 +02:00
Kawrakow	7c4263d426	ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760 ) * WIP: make i-quants work for QK_K = 64 * iq2_xs: attempt to fix AVX dot product for QK_K = 64 Tests pass, but I get gibberish. * QK_K = 64 tests pass on ARM_NEON and Metal Sadly, that does not mean it actually works. * Make CUDA compile with QK_K = 64 Tests don't pass, plus we get misaligned access * Q2_K: fixed bug in imatrix quantization for QK_K = 64 * iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work) --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-28 10:37:02 +02:00
pudepiedj	ebdc0d3907	Merge branch 'server_branch' of https://github.com/pudepiedj/llama.cpp into server_branch	2024-02-27 22:27:12 +00:00
pudepiedj	87d501fc10	Enable log redirection	2024-02-27 22:27:10 +00:00
Kawrakow	cb49e0f8c9	Attempt to fix android build (#5752 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-02-27 19:16:49 +02:00
pudepiedj	fddedfb950	Merge branch 'ggerganov:master' into server_branch	2024-02-27 15:53:43 +00:00

1 2 3 4 5 ...

2374 commits