llama.cpp

Author	SHA1	Message	Date
HanishKVC	5a5f6ab848	SimpleChat: Update notes a bit. Try keep browser happy Avoid browser quirk mode with DOCTYPE. Help with accessibility a bit by specifying the language explicitly. Specify the char encoding explicitly, inturn utf-8 is a safe bet, even with intermixing of languages if reqd in future. Add a cache-control http-equiv meta tag, which in all probability will be ignored. Defer js loading and execution, just for fun and future, not that critical here as it stands now.	2024-05-19 01:59:25 +05:30
HanishKVC	6eb1e0fbde	SimpleChat:JS: bottom of element visible, Set focus to user input As the generated text could be multiple lines and occupy more space that the full scrollable div's vertical space, make the bottom of the last element (which can be such a generated text) in the div visible by scrolling. Ensure that the user input box has focus	2024-05-18 22:59:21 +05:30
HanishKVC	a944ce7cbe	SimpleChat:JS: Try ensure the last entry in chat is visible Needed because now only the chat div is scrollable and not the full page. In last commit the chat div size was fixed to 75% vertical height, so the full page no longer scrolls, so the old bring user-input element to view wont work, instead now the last element in the chat div should be brought into view.	2024-05-18 22:23:34 +05:30
HanishKVC	a1a2f36a45	SimpleChat:CSS: Allow for chat div to be scrollable	2024-05-18 22:11:59 +05:30
HanishKVC	ebd5e71295	SimpleChat:CSS: Move style info into its own css file To keep it simple, clean and seperate so that things are not unnecessarily cluttered.	2024-05-18 17:09:47 +05:30
HanishKVC	65a56e6fdb	SimpleChat: Update the readme file	2024-05-18 03:37:15 +05:30
HanishKVC	0d0a28b4ab	SimpleChat:HTML: Add a style for system role message	2024-05-18 03:31:37 +05:30
HanishKVC	601fedf8c1	SimpleChat: Move handling systemprompt into its own func	2024-05-18 03:19:59 +05:30
HanishKVC	72151aa634	SimpleChat:Alert user if they provide sysprompt late or change it	2024-05-18 03:16:30 +05:30
HanishKVC	884adfd739	SimpleChat: Ignore empty user input, without trimming	2024-05-18 03:07:40 +05:30
HanishKVC	ae52ad1675	SimpleChat:Allow system prompt to be set, if provided before user	2024-05-18 02:59:42 +05:30
HanishKVC	69817fe1de	SimpleChat:HTML: Cleanup/structure UI a bit, Add input for system	2024-05-18 01:40:57 +05:30
HanishKVC	668b98700c	SimpleChat: Add a simple readme file	2024-05-18 01:06:54 +05:30
HanishKVC	b3644172e0	SimpleChat:JS: Force completion mode be single message by default	2024-05-18 00:36:23 +05:30
HanishKVC	aef32d9cc0	SimpleChat:JS: Handle difference in response Try read the assistance response from appropriate field in the response got. Also examples/server seems to return the response in a slightly different field, so try account for that also.	2024-05-18 00:36:23 +05:30
HanishKVC	3e5edbacd6	SimpleChat: Dont submit if already submitted and waiting Also make chat the default selection wrt mode	2024-05-18 00:36:23 +05:30
HanishKVC	9feb58eaa5	SimpleChat: Allow user to select chat or completion mode	2024-05-18 00:36:23 +05:30
HanishKVC	e62087bf3f	SimpleChat:JS: Try trap enter key press wrt input text field So user can either press submit button or press enter key	2024-05-18 00:36:23 +05:30
HanishKVC	29d2d22c02	SimpleChat:sh: Add simple shell script to run python3 http.server So one needs to run the llm server locally then run this script and access it using a local browser	2024-05-18 00:36:23 +05:30
HanishKVC	ebe330d098	SimpleChat: Move into its own sub directory to avoid confusion	2024-05-18 00:36:23 +05:30
HanishKVC	9942851273	SimpleChat: Diff user/assistant msgs, Make input wider Also show a default message to user Also add some metas	2024-05-18 00:36:23 +05:30
HanishKVC	7d772f6b9a	SimpleChat: Try keep input element in view	2024-05-18 00:36:23 +05:30
HanishKVC	564469e4f6	SimpleChat:JS: Messages/Prompt, indicate working to end user	2024-05-18 00:36:23 +05:30
HanishKVC	c6653479fc	SimpleChat:JS: Extract model response and show to user	2024-05-18 00:36:23 +05:30
HanishKVC	33bc67baa6	SimpleChat: Try handshake with llm over its web service endpoint	2024-05-18 00:36:23 +05:30
HanishKVC	27268a6067	SimpleChat: Move handling of submit request into its own func	2024-05-18 00:36:23 +05:30
HanishKVC	ce4aaeb692	SimpleChat: Use common helper logic wrt json data	2024-05-18 00:36:23 +05:30
HanishKVC	639d647ebf	SimpleChat: Also add completions related prompt	2024-05-18 00:36:23 +05:30
HanishKVC	256e02c7c9	SimpleChat: Rather value wrt input text element	2024-05-18 00:36:23 +05:30
HanishKVC	24d348ab97	SimpleChat:HTML: Bring in the js file	2024-05-18 00:36:23 +05:30
HanishKVC	70e5860264	SimpleChatJS: Roles Class, submitClick Define Role class with static members corresponding to the roles. Update startme to * Get hold of the ui elements. * Attach a click handler to submit button, which adds the user input to xchats array and shows the chat messages till now in chat div element. Trap DOMContentLoaded to trigger startme	2024-05-18 00:36:23 +05:30
HanishKVC	1d3cc9353a	SimpleChat: request_json, globals, startme	2024-05-18 00:36:23 +05:30
HanishKVC	0402a4b60e	SimpleChat: A js skeleton with SimpleChat class Allows maintaining an array of chat message. Allows adding chat message (from any of the roles be it system, user, assistant, ...) Allows showing chat messages till now, in a given div element.	2024-05-18 00:36:23 +05:30
HanishKVC	69ecad21e7	SimpleChat: Add a skeletal html page Contains a div placeholder for showing chat messages till now a text-input for allowing user to enter next chat message/query to the model. a submit button to allow sending of the user entered message and chat till now to the model.	2024-05-18 00:36:22 +05:30
Johannes Gäßler	0fc1e820a9	CUDA: faster large batch FA without tensor cores (#7314 )	2024-05-17 18:54:52 +02:00
Gavin Zhao	82ca83db3c	ROCm: use native CMake HIP support (#5966 ) Supercedes #4024 and #4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <git@gzgz.dev>	2024-05-17 17:03:03 +02:00
Radoslav Gerganov	f4bd8b3d26	rpc : set SO_REUSEADDR for the server socket (#7320 ) ref: #7293	2024-05-17 17:25:44 +03:00
Brian	51e9d02599	Added a single test function script and fix debug-test.sh to be more robust (#7279 ) * run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust * debug-test.sh: combined execute and gdb test mode via -g flag * debug-test.sh: refactor * debug-test: refactor for clarity * debug-test.sh: comment style changes * debug-test.sh: fix gdb	2024-05-17 22:40:14 +10:00
Aarni Koskela	d273c1402b	py : convert-hf-to-gguf-update improvements (#7340 ) * convert-hf-to-gguf-update: automate updating * convert-hf-to-gguf-update: improve download * share requests session for performance * create directories only when needed, don't skip downloads when empty directory encountered * be more graceful about errors	2024-05-17 15:11:45 +03:00
fairydreaming	27b040691c	llama : use n_embd_head_v when reshaping kqv (#7327 ) * llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv * llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors. --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-05-17 14:24:38 +03:00
Johannes Gäßler	29c60d8cdd	tokenization: add warning for double BOS (#7332 )	2024-05-17 09:59:57 +02:00
Herman Semenov	359cbe3f46	ggml-quants, llama : removed excess checks (#7274 )	2024-05-17 10:08:49 +03:00
amd-lalithnc	e18bc6aaf3	convert : fix Qwen/Qwen-7b conversion (#7308 )	2024-05-17 10:01:58 +03:00
Radoslav Gerganov	ee94172d33	server : add support for the RPC backend (#7305 ) ref: #7292	2024-05-17 10:00:17 +03:00
Justine Tunney	934266c0e0	ggml : rewrite silu and softmax for cpu (#7154 ) This change upstreams llamafile's vectorized expf() functions. This lets us compute softmax and silu more accurately than the short[65536] lookup table that GGML previously used to make this operation go faster. We can support aarch64 and sse2+ with the worst case rounding error of 2ulp. It makes make -j8 tests && ./tests/test-backend-ops -o SOFT_MAX -b CPU perf go 1.5x faster for SSE2+FMA, 1.9x faster for AVX2+FMA and 2.1x on AVX512	2024-05-17 09:58:52 +03:00
Leon Knauer	9c4fdcbec8	[Server] Added --verbose option to README [no ci] (#7335 )	2024-05-17 10:11:03 +10:00
Pierrick Hymbert	24ecb58168	Revert "server bench: fix bench not waiting for model load (#7284 )" (#7334 ) This reverts commit `583fd6b000`.	2024-05-16 20:43:45 +02:00
Radoslav Gerganov	9afdffe70e	rpc : get available mem for the CPU backend This can be overridden with the -m command line option ref: #7293	2024-05-16 12:04:08 +03:00
Radoslav Gerganov	3b3963c55c	rpc : add command line arg for specifying backend memory ref: #7293	2024-05-16 09:58:29 +03:00
Jared Van Bortel	dda64fc17c	convert : get general.name from model dir, not its parent (#5615 ) Co-authored-by: Brian <mofosyne@gmail.com>	2024-05-16 16:15:23 +10:00

1 2 3 4 5 ...

2949 commits