Commit graph

2949 commits

Author SHA1 Message Date
HanishKVC
5a5f6ab848 SimpleChat: Update notes a bit. Try keep browser happy
Avoid browser quirk mode with DOCTYPE.

Help with accessibility a bit by specifying the language explicitly.

Specify the char encoding explicitly, inturn utf-8 is a safe bet,
even with intermixing of languages if reqd in future.

Add a cache-control http-equiv meta tag, which in all probability
will be ignored.

Defer js loading and execution, just for fun and future, not that
critical here as it stands now.
2024-05-19 01:59:25 +05:30
HanishKVC
6eb1e0fbde SimpleChat:JS: bottom of element visible, Set focus to user input
As the generated text could be multiple lines and occupy more space
that the full scrollable div's vertical space, make the bottom of
the last element (which can be such a generated text) in the div
visible by scrolling.

Ensure that the user input box has focus
2024-05-18 22:59:21 +05:30
HanishKVC
a944ce7cbe SimpleChat:JS: Try ensure the last entry in chat is visible
Needed because now only the chat div is scrollable and not the full
page.

In last commit the chat div size was fixed to 75% vertical height,
so the full page no longer scrolls, so the old bring user-input
element to view wont work, instead now the last element in the
chat div should be brought into view.
2024-05-18 22:23:34 +05:30
HanishKVC
a1a2f36a45 SimpleChat:CSS: Allow for chat div to be scrollable 2024-05-18 22:11:59 +05:30
HanishKVC
ebd5e71295 SimpleChat:CSS: Move style info into its own css file
To keep it simple, clean and seperate so that things are not
unnecessarily cluttered.
2024-05-18 17:09:47 +05:30
HanishKVC
65a56e6fdb SimpleChat: Update the readme file 2024-05-18 03:37:15 +05:30
HanishKVC
0d0a28b4ab SimpleChat:HTML: Add a style for system role message 2024-05-18 03:31:37 +05:30
HanishKVC
601fedf8c1 SimpleChat: Move handling systemprompt into its own func 2024-05-18 03:19:59 +05:30
HanishKVC
72151aa634 SimpleChat:Alert user if they provide sysprompt late or change it 2024-05-18 03:16:30 +05:30
HanishKVC
884adfd739 SimpleChat: Ignore empty user input, without trimming 2024-05-18 03:07:40 +05:30
HanishKVC
ae52ad1675 SimpleChat:Allow system prompt to be set, if provided before user 2024-05-18 02:59:42 +05:30
HanishKVC
69817fe1de SimpleChat:HTML: Cleanup/structure UI a bit, Add input for system 2024-05-18 01:40:57 +05:30
HanishKVC
668b98700c SimpleChat: Add a simple readme file 2024-05-18 01:06:54 +05:30
HanishKVC
b3644172e0 SimpleChat:JS: Force completion mode be single message by default 2024-05-18 00:36:23 +05:30
HanishKVC
aef32d9cc0 SimpleChat:JS: Handle difference in response
Try read the assistance response from appropriate field in the
response got.

Also examples/server seems to return the response in a slightly
different field, so try account for that also.
2024-05-18 00:36:23 +05:30
HanishKVC
3e5edbacd6 SimpleChat: Dont submit if already submitted and waiting
Also make chat the default selection wrt mode
2024-05-18 00:36:23 +05:30
HanishKVC
9feb58eaa5 SimpleChat: Allow user to select chat or completion mode 2024-05-18 00:36:23 +05:30
HanishKVC
e62087bf3f SimpleChat:JS: Try trap enter key press wrt input text field
So user can either press submit button or press enter key
2024-05-18 00:36:23 +05:30
HanishKVC
29d2d22c02 SimpleChat:sh: Add simple shell script to run python3 http.server
So one needs to run the llm server locally
then run this script and access it using a local browser
2024-05-18 00:36:23 +05:30
HanishKVC
ebe330d098 SimpleChat: Move into its own sub directory to avoid confusion 2024-05-18 00:36:23 +05:30
HanishKVC
9942851273 SimpleChat: Diff user/assistant msgs, Make input wider
Also show a default message to user

Also add some metas
2024-05-18 00:36:23 +05:30
HanishKVC
7d772f6b9a SimpleChat: Try keep input element in view 2024-05-18 00:36:23 +05:30
HanishKVC
564469e4f6 SimpleChat:JS: Messages/Prompt, indicate working to end user 2024-05-18 00:36:23 +05:30
HanishKVC
c6653479fc SimpleChat:JS: Extract model response and show to user 2024-05-18 00:36:23 +05:30
HanishKVC
33bc67baa6 SimpleChat: Try handshake with llm over its web service endpoint 2024-05-18 00:36:23 +05:30
HanishKVC
27268a6067 SimpleChat: Move handling of submit request into its own func 2024-05-18 00:36:23 +05:30
HanishKVC
ce4aaeb692 SimpleChat: Use common helper logic wrt json data 2024-05-18 00:36:23 +05:30
HanishKVC
639d647ebf SimpleChat: Also add completions related prompt 2024-05-18 00:36:23 +05:30
HanishKVC
256e02c7c9 SimpleChat: Rather value wrt input text element 2024-05-18 00:36:23 +05:30
HanishKVC
24d348ab97 SimpleChat:HTML: Bring in the js file 2024-05-18 00:36:23 +05:30
HanishKVC
70e5860264 SimpleChatJS: Roles Class, submitClick
Define Role class with static members corresponding to the roles.

Update startme to

* Get hold of the ui elements.

* Attach a click handler to submit button, which adds the user input
  to xchats array and shows the chat messages till now in chat div
  element.

Trap DOMContentLoaded to trigger startme
2024-05-18 00:36:23 +05:30
HanishKVC
1d3cc9353a SimpleChat: request_json, globals, startme 2024-05-18 00:36:23 +05:30
HanishKVC
0402a4b60e SimpleChat: A js skeleton with SimpleChat class
Allows maintaining an array of chat message.

Allows adding chat message (from any of the roles be it system,
user, assistant, ...)

Allows showing chat messages till now, in a given div element.
2024-05-18 00:36:23 +05:30
HanishKVC
69ecad21e7 SimpleChat: Add a skeletal html page
Contains a div placeholder for showing chat messages till now

a text-input for allowing user to enter next chat message/query
to the model.

a submit button to allow sending of the user entered message and
chat till now to the model.
2024-05-18 00:36:22 +05:30
Johannes Gäßler
0fc1e820a9
CUDA: faster large batch FA without tensor cores (#7314) 2024-05-17 18:54:52 +02:00
Gavin Zhao
82ca83db3c
ROCm: use native CMake HIP support (#5966)
Supercedes #4024 and #4813.

CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:

1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.

2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.

3. Updated the Nix recipe to account for these new changes.

4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.

5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.

The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.

~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.

Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
2024-05-17 17:03:03 +02:00
Radoslav Gerganov
f4bd8b3d26
rpc : set SO_REUSEADDR for the server socket (#7320)
ref: #7293
2024-05-17 17:25:44 +03:00
Brian
51e9d02599
Added a single test function script and fix debug-test.sh to be more robust (#7279)
* run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust

* debug-test.sh: combined execute and gdb test mode via -g flag

* debug-test.sh: refactor

* debug-test: refactor for clarity

* debug-test.sh: comment style changes

* debug-test.sh: fix gdb
2024-05-17 22:40:14 +10:00
Aarni Koskela
d273c1402b
py : convert-hf-to-gguf-update improvements (#7340)
* convert-hf-to-gguf-update: automate updating

* convert-hf-to-gguf-update: improve download

* share requests session for performance
* create directories only when needed, don't skip downloads when empty directory encountered
* be more graceful about errors
2024-05-17 15:11:45 +03:00
fairydreaming
27b040691c
llama : use n_embd_head_v when reshaping kqv (#7327)
* llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv

* llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors.

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-05-17 14:24:38 +03:00
Johannes Gäßler
29c60d8cdd
tokenization: add warning for double BOS (#7332) 2024-05-17 09:59:57 +02:00
Herman Semenov
359cbe3f46
ggml-quants, llama : removed excess checks (#7274) 2024-05-17 10:08:49 +03:00
amd-lalithnc
e18bc6aaf3
convert : fix Qwen/Qwen-7b conversion (#7308) 2024-05-17 10:01:58 +03:00
Radoslav Gerganov
ee94172d33
server : add support for the RPC backend (#7305)
ref: #7292
2024-05-17 10:00:17 +03:00
Justine Tunney
934266c0e0
ggml : rewrite silu and softmax for cpu (#7154)
This change upstreams llamafile's vectorized expf() functions. This lets
us compute softmax and silu more accurately than the short[65536] lookup
table that GGML previously used to make this operation go faster. We can
support aarch64 and sse2+ with the worst case rounding error of 2ulp. It
makes make -j8 tests && ./tests/test-backend-ops -o SOFT_MAX -b CPU perf
go 1.5x faster for SSE2+FMA, 1.9x faster for AVX2+FMA and 2.1x on AVX512
2024-05-17 09:58:52 +03:00
Leon Knauer
9c4fdcbec8
[Server] Added --verbose option to README [no ci] (#7335) 2024-05-17 10:11:03 +10:00
Pierrick Hymbert
24ecb58168
Revert "server bench: fix bench not waiting for model load (#7284)" (#7334)
This reverts commit 583fd6b000.
2024-05-16 20:43:45 +02:00
Radoslav Gerganov
9afdffe70e rpc : get available mem for the CPU backend
This can be overridden with the -m command line option

ref: #7293
2024-05-16 12:04:08 +03:00
Radoslav Gerganov
3b3963c55c rpc : add command line arg for specifying backend memory
ref: #7293
2024-05-16 09:58:29 +03:00
Jared Van Bortel
dda64fc17c
convert : get general.name from model dir, not its parent (#5615)
Co-authored-by: Brian <mofosyne@gmail.com>
2024-05-16 16:15:23 +10:00