Commit graph

3104 commits

Author SHA1 Message Date
HanishKVC
0e7880a694 SimpleChat: model request field for openai/equivalent compat
May help testing with openai/equivalent web services, if they
require this field.
2024-06-01 18:18:14 +05:30
HanishKVC
85fd2d0d84 SimpleChat: readme wrt authorization, maybe minimal openai testing 2024-06-01 18:18:14 +05:30
HanishKVC
7a0399e582 SimpleChat:UI+: Return div and element wrt creatediv helpers
use it to set placeholder wrt Authorization header.

Also fix copy-paste oversight.
2024-06-01 18:18:14 +05:30
HanishKVC
af342b3bd0 SimpleChat: Allow Authorization header to be set by end user 2024-06-01 18:18:14 +05:30
HanishKVC
c9559d2b26 SimpleChat: Rather need to use append to insert headers 2024-06-01 18:18:14 +05:30
HanishKVC
dce4e6a64b SimpleChat: Move request headers into Me and gMe
Inturn allow Authorization to be sent, if not empty.
2024-06-01 18:18:14 +05:30
HanishKVC
f54e000039 SimpleChat: Add support for changing the base url
This ensures that if the user is running the server with a
different port or wants to try connect to server on a different
machine, then this can be used.
2024-06-01 18:18:14 +05:30
HanishKVC
ebf978d2bf SimpleChat:UI: Add input element helper 2024-06-01 18:18:14 +05:30
HanishKVC
104848b097 SimpleChat: Move baseUrl to Me and inturn gMe
This should allow easy updating of the base url at runtime by the
end user.
2024-06-01 18:18:14 +05:30
HanishKVC
ace37042fa SimpleChat:MultiPart/Stream flow cleanup
Dont try utf8-decode and newlines-add_append if no data to work on.

If there is no more data to get (ie done is set), then let NewLines
instance return line without newline at end, So that we dont miss
out on any last-data-line without newline kind of scenario.

Pass stream flag wrt utf-8 decode, so that if any multi-byte char
is only partly present in the passed buffer, it can be accounted
for along with subsequent buffer. At sametime, bcas of utf-8's
characteristics there shouldnt be any unaccounted bytes at end,
for valid block of utf8 data split across chunks, so not bothering
calling with stream set to false at end. LATER: Look at TextDecoder's
implementation, for any over intelligence, it may be doing..
If needed, one can use done flag to account wrt both cases.
2024-06-01 18:18:14 +05:30
HanishKVC
fcd385c36a SimpleChat: Disable console debug by default by making it dummy
Parallely save a reference to the original func.
2024-06-01 18:18:14 +05:30
HanishKVC
07923745cf SimpleChat:HandleResponseMultiPart using NewLines helper
Make handle_response_multipart logic better and cleaner. Now it
allows for working with the situation, where the delta data line
got from server in stream mode, could be split up when recving,
but still the logic will handle it appropriately.

ALERT: Rather except (for now) for last data line wrt a request's
response.
2024-06-01 18:18:14 +05:30
HanishKVC
7251714bcb SimpleChat:DU: Make NewLines shift more robust and flexible 2024-06-01 18:18:14 +05:30
HanishKVC
b7a5424c13 SimpleChat:DU: Add NewLines helper class
To work with an array of new lines. Allow adding, appending,
shifting, ...
2024-06-01 18:18:14 +05:30
HanishKVC
4d354556dc SimpleChat: show streamed generative text as it becomes available
Now that the extracting of streamed generated text is implemented,
add logic to show the same on the screen.
2024-06-01 18:18:14 +05:30
HanishKVC
08b117b4a7 SimpleChat: Add MultiPart Response handling, common trimming
Add logic to call into multipart/stream server response handling.

Move trimming of garbage at the end into the common handle_response
helper.

Add new global flag to control between oneshot and multipart/stream
mode of fetching response. Allow same to be controlled by user.

If in multipart/stream mode, send the stream flag to the server.
2024-06-01 18:18:14 +05:30
HanishKVC
aecf0e23fd SimpleChat: Move multi part server response handling in 2024-06-01 18:18:14 +05:30
HanishKVC
8f97c23895 SimpleChat: Move handling oneshot mode server response
Move handling of the oneshot mode server response into SimpleChat.

Also add plumbing for moving multipart server response into same.
2024-06-01 18:18:14 +05:30
HanishKVC
9d0e65d16a SimpleChat:Stream:Initial handshake skeleton
Parse the got stream responses and try extract the data from it.

It allows for a part read to get a single data line or multiple
data line. Inturn extract the json body and inturn the delta
content/message in it.
2024-06-01 18:18:14 +05:30
HanishKVC
060925cda3 SimpleChat: Cleanup readme a bit, add one more chathistory length 2024-06-01 18:18:14 +05:30
HanishKVC
f5f9a2b35e SimpleChat:DU: Bring in both trim garbage logics to try trim 2024-06-01 18:18:14 +05:30
HanishKVC
269cf3f596 SimpleChat:Move extracting assistant response to SimpleChat class
so also the trimming of garbage.
2024-06-01 18:18:14 +05:30
HanishKVC
b2c10b960d SimpleChat: Cleanup a bit wrt Api end point related flow
Consolidate many of the Api end point related basic meta data into
ApiEP class.

Remove the hardcoded ApiEP/Mode settings from html+js, instead use
the generic select helper logic, inturn in the settings block.

Move helper to generate the appropriate request json string based
on ApiEP into SimpleChat class itself.
2024-06-01 18:18:14 +05:30
HanishKVC
f9fc543190 SimpleChat: highlight trim, garbage trimming bitmore aggressive
Make it easy for end user to identified the trimmed text.

Make garbage trimming logic, consider a longer repeat garbage
substring.
2024-06-01 18:18:14 +05:30
HanishKVC
42b4fe555e SimpleChat: GarbageTrim enable/disable, show trimmed part ifany 2024-06-01 18:18:14 +05:30
HanishKVC
1db965d00d SimpleChat: Update a bit wrt readme and notes in du 2024-06-01 18:18:14 +05:30
HanishKVC
452813f235 SimpleChat:UI:Settings make boolean button text show meaning 2024-06-01 18:18:14 +05:30
HanishKVC
0dae12ba6b SimpleChat:UI:Add settings button and bring in settings ui 2024-06-01 18:18:14 +05:30
HanishKVC
e17f5e0204 SimpleChat:UI: Add Div wrapped label+element helpers
Move settings related elements to use the new div wrapped ones.
2024-06-01 18:18:14 +05:30
HanishKVC
94bc0b08d8 SimpleChat:UI:Select: dict-name-value, value wrt default, change
Take a dict/object of name-value pairs instead of just names.
Inturn specify the actual value wrt default, rather than the
string representing that value.

Trap the needed change event rather than click wrt select.
2024-06-01 18:18:14 +05:30
HanishKVC
1e47a48b30 SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt 2024-06-01 18:18:14 +05:30
HanishKVC
e42249d82d SimpleChat:UI: Helper to create bool button and use it wrt settings 2024-06-01 18:18:14 +05:30
HanishKVC
ae7e66d27a SimpleChat:UI: Add and use a para-create-append helper
Also update the config params dump to indicate that now one needs
to use document to get hold of gMe global object, this is bcas of
moving to module type js.

Also add ui.mjs to importmap
2024-06-01 18:18:14 +05:30
HanishKVC
ed345abac8 SimpleChat:DU:Avoid setting frequence/Presence penalty
Some models like llama3 found to try to be over intelligent by
repeating garbage still, but by tweaking the garbage a bit so that
it is not exactly same. So avoid setting these penalties and let
the model's default behaviour work out, as is.

Also the simple minded histogram based garbage trimming from end,
works to an extent, when the garbage is more predictable and
repeatative.
2024-06-01 18:18:14 +05:30
HanishKVC
a41f701159 SimpleChat:UI: Move html ui base helpers into its own module 2024-06-01 18:18:14 +05:30
HanishKVC
15152af94f SimpleChat:DU: Cleanup debug log messages 2024-06-01 18:18:14 +05:30
HanishKVC
ae9f610663 SimpleChat:DU: Bring in maxType to the mix along with maxUniq
Allow for more uniq chars, but then ensure that a given type of
char ie numerals or alphabets or other types dont cross the
specified maxType limit. This allows intermixed text garbage
to be identified and trimmed.
2024-06-01 18:18:14 +05:30
HanishKVC
d1e73d8777 SimpleChat:DU: Switch trim garbage hist based to maxUniq simple
Instead of blindly building histogram for specified substring
length, and then checking if any new char within specified min
garbage length limit, NOW exit learn state when specified maxUniq
chars are found. Inturn there should be no new chars with in
the specified min garbage length required limit.

TODO: Need to track char classes like alphabets, numerals and
special/other chars.
2024-06-01 18:18:14 +05:30
HanishKVC
f33aa28149 SimpleChat:DU: Try trim using histogram based info
TODO: May have to add max number of uniq chars in histogram at
end of learning phase.
2024-06-01 18:18:14 +05:30
HanishKVC
6390f3489a SimpleChat:DU:TrimGarbage if unable try skip char and retry 2024-06-01 18:18:13 +05:30
HanishKVC
54802dc184 SimpleChat:DU: Add trim garbage at end in loop helper 2024-06-01 18:18:13 +05:30
HanishKVC
c83c19ad4c SimpleChat:DU:BringIn local helper js modules using importmap
Use it to bring in a simple trim garbage at end logic, which is
used to trim received response.

Also given that importmap assumes esm / standard js modules, so
also global variables arent implicitly available outside the
modules. So add it has a member of document for now
2024-06-01 18:18:13 +05:30
Johannes Gäßler
9b596417af
CUDA: quantized KV support for FA vec (#7527)
* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code
2024-06-01 08:44:14 +02:00
Georgi Gerganov
a323ec60af
server : update js (#7670) 2024-05-31 22:23:04 +03:00
Galunid
0515ad93f4
convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) 2024-05-31 17:42:33 +02:00
Johannes Gäßler
c8047d538f
scripts: update compare_llama_bench.py [no ci] (#7673) 2024-05-31 16:26:21 +02:00
Daniele
30e238b246
Improve HIP compatibility (#7672) 2024-05-31 16:00:29 +02:00
Georgi Gerganov
16926dff92
readme : link homebrew discussion 2024-05-31 15:04:58 +03:00
Georgi Gerganov
0c27e6f62e
ggml : fix loongson compile warnings (#7537)
* ggml : fix loongson compile warnings

ggml-ci

* Fix loongarch quantize test fail.

Fix unexpected error introduced during rebase code.

* tests : disable json test due to lack of python on the CI node

ggml-ci

---------

Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-05-31 14:17:10 +03:00
Galunid
2e32f874e6
Somehow '**' got lost (#7663) 2024-05-31 18:24:41 +10:00