Commit graph

3092 commits

Author SHA1 Message Date
HanishKVC
7251714bcb SimpleChat:DU: Make NewLines shift more robust and flexible 2024-06-01 18:18:14 +05:30
HanishKVC
b7a5424c13 SimpleChat:DU: Add NewLines helper class
To work with an array of new lines. Allow adding, appending,
shifting, ...
2024-06-01 18:18:14 +05:30
HanishKVC
4d354556dc SimpleChat: show streamed generative text as it becomes available
Now that the extracting of streamed generated text is implemented,
add logic to show the same on the screen.
2024-06-01 18:18:14 +05:30
HanishKVC
08b117b4a7 SimpleChat: Add MultiPart Response handling, common trimming
Add logic to call into multipart/stream server response handling.

Move trimming of garbage at the end into the common handle_response
helper.

Add new global flag to control between oneshot and multipart/stream
mode of fetching response. Allow same to be controlled by user.

If in multipart/stream mode, send the stream flag to the server.
2024-06-01 18:18:14 +05:30
HanishKVC
aecf0e23fd SimpleChat: Move multi part server response handling in 2024-06-01 18:18:14 +05:30
HanishKVC
8f97c23895 SimpleChat: Move handling oneshot mode server response
Move handling of the oneshot mode server response into SimpleChat.

Also add plumbing for moving multipart server response into same.
2024-06-01 18:18:14 +05:30
HanishKVC
9d0e65d16a SimpleChat:Stream:Initial handshake skeleton
Parse the got stream responses and try extract the data from it.

It allows for a part read to get a single data line or multiple
data line. Inturn extract the json body and inturn the delta
content/message in it.
2024-06-01 18:18:14 +05:30
HanishKVC
060925cda3 SimpleChat: Cleanup readme a bit, add one more chathistory length 2024-06-01 18:18:14 +05:30
HanishKVC
f5f9a2b35e SimpleChat:DU: Bring in both trim garbage logics to try trim 2024-06-01 18:18:14 +05:30
HanishKVC
269cf3f596 SimpleChat:Move extracting assistant response to SimpleChat class
so also the trimming of garbage.
2024-06-01 18:18:14 +05:30
HanishKVC
b2c10b960d SimpleChat: Cleanup a bit wrt Api end point related flow
Consolidate many of the Api end point related basic meta data into
ApiEP class.

Remove the hardcoded ApiEP/Mode settings from html+js, instead use
the generic select helper logic, inturn in the settings block.

Move helper to generate the appropriate request json string based
on ApiEP into SimpleChat class itself.
2024-06-01 18:18:14 +05:30
HanishKVC
f9fc543190 SimpleChat: highlight trim, garbage trimming bitmore aggressive
Make it easy for end user to identified the trimmed text.

Make garbage trimming logic, consider a longer repeat garbage
substring.
2024-06-01 18:18:14 +05:30
HanishKVC
42b4fe555e SimpleChat: GarbageTrim enable/disable, show trimmed part ifany 2024-06-01 18:18:14 +05:30
HanishKVC
1db965d00d SimpleChat: Update a bit wrt readme and notes in du 2024-06-01 18:18:14 +05:30
HanishKVC
452813f235 SimpleChat:UI:Settings make boolean button text show meaning 2024-06-01 18:18:14 +05:30
HanishKVC
0dae12ba6b SimpleChat:UI:Add settings button and bring in settings ui 2024-06-01 18:18:14 +05:30
HanishKVC
e17f5e0204 SimpleChat:UI: Add Div wrapped label+element helpers
Move settings related elements to use the new div wrapped ones.
2024-06-01 18:18:14 +05:30
HanishKVC
94bc0b08d8 SimpleChat:UI:Select: dict-name-value, value wrt default, change
Take a dict/object of name-value pairs instead of just names.
Inturn specify the actual value wrt default, rather than the
string representing that value.

Trap the needed change event rather than click wrt select.
2024-06-01 18:18:14 +05:30
HanishKVC
1e47a48b30 SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt 2024-06-01 18:18:14 +05:30
HanishKVC
e42249d82d SimpleChat:UI: Helper to create bool button and use it wrt settings 2024-06-01 18:18:14 +05:30
HanishKVC
ae7e66d27a SimpleChat:UI: Add and use a para-create-append helper
Also update the config params dump to indicate that now one needs
to use document to get hold of gMe global object, this is bcas of
moving to module type js.

Also add ui.mjs to importmap
2024-06-01 18:18:14 +05:30
HanishKVC
ed345abac8 SimpleChat:DU:Avoid setting frequence/Presence penalty
Some models like llama3 found to try to be over intelligent by
repeating garbage still, but by tweaking the garbage a bit so that
it is not exactly same. So avoid setting these penalties and let
the model's default behaviour work out, as is.

Also the simple minded histogram based garbage trimming from end,
works to an extent, when the garbage is more predictable and
repeatative.
2024-06-01 18:18:14 +05:30
HanishKVC
a41f701159 SimpleChat:UI: Move html ui base helpers into its own module 2024-06-01 18:18:14 +05:30
HanishKVC
15152af94f SimpleChat:DU: Cleanup debug log messages 2024-06-01 18:18:14 +05:30
HanishKVC
ae9f610663 SimpleChat:DU: Bring in maxType to the mix along with maxUniq
Allow for more uniq chars, but then ensure that a given type of
char ie numerals or alphabets or other types dont cross the
specified maxType limit. This allows intermixed text garbage
to be identified and trimmed.
2024-06-01 18:18:14 +05:30
HanishKVC
d1e73d8777 SimpleChat:DU: Switch trim garbage hist based to maxUniq simple
Instead of blindly building histogram for specified substring
length, and then checking if any new char within specified min
garbage length limit, NOW exit learn state when specified maxUniq
chars are found. Inturn there should be no new chars with in
the specified min garbage length required limit.

TODO: Need to track char classes like alphabets, numerals and
special/other chars.
2024-06-01 18:18:14 +05:30
HanishKVC
f33aa28149 SimpleChat:DU: Try trim using histogram based info
TODO: May have to add max number of uniq chars in histogram at
end of learning phase.
2024-06-01 18:18:14 +05:30
HanishKVC
6390f3489a SimpleChat:DU:TrimGarbage if unable try skip char and retry 2024-06-01 18:18:13 +05:30
HanishKVC
54802dc184 SimpleChat:DU: Add trim garbage at end in loop helper 2024-06-01 18:18:13 +05:30
HanishKVC
c83c19ad4c SimpleChat:DU:BringIn local helper js modules using importmap
Use it to bring in a simple trim garbage at end logic, which is
used to trim received response.

Also given that importmap assumes esm / standard js modules, so
also global variables arent implicitly available outside the
modules. So add it has a member of document for now
2024-06-01 18:18:13 +05:30
Johannes Gäßler
9b596417af
CUDA: quantized KV support for FA vec (#7527)
* CUDA: quantized KV support for FA vec

* try CI fix

* fix commented-out kernel variants

* add q8_0 q4_0 tests

* fix nwarps > batch size

* split fattn compile via extern templates

* fix flake8

* fix metal tests

* fix cmake

* make generate_cu_files.py executable

* add autogenerated .cu files

* fix AMD

* error if type_v != FP16 and not flash_attn

* remove obsolete code
2024-06-01 08:44:14 +02:00
Georgi Gerganov
a323ec60af
server : update js (#7670) 2024-05-31 22:23:04 +03:00
Galunid
0515ad93f4
convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) 2024-05-31 17:42:33 +02:00
Johannes Gäßler
c8047d538f
scripts: update compare_llama_bench.py [no ci] (#7673) 2024-05-31 16:26:21 +02:00
Daniele
30e238b246
Improve HIP compatibility (#7672) 2024-05-31 16:00:29 +02:00
Georgi Gerganov
16926dff92
readme : link homebrew discussion 2024-05-31 15:04:58 +03:00
Georgi Gerganov
0c27e6f62e
ggml : fix loongson compile warnings (#7537)
* ggml : fix loongson compile warnings

ggml-ci

* Fix loongarch quantize test fail.

Fix unexpected error introduced during rebase code.

* tests : disable json test due to lack of python on the CI node

ggml-ci

---------

Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>
2024-05-31 14:17:10 +03:00
Galunid
2e32f874e6
Somehow '**' got lost (#7663) 2024-05-31 18:24:41 +10:00
Galunid
1af511fc22
Add convert.py removal to hot topics (#7662) 2024-05-31 10:09:20 +02:00
Sertaç Özercan
0541f06296
[no ci] docs: add aikit to readme (#7650)
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-31 09:57:16 +10:00
JohnnyB
9022c33646
Fixed painfully slow single process builds. (#7326)
* Fixed painfully slow single process builds.

* Added nproc for systems that don't default to nproc
2024-05-30 22:32:38 +02:00
Georgi Gerganov
5921b8f089
llama : cache llama_token_to_piece (#7587)
* llama : cache llama_token_to_piece

ggml-ci

* llama : use vectors and avoid has_cache

ggml-ci

* llama : throw on unknown tokenizer types

ggml-ci

* llama : print a log of the total cache size
2024-05-31 02:01:41 +10:00
Martin Delille
5dcdf94676
Fix conan badge display [no ci] (#7645) 2024-05-31 01:07:39 +10:00
Manuel
2e2340de17
Add brew installation instruction to README [no ci] (#7616) 2024-05-31 00:58:15 +10:00
Martin Delille
7846540bd2
readme : add Conan badge (#7638) 2024-05-30 15:52:50 +03:00
Brian
e6157f94c8
github: add contact links to issues and convert question into research [no ci] (#7612) 2024-05-30 21:55:36 +10:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py (#7430)
* Move convert.py to examples/convert-no-torch.py

* Fix CI, scripts, readme files

* convert-no-torch -> convert-legacy-llama

* Move vocab thing to vocab.py

* Fix convert-no-torch -> convert-legacy-llama

* Fix lost convert.py in ci/run.sh

* Fix imports

* Fix gguf not imported correctly

* Fix flake8 complaints

* Fix check-requirements.sh

* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE

* Review fixes
2024-05-30 21:40:00 +10:00
Chris Elrod
59b0d07766
faster avx512 exp implementation (#7551)
* faster avx512 exp implementation

* x->r

* improve accuracy, handle special cases

* remove `e`
2024-05-30 21:32:55 +10:00
junchao-loongson
d5c05821f3
ggml : fix loongarch build (O2 issue) (#7636) 2024-05-30 12:30:10 +03:00
Johannes Gäßler
972b555ab9
README: explain parallel build [no ci] (#7618) 2024-05-30 09:52:39 +02:00