llama.cpp

Author	SHA1	Message	Date
HanishKVC	ace37042fa	SimpleChat:MultiPart/Stream flow cleanup Dont try utf8-decode and newlines-add_append if no data to work on. If there is no more data to get (ie done is set), then let NewLines instance return line without newline at end, So that we dont miss out on any last-data-line without newline kind of scenario. Pass stream flag wrt utf-8 decode, so that if any multi-byte char is only partly present in the passed buffer, it can be accounted for along with subsequent buffer. At sametime, bcas of utf-8's characteristics there shouldnt be any unaccounted bytes at end, for valid block of utf8 data split across chunks, so not bothering calling with stream set to false at end. LATER: Look at TextDecoder's implementation, for any over intelligence, it may be doing.. If needed, one can use done flag to account wrt both cases.	2024-06-01 18:18:14 +05:30
HanishKVC	fcd385c36a	SimpleChat: Disable console debug by default by making it dummy Parallely save a reference to the original func.	2024-06-01 18:18:14 +05:30
HanishKVC	07923745cf	SimpleChat:HandleResponseMultiPart using NewLines helper Make handle_response_multipart logic better and cleaner. Now it allows for working with the situation, where the delta data line got from server in stream mode, could be split up when recving, but still the logic will handle it appropriately. ALERT: Rather except (for now) for last data line wrt a request's response.	2024-06-01 18:18:14 +05:30
HanishKVC	7251714bcb	SimpleChat:DU: Make NewLines shift more robust and flexible	2024-06-01 18:18:14 +05:30
HanishKVC	b7a5424c13	SimpleChat:DU: Add NewLines helper class To work with an array of new lines. Allow adding, appending, shifting, ...	2024-06-01 18:18:14 +05:30
HanishKVC	4d354556dc	SimpleChat: show streamed generative text as it becomes available Now that the extracting of streamed generated text is implemented, add logic to show the same on the screen.	2024-06-01 18:18:14 +05:30
HanishKVC	08b117b4a7	SimpleChat: Add MultiPart Response handling, common trimming Add logic to call into multipart/stream server response handling. Move trimming of garbage at the end into the common handle_response helper. Add new global flag to control between oneshot and multipart/stream mode of fetching response. Allow same to be controlled by user. If in multipart/stream mode, send the stream flag to the server.	2024-06-01 18:18:14 +05:30
HanishKVC	aecf0e23fd	SimpleChat: Move multi part server response handling in	2024-06-01 18:18:14 +05:30
HanishKVC	8f97c23895	SimpleChat: Move handling oneshot mode server response Move handling of the oneshot mode server response into SimpleChat. Also add plumbing for moving multipart server response into same.	2024-06-01 18:18:14 +05:30
HanishKVC	9d0e65d16a	SimpleChat:Stream:Initial handshake skeleton Parse the got stream responses and try extract the data from it. It allows for a part read to get a single data line or multiple data line. Inturn extract the json body and inturn the delta content/message in it.	2024-06-01 18:18:14 +05:30
HanishKVC	060925cda3	SimpleChat: Cleanup readme a bit, add one more chathistory length	2024-06-01 18:18:14 +05:30
HanishKVC	f5f9a2b35e	SimpleChat:DU: Bring in both trim garbage logics to try trim	2024-06-01 18:18:14 +05:30
HanishKVC	269cf3f596	SimpleChat:Move extracting assistant response to SimpleChat class so also the trimming of garbage.	2024-06-01 18:18:14 +05:30
HanishKVC	b2c10b960d	SimpleChat: Cleanup a bit wrt Api end point related flow Consolidate many of the Api end point related basic meta data into ApiEP class. Remove the hardcoded ApiEP/Mode settings from html+js, instead use the generic select helper logic, inturn in the settings block. Move helper to generate the appropriate request json string based on ApiEP into SimpleChat class itself.	2024-06-01 18:18:14 +05:30
HanishKVC	f9fc543190	SimpleChat: highlight trim, garbage trimming bitmore aggressive Make it easy for end user to identified the trimmed text. Make garbage trimming logic, consider a longer repeat garbage substring.	2024-06-01 18:18:14 +05:30
HanishKVC	42b4fe555e	SimpleChat: GarbageTrim enable/disable, show trimmed part ifany	2024-06-01 18:18:14 +05:30
HanishKVC	1db965d00d	SimpleChat: Update a bit wrt readme and notes in du	2024-06-01 18:18:14 +05:30
HanishKVC	452813f235	SimpleChat:UI:Settings make boolean button text show meaning	2024-06-01 18:18:14 +05:30
HanishKVC	0dae12ba6b	SimpleChat:UI:Add settings button and bring in settings ui	2024-06-01 18:18:14 +05:30
HanishKVC	e17f5e0204	SimpleChat:UI: Add Div wrapped label+element helpers Move settings related elements to use the new div wrapped ones.	2024-06-01 18:18:14 +05:30
HanishKVC	94bc0b08d8	SimpleChat:UI:Select: dict-name-value, value wrt default, change Take a dict/object of name-value pairs instead of just names. Inturn specify the actual value wrt default, rather than the string representing that value. Trap the needed change event rather than click wrt select.	2024-06-01 18:18:14 +05:30
HanishKVC	1e47a48b30	SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt	2024-06-01 18:18:14 +05:30
HanishKVC	e42249d82d	SimpleChat:UI: Helper to create bool button and use it wrt settings	2024-06-01 18:18:14 +05:30
HanishKVC	ae7e66d27a	SimpleChat:UI: Add and use a para-create-append helper Also update the config params dump to indicate that now one needs to use document to get hold of gMe global object, this is bcas of moving to module type js. Also add ui.mjs to importmap	2024-06-01 18:18:14 +05:30
HanishKVC	ed345abac8	SimpleChat:DU:Avoid setting frequence/Presence penalty Some models like llama3 found to try to be over intelligent by repeating garbage still, but by tweaking the garbage a bit so that it is not exactly same. So avoid setting these penalties and let the model's default behaviour work out, as is. Also the simple minded histogram based garbage trimming from end, works to an extent, when the garbage is more predictable and repeatative.	2024-06-01 18:18:14 +05:30
HanishKVC	a41f701159	SimpleChat:UI: Move html ui base helpers into its own module	2024-06-01 18:18:14 +05:30
HanishKVC	15152af94f	SimpleChat:DU: Cleanup debug log messages	2024-06-01 18:18:14 +05:30
HanishKVC	ae9f610663	SimpleChat:DU: Bring in maxType to the mix along with maxUniq Allow for more uniq chars, but then ensure that a given type of char ie numerals or alphabets or other types dont cross the specified maxType limit. This allows intermixed text garbage to be identified and trimmed.	2024-06-01 18:18:14 +05:30
HanishKVC	d1e73d8777	SimpleChat:DU: Switch trim garbage hist based to maxUniq simple Instead of blindly building histogram for specified substring length, and then checking if any new char within specified min garbage length limit, NOW exit learn state when specified maxUniq chars are found. Inturn there should be no new chars with in the specified min garbage length required limit. TODO: Need to track char classes like alphabets, numerals and special/other chars.	2024-06-01 18:18:14 +05:30
HanishKVC	f33aa28149	SimpleChat:DU: Try trim using histogram based info TODO: May have to add max number of uniq chars in histogram at end of learning phase.	2024-06-01 18:18:14 +05:30
HanishKVC	6390f3489a	SimpleChat:DU:TrimGarbage if unable try skip char and retry	2024-06-01 18:18:13 +05:30
HanishKVC	54802dc184	SimpleChat:DU: Add trim garbage at end in loop helper	2024-06-01 18:18:13 +05:30
HanishKVC	c83c19ad4c	SimpleChat:DU:BringIn local helper js modules using importmap Use it to bring in a simple trim garbage at end logic, which is used to trim received response. Also given that importmap assumes esm / standard js modules, so also global variables arent implicitly available outside the modules. So add it has a member of document for now	2024-06-01 18:18:13 +05:30
Johannes Gäßler	9b596417af	CUDA: quantized KV support for FA vec (#7527 ) * CUDA: quantized KV support for FA vec * try CI fix * fix commented-out kernel variants * add q8_0 q4_0 tests * fix nwarps > batch size * split fattn compile via extern templates * fix flake8 * fix metal tests * fix cmake * make generate_cu_files.py executable * add autogenerated .cu files * fix AMD * error if type_v != FP16 and not flash_attn * remove obsolete code	2024-06-01 08:44:14 +02:00
Georgi Gerganov	a323ec60af	server : update js (#7670 )	2024-05-31 22:23:04 +03:00
Galunid	0515ad93f4	convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660 )	2024-05-31 17:42:33 +02:00
Johannes Gäßler	c8047d538f	scripts: update compare_llama_bench.py [no ci] (#7673 )	2024-05-31 16:26:21 +02:00
Daniele	30e238b246	Improve HIP compatibility (#7672 )	2024-05-31 16:00:29 +02:00
Georgi Gerganov	16926dff92	readme : link homebrew discussion	2024-05-31 15:04:58 +03:00
Georgi Gerganov	0c27e6f62e	ggml : fix loongson compile warnings (#7537 ) * ggml : fix loongson compile warnings ggml-ci * Fix loongarch quantize test fail. Fix unexpected error introduced during rebase code. * tests : disable json test due to lack of python on the CI node ggml-ci --------- Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>	2024-05-31 14:17:10 +03:00
Galunid	2e32f874e6	Somehow '**' got lost (#7663 )	2024-05-31 18:24:41 +10:00
Galunid	1af511fc22	Add convert.py removal to hot topics (#7662 )	2024-05-31 10:09:20 +02:00
Sertaç Özercan	0541f06296	[no ci] docs: add aikit to readme (#7650 ) Signed-off-by: Sertac Ozercan <sozercan@gmail.com>	2024-05-31 09:57:16 +10:00
JohnnyB	9022c33646	Fixed painfully slow single process builds. (#7326 ) * Fixed painfully slow single process builds. * Added nproc for systems that don't default to nproc	2024-05-30 22:32:38 +02:00
Georgi Gerganov	5921b8f089	llama : cache llama_token_to_piece (#7587 ) * llama : cache llama_token_to_piece ggml-ci * llama : use vectors and avoid has_cache ggml-ci * llama : throw on unknown tokenizer types ggml-ci * llama : print a log of the total cache size	2024-05-31 02:01:41 +10:00
Martin Delille	5dcdf94676	Fix conan badge display [no ci] (#7645 )	2024-05-31 01:07:39 +10:00
Manuel	2e2340de17	Add brew installation instruction to README [no ci] (#7616 )	2024-05-31 00:58:15 +10:00
Martin Delille	7846540bd2	readme : add Conan badge (#7638 )	2024-05-30 15:52:50 +03:00
Brian	e6157f94c8	github: add contact links to issues and convert question into research [no ci] (#7612 )	2024-05-30 21:55:36 +10:00
Galunid	9c4c9cc83f	Move convert.py to examples/convert-legacy-llama.py (#7430 ) * Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes	2024-05-30 21:40:00 +10:00

1 2 3 4 5 ...

3095 commits