llama.cpp

Author	SHA1	Message	Date
HanishKVC	452813f235	SimpleChat:UI:Settings make boolean button text show meaning	2024-06-01 18:18:14 +05:30
HanishKVC	0dae12ba6b	SimpleChat:UI:Add settings button and bring in settings ui	2024-06-01 18:18:14 +05:30
HanishKVC	e17f5e0204	SimpleChat:UI: Add Div wrapped label+element helpers Move settings related elements to use the new div wrapped ones.	2024-06-01 18:18:14 +05:30
HanishKVC	94bc0b08d8	SimpleChat:UI:Select: dict-name-value, value wrt default, change Take a dict/object of name-value pairs instead of just names. Inturn specify the actual value wrt default, rather than the string representing that value. Trap the needed change event rather than click wrt select.	2024-06-01 18:18:14 +05:30
HanishKVC	1e47a48b30	SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt	2024-06-01 18:18:14 +05:30
HanishKVC	e42249d82d	SimpleChat:UI: Helper to create bool button and use it wrt settings	2024-06-01 18:18:14 +05:30
HanishKVC	ae7e66d27a	SimpleChat:UI: Add and use a para-create-append helper Also update the config params dump to indicate that now one needs to use document to get hold of gMe global object, this is bcas of moving to module type js. Also add ui.mjs to importmap	2024-06-01 18:18:14 +05:30
HanishKVC	ed345abac8	SimpleChat:DU:Avoid setting frequence/Presence penalty Some models like llama3 found to try to be over intelligent by repeating garbage still, but by tweaking the garbage a bit so that it is not exactly same. So avoid setting these penalties and let the model's default behaviour work out, as is. Also the simple minded histogram based garbage trimming from end, works to an extent, when the garbage is more predictable and repeatative.	2024-06-01 18:18:14 +05:30
HanishKVC	a41f701159	SimpleChat:UI: Move html ui base helpers into its own module	2024-06-01 18:18:14 +05:30
HanishKVC	15152af94f	SimpleChat:DU: Cleanup debug log messages	2024-06-01 18:18:14 +05:30
HanishKVC	ae9f610663	SimpleChat:DU: Bring in maxType to the mix along with maxUniq Allow for more uniq chars, but then ensure that a given type of char ie numerals or alphabets or other types dont cross the specified maxType limit. This allows intermixed text garbage to be identified and trimmed.	2024-06-01 18:18:14 +05:30
HanishKVC	d1e73d8777	SimpleChat:DU: Switch trim garbage hist based to maxUniq simple Instead of blindly building histogram for specified substring length, and then checking if any new char within specified min garbage length limit, NOW exit learn state when specified maxUniq chars are found. Inturn there should be no new chars with in the specified min garbage length required limit. TODO: Need to track char classes like alphabets, numerals and special/other chars.	2024-06-01 18:18:14 +05:30
HanishKVC	f33aa28149	SimpleChat:DU: Try trim using histogram based info TODO: May have to add max number of uniq chars in histogram at end of learning phase.	2024-06-01 18:18:14 +05:30
HanishKVC	6390f3489a	SimpleChat:DU:TrimGarbage if unable try skip char and retry	2024-06-01 18:18:13 +05:30
HanishKVC	54802dc184	SimpleChat:DU: Add trim garbage at end in loop helper	2024-06-01 18:18:13 +05:30
HanishKVC	c83c19ad4c	SimpleChat:DU:BringIn local helper js modules using importmap Use it to bring in a simple trim garbage at end logic, which is used to trim received response. Also given that importmap assumes esm / standard js modules, so also global variables arent implicitly available outside the modules. So add it has a member of document for now	2024-06-01 18:18:13 +05:30
Johannes Gäßler	9b596417af	CUDA: quantized KV support for FA vec (#7527 ) * CUDA: quantized KV support for FA vec * try CI fix * fix commented-out kernel variants * add q8_0 q4_0 tests * fix nwarps > batch size * split fattn compile via extern templates * fix flake8 * fix metal tests * fix cmake * make generate_cu_files.py executable * add autogenerated .cu files * fix AMD * error if type_v != FP16 and not flash_attn * remove obsolete code	2024-06-01 08:44:14 +02:00
Georgi Gerganov	a323ec60af	server : update js (#7670 )	2024-05-31 22:23:04 +03:00
Galunid	0515ad93f4	convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660 )	2024-05-31 17:42:33 +02:00
Johannes Gäßler	c8047d538f	scripts: update compare_llama_bench.py [no ci] (#7673 )	2024-05-31 16:26:21 +02:00
Daniele	30e238b246	Improve HIP compatibility (#7672 )	2024-05-31 16:00:29 +02:00
Georgi Gerganov	16926dff92	readme : link homebrew discussion	2024-05-31 15:04:58 +03:00
Georgi Gerganov	0c27e6f62e	ggml : fix loongson compile warnings (#7537 ) * ggml : fix loongson compile warnings ggml-ci * Fix loongarch quantize test fail. Fix unexpected error introduced during rebase code. * tests : disable json test due to lack of python on the CI node ggml-ci --------- Co-authored-by: junchao-loongson <zhaojunchao@loongson.cn>	2024-05-31 14:17:10 +03:00
Galunid	2e32f874e6	Somehow '**' got lost (#7663 )	2024-05-31 18:24:41 +10:00
Galunid	1af511fc22	Add convert.py removal to hot topics (#7662 )	2024-05-31 10:09:20 +02:00
Sertaç Özercan	0541f06296	[no ci] docs: add aikit to readme (#7650 ) Signed-off-by: Sertac Ozercan <sozercan@gmail.com>	2024-05-31 09:57:16 +10:00
JohnnyB	9022c33646	Fixed painfully slow single process builds. (#7326 ) * Fixed painfully slow single process builds. * Added nproc for systems that don't default to nproc	2024-05-30 22:32:38 +02:00
Georgi Gerganov	5921b8f089	llama : cache llama_token_to_piece (#7587 ) * llama : cache llama_token_to_piece ggml-ci * llama : use vectors and avoid has_cache ggml-ci * llama : throw on unknown tokenizer types ggml-ci * llama : print a log of the total cache size	2024-05-31 02:01:41 +10:00
Martin Delille	5dcdf94676	Fix conan badge display [no ci] (#7645 )	2024-05-31 01:07:39 +10:00
Manuel	2e2340de17	Add brew installation instruction to README [no ci] (#7616 )	2024-05-31 00:58:15 +10:00
Martin Delille	7846540bd2	readme : add Conan badge (#7638 )	2024-05-30 15:52:50 +03:00
Brian	e6157f94c8	github: add contact links to issues and convert question into research [no ci] (#7612 )	2024-05-30 21:55:36 +10:00
Galunid	9c4c9cc83f	Move convert.py to examples/convert-legacy-llama.py (#7430 ) * Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes	2024-05-30 21:40:00 +10:00
Chris Elrod	59b0d07766	faster avx512 exp implementation (#7551 ) * faster avx512 exp implementation * x->r * improve accuracy, handle special cases * remove `e`	2024-05-30 21:32:55 +10:00
junchao-loongson	d5c05821f3	ggml : fix loongarch build (O2 issue) (#7636 )	2024-05-30 12:30:10 +03:00
Johannes Gäßler	972b555ab9	README: explain parallel build [no ci] (#7618 )	2024-05-30 09:52:39 +02:00
Meng, Hengyu	3854c9d07f	[SYCL] fix intel docker (#7630 ) * Update main-intel.Dockerfile * workaround for https://github.com/intel/oneapi-containers/issues/70 * reset intel docker in CI * add missed in server	2024-05-30 16:19:08 +10:00
Galunid	eb57fee51f	gguf-py : Add tokenizer.ggml.pre to gguf-new-metadata.py (#7627 )	2024-05-30 02:10:40 +02:00
Georgi Gerganov	55d62262a9	metal : remove invalid asserts (#7617 )	2024-05-29 22:21:20 +03:00
Georgi Gerganov	975ec63ff2	metal : add missing asserts (#7617 )	2024-05-29 20:45:25 +03:00
Georgi Gerganov	fb76ec31a9	ggml : fix YARN + add tests + add asserts (#7617 ) * tests : add rope tests ggml-ci * ggml : fixes (hopefully) ggml-ci * tests : add non-cont tests ggml-ci * cuda : add asserts for rope/norm + fix DS2 ggml-ci * ggml : assert contiguousness * tests : reduce RoPE tests ggml-ci	2024-05-29 20:17:31 +03:00
Georgi Gerganov	cce3dcffc5	cuda : non-cont concat support (#7610 ) * tests : add non-cont concat tests * cuda : non-cont concat support ggml-ci	2024-05-29 15:38:26 +03:00
Radoslav Gerganov	210d99173d	llama-bench : add support for the RPC backend (#7435 )	2024-05-29 14:45:44 +03:00
slaren	87bdf2a199	ggml : use atomic_flag for critical section (#7598 ) * ggml : use atomic_flag for critical section * add windows shims	2024-05-29 13:36:39 +02:00
Georgi Gerganov	00281b7be3	scripts : remove mpi remnants	2024-05-29 14:31:18 +03:00
Georgi Gerganov	2ab977282b	sync : ggml	2024-05-29 14:29:52 +03:00
Georgi Gerganov	72de268bec	ggml : restore ggml_rope_xpos_inplace (ggml/0) ggml-ci	2024-05-29 14:29:33 +03:00
Akarshan Biswas	0e8d8bfd6c	Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605 )	2024-05-29 16:53:47 +10:00
zhouwg	504f0c340f	ggml : fix typo in ggml.c (#7603 )	2024-05-29 04:09:31 +02:00
Meng, Hengyu	b864b50ce5	[SYCL] Align GEMM dispatch (#7566 ) * align GEMM dispatch	2024-05-29 07:00:24 +08:00

1 2 3 4 5 ...

3078 commits