llama.cpp

Author	SHA1	Message	Date
Concedo	89495c0716	handle token unbanning over api	2023-08-30 10:51:49 +08:00
Concedo	f2c02dd06d	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # CMakeLists.txt # Makefile # README.md # tests/test-grad0.cpp	2023-08-30 10:51:28 +08:00
YellowRoseCx	d7bdfbdd78	Update Makefile for misc amd gpu targetting (#407 ) adds the hipBlas gpu_target $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch) back to the gpu_target line, possibly allowing misc gpu arch's like gfx1031 or gfx1032 etc to be built	2023-08-30 09:54:15 +08:00
slaren	06abf8eeba	ggml : add view_src and view_offs to ggml_tensor for views (#2874 ) * ggml : add view_src and view_offs * update ggml-alloc to use view_src * update ggml_diag_mask to work correctly with automatic inplace * exclude other ops that set an inplace flag from automatic inplace	2023-08-29 23:24:42 +02:00
slaren	c03a243abf	remove outdated references to -eps and -gqa from README (#2881 )	2023-08-29 23:17:34 +02:00
Kawrakow	fa3582f509	Tell users attmepting to run perplexity with too few tokens to use more (#2882 ) Closes #2858 Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-29 23:55:45 +03:00
Kawrakow	e37e69dcc3	10X faster BPE tokenizer (#2876 ) * 10X faster BPE tokenizer * Remove comment that no longer applies --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-29 23:55:03 +03:00
Concedo	380fa0f0ca	fixed broken typical sampler issues	2023-08-29 23:50:59 +08:00
maddes8cht	53885d7256	py : fix "usage" messages (#2873 ) convert-to-gguf python scripts	2023-08-29 16:51:02 +03:00
jameswu2014	bcce96ba4d	convert.py : fix baichuan7B support (#2870 ) * [Fix]: convert.py support baichuan7B * convert.py : fix trailing whitespaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-29 12:48:41 +03:00
Jhen-Jie Hong	74e0caeb82	readme : add react-native binding (#2869 )	2023-08-29 12:30:10 +03:00
Cebtenzzre	d4b5e16c32	make : fix clang tests build, add missing examples (#2859 ) * make : do not pass headers to the compiler This fixes building tests with clang. * make : add missing examples * make : fix build-info.h dependencies	2023-08-29 11:42:41 +03:00
Georgi Gerganov	3a007648f2	metal : add option to disable debug logs (close #2764 )	2023-08-29 11:33:46 +03:00
Georgi Gerganov	611363ac79	scripts : add pipefail	2023-08-29 10:50:30 +03:00
Marcus Dunn	95b6e5212f	added `struct` to llama_dump_timing_info_yaml's `llama_context` (#2857 ) fixes C compat.	2023-08-29 09:33:27 +03:00
xaedes	44c117f41e	train : mem usage and other improvements (#2439 ) * fix track_max_mem in forward_batch_wo_cache_flash_attn_train * remove unnecessary Adam(W) optimizer tensors. reduces optimizer memory overhead from 7modelsize to 2modelsize. additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t. bumps training checkpoint file version, but old checkpoints can still be read. new version with less tensors is saved. * add gradient clipping to AdamW * Fix reset of unused g->nodes and g->grads to NULL * implement gradient checkpointing for training reduces memory overhead from O(n_layer) to O(sqrt(n_layer)) as explained in readme of https://github.com/cybertronai/gradient-checkpointing * remove unused compute buffer 3 * add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep); * change AdamW decay parameter to work like the torch AdamW decay parameter It is now relative to Adam learning rate `alphasched`. Before that it was relative to `sched` only. `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1] change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT * change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW btw: the default weight decay parameter for torch.optim.AdamW is 0.01 * bug fixes for cross entropy loss ggml_cross_entropy_loss: sums where not correctly added in workload of each thread ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16 cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup. so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance. * fix test-grad0 for cross_entropy_loss the second argument to cross_entropy_loss must sum up to 1 for each row * fix test-grad0 for soft_max dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work instead use sum(log(soft_max()(1-eps)+eps)); use eps to avoid log(0) improve finite differences of test-grad0 by using double instead of float * change cross_entropy_loss to output average over all rows this helps keeping the loss and gradients in a sane range * improve gradient checkpointing sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal. since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different: ``` given: n, u, v objective: minimize(au+bv) where ab=n, a>0, b>0 b=n/a minimize(au+vn/a) diff(au+vn/a, a) = u - (vn/a)/a diff(au+vn/a, a) == 0 u - (vn/a)/a == 0 u == vn/(aa) uaa = vn aa = vn/u a = sqrt(nv/u) ``` this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage. disable gradient checkpointing debug output * llama : fix rope usage in train-text-from-scratch after ChatGLM change * add more training parameters: --enable-restart N Only for Adam optimizer. Enable restarts of cos-decay --disable-restart N Only for Adam optimizer. Disable restarts of cos-decay --opt-past N Number of optimization iterations to track for delta convergence test. Disabled when zero. --opt-delta N Maximum delta for delta convergence test. Disabled when <= zero. --opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero. --adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero. --adam-min-alpha N Adam minimum learning rate alpha, usually 0.1 * alpha * replace memcpy with reshape operation so that the graph is not cut at the input this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it * remove unused function argument from get_example_targets_batch * measure and print total training time * add optimization callback to ggml_opt_resume_g this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)). can be used for dynamic learning schedule and setting input data for batches before each iteration * use optimization callback in training allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration * add minimum number of tensor dimensions to apply weight decay (default 2) this allows to not apply weight decay to bias parameters * rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup * fix increase of model.train_samples and model.train_tokens now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations * change sampling parameters for prediction after training to defaults of common.h and clarify what is context for prediction and what are generated tokens * tighten abs error bounds for cross_entropy_loss in test-grad0 * add conditional compilation of using F16 exp in flash attention uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention * tighten abs error bounds for flash_attn in test-grad0 * tighten abs error bounds for sqrt in test-grad0 * remove out-commented vectorized code of opt_adam the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead * ggml : update ggml_rms_norm_back with configurable eps * llama training : fix ggml_rms_norm_back calls to pass configurable eps * remove trailing whitespace * add train function using automatic gradient checkpointing backward pass and allocator * in train function replace add_inplace by regular add because using add_inplace seems to result in different gradients * don't use allocate hash_map on context because the context has no_alloc=True when using memory allocator resulting in NULL data pointers * correctly clone reshape and permute operations by also cloning tensor->nb values * fix variable name and add missing type cast * terminate recursive tensor cloning when reaching tensor without src tensors * correctly clone view tensors by setting data pointers without this the checkpointing would only work when being used together with memory allocator * fix variable names * swap arguments to commutative ops to be the same as in `forward_batch_wo_cache_flash_attn` * add input tensors as checkpoints so that recursive tensor cloning of gradient checkpointing terminates on input tensors * fix variable name and add missing boolean negation * make sure some tensors are not reallocated by inserting new temporary nodes depending on them: output and parameter gradient tensors need to be available at the end of the graph execution parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration checkpoint tensors are allocated all together to reduce memory allocator fragmentation afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs * fix ASSERT to work with zero layers * add training options whether to use allocator and/or unified training function * integrate unified training function which may use memory allocator the unified training function also supports arguments whether to use flash attention and/or gradient checkpointing * format name of cloned tensors with " (clone)" suffix * set names for tensors in unified train function for easier debugging * allocate graph on context using ggml_new_graph * remove handwritten training functions * remove unused training parameters "use_scratch" and "use_unified" * remove trailing whitespace * remove unused train params: mem_compute1_gb & mem_compute2_gb mem_compute_gb is used for compute when automatic memory allocator is not enabled, otherwise it can be very small to only hold the tensor definitions mem_compute0_gb is used for automatic memory allocator (as long as measurement of max required size is not implemented) * remove unused forward_batch function * add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly * only use ggml_allocr_alloc when tensor has NULL data and is no view * fix test when to create temporary backward graph temporary backward graph is only necessary when using checkpointing * fix memory "leak" in optimizers each iteration a new cplan with new memory for work data was allocated. now cplan creation only happens at the start of optimization, with each iteration reusing the cplan and its work data. * reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator with this loop order gradient checkpointing with allocator on 16 layer model saves 13% memory; 2 layer memory it saves 2% memory. the computation results are the same * add missing lctx argument to get_example_targets_batch * implement llama model file saving using gguf checkpoint loading and saving disabled, to be replaced by loading and saving via gguf * implement loading/saving of checkpointing files using GGUF * bug fixes * add checkpoint file version for future compatibility * update readme with gguf filenames * save & load opt->just_initialized value * add first draft for checkpoint conversion script * add gguf arch and ftype * save opt parameter counter as uint64 * add gguf key and tensor names for optimizer and training * add layer_norm_rms_eps to checkpoint convert script * use same GGUF_GET_KEY macro as in llama.cpp * use norm_rms_eps, and rope parameters and command line options to set them * fix memory corruption bug in gguf ctx->kv and ctx->infos was reallocated using not-aligned realloc, but freed with aligned free. to fix this a GGML_ALIGNED_REALLOC was added, but there is no posix_memalign_realloc function. so on non-windows and non-mingw32 platforms we fall back to aligned malloc, followed by copying and freeing the old data. * add gguf example cmake file * bug fixes in tokenize_file * bug fixes in load_llama_model_gguf * bug fix: init model when no checkpoint was loaded * bug fix in read_tensor_by_name * bug fix in load_opt_context_gguf * avoid printing lots of spaced on the unusual case that loss gets nan * set name of tensors with empty name from what was read from gguf * remove trailing whitespace * print data checksums before saving and after loading to verify correctness * bug fixes for convert-train-checkpoint-to-gguf * temporarily add code to write old checkpoint files used to verify that old checkpoint files are correctly converted to gguf * bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints with opt_version=0 * remove code used to verify correctness of checkpoint file conversion * remove trailing whitespace * remove prediction related code use main for prediction, it is better optimized * update train-text-from-scratch README.md * fix non-windows GGML_ALIGNED_REALLOC * add missing blank line at end of file * remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gguf ctx->kv & ctx->infos * train : fix compile warnings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-28 22:51:47 +03:00
slaren	43033b7bb4	llama-bench : set locale to utf8 (#2832 )	2023-08-28 19:19:18 +02:00
Johannes Gäßler	6b73ef1201	YAML result logging + preset script (#2657 )	2023-08-28 17:59:39 +02:00
alonfaraj	75fafcbccc	make : fix tests build (#2855 ) * makefile: - fix test name - add missing tests build * editorconfig : fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-28 18:38:35 +03:00
grahameth	be475f60af	llama.cpp : fix wrong vsnprintf call in MS compiler (#2856 ) Co-authored-by: grahameth <->	2023-08-28 18:38:12 +03:00
Ronny Brendel	3af6b86301	ggml : tiny ggml_vec_dot_q4_K_q8_K AVX2 improvement (#2819 )	2023-08-28 15:51:08 +03:00
Georgi Gerganov	35feac6560	ggml : sync (mem align to header + conv_transpose_2d fixes + ggml_alloc) (#2852 ) * ggml : sync (mem align to header + conv_transpose_2d fixes) ggml-ci * ggml-alloc : minor fix * ggml-alloc : sync more fixes	2023-08-28 14:24:53 +03:00
Johannes Gäßler	92b1bbd2ec	CUDA: fix RoPE asserts, block sizes (#2833 )	2023-08-28 14:23:55 +03:00
YellowRoseCx	cf5d918073	Koboldcpp-ROCm Port (#399 ) * koboldcpp-ROCm Port commit 3416c986d9d9a31c3cdefd7e7bd4d9438d72ba35 Merge: 5eb17f0 `4c4e435` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Aug 25 13:46:56 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 5eb17f02c8638e003bb91bddf95ccf54d2ad0c12 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Aug 25 13:38:21 2023 -0500 ROCm Port update * use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-Authored-By: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Co-Authored-By: ardfork <134447697+ardfork@users.noreply.github.com> Co-Authored-By: funnbot <22226942+funnbot@users.noreply.github.com> Co-Authored-By: Engininja2 <139037756+Engininja2@users.noreply.github.com> Co-Authored-By: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> Co-Authored-By: jammm <2500920+jammm@users.noreply.github.com> Co-Authored-By: jdecourval <7315817+jdecourval@users.noreply.github.com> commit b34f4bd2724733e188ec4f6074042f66a5ed28c9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Aug 19 17:12:52 2023 -0500 Update README.md commit 7d1196108ad330b32845546fb3472c2172a0b6b8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Aug 14 23:03:12 2023 -0500 remove force DMMV commit cd61aa0d9e16627935c7978adf488a679ddfa745 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Aug 12 17:24:31 2023 -0500 restore main_gpu parameter commit 4a042f326830271a4c31104051b7b08e08ac234e Author: Henri Vasserman <henv@hot.ee> Date: Sat Aug 12 10:51:46 2023 +0300 gfx1100 support --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: jammm <2500920+jammm@users.noreply.github.com> Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com> commit 8913bc6fea97d3cb860937b0461f455c6abe3ea1 Author: Henri Vasserman <henv@hot.ee> Date: Fri Aug 11 10:16:02 2023 +0300 Allow overriding CC_TURING commit e77a4c37a756c002e97173f4122e088fb304e18a Author: Henri Vasserman <henv@hot.ee> Date: Fri Aug 11 10:00:07 2023 +0300 Merge 'origin/master' into hipblas commit cc4c4e355cd553b1557d5fba2562e824db93f9b4 Author: Engininja2 <139037756+Engininja2@users.noreply.github.com> Date: Fri Aug 11 09:43:14 2023 +0300 New __dp4a assembly Now compatible with gfx900 and faster as well. commit 1a03b709848ce68d5bf5966237756167e2cac540 Author: Henri Vasserman <henv@hot.ee> Date: Fri Aug 11 09:30:28 2023 +0300 Undo mess --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> commit 4366ff9ba1b1f12e494118ef9b5198479022fcc5 Author: DannyDaemonic <DannyDaemonic@gmail.com> Date: Thu Aug 10 13:11:36 2023 -0700 Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows. commit 811ff855a24323cafddc95c1b8aca711fef05f76 Author: Christian Demsar <crasm@git.vczf.us> Date: Thu Aug 10 10:28:27 2023 -0400 Add --n-predict -2 for stopping generation on full context (#2565) commit 37c9717aaa6815b6a5be21aaab970212f20fe6bf Author: Martin Krasser <krasserm@googlemail.com> Date: Thu Aug 10 12:16:38 2023 +0200 Fix grammar-based sampling issue in server (#2566) commit d18ecd5b9e5dde58ae08a3eef1637406159ddaca Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Aug 10 13:19:41 2023 -0500 make mmq gen faster for amd commit 243894a952147a4fac5b6aee748861a0df6cc2c6 Author: Henri Vasserman <henv@hot.ee> Date: Thu Aug 10 12:14:40 2023 +0300 ws fix commit ac2f14da445ea87d73539adbd29d19ff2c9eba58 Author: Engininja2 <139037756+Engininja2@users.noreply.github.com> Date: Thu Aug 10 12:11:27 2023 +0300 AMD assembly optimized __dp4a Doesn't seem to work for gfx900, so commented out. commit 9dba0c985f140ddded8cbb671f139e81fff82eed Author: Henri Vasserman <henv@hot.ee> Date: Thu Aug 10 12:09:28 2023 +0300 Fix merge --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> commit f570b5cb1070591527a82d94bba408927b37778d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 22:11:20 2023 -0500 Revert "revert cuda changes as they are bugggy" This reverts commit 1541bf879772aeeed8ff646bfc52185c2a88b79b. commit 1541bf879772aeeed8ff646bfc52185c2a88b79b Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Wed Aug 9 22:36:41 2023 +0800 revert cuda changes as they are bugggy commit bacc20203efb1839aa313858a04d75255bb4b7f4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 20:37:17 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit b7cb4cfd109986bd66e8fd382d1e2516eaddfebb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 20:00:52 2023 -0500 additional fixes commit fadae727baa3735ad3e0667384d6e05ca056b3ef Merge: 518eb2a `8f8ab6c` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:45:50 2023 -0500 Merge branch 'hipblas' into develop4Main commit 518eb2af9225f8300a108c4244c7eb0a2217c3bc Merge: bda0215 `cae6a84` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:32:10 2023 -0500 Merge remote-tracking branch 'upstream/concedo' into develop2Main commit bda0215b413bafc49890aa23fc35f96a191fb3e0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:17:54 2023 -0500 update makefile to multisystem path commit `8f8ab6c4c0` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:05:03 2023 -0500 hipLDFLAG Path change Unix to multisystem in Makefile changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS commit `610ba4cfc4` Merge: `4024f91` `25d43e0` Author: Henri Vasserman <henv@hot.ee> Date: Wed Aug 9 23:54:58 2023 +0300 Merge 'origin/master' into hipblas commit `4024f91a66` Author: Henri Vasserman <henv@hot.ee> Date: Wed Aug 9 01:56:44 2023 +0300 Add intrinsics polyfills for AMD --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com> Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com> commit `ab6212864c` Merge: `d91456a` `f5bfea0` Author: Henri Vasserman <henv@hot.ee> Date: Wed Aug 9 00:37:01 2023 +0300 Merge 'origin/master' into hipblas commit ee9fa2aca4f2e6645b99702935b34a5f8ec8f05d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 2 01:53:58 2023 -0500 Update Makefile commit `d91456aaf1` Author: ardfork <134447697+ardfork@users.noreply.github.com> Date: Mon Jul 31 20:35:00 2023 +0300 fix half2 decomposition commit `c1cb70d64d` Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 31 19:56:44 2023 +0300 new build arg LLAMA_CUDA_MMQ_Y commit `c1664a00ae` Merge: `4336231` `0728c5a` Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 31 19:32:27 2023 +0300 Merge 'origin/master' into hipblas commit 848558d7d95a5036ac057efdefa9b2a2e6fb61b7 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 30 20:02:52 2023 -0500 import vars logic fix commit b650b849d52aac65364558521f76e75ded7ea590 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 30 00:21:36 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 8573a67a29e813d82e7f032912a8c221cd199505 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 21:31:12 2023 -0500 remove duplicate code and fix typo remove duplicate tooltip commit 430986e3f68f599fd7a11ea4b2b8e45ef33da643 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 21:07:34 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available " if len(runopts)==6 else + " commit dd0db7265dbc0b0699ca861291006808b662b0e4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 20:52:31 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available commit 43fffb66d8a30cbd776c3682f8a104c3644206b1 Merge: 0ed65a4 `b40550c` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 19:13:15 2023 -0500 Merge branch 'concedo' commit 0ed65a44a5fdb529611730f276a4b910cbf70ae0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 18:34:21 2023 -0500 Hide unavailable backends & Add tooltip over backend count Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command Add tooltip when hovering over backend count label hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built commit 2a263983ab35024a95c411995963182ada06ed6f Merge: cee2e9d `31486eb` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 15:16:33 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit `4336231a32` Author: Henri Vasserman <henv@hot.ee> Date: Sat Jul 29 18:35:56 2023 +0300 add hipBLAS to README --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> commit `f8e3fc6c74` Author: Henri Vasserman <henv@hot.ee> Date: Sat Jul 29 14:16:46 2023 +0300 rocblas init stuff commit `d2ade639f4` Merge: `cde52d6` `8a88e58` Author: Henri Vasserman <henv@hot.ee> Date: Sat Jul 29 12:59:48 2023 +0300 Merge 'origin/master' into hipblas commit cee2e9d76740fd8e8f50b612078f3e7658460f29 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 26 23:36:55 2023 -0500 Only Show Available Backends in GUI Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command commit 78636109fc2ded79ee3e9a44d2e3c2d63a8de70e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 26 13:27:22 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 731cd6e2ab9bb722e211142bb633e7018ccdb31b Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jul 25 22:39:50 2023 -0500 Create easy_rocm_install.sh commit f154685bbdc79b5ace752fbc179e32f2f7806bdb Merge: cbdc1f3 `94e0a06` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jul 25 22:25:10 2023 -0500 Merge branch 'concedo_experimentalMAIN' commit cbdc1f3fb91969e79bc8640e0cebfc3247e200df Merge: 5b838d4 `9731682` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 16:53:21 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit `cde52d6a63` Merge: `8e8054a` `84e09a7` Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 24 12:22:58 2023 +0300 Merge 'origin/master' into hipblas commit `8e8054ad83` Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 24 12:20:49 2023 +0300 Add rocblas to build files commit `1f6294dc44` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:52:01 2023 -0500 Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * initialize rocblas commit 5b838d47874536ebffc2f6cb25877e0476a9402d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:10:35 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 9bfb2fdd68000670bda85c4e9748d72f5af09764 Merge: b379f9d `66328fc` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:07:44 2023 -0500 Merge branch 'concedo_experimental' commit b379f9d6fac570c220c928ff5f4ba4ed1ca7c051 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:07:00 2023 -0500 Revert "amd multigpu full layer offload w/o vram scratch" This reverts commit 9adfc8e33f7116d6ae2e0992920733f783b70d08. commit 9adfc8e33f7116d6ae2e0992920733f783b70d08 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 02:56:40 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 05c792e622a1d9838f9343e04f79ddf2bb63ae96 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 00:18:48 2023 -0500 initialize rocblas commit ade68d09d7b63d3344e18b6193043b378671eb12 Merge: 521ad6b `56995ca` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 23 20:25:05 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 521ad6b5cb2a107ad7b972025aeb0f353e0cac67 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jul 20 21:42:33 2023 -0500 lazy import_var error handling for saves commit 9553e52e7e4eabe46312729f6c4effeef6390df7 Merge: cac6650 `f036109` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jul 20 19:59:41 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cac6650754502208abfead61ba169fefc5ae84ac Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 17 23:05:02 2023 -0500 Makefile fix! Allows hip/clblast build together commit `3db70b5f0a` Merge: `2ec4466` `7568d1a` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 18 01:54:17 2023 +0300 Merge 'origin/master' into hipblas commit f208670ffb6cdbb1e225adfb2fd80a67a6dc5055 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 14 02:56:03 2023 -0500 improve error handling with gpu names commit 860e73845f61fe0afb6a26cc8054d8be1f9e3669 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 14 00:33:03 2023 -0500 Show GPU names in GUI, Only show GPUs that exist changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes. commit `2ec4466db5` Author: Henri Vasserman <henv@hot.ee> Date: Thu Jul 13 13:44:02 2023 +0300 Update build flags. GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y so update your build instructions. GGML_CUDA_FORCE_DMMV is always enabled. --------- Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> commit `cd36b185ff` Merge: `afcb8fe` `1cbf561` Author: Henri Vasserman <henv@hot.ee> Date: Thu Jul 13 13:03:01 2023 +0300 Merge 'origin/master' into hipblas commit ac7ebc3ac1deedfbc2940443b26774f1b4c85fae Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 18:32:18 2023 -0500 add hipBLAS name scheme to GUI and update README commit 7f85cc5ac30f2f300ca817a489ef209c995c634b Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 17:35:54 2023 -0500 update makefile and ggml.c commit 6ca3499275ba168320424f06ab3301ec329a6a83 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 15:43:45 2023 -0500 ggml.c fix commit 770e674aa5b2a1a9ffff2888a12e27b04ccfc7ef Merge: 2b289cd `5941514` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 15:24:36 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2b289cde558310c6c67dfc8d508c04e634595716 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 14:30:00 2023 -0500 Update c-cpp.yml commit 5dae95a9bb486c7f720789dffde1cfb470bffce0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 14:28:51 2023 -0500 Update c-cpp.yml commit b37cd738c84debb53b149f5a9fb73de958f263fd Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 14:27:04 2023 -0500 Create c-cpp.yml to test Actions commit `afcb8fe0c4` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 11 18:09:27 2023 +0300 Add new config option commit `8c2c4978a3` Merge: `e610466` `2347463` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 11 17:53:54 2023 +0300 Merge 'origin/master' into hipblas commit `e610466307` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 11 17:53:14 2023 +0300 Expand arch list and make it overrideable commit `80e4e548bf` Merge: `7735c5a` `1d16309` Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 10 02:09:28 2023 +0300 Merge 'origin/master' into hipblas commit 8432e9d5dc8d080535243467f8d380271e8d9489 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c1893fa839c0f35df96f6a8b026a7f2576762 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b9927127b45030fe88283dfbdd23853d34 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60cd8e563e2594cee5a7da7e9aebed494f9 Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (#2140) commit 4539bc2761a7a23b588b5420b9d3fd1962ff63e5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31ec523eac9ef308f0d28bc2d93aab7c3ecb Merge: 74e2703 `ddaa4f2` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 74e2703ac3b1557f107e540657d0919db115f913 Merge: cf65429 `f9108ba` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit `7735c5a9af` Merge: `c3e3733` `7ee76e4` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 4 17:09:16 2023 +0300 Merge 'origin/master' into hipblas commit cf65429c3832d32a8c17c7ed5ab47066d7511fbe Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2310b2e4c44018e2084aeb79e68c0b8709 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5e69e240a54e7d3da962d8520855f072b9. commit 777aed5e69e240a54e7d3da962d8520855f072b9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit 27780a987a8dabb18689038c0397e16f2f219c7e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d439770c1ea0bebc1f895b74d6aeea5f0a6 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927353a1e53353623f850d3d534da88f5154. commit 2fe9927353a1e53353623f850d3d534da88f5154 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560c83a497f5e750bbe27922babd4233bda9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93d63f833b7871ba6e60f8fe207562678ee. commit 4fc0181e44685019dcd309d4bb345cac7a5fef87 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb70464a71fd138484de4b41da05622e027. commit 89eb576f2771bd81a3a6274348b47535dfdd5f63 Merge: 2741ffb `3d2907d` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit `c3e3733c61` Author: Henri Vasserman <henv@hot.ee> Date: Sun Jul 2 15:51:31 2023 +0300 ROCm fixes commit `15db19ae7b` Merge: `04419f1` `46088f7` Author: Henri Vasserman <henv@hot.ee> Date: Sun Jul 2 15:39:57 2023 +0300 Merge 'origin/master' into hipblas commit 2741ffb70464a71fd138484de4b41da05622e027 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93d63f833b7871ba6e60f8fe207562678ee Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e05f4e95378f46a83e2919d3962c0ef9eac Merge: 2c3b46f `eda663f` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2c3b46f8a80ca9d94b2d3d06e1af6b6f7b791914 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103da0d72fd39a36391ac4b5d941a133598a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5db80ed545a6fbeae3d551bddb47955598 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c2534bb8825f4009760b12d9bd44d108c6ed Merge: 096f0b0 `1347d3a` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit `04419f1894` Merge: `bb16eff` `d3494bb` Author: Henri Vasserman <henv@hot.ee> Date: Wed Jun 28 23:30:10 2023 +0300 Merge 'origin/master' into hipblas commit `bb16effc75` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 15:27:10 2023 -0500 headers fix; add kquants_iter for hipblas and add gfx803 (#1) * kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now commit 096f0b055e11b7d930842f86146d0e5013c5dce6 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81adffd6eb59e280ae1885864bb5fbd9bba6 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit `c8ae94524a` Merge: `c1e5c83` `0be54f7` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 27 10:50:37 2023 +0300 Merge 'origin/master' into hipblas commit 2579ecf8db9569d7756161f05ce7b0f5f23174b0 Merge: abed427 `d2034ce` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit `c1e5c8345e` Merge: `35a6031` `447ccbe` Author: Henri Vasserman <henv@hot.ee> Date: Sun Jun 25 21:40:05 2023 +0300 Merge 'origin/master' into hipblas commit `35a603161a` Merge: `df7346c` `66a2555` Author: Henri Vasserman <henv@hot.ee> Date: Sun Jun 25 10:57:48 2023 +0300 Merge 'origin/master' into hipblas commit abed427b6f370698fe8e8409e7980f238aad03ef Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf03b92c2e00fc4bcd27f0c34f32c58b19a9 Merge: ea6d320 `8342fe8` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d3208dcdc0b05e2c164dde8ee0bfc6a02ad09 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8158595d1e835cb379939dc5526deb39e2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f930872b6e232679fe02eac9e429367365c6af Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit `df7346ccd5` Merge: `5dd2fbe` `7487137` Author: Henri Vasserman <henv@hot.ee> Date: Thu Jun 22 20:51:09 2023 +0300 Merge 'origin/master' into hipblas commit b6ff89066bbf2de23dab90bc8bbf9f63d8d1e070 Merge: eb094f0 `e6ddb15` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f043f9b0b94e7db028ca36e96ce479b0369 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb568d543376910180caa9a99b081fef9d4 Merge: 665cc11 `b1f00fa` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc1136b188e7ff5c1aa1359118c999ff6d162 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb141f7ce79884cafb6bcebd860ae27cc04 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f958124ec99525cb58d8c534f9d1789377544e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0f0defd9d49b770c5ce107c71e5cba8003 Merge: a7e74b3 `266d47a` Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b39fe5eedf85d955fe5ea5f4c546322a9b0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3cb72d83f45b3f7904ffb8f242e743a142c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17432ebdc489ab05b71df6c3b8d5e7f5895 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit `5dd2fbe6ea` Merge: `67e229b` `20568fe` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 20 01:23:12 2023 +0300 Merge 'origin/master' into hipblas commit 2780ea292b1e9c6ead274de3afb34337716be08f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64807a92c2e105af92f16dd6db2ea024d39 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9dea3780e797a3b4972ba211e0c762fdc1 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4b90ed11d83d622eb976a945ff26a8974e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b08818416972f83349bc4d6479bccc55ee31436d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas commit `67e229b7ca` Merge: `6f7c156` `b241649` Author: Henri Vasserman <henv@hot.ee> Date: Sun Jun 18 00:36:54 2023 +0300 Merge 'origin/master' into hipblas commit `6f7c15637a` Merge: `61df8e9` `fc45a81` Author: Henri Vasserman <henv@hot.ee> Date: Sat Jun 17 16:53:22 2023 +0300 Merge 'origin/master' into hipblas commit `61df8e9217` Author: Henri Vasserman <henv@hot.ee> Date: Wed Jun 14 22:46:10 2023 +0300 add cudaMemset commit `a836529996` Merge: `85f902d` `254a7a7` Author: Henri Vasserman <henv@hot.ee> Date: Wed Jun 14 22:41:55 2023 +0300 Merge 'origin/master' into hipblas commit `85f902d5c4` Merge: `4362e80` `b50b570` Author: Henri Vasserman <henv@hot.ee> Date: Thu Jun 8 10:50:28 2023 +0300 Merge 'origin/master' into hipblas commit `4362e805a4` Merge: `fa5b3d7` `17366df` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 23:14:40 2023 +0300 Merge 'origin/master' into hipblas commit `fa5b3d7365` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 18:47:00 2023 +0300 fix makefile. commit `1ba4ce4ad7` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 18:41:08 2023 +0300 Revert "warp size fixes" It seems like 32 is faster for me, at least and it won't cause so many conflicts. This reverts commit `5d6eb72164`. commit `5d6eb72164` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 18:32:41 2023 +0300 warp size fixes commit `33091a9bd3` Merge: `9fdaa1d` `2d43387` Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 16:19:23 2023 +0300 Merge 'origin/master' into hipblas commit `9fdaa1d250` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 27 19:17:53 2023 +0300 Add more defs For forward compatibility #1607 commit `a4648c1e7c` Merge: `4c8b3fb` `0ecb1bb` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 27 18:22:39 2023 +0300 Merge 'origin/master' into hipblas commit `4c8b3fb107` Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 01:08:53 2023 +0300 add configurable vars commit `30d921af3e` Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 01:03:56 2023 +0300 and makefile commit `a593a4f6c2` Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 00:55:28 2023 +0300 Add missing parameters commit `174bf6a86d` Merge: `f80ce7a` `1fcdcc2` Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 00:44:23 2023 +0300 Merge 'origin/master' into hipblas commit `f80ce7a4e0` Merge: `600ace3` `ac7876a` Author: Henri Vasserman <henv@hot.ee> Date: Thu May 25 00:02:50 2023 +0300 Merge branch 'origin/master' into hipblas commit `600ace39c8` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 20 23:42:20 2023 +0300 update warp size commit `b19fefef94` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 20 23:28:08 2023 +0300 Forwardcompat commit `c66115b833` Merge: `a0b2d5f` `b8ee340` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 20 18:29:31 2023 +0300 Merge 'origin/master' into hipblas commit `a0b2d5f291` Merge: `8bab456` `2a5ee02` Author: Henri Vasserman <henv@hot.ee> Date: Tue May 16 17:08:29 2023 +0300 Merge 'origin/master' into hipblas commit `8bab45611e` Merge: `2956630` `b5c9295` Author: Henri Vasserman <henv@hot.ee> Date: Mon May 15 00:01:12 2023 +0300 Merge 'origin/master' into hipblas commit `2956630a3d` Merge: `0fe6384` `f048af0` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 13 13:12:52 2023 +0300 Merge 'origin/master' into hipblas commit `0fe6384755` Author: Henri Vasserman <henv@hot.ee> Date: Fri May 12 17:22:11 2023 +0300 fix makefile commit `605560d9ec` Merge: `127f68e` `089b1c9` Author: Henri Vasserman <henv@hot.ee> Date: Fri May 12 16:12:53 2023 +0300 Merge 'origin/master' into hipblas commit `127f68eb5a` Merge: `070cbcc` `b608b55` Author: Henri Vasserman <henv@hot.ee> Date: Thu May 11 20:21:27 2023 +0300 Merge 'origin/master' into hipblas commit `070cbcc1bd` Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 18:10:56 2023 +0300 occupanct function commit `a3296d50aa` Merge: `0aefa6a` `e129551` Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 18:06:04 2023 +0300 Merge 'origin/master' into hipblas commit `0aefa6ab71` Merge: `baeb482` `1b0fd45` Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 12:24:41 2023 +0300 Merge 'origin/master' into hipblas commit `baeb482a94` Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 12:24:12 2023 +0300 Revert to default copy commit `289073a532` Merge: `1107194` `173d0e6` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 6 19:59:41 2023 +0300 Merge 'origin/master' into hipblas commit `1107194e6b` Merge: `04c0d48` `a3b85b2` Author: Henri Vasserman <henv@hot.ee> Date: Sat May 6 00:38:20 2023 +0300 Merge 'origin/master' into hipblas commit `04c0d480d7` Author: Henri Vasserman <henv@hot.ee> Date: Thu May 4 12:31:16 2023 +0300 Move all HIP stuff to ggml-cuda.cu commit `d83cfbad0c` Merge: `b67cc50` `799fdc1` Author: Henri Vasserman <henv@hot.ee> Date: Thu May 4 11:31:16 2023 +0300 Merge 'origin/master' into hipblas commit `b67cc50dad` Merge: `fcbc262` `e216aa0` Author: Henri Vasserman <henv@hot.ee> Date: Wed May 3 15:04:51 2023 +0300 Merge 'origin/master' into hipblas commit `fcbc262eb9` Merge: `c73def1` `f4cef87` Author: Henri Vasserman <henv@hot.ee> Date: Mon May 1 22:45:29 2023 +0300 Merge 'origin/master' into hipblas commit `c73def129a` Merge: `d8ea75e` `f0d70f1` Author: Henri Vasserman <henv@hot.ee> Date: Sun Apr 30 18:40:42 2023 +0300 Merge 'origin/master' into hipblas commit `d8ea75e952` Merge: `d194586` `334637e` Author: Henri Vasserman <henv@hot.ee> Date: Sat Apr 29 11:25:51 2023 +0300 Merge 'origin/master' into hipblas commit `d194586f65` Merge: `2ab9d11` `7f15c5c` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 23:03:52 2023 +0300 Merge 'origin/master' into hipblas commit `2ab9d11f37` Merge: `3b4a531` `04aaae1` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 16:30:05 2023 +0300 Merge 'origin/master' into hipblas commit `3b4a53138f` Merge: `a1caa48` `0b2da20` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 10:08:41 2023 +0300 Merge 'origin/master' into hipblas commit `a1caa48611` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 10:08:21 2023 +0300 add more cuda defines This is so 'slaren/cuda-f16f32' would merge. commit `ecc056519f` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 01:58:27 2023 +0300 only .cu file needs to be complied as device commit `ef51e9ecac` Merge: `d571d16` `4afcc37` Author: Henri Vasserman <henv@hot.ee> Date: Wed Apr 26 12:46:26 2023 +0300 Merge branch 'ggerganov:master' into hipblas commit `d571d1629f` Merge: `608aa33` `dd0eabc` Author: Henri Vasserman <henv@hot.ee> Date: Tue Apr 25 21:15:33 2023 +0300 Merge 'origin/master' into hipblas commit `608aa33d9f` Author: Henri Vasserman <henv@hot.ee> Date: Tue Apr 25 21:15:04 2023 +0300 change default GPU arch to match CMake commit `3a004b2a01` Author: Henri Vasserman <henv@hot.ee> Date: Mon Apr 24 02:24:54 2023 +0300 add rpath commit `db7a01297e` Merge: `3677235` `284685f` Author: Henri Vasserman <henv@hot.ee> Date: Sun Apr 23 21:49:28 2023 +0300 Merge 'origin/master' into hipblas commit `367723544c` Author: Henri Vasserman <henv@hot.ee> Date: Sat Apr 22 23:28:00 2023 +0300 More build file changes commit `d3e1984ce0` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 21 03:32:06 2023 +0300 add rpath commit `0e005f7793` Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 21 02:13:00 2023 +0300 Build file changes Now HIP Clang is not required, the CMake scripts will configure the needed compiler, which can be system clang++. Also other code can still use GCC, but CMake will force the clang to link. commit `54a63c10e8` Author: Henri Vasserman <henv@hot.ee> Date: Thu Apr 20 22:19:22 2023 +0300 Update Makefile for the Cuda kernels commit `0fd8363adc` Author: Henri Vasserman <henv@hot.ee> Date: Thu Apr 20 02:04:00 2023 +0300 use hipblas based on cublas * Merge Fixes * readme merge fix * remove old ggmlv2 changes * bring ggml v2_cuda up to date with AMD changes * Revert ggml v2_cuda changes BC they werent needed This reverts commit 3385dd4240e16ce78337aef8b6090348bf87e1c7. * avoid launching subprocesses to get device names for now, but other than that seems to be working --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-08-28 17:05:06 +08:00
igarnier	dd0dc366da	llama.h : add missing struct keyword for C compat in callback type (#2847 )	2023-08-28 11:19:59 +03:00
Georgi Gerganov	f55538c3cc	metal : fix memory leak (#2762 ) * metal : fix memory leak * metal : fix encoders memory leak * metal : clean up more memory resources * metal : fix more leaks * metal : reuse dispatch queue + autoreleasepool * metal : reuse array for command buffers and encoders * ggml : assert for odd number of blocks on ARM 15M tinyllama is an example	2023-08-28 10:59:08 +03:00
Cebtenzzre	ebcee207b6	quantize : make output filename optional again (#2823 ) * quantize : make output filename optional again * quantize : fix path parsing on Windows suggested by @slaren	2023-08-28 09:32:25 +03:00
JohnnyB	3e8ff47af6	devops : added systemd units and set versioning to use date. (#2835 ) * Corrections and systemd units * Missing dependency clblast	2023-08-28 09:31:24 +03:00
Concedo	4b00916ac7	Merge branch 'master' into concedo_experimental # Conflicts: # .dockerignore # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # flake.lock # flake.nix # tests/CMakeLists.txt	2023-08-28 14:19:05 +08:00
Georgi Gerganov	103cfafc77	gguf : fix strings to not be null-terminated (#2839 ) * gguf : fix strings to not be null-terminated ggml-ci * gguf : fix gguf_add_tensor name	2023-08-27 21:50:22 +03:00
Georgi Gerganov	c10704d01e	llama : fix MPI threads (close #2827 )	2023-08-27 18:55:41 +03:00
Olivier Chafik	230d46c723	examples : update llama2.c converter to read vocab and write models in GGUF format (#2751 ) * llama2.c: direct gguf output (WIP) * Simplify vector building logic * llama2.c gguf conversion: fix token types in converter * llama2.c: support copying vocab from a llama gguf model file * llama2.c: update default path for vocab model + readme * llama2.c: use defines for gguf keys * llama2.c: escape whitespaces w/ U+2581 in vocab converter the llama.cpp way * llama2.c converter: cleanups + take n_ff from config	2023-08-27 17:13:31 +03:00
Kawrakow	463173a6c0	llama : speedup tokenization (#2831 ) * Speedup tokenization On current master it takes ~3.2 seconds to tokenize Wikitext. With this change it becomes ~525 ms. * Fixit: it was missing the piece after the last found occurence --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-27 16:50:33 +03:00
Georgi Gerganov	eaa13a48ff	falcon : fix CUDA inference by making K and Q contiguous (#2830 ) * falcon : fix CUDA inference by making K and Q contiguous ggml-ci * cuda : add assert to guard from non-cont ropes	2023-08-27 16:40:48 +03:00
Georgi Gerganov	da7455d046	readme : fix headings	2023-08-27 15:52:34 +03:00
Georgi Gerganov	25423e9185	scripts : helper convert script	2023-08-27 15:24:58 +03:00
Kawrakow	a6d1189fdd	k_quants tuning for Falcon-7b (#2816 ) * Make ggml-cuda.cu build with QK_K = 64 Using LLAMA_CUDA_FORCE_DMMV = ON and -nommq it runs and produces a meaningful result. * k_quants tuning for Falcon-7b --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-27 15:19:59 +03:00
Georgi Gerganov	c48c5bb0b0	readme : update hot topics	2023-08-27 14:44:35 +03:00
Georgi Gerganov	d0cee0d36d	gguf : add 64-bit support (GGUF v2) (#2821 ) * gguf : bump version to 2 * gguf : add support for 64-bit (no backwards comp yet) * gguf : v1 backwards comp * gguf.py : bump GGUF version * gguf.py : uint64_t on all lengths, sizes and counts, enums still uint32_t * gguf.py : string lengths uint32_t * gguf : update all counts to 64-bit * gguf.py : string len uint64_t and n_dims uint32_t * gguf : fix typo * llama.cpp : print gguf version --------- Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>	2023-08-27 14:19:54 +03:00
Georgi Gerganov	edd4c14817	llama : more tokenizer fixes (#2810 ) * tests : write a Python tokenizer test (wip) * llama : prefix input text for tokenization with whitespace * llama : distinguish pieces from decoded text + fix detokenization * common : add comments * examples : no longer manually add leading space when tokenizing * tests : use Python to generate tokenizer tests for C++ * tests : add option to tokenize text files ggml-ci * tests : add test-tokenizer-1.py * llama.cpp : fix LF token * hellaswag : move the concat space for clarity * tests : add falcon tests (py + cpp, currently do not pass Unicode) ggml-ci * common : temporary separate llama_detokenize calls for SPM and BPE --------- Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>	2023-08-27 14:19:19 +03:00
Przemysław Pawełczyk	1591e2e590	ggml : detect SSSE3 (#2825 ) * ggml : add ggml_cpu_has_ssse3 * llama : show SSSE3 in system info	2023-08-27 11:10:25 +03:00
slaren	789c8c945a	ci : add LoRA test to CI (#2650 ) * ci : add lora test ggml-ci * move lora summary to the top, add lora logs ggml-ci * ci : decrease CPU ppl runs to 2 to avoide 20 min timeout ggml-ci * add 7b lora test use 1 thread for CUDA generation tests ggml-ci * add test with q8_0 (cpu only) ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-27 10:03:27 +03:00
Concedo	9d5b4238e8	added config to class.py	2023-08-27 10:32:01 +08:00
Bruce MacDonald	c1ac54b77a	server : add `/detokenize` endpoint (#2802 ) * Add a /detokenize endpoint to the example server * remove trailing white-space	2023-08-27 07:11:45 +08:00
Kerfuffle	730d9c681e	convert.py : advanced option (#2753 ) * Allow convert.py to convert to q8_0 Fix issue with bounded_parallel_map and greedy consuming iterator Display elapsed time during conversion * Add --concurrency option Minor improvements to help text Clean up bounded_parallel_map function a bit * Massive speed improvement thanks to Cebtenzzre * Refactor types	2023-08-26 23:13:36 +03:00
Tim Miller	c7d92e6dfe	llama : use Unicode Escape Sequence to replace encoded characters (#2814 ) The use of special characters within source files can break compiling on some computers with different region and language settings. Using Unicode escape sequences should allow for the code to be compiled on all setups without needing to change your computers settings or switch regions.	2023-08-26 21:27:07 +03:00
Tungsten842	61d1a2895e	flake.nix : add rocm support and cleanup (#2808 )	2023-08-26 21:19:44 +03:00
Cebtenzzre	741ca7dd1c	llama : move #includes out of _GNU_SOURCE conditional (#2817 )	2023-08-26 21:17:51 +03:00
Dr. Tom Murphy VII Ph.D	72f895c923	main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (#1528 ) * Fix bug in main.cpp where penalize_nl=false has no effect. It modifies the underlying logits array, but at this point we are already working on the candidates copy. * Suppress redefinition warning for NOMINMAX on mingw. In my installation, this macro is already defined by /usr/lib/gcc/x86_64-w64-mingw32/11/include/c++/x86_64-w64-mingw32/bits/os_defines.h:45. * main : fix indentation * main : pass ctx to llama_token_nl() --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-08-26 21:12:56 +03:00
Cebtenzzre	50526f37eb	llama : use std::abs in llama_sample_tail_free (#2800 ) Plain 'abs' casts the input to int.	2023-08-26 19:53:52 +03:00

1 2 3 4 5 ...

1933 commits