Commit graph

  • e5e7183299 use const in methods mendax0110 2023-07-02 18:43:29 +02:00
  • f713dd515d add /v1/ endpoints binding jwj7140 2023-07-03 00:50:46 +09:00
  • 685d236d8b Add BPE dropout support, use it in training. Howard Su 2023-07-02 22:57:14 +08:00
  • c3e3733c61
    ROCm fixes Henri Vasserman 2023-07-02 15:51:31 +03:00
  • 15db19ae7b
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-02 15:39:57 +03:00
  • 7dcffd7a03 set n_keep to -1 jwj7140 2023-07-02 20:45:29 +09:00
  • 377ecf9e9b fix bugs jwj7140 2023-07-02 20:17:03 +09:00
  • 3d2907d208 make gptneox and gptj work with extended context too Concedo 2023-07-02 18:28:09 +08:00
  • d6b47e6a5b Merge branch 'master' into concedo_experimental Concedo 2023-07-02 17:26:39 +08:00
  • e17c8497cf switched to NTK aware scaling Concedo 2023-07-02 17:25:08 +08:00
  • 9bfaf7ddd0
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-02 10:29:26 +02:00
  • e19483ca0f increase scratch for above 4096 Concedo 2023-07-02 14:55:08 +08:00
  • 46088f7231 ggml : fix build with OpenBLAS (close #2066) master-46088f7 Georgi Gerganov 2023-07-02 09:46:46 +03:00
  • b85ea580d3 Merge branch 'master' into concedo_experimental Concedo 2023-07-02 14:45:25 +08:00
  • da7d2f9587 Adjust Metal buffer allocation to avoid allocating beyond MTLDevice.recommendedMaxWorkingSetSize Kilty McGowan 2023-07-01 21:33:16 -07:00
  • cc06f1171b Fix crash of test-tokenizer-0 under Debug build Howard Su 2023-07-01 22:37:26 +08:00
  • cc3c86f6ea
    Merge pull request #9 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-07-02 08:02:14 +08:00
  • 71f829678a examples/common.h: put all bool variables in gpt_params together Wang Haoran(Robin) 2023-07-02 08:01:19 +08:00
  • 1a70a80369 examples/common.h: put all bool variables in gpt_params together Wang Haoran(Robin) 2023-07-02 08:00:13 +08:00
  • ad807731d9
    Merge branch 'ggerganov:master' into master WangHaoranRobin 2023-07-02 07:54:40 +08:00
  • adb97e8818
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-01 23:42:15 +02:00
  • 0bc2cdfc87
    Better CUDA synchronization logic (#2057) master-0bc2cdf Johannes Gäßler 2023-07-01 21:49:44 +02:00
  • befb3a3562
    Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +02:00
  • b213227067
    cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +02:00
  • 2f8cd979ec
    metal : release buffers when freeing metal context (#2062) master-2f8cd97 Aaron Miller 2023-07-01 11:14:59 -07:00
  • 471aab6e4c
    convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +08:00
  • ef3b8dc0d9 GPU accel for rwkv is slow, disable it Concedo 2023-07-02 00:41:46 +08:00
  • e1a7042943 try out the new rwkv but it seems worse, may revert Concedo 2023-07-02 00:10:56 +08:00
  • 463f2f4c4f
    llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +03:00
  • cb44dbc7de
    llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +08:00
  • 79f634a19d
    embd-input : fix returning ptr to temporary master-79f634a Georgi Gerganov 2023-07-01 18:46:00 +03:00
  • 04606a1599
    train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +03:00
  • b1ca8f36a9
    ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +08:00
  • 2353509b78 cmake: don't force -mcpu=native on aarch64 Daniel Drake 2023-07-01 09:12:16 +02:00
  • 632bf27b65 more granular context size selections Concedo 2023-07-01 11:02:44 +08:00
  • 1a3e8ad6db release metal buffers when freeing metal context Aaron Miller 2023-06-30 16:08:37 -07:00
  • d412bbbcdc
    Merge branch 'ggerganov:master' into master m3ndax 2023-06-30 22:55:21 +02:00
  • 94ba56184e Better CUDA synchronization logic JohannesGaessler 2023-06-30 19:19:43 +02:00
  • 36cd5d85e9 Avoid requesting dedicated memory, VMA can decide that by itself 0cc4m 2023-06-30 21:20:19 +02:00
  • 4ea9b2fd4b Add VMA library 0cc4m 2023-06-30 21:15:06 +02:00
  • c8ff09bdc7 dequant_q4_0 kernel 0cc4m 2023-06-30 20:48:42 +02:00
  • cb5cb4d6e2 Fix f16_to_f32 kernel 0cc4m 2023-06-30 20:14:11 +02:00
  • df3cdbdac7 Output FP32 in fp16 matmul shader 0cc4m 2023-06-29 20:15:39 +02:00
  • 40c8f843f2 Fix mulmat_f16 0cc4m 2023-06-29 20:04:36 +02:00
  • c31e14b2fd Enable device extensions properly, restore fp16 matmul op 0cc4m 2023-06-29 06:46:17 +02:00
  • fc5bb53b32 Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel 0cc4m 2023-06-28 20:18:55 +02:00
  • 3adc7b1d60 First FP16 attempt, disabled for now 0cc4m 2023-06-28 07:36:56 +02:00
  • 2c70df985a Continue vulkan implementation and optimization 0cc4m 2023-06-25 15:17:23 +02:00
  • 0c9cca00bd Write coalescing 0cc4m 2023-06-25 09:54:40 +02:00
  • 7c6860b483 2D Blocktiling 0cc4m 2023-06-24 18:40:11 +02:00
  • 1b4863c2b9 1D Blocktiling 0cc4m 2023-06-24 08:01:43 +02:00
  • baf9ff536b GEMM Kernel optimization 0cc4m 2023-06-23 14:43:57 +02:00
  • a42376e7ec First matmul success 0cc4m 2023-06-22 09:46:00 +02:00
  • 8ce84c2747 Continue implementation 0cc4m 2023-06-21 00:26:48 +02:00
  • 2471728a9d Add aligned malloc and free for VMA 0cc4m 2023-06-13 12:00:06 +02:00
  • fc4f207cfb Matmul call 0cc4m 2023-06-12 09:57:26 +02:00
  • b0e65855d1 Vulkan development 0cc4m 2023-06-12 08:01:38 +02:00
  • a4004d4fa8 Vulkan memory management 0cc4m 2023-06-11 19:26:52 +02:00
  • 88d4ec05a8 Continue implementation 0cc4m 2023-06-11 08:49:43 +02:00
  • 4a96d0eb7f Fix matmul kernel, continue implementation 0cc4m 2023-06-10 16:24:37 +02:00
  • 061246fb07 Vulkan loader code 0cc4m 2023-05-07 07:22:12 +02:00
  • eda663f15f update lite and up ver Concedo 2023-07-01 00:15:26 +08:00
  • 0cb8a9eab3 Merge remote-tracking branch 'Johannes/cuda-scratch-size-adjust' into concedo_experimental Concedo 2023-06-30 23:29:38 +08:00
  • 67cb0b2760 Merge branch 'master' into concedo_experimental Concedo 2023-06-30 23:25:40 +08:00
  • d16926dff4 Merge branch 'concedo' into concedo_experimental Concedo 2023-06-30 23:06:21 +08:00
  • baf6325907 added flag for building kquants in tools Concedo 2023-06-30 23:06:11 +08:00
  • 30ea774e2c
    Update CMakeLists.txt with dmmv_x/y/f16 (#277) YellowRoseCx 2023-06-30 09:52:32 -05:00
  • 1129d66ca9
    To fix build problem on Apple Metal LLAMA_METAL=1 (#282) bebopkim 2023-06-30 23:50:38 +09:00
  • f0e1429d7f Implemented RMS_NORM niansa 2023-06-30 16:01:08 +02:00
  • d1f84db4b6 Implemented GGML_OP_NORM niansa 2023-06-30 15:18:10 +02:00
  • 8fa60134b1 Added missing break to mul_mat_f16 case niansa 2023-06-30 12:47:17 +02:00
  • 0dc5f2f2ba Fixed mul mat dispatch size niansa 2023-06-30 12:31:13 +02:00
  • f093bf2e5e Minor MUL_MAT fix and implemented DIAG_MASK_INF niansa 2023-06-30 12:19:29 +02:00
  • 964fe8c546 Added mul_mat (needs fixes) niansa 2023-06-30 11:47:10 +02:00
  • 600bf6d929 Test-based VRAM scratch size + context adjustment JohannesGaessler 2023-06-30 11:35:30 +02:00
  • 8e215e4d9f add support of baichuan-7b Judd 2023-06-30 15:29:26 +08:00
  • 86469d15c4 fix for yr-rocm, large gpu scratch Concedo 2023-06-30 12:40:08 +08:00
  • dedd2067e8 convert: spike out xgen support Aman Karmani 2023-06-29 19:08:57 -07:00
  • b95016c19b add newline jwj7140 2023-06-30 01:08:34 +09:00
  • 1347d3acc0 another missing flag? Concedo 2023-06-30 00:02:18 +08:00
  • 396f857021 make platform appropriate library Concedo 2023-06-29 23:50:48 +08:00
  • f50c73a0b2 readme Concedo 2023-06-29 23:45:57 +08:00
  • d7435fe320 fix whitespace, edit README.md jwj7140 2023-06-30 00:03:02 +09:00
  • ad945e2c41 make instructions clearer Concedo 2023-06-29 22:13:39 +08:00
  • 64aba0a151 update readme Concedo 2023-06-29 21:42:04 +08:00
  • b8c8dda75f
    Use unsigned for random seed (#2006) master-b8c8dda Howard Su 2023-06-29 21:15:15 +08:00
  • f09debb1ec remove debug Concedo 2023-06-29 20:54:56 +08:00
  • 966d736582 revert cublasLt removal Concedo 2023-06-29 20:51:02 +08:00
  • 10a2bdfaf1 Merge remote-tracking branch 'upstream/ik/context_extend' into concedo_experimental Concedo 2023-06-29 20:35:17 +08:00
  • 749d6179a8 Snake case all functions niansa 2023-06-29 14:23:00 +02:00
  • c7c6e522e7 bigger scratch buffers for bigger context Concedo 2023-06-29 19:43:23 +08:00
  • 86b061b98c wip on unified cublas integration, add all the small libraries but exclude the large ones Concedo 2023-06-29 18:35:31 +08:00
  • c2f1ed6556 fix compile errors Concedo 2023-06-29 17:54:12 +08:00
  • dff5575647 Merge branch 'master' into concedo_experimental Concedo 2023-06-29 17:35:28 +08:00
  • 5ac68ccacb Cleanups niansa 2023-06-29 11:14:21 +02:00
  • 4b3a1282f0 Add flag for lowvram directly into cublas launch param Concedo 2023-06-29 17:07:31 +08:00
  • 13c8d87111 breaking change: deprecate GGML_TASK_INIT and GGML_TASK_FINALIZE. Will not be scheduled unless explicitly enabled. mqy 2023-06-29 17:06:00 +08:00
  • 746f5fa9e9 update lite Concedo 2023-06-29 16:44:39 +08:00
  • f8baad235d use struct for grammar elements and add Unicode support Evan Jones 2023-06-20 00:06:38 -04:00
  • 96a712ca1b
    Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +08:00