Commit graph

  • 09f5d3c4ab Revert scripts work crasm 2024-01-19 17:44:10 -05:00
  • 09db1a7cf3 Merge branch 'gg/flash-attn' of https://github.com/ggerganov/llama.cpp into flash-attn-cuda FSSRepo 2024-01-19 17:38:47 -05:00
  • 66510528f7 Add venv to ci/run.sh crasm 2024-01-19 16:44:47 -05:00
  • 32a392fe68 try a differerent fix ceb/fix-msvc-build Jared Van Bortel 2024-01-19 17:10:23 -05:00
  • e15c61635f perplexity : fix MSVC build after #5020 Jared Van Bortel 2024-01-19 16:38:29 -05:00
  • 4a3bc1522e py : linting with mypy and isort ceb/restore-convert Jared Van Bortel 2024-01-19 12:38:18 -05:00
  • ffdd051ab5 convert : update GGML script to use VocabFactory Jared Van Bortel 2024-01-19 12:27:58 -05:00
  • cb4605fe47 convert : partially revert PR #4818 Jared Van Bortel 2024-01-19 12:19:25 -05:00
  • 381ee19572
    finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033) Uzo Nweke 2024-01-19 13:20:50 -05:00
  • 38f2e6e7c3 remove ggml_allocr_free as suggested in issue #4791 Uzo Nweke 2024-01-19 13:12:27 -05:00
  • fa7ebcca99 ggml : fix GQA support in ggml_flash_attn_ext Georgi Gerganov 2024-01-19 20:06:26 +02:00
  • a5cacb22b2
    imatrix : add README.md Georgi Gerganov 2024-01-19 15:24:47 +02:00
  • 888f1f5439 support minLength and maxLength in JSON schema grammar converter nopperl 2024-01-19 13:47:43 +01:00
  • 9b75cb2b3c
    llama : support upcoming Qwen2 (#5037) Shijie 2024-01-19 19:53:13 +08:00
  • de9a147df1 py : fix flake8 lint Georgi Gerganov 2024-01-19 13:52:22 +02:00
  • 7051aacfac
    winogrande: evaluate log-probs in parallel (#5036) Kawrakow 2024-01-19 11:39:11 +02:00
  • 3ca015324e support qwen2 simonJJJ 2024-01-19 17:34:49 +08:00
  • 2b3b999cac
    llama : add CodeShell support (#5016) chiranko 2024-01-19 17:07:27 +08:00
  • 993fba8180
    perplexity: avoid unnecessary alloocations and logit copies (#5035) Kawrakow 2024-01-19 11:02:39 +02:00
  • e54fcbcbb7 winogrande: evaluate log-probs in parallel Iwan Kawrakow 2024-01-19 10:58:15 +02:00
  • 8b20858e5e
    perplexity : faster Winogrande via batching (#5024) Georgi Gerganov 2024-01-19 10:45:06 +02:00
  • 9e4ad80cfc perplexity : only tokenize selected tasks for Winogrande Georgi Gerganov 2024-01-19 10:38:50 +02:00
  • a2ac9e427f perplexity: avoid unnecessary alloocations and logit copies Iwan Kawrakow 2024-01-19 10:16:16 +02:00
  • 9a11611a7d scripts : add some fancy conversion from snake_case to PascalCase crasm 2024-01-19 02:09:57 -05:00
  • 9303bbf1b1 delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake Chenxiaotao03 2024-01-19 12:50:01 +08:00
  • cc4ff992f5
    llama.cpp: fix codeshell with NeoX rope chiranko 2024-01-19 11:02:17 +08:00
  • f783c5971f Fix issue with alloc causing max_compute_size to be calculated Uzo Nweke 2024-01-18 21:40:19 -05:00
  • a2c94ae5be exposed exponent_val in dynamic temp sampler l3utterfly 2024-01-19 10:05:59 +09:00
  • 633c502d56 Resolve ggml_backend_sched_eval_callback visibility Britt Lewis 2024-01-18 18:44:43 -05:00
  • a34648d35e scripts : switch to PascalCase for functions crasm 2024-01-18 18:21:19 -05:00
  • 57e2a7a52a
    llama : fix falcon arch for tied output embeddings (#4978) John 2024-01-18 23:12:15 +01:00
  • 1453215165
    kompute : fix ggml_add kernel ceb/nomic-vulkan-fix-add Georgi Gerganov 2024-01-19 00:09:16 +02:00
  • 85070cff2c
    Update llama.cpp John 2024-01-18 22:59:21 +01:00
  • 610394fff8 fix supported ops for kompute backend Jared Van Bortel 2024-01-18 15:32:55 -05:00
  • 9b6ea4263a
    cmake : add ggml public headers (#5011) Georgi Gerganov 2024-01-18 23:36:07 +02:00
  • 7addf2b878 never try to evaluate an empty command buffer Jared Van Bortel 2024-01-18 16:11:00 -05:00
  • bb58b0e76f
    perplexity : remove unused function Georgi Gerganov 2024-01-18 22:40:58 +02:00
  • 821f0a271e
    server : defer tasks when "slot unavailable" (#5018) Xuan Son Nguyen 2024-01-18 21:33:05 +01:00
  • 96d7f56d29
    llama : fix mlock with no-mmap with Metal (#5025) slaren 2024-01-18 21:12:15 +01:00
  • c8e5f9c7cb llama : fix mlock with no-mmap with Metal slaren 2024-01-18 20:45:13 +01:00
  • 2d5419d08a
    imatrix : fix assert for src0 non-cont check Georgi Gerganov 2024-01-18 21:45:51 +02:00
  • c0f3474ed5 Fix compiler warnings 0cc4m 2024-01-18 20:44:00 +01:00
  • 0f3ed789b3
    perplexity : faster Winogrande via batching Georgi Gerganov 2024-01-18 21:35:37 +02:00
  • d391ae9b49
    perplexity : fix winogrande N tasks option Georgi Gerganov 2024-01-18 20:49:00 +02:00
  • e9240cdfa0
    scripts : add get-winogrande.sh Georgi Gerganov 2024-01-18 20:45:39 +02:00
  • f84c54fe23 Fix warning about empty C function parameters 0cc4m 2024-01-18 19:34:56 +01:00
  • 1811c4ec9b Replace uint64_t(-1) with UINT64_MAX, rename function for clarity 0cc4m 2024-01-18 19:34:21 +01:00
  • b46757735d
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019) David Sommers 2024-01-18 12:20:59 -05:00
  • 3e945cc1e9
    HellaSwag: speed up by parallelizing log-prob evaluation (#5020) Kawrakow 2024-01-18 19:18:21 +02:00
  • 16bc3c3be8 sync op_rope_f16 with recent op_rope_f32 changes Jared Van Bortel 2024-01-18 11:56:00 -05:00
  • a1c004ef2e
    ggml : add ggml_flash_attn_ext API Georgi Gerganov 2024-01-18 17:42:55 +02:00
  • 0f1a958a51 actually fix this assertion Jared Van Bortel 2024-01-18 11:48:27 -05:00
  • a97935e098 clean up old backend code Jared Van Bortel 2024-01-18 11:48:12 -05:00
  • 696faa8660
    kompute : fix rope_f32 and scale ops (#5008) Georgi Gerganov 2024-01-18 18:49:39 +02:00
  • 2c36544741
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 - take 2 David Sommers 2024-01-18 11:35:02 -05:00
  • e53de2866a fix compilation FSSRepo 2024-01-18 11:27:07 -05:00
  • ccc78a200e hellaswag: speed up even more by parallelizing log-prob evaluation ik/faster_hellaswag Iwan Kawrakow 2024-01-18 18:25:29 +02:00
  • 2d14b22a99 Merge upstream changes, implement basic vulkan backend 0cc4m 2024-01-18 16:54:12 +01:00
  • ad19812cda
    perplexity : faster HellaSwag via batching (#5017) Georgi Gerganov 2024-01-18 15:33:01 +02:00
  • 558cd1d631 remove unnecessary log Xuan Son Nguyen 2024-01-18 14:21:24 +01:00
  • bf0daf49d6 server: defer task when no slot is available Xuan Son Nguyen 2024-01-18 14:19:52 +01:00
  • 9df62c25f7 perplexity : remove HellaSwag restruction for n_batch Georgi Gerganov 2024-01-18 14:58:13 +02:00
  • 64d173bc9c perplexity : option to specify max batched tasks via n_parallel Georgi Gerganov 2024-01-18 14:43:33 +02:00
  • 30ebd94723 perplexity : add comments Georgi Gerganov 2024-01-18 14:20:11 +02:00
  • 0e4e58ff1b Merge branch 'master' into gg/hellaswag-batched Georgi Gerganov 2024-01-18 14:02:19 +02:00
  • af309010ab perplexity : no need for decode_helper Georgi Gerganov 2024-01-18 13:57:58 +02:00
  • 4351c4632d perplexity : clean-up Georgi Gerganov 2024-01-18 13:51:02 +02:00
  • 682986a08e
    Add Winogrande evaluation (#5015) Kawrakow 2024-01-18 13:46:27 +02:00
  • baa5279d02 perplexity : faster HellaSwag Georgi Gerganov 2024-01-18 13:42:25 +02:00
  • d70e48dedf llama: add codeshell support chiranko 2024-01-18 10:00:52 +00:00
  • dcad445d0c
    scritps : add helper script to get hellaswag data in txt format Georgi Gerganov 2024-01-18 11:44:49 +02:00
  • 04fee216d3 scripts : stub out new ci-run.sh script crasm 2024-01-18 03:54:58 -05:00
  • 1e605f4102
    metal : fix memory leak, dangling pointer and unused autorel (#5007) Paul Tsochantaris 2024-01-18 08:47:24 +00:00
  • e3a17dcb64 winogrande: add dataset instructions Iwan Kawrakow 2024-01-18 10:17:02 +02:00
  • f5f46ebc64 scripts : add lib.sh and lib_test.sh crasm 2024-01-18 03:05:08 -05:00
  • e0d4439871 winogrande: improving Iwan Kawrakow 2024-01-18 09:57:30 +02:00
  • 2605b92027 winogrande: somewhat better Iwan Kawrakow 2024-01-18 09:23:04 +02:00
  • e1f91ae24a removed unused temp parameter in llama_sample_entropy l3utterfly 2024-01-18 09:02:10 +09:00
  • 011a74572e
    Merge branch 'master' into dynamic-temp l3utterfly 2024-01-18 08:59:55 +09:00
  • 1c98f9933c Reverting symlinks Paul Tsochantaris 2024-01-17 21:52:02 +00:00
  • 130eac8660 SPM header potential fix Paul Tsochantaris 2024-01-17 21:40:29 +00:00
  • f7bcfb0566 cuda: add flash attention + test FSSRepo 2024-01-17 16:38:28 -05:00
  • 681f6a1f7c
    kompute : fix rope_f32 and scale ops Georgi Gerganov 2024-01-17 22:50:56 +02:00
  • f470042e71 Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute Paul Tsochantaris 2024-01-17 20:26:41 +00:00
  • 09db8bd598 winogrande: simple implementation Iwan Kawrakow 2024-01-17 21:40:52 +02:00
  • 02b9bafe29 kompute : ignore exceptions in ggml_vk_available_devices (#12) Jared Van Bortel 2024-01-17 13:47:03 -05:00
  • 6b6916b215
    sync : ggml Georgi Gerganov 2024-01-17 20:54:50 +02:00
  • 38566680cd
    ggml : add IQ2 to test-backend-ops + refactoring (#4990) Georgi Gerganov 2024-01-17 18:54:56 +02:00
  • dcf8dc7292
    Added support ccache for speedup recompilation Herman Semenov 2024-01-17 16:52:19 +00:00
  • ba69bbc84c
    imatrix : offload to GPU support (#4957) Georgi Gerganov 2024-01-17 18:46:30 +02:00
  • 2917e6b528
    Merge branch 'master' into gg/imatrix-gpu-4931 gg/imatrix-gpu-4931 Georgi Gerganov 2024-01-17 18:41:47 +02:00
  • 44a1a4a41a
    backend : add eval callback (#4935) Georgi Gerganov 2024-01-17 18:39:41 +02:00
  • c918fe8dca
    metal : create autorelease pool during library build (#4970) Georgi Gerganov 2024-01-17 18:38:39 +02:00
  • 0f83e727af
    py : fix whitespace Georgi Gerganov 2024-01-17 18:37:36 +02:00
  • 06b4979149
    test : simplify Georgi Gerganov 2024-01-17 18:15:15 +02:00
  • de9b0bbbe4 add sanity check and fix kompute teardown order Jared Van Bortel 2024-01-17 10:09:27 -05:00
  • f5f6e2729e
    Merge d3f155733f into 4f4bf35f46 ct-clmsn 2024-01-17 21:47:39 +08:00
  • 4f4bf35f46
    py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971) Georgi Gerganov 2024-01-17 15:45:03 +02:00
  • 23742deb5b
    py : fix padded dummy tokens (I hope) gg/fix-spm-added-tokens-dict-4958 Georgi Gerganov 2024-01-17 15:44:22 +02:00
  • a0dee649b8
    Update llama.cpp John 2024-01-17 14:38:54 +01:00