Commit graph

  • ea858eee03 first fixes. Julia Longtin 2024-03-23 15:56:47 +00:00
  • feed51c3f4 attempt to speed up float clearing. Julia Longtin 2024-03-23 15:55:00 +00:00
  • 2ed306623c allow using code from ggml-phi-knc-dot_q5_K_q8_K.c Julia Longtin 2024-03-23 15:02:56 +00:00
  • d5f39c3caa force to compile. Julia Longtin 2024-03-23 14:58:33 +00:00
  • b794e48ff8 tell ggml-common.h to export what we want. Julia Longtin 2024-03-23 14:49:35 +00:00
  • 2c5daab90f pull in ggml specific types. Julia Longtin 2024-03-23 14:38:15 +00:00
  • 7080280c5b import stdio.h for size_t. Julia Longtin 2024-03-23 14:29:59 +00:00
  • 96dce97091 import stdint.h for sizeSt. Julia Longtin 2024-03-23 14:28:29 +00:00
  • 0e6c910db9 begin work on targeting dot_q5_K_q8_K. Julia Longtin 2024-03-23 14:19:47 +00:00
  • 16cbe5dd81 be more specific about the length of our list of run amounts. Julia Longtin 2024-03-21 20:38:49 +00:00
  • c605e951dc spacing changes. Julia Longtin 2024-03-21 18:36:25 +00:00
  • 56be29fc58 formatting changes. Julia Longtin 2024-03-20 21:34:12 +00:00
  • 97c69835dc use the same header as ggml.c, and remove some warnings. Julia Longtin 2024-03-20 21:12:22 +00:00
  • 580a347e59 remove intrinsics import, and use upConv to save 12 bytes of memory transit. Julia Longtin 2024-03-20 20:15:16 +00:00
  • 9ba28eaed3 Update ggml-phi-knc.c Julia Longtin 2024-03-17 21:36:14 +00:00
  • 72e2b13185 add a benchmark / test binary. Julia Longtin 2024-03-17 21:20:14 +00:00
  • 6f699fc98d merge from upstream Julia Longtin 2024-03-17 21:15:32 +00:00
  • 926b0e8076 Update ggml.c Julia Longtin 2024-03-16 14:17:21 +00:00
  • 6e1b77ad58 Update ggml.c Julia Longtin 2024-03-16 14:15:51 +00:00
  • f940c96aac Update ggml.c Julia Longtin 2024-03-16 14:13:22 +00:00
  • 2458643dac implement F32 dot products. Julia Longtin 2024-03-16 14:05:03 +00:00
  • 59ce785f61 import intrinsics. Julia Longtin 2024-03-13 19:26:54 +00:00
  • c08ddb831f use right type, and define GGML_F32_VEC_ZERO. Julia Longtin 2024-03-13 19:23:53 +00:00
  • 25095cac23 try to implement one intrinsic Julia Longtin 2024-03-13 19:18:10 +00:00
  • 8f6e535edc try to detect the PHI cross compiler in make. Julia Longtin 2024-03-12 21:54:38 +00:00
  • f7f174ecc9 try to detect the PHI cross compiler in make. Julia Longtin 2024-03-12 21:40:46 +00:00
  • b9e2f2a332 instead of checking on glibc, check on SYS_getcpu Julia Longtin 2024-03-12 21:07:10 +00:00
  • 78291d93b9 handle the case that we have no glibc on the PHI. Julia Longtin 2024-03-12 21:02:14 +00:00
  • 757f952046 add detection of Xeon PHI: Knights Corner. Julia Longtin 2024-03-12 20:57:43 +00:00
  • 104abc6df2 fix whitespace Steve Grubb 2024-05-13 18:00:58 -04:00
  • 0a9eb98ee0 fix whitespace Steve Grubb 2024-05-13 17:59:55 -04:00
  • 94061d58e7 llama : disable pipeline parallelism with nkvo sl/disable-pp-nkvo slaren 2024-05-13 23:17:53 +02:00
  • 28ddd2c474 ChatON: ChatParts dump returns info str rather than direct logging HanishKVC 2024-05-14 02:11:10 +05:30
  • 4dfd10a40d ChatON: Move core templating/tagging code into ChatTemplates class HanishKVC 2024-05-14 01:49:38 +05:30
  • 600653dae2 ChatON:Optional control of MsgCntBasedTagging HanishKVC 2024-05-14 01:27:24 +05:30
  • 6e13c0c87e ChatON:Control SystemMsgSuffix+End tags only wrt 1st system msg HanishKVC 2024-05-14 01:19:04 +05:30
  • 3fcaf19967 ChatON+:Multi4Single: applyGlobalIfAny flag wrt templating api HanishKVC 2024-05-14 00:58:16 +05:30
  • 8165bd4035 ChatON:WIP:chaton_tmpl_apply_single build on multi msg tagging HanishKVC 2024-05-14 00:44:47 +05:30
  • 8e04c3ce97 server: free sampling contexts on exit Steve Grubb 2024-05-13 15:07:47 -04:00
  • fe0c9ce646 ChatON:BasicCheck+:return a string with info, dont directly log HanishKVC 2024-05-14 00:08:33 +05:30
  • a31473c023
    Merge 71c98cc3bd into ee52225067 Johannes Gäßler 2024-05-13 14:20:53 -04:00
  • ee52225067
    convert-hf : support direct Q8_0 conversion (#7234) compilade 2024-05-13 14:10:51 -04:00
  • cfe659d90a
    feat: Add option for adding generation prompt teleprint-me 2024-05-13 13:12:38 -04:00
  • da96fdd15f
    patch: Apply patch for advisories/GHSA-56xg-wfcc-g829 teleprint-me 2024-05-13 13:10:44 -04:00
  • 614d3b914e
    llama : less KV padding when FA is off (#7257) b2871 Georgi Gerganov 2024-05-13 17:15:15 +03:00
  • 30e70334f7
    llava-cli: fix base64 prompt (#7248) b2870 k.h.lai 2024-05-13 22:02:36 +08:00
  • cbca75cb2c
    llama : less KV padding when FA is off Georgi Gerganov 2024-05-13 16:38:46 +03:00
  • 29d5012042 Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 Srihari-mcw 2024-05-13 06:05:29 -07:00
  • efbb87dba6 ChatON:ChatTemplates:TmplBasicCheck HanishKVC 2024-05-13 17:50:15 +05:30
  • 0cfe99076d ChatON:ChatTemplates: TmplExists, TmplGetKey, TmplRoleGetKeys HanishKVC 2024-05-13 16:51:07 +05:30
  • 3176c2f561
    solve process prompt bug Leo Zhang 2024-05-13 19:32:50 +08:00
  • 1c570d8bee
    perplexity: add BF16 vs. FP16 results (#7150) Johannes Gäßler 2024-05-13 13:03:27 +02:00
  • 184ac322e3 ChatON: Make json_get efficient and flexible wrt its calling HanishKVC 2024-05-13 16:04:42 +05:30
  • 948f4ec7c5
    [SYCL] rm wait() (#7233) b2868 Neo Zhang 2024-05-13 18:11:26 +08:00
  • 22b5f6b71f Merge branch 'master' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-05-13 10:41:48 +02:00
  • 9aa672490c
    llama : rename jina tokenizers to v2 (#7249) b2867 Joan Fontanals 2024-05-13 10:35:14 +02:00
  • ea0f7df2fb Merge branch 'refactor-jina-rename' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-05-13 10:29:55 +02:00
  • fb83012096 refactor: keep refactoring non-breaking Joan Martinez 2024-05-13 10:28:26 +02:00
  • 22a0113299 fix: fix alignment Joan Martinez 2024-05-13 10:27:23 +02:00
  • 0771b175aa Merge branch 'refactor-jina-rename' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-05-13 09:46:23 +02:00
  • 8957cacd98 refactor: rename jina tokenizers to v2 Joan Martinez 2024-05-13 09:40:46 +02:00
  • d0a99aa424 Merge branch 'master' of https://github.com/JoanFM/llama.cpp into feat-jina-embeddings-v2-zh Joan Martinez 2024-05-13 09:38:04 +02:00
  • 33a004e9cc
    llama : more metal-friendly KV cache PAD mlx-challenge Georgi Gerganov 2024-04-08 12:49:04 +03:00
  • eb7554ca3b ChatON: Avoid -> to match simpcfg as well as corresponding keys HanishKVC 2024-05-13 10:37:14 +05:30
  • 2cafc209bd llava-cli: fix base64 prompt Adriankhl 2024-05-13 11:31:32 +08:00
  • b1f8af1886
    convert.py: Outfile default name change and additional metadata support (#4858) Brian 2024-05-13 12:56:47 +10:00
  • e586ee4259
    change default temperature of OAI compat API from 0 to 1 (#7226) b2865 Benjamin Findley 2024-05-12 19:40:08 -07:00
  • 005c1b48f0 convert.py: fix metadata format to sync with LLM_KV_NAMES in llama.cpp brian khuu 2024-05-13 12:26:00 +10:00
  • 58551d0bd2
    chore: Apply updates to vocab models teleprint-me 2024-05-12 21:50:36 -04:00
  • 932ab05d69
    Remove qwen and fix mauled imports teleprint-me 2024-05-12 21:44:31 -04:00
  • 4bd6227f46 convert.py: typo fix brian khuu 2024-05-13 11:31:36 +10:00
  • caf5fc35b8 convert.py: don't stringify Metadata load method output brian khuu 2024-05-13 11:30:48 +10:00
  • fc0007eca5
    Merge branch 'master' into add-stablelm-hash teleprint-me 2024-05-12 21:27:12 -04:00
  • f8bb223924
    refactor: Remove rename from display to render and return result instead of printing teleprint-me 2024-05-12 21:17:04 -04:00
  • 214e9e6f0b
    refactor: Add logging debug and clean up logger implementation teleprint-me 2024-05-12 21:07:05 -04:00
  • 6be3576e01
    feat: Add sane defaults and options for setting special tokens teleprint-me 2024-05-12 20:48:29 -04:00
  • fa0b0b10cc
    feat: Allow toggling verbosity teleprint-me 2024-05-12 20:39:55 -04:00
  • 668c7ee6c5
    refactor: Use render template instead of format teleprint-me 2024-05-12 20:37:27 -04:00
  • 8b9ed888bc
    patch: Handle how templates are rendered if no system prompt is allowed teleprint-me 2024-05-12 20:17:35 -04:00
  • 4a018e706f
    feat: Add assistant turn teleprint-me 2024-05-12 20:08:37 -04:00
  • bf5154f9bc
    docs: Fix filename in docstring and remove return type from main teleprint-me 2024-05-12 20:07:30 -04:00
  • cbf75894d2
    [SYCL] Add oneapi runtime dll files to win release package (#7241) b2864 Neo Zhang 2024-05-13 08:04:29 +08:00
  • 0d5cef78ae
    [SYCL] update CI with oneapi 2024.1 (#7235) Neo Zhang 2024-05-13 08:02:55 +08:00
  • f572213c34
    Merge branch 'ggerganov:master' into gguf-model-template Austin 2024-05-12 19:54:44 -04:00
  • eac2e83f9f
    gguf: Add example script for extracting chat template teleprint-me 2024-05-12 19:51:55 -04:00
  • 71c98cc3bd Server: enable lookup decoding Johannes Gäßler 2024-04-20 08:24:21 +02:00
  • 79b044b0c5 cuda : add half2 __shfl_xor() for ROCm 5.5 Engininja2 2024-05-12 15:18:18 -06:00
  • b7ec12ebf7 Merge branch 'master' into compilade/refactor-kv-cache Francis Couture-Harpin 2024-05-12 17:13:31 -04:00
  • 1bbf54a13c Remove unnecessary declaration Haggai Nuchi 2024-05-12 12:59:30 -07:00
  • d5b0bfbaec SimpCfg: Remove now unused SC_DEBUG, rather GroupKV uses equiv HanishKVC 2024-05-13 00:33:02 +05:30
  • 857570f8f8 SimpCfgTest: Update dump usage to GKV return string semantic HanishKVC 2024-05-13 00:20:58 +05:30
  • 9249649fb3 ChatON+TestPrgs: Use specific log files HanishKVC 2024-05-12 23:59:27 +05:30
  • dc685be466
    CUDA: add FP32 FlashAttention vector kernel (#7188) b2862 Johannes Gäßler 2024-05-12 19:40:45 +02:00
  • 3d33d62924 SimpCfg: Move testing code into its own file in tests HanishKVC 2024-05-12 22:44:58 +05:30
  • 93b9baee73 convert-hf : reduce stacked MoE conversion RAM usage by a third Francis Couture-Harpin 2024-05-12 13:21:35 -04:00
  • f2dd1263fd GroupKV: Move test code into its own file in tests HanishKVC 2024-05-12 22:10:40 +05:30
  • 65a1a58562 convert-hf : add missing ftype to Baichuan and Xverse compilade/q8_0-convert-hf Francis Couture-Harpin 2024-05-12 12:56:03 -04:00
  • 6048218383 SimpCFG: COnvert to GroupKV extended version HanishKVC 2024-05-12 21:58:59 +05:30
  • 6f1b63606f
    cmake : fix version cmp (#7227) b2861 Georgi Gerganov 2024-05-12 18:30:23 +03:00
  • f4f5b7ac56 Removed changes Maximilian Winter 2024-05-12 16:27:32 +02:00