Commit graph

  • 03bdc36e8b minor spaces Pierrick HYMBERT 2024-04-12 22:01:37 +02:00
  • 647a11b1dc eval-callback: also print last n elements of each dimension Pierrick HYMBERT 2024-04-12 21:34:46 +02:00
  • ecbfb1b584
    Wrong input was being fed to moe layer. This needs to be corrected Pierrick Hymbert 2024-04-12 21:41:14 +02:00
  • 542585fbea
    Is silu activation function applied to MODEL_TENSOR.FFN_GATE_EXP here? If so, we must change this to w1 for DBRX. Each expert in DBRX has 3 linear layers: w1, v1 and w2. For an input tensor x, output from the expert layer would be (silu(x.w1_t) * x.v1_t) . w2_t). Same math is also used in mixtral, only difference being DBRX uses v1 instead of w3 in mixtral. Pierrick Hymbert 2024-04-12 21:40:57 +02:00
  • bdc4efe17f
    Is silu activation function applied to MODEL_TENSOR.FFN_GATE_EXP here? If so, we must change this to w1 for DBRX. Each expert in DBRX has 3 linear layers: w1, v1 and w2. For an input tensor x, output from the expert layer would be (silu(x.w1_t) * x.v1_t) . w2_t). Same math is also used in mixtral, only difference being DBRX uses v1 instead of w3 in mixtral. Pierrick Hymbert 2024-04-12 21:40:47 +02:00
  • ab9a3240a9
    JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) b2664 Olivier Chafik 2024-04-12 19:43:38 +01:00
  • 2b7d7d5ee5 Server Side Swift Support Steven Prichard 2024-04-10 11:32:18 -05:00
  • 22faba62ff grammars: more test cases Olivier Chafik 2024-04-12 19:10:57 +01:00
  • ec913426be grammars: fix copy rule skipping (again) & display of expectations Olivier Chafik 2024-04-12 18:56:14 +01:00
  • 2d98ebf0f7
    Update common/grammar-parser.cpp Olivier Chafik 2024-04-12 18:25:20 +01:00
  • 9d8efa545f grammars: disallow a{,} (not allowed in regexps) Olivier Chafik 2024-04-12 18:10:50 +01:00
  • a9351b8f75 grammars: fix copy rule skipping Olivier Chafik 2024-04-12 17:58:16 +01:00
  • ffe321d01e grammars: pretty print rules and chars Olivier Chafik 2024-04-12 17:55:57 +01:00
  • 2e2df72383 grammars: improve test pretty print again Olivier Chafik 2024-04-12 17:40:22 +01:00
  • 0d7347f26e grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) Olivier Chafik 2024-04-12 17:34:55 +01:00
  • 137fbb8f59 Merge remote-tracking branch 'origin/master' into sl/moe-rework-2 slaren 2024-04-12 18:27:51 +02:00
  • 47c3867b6d minor slaren 2024-04-10 13:19:55 +02:00
  • 8938a050cc grammar: parsing tests w/ natural pretty print of updated expectations Olivier Chafik 2024-04-12 17:17:56 +01:00
  • fbbc030ba9
    metal : unify mul_mv_id kernels (#6556) b2663 slaren 2024-04-12 18:13:20 +02:00
  • 0ceb69afbc grammars: refactor parser test Olivier Chafik 2024-04-12 17:00:31 +01:00
  • 6b5518c9da grammars: uniform use of int for min & max Olivier Chafik 2024-04-12 16:47:44 +01:00
  • 39d44279de metal : try to unify mul_mv_id kernels slaren 2024-04-09 01:19:00 +02:00
  • 9d9b5a34f6 grammars: nit Olivier Chafik 2024-04-12 16:15:24 +01:00
  • de0fd3f7f0 grammars: document new repetition operators Olivier Chafik 2024-04-12 16:12:16 +01:00
  • f2030e3210 grammars: handle x{n} and fix x{n,n} Olivier Chafik 2024-04-12 16:11:10 +01:00
  • 01604690c1 grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates Olivier Chafik 2024-04-12 15:11:47 +01:00
  • a40156a077 fix: use other tensors Joan Martinez 2024-04-12 16:09:18 +02:00
  • 9b935e9b80
    Update README.md 4ce 2024-04-12 10:06:56 -04:00
  • dfd4eb3aa6 json: rm useless assert & ggml.h import Olivier Chafik 2024-04-12 14:01:19 +01:00
  • 4cc120c744
    infill : add download instructions for model (#6626) Daniel Bevenius 2024-04-12 14:11:46 +02:00
  • 24ee66ed0d
    server : coherent log output for KV cache full (#6637) b2661 Pierrick Hymbert 2024-04-12 13:49:21 +02:00
  • 6fd5ad597f server: cap n_predict if not set to n_ctx_train Pierrick HYMBERT 2024-04-12 13:38:02 +02:00
  • e4e9bc0081 server: coherent log output for KV cache full Pierrick HYMBERT 2024-04-12 13:09:04 +02:00
  • 747d17a62c feat: create tensors for Jina architecture Joan Martinez 2024-04-12 12:47:48 +02:00
  • 91c736015b
    llama : add gguf_remove_key + remove split meta during quantize (#6591) b2660 jiez 2024-04-12 18:45:06 +08:00
  • 907df4459c Hack test-bench Aidan 2024-04-11 17:20:32 +01:00
  • 675c0f054e Fix cuda mul mat for pascal cc==610 xcnick 2024-04-12 10:25:40 +00:00
  • ba90d5b7e7 json: rm dead code ochafik 2024-04-12 10:59:54 +01:00
  • 4b8b7a6c01 keep to support mmap() Jianyu Zhang 2024-04-12 17:19:45 +08:00
  • 23d58145ce
    squash! infill : add download instructions for model Daniel Bevenius 2024-04-12 11:19:11 +02:00
  • 77e9703de7 refactor the solution, use host buf to fix it, instead of disable mmap Jianyu Zhang 2024-04-12 17:16:36 +08:00
  • 64e305901e json: remove recursion in opt_repetitions (avoids Python stack overflow) ochafik 2024-04-12 10:03:27 +01:00
  • 7e54166562
    Merge branch 'master' into fix_memcpy_crash Neo Zhang Jianyu 2024-04-12 16:55:03 +08:00
  • 5c4d767ac0
    chore: Fix markdown warnings (#6625) Rene Leonhardt 2024-04-12 10:52:36 +02:00
  • ef21ce4ccb
    imatrix : remove invalid assert (#6632) b2658 Georgi Gerganov 2024-04-12 11:49:58 +03:00
  • 8b495540fa
    imatrix : remove invalid assert gg/imatrix-remove-assert Georgi Gerganov 2024-04-12 11:45:12 +03:00
  • dee7f8d692
    Correct free memory and total memory. (#6630) b2657 MasterYi1024 2024-04-12 16:28:12 +08:00
  • 81da18e71c
    eval-callback: use ggml_op_desc to pretty print unary operator name (#6631) b2656 Pierrick Hymbert 2024-04-12 10:26:47 +02:00
  • 958bdda559 Merge remote-tracking branch 'origin/master' into json-faster-repetitions2 ochafik 2024-04-12 09:21:54 +01:00
  • 9ed2737acc
    ci : disable Metal for macOS-latest-cmake-x64 (#6628) b2655 Georgi Gerganov 2024-04-12 11:15:05 +03:00
  • a8c80d2a4e
    ci : disable Metal for macOS-latest-cmake-x64 Georgi Gerganov 2024-04-12 10:09:50 +03:00
  • 5a759a0f80 fix compile error in other os Jianyu Zhang 2024-04-12 15:39:15 +08:00
  • 391decad1c refactor to disable mmap for SYCL backend Jianyu Zhang 2024-04-12 15:23:31 +08:00
  • 9e65768a57 eval-callback: use ggml_op_desc to pretty print unary operator name Pierrick HYMBERT 2024-04-12 09:17:53 +02:00
  • 2bb0d2495f Correct free memory and total memory. MasterYi 2024-04-12 15:16:24 +08:00
  • fc89feeddf model: convert-hf-to-gguf.py remove tiktoken Pierrick HYMBERT 2024-04-11 14:27:15 +02:00
  • 5b64d6f4d3
    infill : add download instructions for model Daniel Bevenius 2024-04-12 07:45:23 +02:00
  • 05e135ad39
    chore: Fix markdown warnings Rene Leonhardt 2024-04-11 15:36:02 +02:00
  • d053698c12 disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax Jianyu Zhang 2024-04-12 09:45:05 +08:00
  • 04a5ac211e
    Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616) Clint Herron 2024-04-11 21:44:50 -04:00
  • ed13d47c1b json: simplify DOT {"type": "string", "pattern": "^.$"} ochafik 2024-04-12 02:39:58 +01:00
  • f7001ccc5a
    As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 (#6619) Clint Herron 2024-04-11 17:44:48 -04:00
  • d031c8f1b6 As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 Clint Herron 2024-04-11 17:04:18 -04:00
  • a474f50ebb
    Refactor Error Handling for CUDA (#6575) Nikolas 2024-04-11 21:56:29 +02:00
  • 80553e53c0 Optimization: eliminate addition of redundant stacks when advancing grammar. Clint Herron 2024-04-11 13:57:10 -04:00
  • cbaadc9294
    grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) Olivier Chafik 2024-04-11 19:47:34 +01:00
  • 1bbdaf6ecd
    ci: download artifacts to release directory (#6612) Hugo Roussel 2024-04-11 19:52:21 +02:00
  • 1e0f466920 grammars: simpler syntax (no swap) Olivier Chafik 2024-04-11 18:51:19 +01:00
  • 3224319b1a ci: download artifacts to release directory Hugo Roussel 2024-04-10 23:54:35 +02:00
  • cb77a8db1d grammars: update gbnf-validator.cpp Olivier Chafik 2024-04-11 16:47:19 +01:00
  • db787a4489 grammars: fix test (api changed) Olivier Chafik 2024-04-11 16:20:41 +01:00
  • 763b41e2aa grammars: fix missing sig change in llama.h Olivier Chafik 2024-04-11 15:47:00 +01:00
  • 47e37dd955 grammars: reuse new_stacks Olivier Chafik 2024-04-11 15:11:40 +01:00
  • 3732ad9c22 grammars: reserve rejects & next candidates Olivier Chafik 2024-04-10 23:05:14 +01:00
  • fa908c0820 Free kv memory z5269887 2024-04-11 22:05:42 +08:00
  • ad0710aa8b Correct loop range for gguf_remove_key and code format z5269887 2024-04-11 21:26:19 +08:00
  • 29ed5d60e1 Find metadata key by enum z5269887 2024-04-11 21:25:06 +08:00
  • f4183afe6a
    scripts : add --outdir option to hf.sh (#6600) Daniel Bevenius 2024-04-11 15:22:47 +02:00
  • 06527c66c3 Merge remote-tracking branch 'origin/master' into hp/model/support-dbrx Pierrick HYMBERT 2024-04-11 14:55:25 +02:00
  • b804b1ef77
    eval-callback: Example how to use eval callback for debugging (#6576) Pierrick Hymbert 2024-04-11 14:51:07 +02:00
  • 12731d21c5 eval-callback: fix make toolchain Pierrick HYMBERT 2024-04-11 14:27:15 +02:00
  • 86a5d96fc6 feat: first things to do Joan Martinez 2024-04-11 14:27:15 +02:00
  • ee588a5c24 eval-callback: renamed from ggml-debug Pierrick HYMBERT 2024-04-11 14:10:47 +02:00
  • 28fd76ffd4 ggml-debug: remove block size Pierrick HYMBERT 2024-04-11 14:07:09 +02:00
  • f489c6bc0e
    squash! scripts : add --outdir option to hf.sh Daniel Bevenius 2024-04-11 13:40:31 +02:00
  • 8d7be2c986 ggml-debug: printing also the sum of each tensor Pierrick HYMBERT 2024-04-11 13:05:45 +02:00
  • bb359cdd7d
    gitignore : ggml-debug Georgi Gerganov 2024-04-11 13:58:58 +03:00
  • 6f4ccce978
    Update Makefile Nikolas 2024-04-11 11:23:18 +02:00
  • 4f61b3066e recover main.cpp zhangkaihuo 2024-04-11 15:48:59 +08:00
  • 582b13c966 for old config zhangkaihuo 2024-04-11 15:35:18 +08:00
  • 0d8017d706
    scripts : add --outdir option to hf.sh Daniel Bevenius 2024-04-11 08:15:48 +02:00
  • 87d5c3e496
    replace filtered characters with underscore Sigbjørn Skjæret 2024-04-11 03:59:14 +02:00
  • cfb820b343 ggml-debug: better tensor type support Pierrick HYMBERT 2024-04-10 23:23:22 +02:00
  • 3f8a93fb7b ci: add curl test Pierrick HYMBERT 2024-04-10 22:50:36 +02:00
  • 831c97efc7 common: allow the warmup to be disabled in llama_init_from_gpt_params Pierrick HYMBERT 2024-04-10 22:42:04 +02:00
  • 52a8e0640a ggml-debug: ci add test curl label Pierrick HYMBERT 2024-04-10 22:36:03 +02:00
  • f84473da64 ggml-debug: tests add the main label Pierrick HYMBERT 2024-04-10 22:18:45 +02:00
  • a42ebbd596 ggml-debug: add to make toolchain Pierrick HYMBERT 2024-04-10 22:12:10 +02:00
  • 0b3392887b doc: add a model: add a link to ggml-debug Pierrick HYMBERT 2024-04-10 21:57:48 +02:00
  • deadf29759 Merge remote-tracking branch 'origin/master' into hp/ggml/debug Pierrick HYMBERT 2024-04-10 21:55:23 +02:00