Commit graph

  • ad675e1c67
    Added support for . (any character) token in grammar engine. (#6467) Clint Herron 2024-06-06 06:08:52 -07:00
  • c1b89b8381 Add integration tests for any-character symbol. Clint Herron 2024-06-05 15:58:43 -07:00
  • d0c008364a Added support for . (any characer) token in grammar engine. Clint Herron 2024-04-03 22:44:19 -04:00
  • 83e4a3f5cc make pathlib explicit Christian Zhou-Zheng 2024-06-06 09:00:59 -04:00
  • 2037eabb64 move kv keys to constants.py Christian Zhou-Zheng 2024-06-06 08:49:46 -04:00
  • 1cbab22225 type consistency in format_n_bytes_to_str Christian Zhou-Zheng 2024-06-06 08:43:26 -04:00
  • 3328b0a991 Shard dataclass and un-negative dont_add_architecture Christian Zhou-Zheng 2024-06-06 08:37:35 -04:00
  • 6a05183b97
    GGUFWriter compatibility fix Christian Zhou-Zheng 2024-06-06 08:28:10 -04:00
  • 706bd69023
    re-add type hint Christian Zhou-Zheng 2024-06-06 08:27:25 -04:00
  • f4c53037ab
    review: remove unused QNN helper functions zhou.weiguo 2024-06-06 20:24:03 +08:00
  • a143c04375
    README minor fixes (#7798) [no ci] Mattheus Chediak 2024-06-06 09:17:54 -03:00
  • e41cbb7b78 README minor fixes chediak 2024-06-06 08:18:22 -03:00
  • b6b6a6caee Update json-schema-to-grammar.cpp ochafik 2024-05-19 02:29:31 +01:00
  • 431edb8e7b json: fix bounds tests ochafik 2024-05-19 00:52:34 +01:00
  • 5a86c6f0e2 json: integration test for schemas ochafik 2024-05-19 00:35:01 +01:00
  • f8db47814b json: proper paren fix ochafik 2024-05-01 02:32:21 +01:00
  • a381deb1b6 json: fix missing paren min/max bug ochafik 2024-05-01 02:23:18 +01:00
  • af63f4fb27 json: handle negative min / max integer bounds ochafik 2024-05-01 01:47:35 +01:00
  • c37c484029 json: min + max integer constraints ochafik 2024-05-01 01:35:53 +01:00
  • d69ccb06a4 json: fix min 0 ochafik 2024-04-30 23:35:33 +01:00
  • 057bbdc1f3 json: support minimum for positive integer values ochafik 2024-04-30 23:23:55 +01:00
  • dd29834c11
    add supportive of quantize data type Q8_0 zhou.weiguo 2024-06-06 17:12:28 +08:00
  • 55b2d0849d
    grammars: x{min,max} repetition operator (#6640) Olivier Chafik 2024-06-06 10:07:06 +01:00
  • 605a6199e9 fix: merge issues Joan Martinez 2024-06-06 10:17:44 +02:00
  • ad571bbeed Fix a typo and add Fedora 40 pacakge to install for Vulkan Patrice Ferlet 2024-06-06 10:26:05 +02:00
  • a8a64fd073 fix: fix preprocessing jina v2 zh Joan Martinez 2024-06-06 10:15:07 +02:00
  • f5d7b268ec
    llama : add jina v2 base code (#7596) Joan Fontanals 2024-06-06 09:22:41 +02:00
  • 4c4d877d23
    style : minor Georgi Gerganov 2024-06-06 10:21:35 +03:00
  • 9262d55780 Changing vulkan GGML_VK_FORCE_MAX_ALLOCATION_SIZE to parse 64-bit not 32-bit richardanaya2_2048b.Q6_K.gguf 2024-06-06 07:01:51 +00:00
  • 2d08b7fbb4
    docker : build only main and server in their images (#7782) slaren 2024-06-06 07:19:49 +02:00
  • d67caea0d6
    docker : add openmp lib (#7780) slaren 2024-06-06 07:17:21 +02:00
  • 1f2e0ee012 finish bitnet e2e Eddie-Wang1120 2024-06-06 12:28:11 +08:00
  • 2bfdb7fe4e support multithreaded dequantization with std::async when openmp is not available slaren 2024-06-06 03:12:50 +02:00
  • 845fa20f26 alloc : reuse same buffer when the same buffer type if used multiple times slaren 2024-06-06 02:16:13 +02:00
  • 77f88e350e add support for out_prod slaren 2024-06-06 01:40:43 +02:00
  • 8641ae1f5a
    build: Fix BUILD_SHARED_LIBS=ON build intelmatt 2024-06-05 16:29:15 -07:00
  • b88957e519 rename GGML_USE_OPENBLAS to GGML_USE_BLAS slaren 2024-06-06 00:35:55 +02:00
  • ce7e6985d2 form shards while adding tensors, SHA256 sums agree with master Christian Zhou-Zheng 2024-06-05 18:29:39 -04:00
  • f7d4b7c343 build only main and server in their docker images sl/fix-docker-main-server-build slaren 2024-06-06 00:13:01 +02:00
  • 3d2e79da7f add openmp lib to dockerfiles sl/fix-docker-omp slaren 2024-06-06 00:05:25 +02:00
  • a9e4addb76
    common : fix --no-ppl Georgi Gerganov 2024-06-05 22:13:23 +03:00
  • 5175117a09 add phi3v projection handling in clip.cpp farris 2024-06-05 11:55:41 -07:00
  • 5ad397d610 reduce diffs with master Christian Zhou-Zheng 2024-06-05 13:49:20 -04:00
  • 7672adeec7
    Fix encoding in python scripts (#7733) Galunid 2024-06-05 19:07:24 +02:00
  • bb5ee02096 simplify even further and standardize with GGUFWriter Christian Zhou-Zheng 2024-06-05 12:49:08 -04:00
  • f6fd3ea4e9 further simplify GGUFManager Christian Zhou-Zheng 2024-06-05 12:28:40 -04:00
  • 57dfc3bcdf hf bitnet e2e v2 Eddie-Wang 2024-06-05 16:01:05 +00:00
  • 7d1a378b8f
    CUDA: refactor mmq, dmmv, mmvq (#7716) b3092 Johannes Gäßler 2024-06-05 16:53:00 +02:00
  • d86efa6c9a fix: merge with code Joan Martinez 2024-06-05 16:30:38 +02:00
  • 7ab6023bc3 Merge branch 'master' of https://github.com/JoanFM/llama.cpp into feat-jina-v2-base-code Joan Martinez 2024-06-05 16:26:19 +02:00
  • 901b86b296
    imatrix : add --save-frequency cli arg Georgi Gerganov 2024-06-05 16:53:19 +03:00
  • fd65ff31e9 mmq_type_traits Johannes Gäßler 2024-06-05 15:44:25 +02:00
  • cbe51d7f3d
    imatrix : migrate to gpt_params Georgi Gerganov 2024-06-05 15:52:49 +03:00
  • 3e9430df33 reduce duplicated code from gguf_writer Christian Zhou-Zheng 2024-06-05 09:29:33 -04:00
  • 7f58793c56 move BLAS to a separate backend slaren 2024-03-20 18:45:05 +01:00
  • 926a8661f3
    review: replace external declaration with NDK header file zhou.weiguo 2024-06-05 21:10:59 +08:00
  • f1164112de
    remove trailing whitespaces sasha0552 2024-06-05 09:02:01 +00:00
  • 2df61bf59e
    add usage sasha0552 2024-06-05 08:41:11 +00:00
  • d15e6623e9
    Merge branch 'master' into use-slot-by-lcs sasha0552 2024-06-05 08:32:03 +00:00
  • 2b3389677a
    ggml : refactor rope norm/neox (#7634) b3091 Georgi Gerganov 2024-06-05 11:29:20 +03:00
  • 076b4a197b hf bitnet v1 Eddie-Wang1120 2024-06-05 16:15:28 +08:00
  • 05659d3c7b fix: remove ollama patches Joan Martinez 2024-06-05 09:15:36 +02:00
  • 3b44f8f658 fix: fix linting issues Joan Martinez 2024-06-05 08:53:26 +02:00
  • 9973e81c5c
    readme : remove -ins (#7759) arch-btw 2024-06-04 23:40:49 -07:00
  • b807393a87
    Removing -ins from README.md arch-btw 2024-06-04 21:54:30 -07:00
  • 9c872cbbce
    refine ggml-qnn-ut program and script to make reviewers happy zhou.weiguo 2024-06-05 12:06:17 +08:00
  • c75817b881
    rebase zhou.weiguo 2024-06-05 10:57:08 +08:00
  • d325088dbf
    ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend zhou.weiguo 2024-04-24 16:28:18 +08:00
  • e84ed9f176 Enable stream updating for the completion shu223 2024-06-05 09:19:19 +09:00
  • c90dbe026b
    Fix per token atrributes bits (#7749) b3089 jaime-m-p 2024-06-05 01:26:14 +02:00
  • c208932a4f Merge remote-tracking branch 'origin/master' into grammar-fast ochafik 2024-06-04 22:48:14 +01:00
  • 2b79d47d3e Merge remote-tracking branch 'origin/master' into grammar-reps ochafik 2024-06-04 22:33:45 +01:00
  • b90dc566c1
    Allow number of nodes in CUDA graph to change (#7738) b3088 agray3 2024-06-04 21:06:49 +01:00
  • 02eb91213e
    Fix no gcc pragma on Windows jojorne 2024-06-04 16:13:06 -03:00
  • 1442677f92
    common : refactor cli arg parsing (#7675) b3087 Georgi Gerganov 2024-06-04 21:23:39 +03:00
  • 554c247caf
    ggml : remove OpenCL (#7735) b3086 Georgi Gerganov 2024-06-04 21:23:20 +03:00
  • 0cd6bd3483
    llama : remove beam search (#7736) b3085 Georgi Gerganov 2024-06-04 21:23:05 +03:00
  • 6a0f3db79d
    Merge branch 'master' into per-token-attribs jaime-m-p 2024-06-04 19:11:43 +02:00
  • f95c7dc710 Fix per token atrributes bits jaime-m-p 2024-06-04 19:00:06 +02:00
  • 3461041842
    Merge 06c3a95c02 into 5ca0944a15 Vaibhav Srivastav 2024-06-04 16:59:17 +00:00
  • 06c3a95c02 [ci] add LLAMA_CURL flags to the prebuilt binaries via the CI Vaibhav Srivastav 2024-06-04 18:54:17 +02:00
  • 5ca0944a15
    readme : remove obsolete Zig instructions (#7471) b3084 Georgi Gerganov 2024-06-04 19:43:01 +03:00
  • 4674918e53
    Merge c755bd6223 into adc9ff3841 Sourabrata Bose 2024-06-04 11:03:52 -05:00
  • f99de4fa8e server: update cache_prompt documentation [no ci] Johannes Gäßler 2024-06-04 17:09:21 +02:00
  • 4bce30cc0e fix: fix comments Joan Martinez 2024-06-04 17:08:47 +02:00
  • 0fc775ed90 Merge branch 'master' of https://github.com/JoanFM/llama.cpp into feat-jina-v2-base-code Joan Martinez 2024-06-04 17:02:13 +02:00
  • 48b860a2b2
    Merge branch 'ggerganov:master' into patches Robert Sinclair 2024-06-04 17:48:51 +03:00
  • e79bc7ef4d doc: Made PR template only contain a line that a collaborator has read the contributing guidelines Nicolas Perez 2024-06-04 10:21:25 -04:00
  • 3de6557bdf doc: Moved all PR template information to CONTRIBUTING.md Nicolas Perez 2024-06-04 10:20:47 -04:00
  • 0085f94936
    server : add /v1/completion endpoint gg/server-v1-completion Georgi Gerganov 2024-06-04 15:58:14 +03:00
  • adc9ff3841
    llama-bench : allow using a different printer for stderr with -oe (#7722) b3083 slaren 2024-06-04 14:32:42 +02:00
  • a2fb00de88 Allow number of nodes in CUDA graph to change Alan Gray 2024-06-04 05:26:57 -07:00
  • 987d743d6b
    Improve hipBLAS support in CMake (#7696) b3082 Daniele 2024-06-04 12:09:15 +00:00
  • e87c104dfd
    common : passkey use gpt_params Georgi Gerganov 2024-06-04 14:53:54 +03:00
  • fb1e7b08e8
    llama : remove beam search Georgi Gerganov 2024-06-04 14:29:38 +03:00
  • b226c1227b
    refine .gitignore (#7688) zhouwg 2024-06-04 19:21:26 +08:00
  • a0510370af
    ggml : remove OpenCL Georgi Gerganov 2024-06-04 14:12:16 +03:00
  • 4df81854ab
    common : remove chatml and instruct params Georgi Gerganov 2024-06-04 13:08:17 +03:00
  • ea4665e6ef
    common : change defaults for escape and n_ctx Georgi Gerganov 2024-06-04 13:04:31 +03:00
  • a8fe2516b3 Fix encoding in python scripts Galunid 2024-06-04 11:43:09 +02:00