Find a file
2024-02-28 12:44:01 -08:00
.devops nix: init singularity and docker images (#5056) 2024-02-22 11:44:10 -08:00
.github/workflows Add XCFramework building 2024-02-28 12:19:44 -08:00
ci ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) 2024-02-28 21:44:21 +02:00
cmake cmake : MSVC instruction detection (fixed up #809) (#3923) 2023-11-05 10:03:09 +02:00
common Fix external parameters 2024-02-28 12:44:01 -08:00
docs docs : add llama-star arch idea 2023-11-23 11:35:04 +02:00
examples server : hit Ctrl+C twice to exit (#5734) 2024-02-28 10:55:37 +02:00
gguf-py llama : add gemma model (#5631) 2024-02-21 15:08:22 +02:00
grammars Fix some documentation typos/grammar mistakes (#4032) 2023-11-11 23:04:58 -07:00
kompute@d1e3b0953c Add XCFramework building 2024-02-28 12:19:44 -08:00
kompute-shaders Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
llama.xcframework Add XCFramework building 2024-02-28 12:19:44 -08:00
llama.xcodeproj Add XCFramework building 2024-02-28 12:19:44 -08:00
llama.xcworkspace Add XCFramework building 2024-02-28 12:19:44 -08:00
media media : add logos and banners 2023-04-05 18:58:31 +03:00
models gpt2 : Add gpt2 architecture integration (#4555) 2023-12-28 15:03:57 +01:00
pocs ggml : add mmla kernels for quantized GEMM (#4966) 2024-02-11 15:22:33 +02:00
prompts llama : add Qwen support (#4281) 2023-12-01 20:16:31 +02:00
requirements python : add check-requirements.sh and GitHub workflow (#4585) 2023-12-29 16:50:29 +02:00
scripts sync : ggml 2024-02-28 11:17:32 +02:00
spm-headers swift : package no longer use ggml dependency (#5465) 2024-02-12 19:54:29 +02:00
swift Add XCFramework building 2024-02-28 12:19:44 -08:00
tests IQ4_XS: a 4.25 bpw quantization (#5747) 2024-02-27 16:34:24 +02:00
.clang-tidy fix some warnings from gcc and clang-tidy (#3038) 2023-09-07 13:22:29 -04:00
.dockerignore docker : ignore Git files (#3314) 2023-10-02 11:53:53 +03:00
.ecrc Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
.editorconfig llama.swiftui : add bench functionality (#4483) 2023-12-17 19:38:41 +02:00
.flake8 Add support for BERT embedding models (#5423) 2024-02-11 11:21:38 -05:00
.gitignore Add XCFramework building 2024-02-28 12:19:44 -08:00
.gitmodules Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
.pre-commit-config.yaml hooks : setting up flake8 and pre-commit hooks (#1681) 2023-06-17 13:32:48 +03:00
build.zig server : support llava 1.6 (#5553) 2024-02-20 21:07:22 +02:00
CMakeLists.txt cmake : fix compilation for Android armeabi-v7a (#5702) 2024-02-25 12:53:11 +02:00
codecov.yml cov : disable comment in PRs (#2989) 2023-09-03 13:19:01 +03:00
convert-hf-to-gguf.py py : fix StableLM conversion after config.json changes (#5703) 2024-02-25 11:54:04 +02:00
convert-llama-ggml-to-gguf.py convert : partially revert PR #4818 (#5041) 2024-01-20 18:14:18 -05:00
convert-lora-to-ggml.py add safetensors support to convert-lora-to-ggml.py (#5062) 2024-01-21 17:28:14 +01:00
convert-persimmon-to-gguf.py py : fix persimmon n_rot conversion (#5460) 2024-02-12 19:29:57 +02:00
convert.py llava : support v1.6 (#5267) 2024-02-14 09:38:35 +02:00
flake.lock flake.lock: Update 2024-02-25 22:24:22 +00:00
flake.nix nix: now that we can do so, allow MacOS to build Vulkan binaries 2024-02-19 14:49:49 -08:00
ggml-alloc.c ggml-alloc : apply ggml/731 2024-02-19 15:09:43 +02:00
ggml-alloc.h sync : ggml (#5452) 2024-02-12 09:16:06 +02:00
ggml-backend-impl.h Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-backend.c Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-backend.h Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-cuda.cu Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-cuda.h ggml : introduce GGML_CALL function annotation (#4850) 2024-01-16 13:16:33 +02:00
ggml-impl.h ggml : always define ggml_fp16_t as uint16_t (#5666) 2024-02-22 23:21:39 +02:00
ggml-kompute.cpp Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-kompute.h Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
ggml-metal.h metal : add debug capture backend function (ggml/694) 2024-01-30 16:20:25 +02:00
ggml-metal.m Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-metal.metal ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) 2024-02-28 10:37:02 +02:00
ggml-mpi.c ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) 2023-07-11 19:31:10 +03:00
ggml-mpi.h mpi : add support for distributed inference via MPI (#2099) 2023-07-10 18:49:56 +03:00
ggml-opencl.cpp code : normalize enum names (#5697) 2024-02-25 12:09:09 +02:00
ggml-opencl.h Add OpenCL add kernel (#5151) 2024-01-26 23:07:32 +01:00
ggml-quants.c ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) 2024-02-28 10:37:02 +02:00
ggml-quants.h ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) 2024-02-28 10:37:02 +02:00
ggml-sycl.cpp Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml-sycl.h add --no-mmap in llama-bench (#5257) 2024-02-01 20:48:53 +01:00
ggml-vulkan-shaders.hpp Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) 2024-02-03 18:15:00 +01:00
ggml-vulkan.cpp make portability_enumeration_ext apple only (#5757) 2024-02-28 20:33:37 +01:00
ggml-vulkan.h Basic Vulkan Multi-GPU implementation (#5321) 2024-02-07 07:54:50 +01:00
ggml.c add google magika inference example (ggml/748) 2024-02-28 11:17:06 +02:00
ggml.h Introduce backend GUIDs (ggml/743) 2024-02-28 11:17:05 +02:00
ggml_vk_generate_shaders.py vulkan: Set limit for task concurrency (#5427) 2024-02-09 19:30:19 +01:00
LICENSE Add LICENSE (#21) 2023-03-12 08:36:03 +02:00
llama-module.modulemap Add XCFramework building 2024-02-28 12:19:44 -08:00
llama.cpp llama : remove deprecated API (#5770) 2024-02-28 18:43:38 +02:00
llama.h llama : remove deprecated API (#5770) 2024-02-28 18:43:38 +02:00
Makefile Makefile: use variables for cublas (#5689) 2024-02-27 03:03:06 +01:00
mypy.ini convert : partially revert PR #4818 (#5041) 2024-01-20 18:14:18 -05:00
Package.swift Add XCFramework building 2024-02-28 12:19:44 -08:00
README-sycl.md readme : fix typo in README-sycl.md (#5353) 2024-02-19 12:37:10 +02:00
README.md Add XCFramework building 2024-02-28 12:19:44 -08:00
requirements.txt python : add check-requirements.sh and GitHub workflow (#4585) 2023-12-29 16:50:29 +02:00
unicode.h llama : improve BERT tokenization (#5740) 2024-02-28 10:51:11 +02:00

Stanford BDHG llama.cpp

llama

License: MIT

Overview

This project is a Stanford BDHG-maintained fork of the well-regarded llama.cpp, tailored for deploying LLaMA models using C/C++. Our modifications package the library as an XCFramework for distribution as a binary compatible with multiple platforms. The inclusion of a Package.swift file facilitates the integration with the Swift Package Manager (SPM).

Note

Should you have inquiries regarding the llama.cpp codebase this fork builds upon, please refer to the upstream llama.cpp README for comprehensive details and guidance.

Setup

Add Stanford BDHG llama.cpp as a Dependency

You need to add Stanford BDHG llama.cpp Swift package to your app in Xcode or Swift package.

Important

Important: In order to use the library, one needs to set build parameters in the consuming Xcode project or the consuming SPM package to enable the Swift / C++ Interop, introduced in Xcode 15 and Swift 5.9. Keep in mind that this is true for nested dependencies, one needs to set this configuration recursivly for the entire dependency tree towards the llama.cpp SPM package.

For Xcode projects:

  • Open your project settings in Xcode by selecting PROJECT_NAME > TARGET_NAME > Build Settings.
  • Within the Build Settings, search for the C++ and Objective-C Interoperability setting and set it to C++ / Objective-C++. This enables the project to use the C++ headers from llama.cpp.

For SPM packages:

  • Open the Package.swift file of your SPM package
  • Within the package target that consumes the llama.cpp package, add the interoperabilityMode(_:) Swift build setting like that:
/// Adds the dependency to the Stanford BDHG llama.cpp SPM package
dependencies: [
    .package(url: "https://github.com/StanfordBDHG/llama.cpp", .upToNextMinor(from: "0.1.0"))
],
targets: [
  .target(
      name: "ExampleConsumingTarget",
      /// State the dependence of the target to llama.cpp
      dependencies: [
          .product(name: "llama", package: "llama.cpp")
      ],
      /// Important: Configure the `.interoperabilityMode(_:)` within the `swiftSettings`
      swiftSettings: [
          .interoperabilityMode(.Cxx)
      ]
  )
]

Contributing

Contributions to this project are welcome. Please make sure to read the contribution guidelines and the contributor covenant code of conduct first. You can find a list of contributors in the CONTRIBUTORS.md file.

License

This project is a fork of an existing project that is licensed under the MIT License, and all changes made in this fork continue to be under the MIT License. For more information about the license terms, see the Licenses folder.

Our Research

For more information, check out our website at biodesigndigitalhealth.stanford.edu.

Stanford Byers Center for Biodesign Logo Stanford Byers Center for Biodesign Logo