Commit graph

  • 29a5a94d5c
    fixes Behnam M 2024-01-10 12:07:56 -05:00
  • a4b26cc635
    fixes Behnam M 2024-01-10 11:59:43 -05:00
  • 49eb32e952
    fixes Behnam M 2024-01-10 11:54:58 -05:00
  • f0b71d5da7 Cleanup Iwan Kawrakow 2024-01-10 18:45:35 +02:00
  • 055a0c2e12 imatrix: WIP Iwan Kawrakow 2024-01-10 18:25:26 +02:00
  • 675819df2b imatrix: 1st version Iwan Kawrakow 2024-01-10 17:25:44 +02:00
  • c10bbda436
    Merge bb13098206 into 57d016ba2d peturparkur 2024-01-10 14:39:35 +00:00
  • 76e1fd0f45 print graph luffy06 2024-01-10 22:36:50 +08:00
  • 57d016ba2d
    llama : add additional suffixes for model params (#4834) b1808 Brian 2024-01-11 01:09:53 +11:00
  • 50cfa1fc36
    minor : cleanup trailing whitespaces Georgi Gerganov 2024-01-10 16:09:20 +02:00
  • 329ff61569
    llama : recognize 1B phi models (#4847) b1807 Austin 2024-01-10 08:39:09 -05:00
  • d34633d8db
    clip : support more quantization types (#4846) b1806 John 2024-01-10 14:37:09 +01:00
  • 74066f8c41
    Apply suggestions from code review slaren 2024-01-10 13:27:19 +01:00
  • a1610b05b2 iq2_xs: had forgotten to delete iq2-data.h Iwan Kawrakow 2024-01-10 13:47:42 +02:00
  • 3cd0cbb1b5
    metal : page align the data ptr (#4854) Georgi Gerganov 2024-01-10 11:50:10 +02:00
  • 8299b03a99 iq2_xs: faster AVX2 dit product Iwan Kawrakow 2024-01-10 11:33:23 +02:00
  • d2bf47987a test setting arch to all major Concedo 2024-01-10 17:22:51 +08:00
  • ec0859d85d
    metal : page align the data ptr Georgi Gerganov 2024-01-10 11:02:58 +02:00
  • 9b59fb66e6
    Merge 9c3e7e0c77 into 4f56458d34 peturparkur 2024-01-10 10:33:55 +02:00
  • 07a1b052e5
    llama : on Metal, by default offload the full model Georgi Gerganov 2024-01-10 10:15:36 +02:00
  • 0526cc5d72 Revert "CUDA: faster softmax via shared memory + fp16 math (#4742)" Concedo 2024-01-10 16:06:48 +08:00
  • c6879f3fca Merge branch 'master' into concedo_experimental Concedo 2024-01-10 16:05:14 +08:00
  • 3198e94f00 iq2_xs: AVX2 dot product - 19.5 t/s Iwan Kawrakow 2024-01-10 08:49:38 +02:00
  • 2fc0c248fe
    Update server.cpp Behnam M 2024-01-10 00:25:45 -05:00
  • 139cdfc0de
    Update server.cpp Behnam M 2024-01-10 00:24:32 -05:00
  • 58ad3c3ad2
    starting http server before initializing the model Behnam M 2024-01-10 00:20:33 -05:00
  • d59c119f0e
    fixed a typo Behnam M 2024-01-10 00:17:33 -05:00
  • 675e67fe89
    initialized server_state Behnam M 2024-01-10 00:16:47 -05:00
  • 03d7ff0777
    Better handling of server state Behnam M 2024-01-10 00:14:36 -05:00
  • 0a2dc7559f
    added comments on the additional /health endpoint Behnam M 2024-01-09 23:30:57 -05:00
  • d17debd2c7
    added /health endpoint to the server Behnam M 2024-01-09 23:21:09 -05:00
  • 3cb1c1fb4e Merge remote-tracking branch 'origin/master' into sl/backend-sched slaren 2024-01-10 01:08:19 +01:00
  • 5d2dffcf48 cuda : only use batched_cublas with batched mat muls (fixes fp16 tg perf) slaren 2024-01-10 01:07:56 +01:00
  • 4f56458d34
    Python script to compare commits with llama-bench (#4844) Johannes Gäßler 2024-01-10 01:04:33 +01:00
  • f726780214
    Merge branch 'master' into outfile-default-name-change Brian 2024-01-10 10:51:15 +11:00
  • 6324c528d1
    Merge branch 'master' into name-metadata-fix Brian 2024-01-10 10:50:47 +11:00
  • c410ab8fd7 param.path_model is a Path() not a string brian khuu 2024-01-10 10:46:41 +11:00
  • 29839d30ce Python script to compare commits with llama-bench JohannesGaessler 2024-01-09 21:26:26 +01:00
  • 6da36b0d5b
    Enhanced model type determination in llama.cpp to include Phi-1 and Phi-1.5 models. teleprint-me 2024-01-09 18:42:21 -05:00
  • c359c843fd
    Update llama.cpp model param log Brian 2024-01-10 10:37:44 +11:00
  • ed3b0cda4b
    quantization updates John 2024-01-10 00:10:33 +01:00
  • 2e7814a8c7 Merge remote-tracking branch 'origin/master' into sl/backend-sched slaren 2024-01-09 20:42:51 +01:00
  • 2f2b3e443a
    chore: Apply flake8 formatting rules teleprint-me 2024-01-09 16:01:26 -05:00
  • 46536a4cb8
    Merge branch 'master' into phi-1 teleprint-me 2024-01-09 14:44:26 -05:00
  • fa7620116e opencl : add ggml-backend buffer type slaren 2024-01-09 03:14:16 +01:00
  • 6efb8eb30e
    convert.py : fix vanilla LLaMA model conversion (#4818) b1804 Austin 2024-01-09 13:46:46 -05:00
  • 7cfcee408d
    py : suggest hint for missing vocab size Georgi Gerganov 2024-01-09 20:44:22 +02:00
  • 52ea3f7930 iq2_xs: better ARM_NEON dot product Iwan Kawrakow 2024-01-09 19:43:39 +01:00
  • 90582b7341
    py : fix outfile and outtype Georgi Gerganov 2024-01-09 20:40:11 +02:00
  • 787860ada2
    refactor: Revise check_vocab_size for Enhanced Clarity and Correctness teleprint-me 2024-01-09 13:30:35 -05:00
  • 36e5a08b20
    llava-cli : don't crash if --image flag is invalid (#4835) b1803 Justine Tunney 2024-01-09 09:59:14 -08:00
  • 0344b6a692
    llava-cli : don't crash if --image flag is invalid Justine Tunney 2024-01-09 03:14:30 -08:00
  • 4dccb38d9a
    metal : improve dequantize precision to match CPU (#4836) Georgi Gerganov 2024-01-09 19:37:08 +02:00
  • ff49d876c6 iq2_xs: working, but dog slow, ARM_NEON dot product Iwan Kawrakow 2024-01-09 18:36:45 +01:00
  • 55e2cae83f iq2_xs: Metal now works Iwan Kawrakow 2024-01-09 18:22:20 +01:00
  • 9a818f7c42
    scripts : improve get-pg.sh (#4838) Georgi Gerganov 2024-01-09 19:20:45 +02:00
  • dd1c1004f8
    chore: Apply flake8 formatting rules teleprint-me 2024-01-09 12:14:14 -05:00
  • 29abd8d46c
    Revert to commit 0614c33 teleprint-me 2024-01-09 11:52:41 -05:00
  • 0aacd55159 iq2_xs: WIP Metal Iwan Kawrakow 2024-01-09 17:46:27 +01:00
  • 18adb4e9bb
    readme : add 3rd party collama reference to UI list (#4840) b1800 iohub 2024-01-10 00:45:54 +08:00
  • 9b6e38d8c0 iq2_xs: CUDA and scalar CPU works Iwan Kawrakow 2024-01-09 18:19:02 +02:00
  • 9f21b82e4b iq2_xs: this should have been in the basics Iwan Kawrakow 2024-01-08 20:18:02 +02:00
  • 3569fa3fe3 iq2_xs: basics Iwan Kawrakow 2024-01-08 20:05:00 +02:00
  • 9e943bd8b4
    Add collama reference to UI list iohub 2024-01-09 23:02:51 +08:00
  • d9653894df
    scripts : script to get Paul Graham essays in txt format (#4838) Georgi Gerganov 2024-01-09 16:23:05 +02:00
  • 904855f3d6
    scripts : script to get Paul Graham essays in txt format Georgi Gerganov 2024-01-09 15:54:43 +02:00
  • 5917276d32
    Merge branch 'master' into gg/metal-feature-set Georgi Gerganov 2024-01-09 14:41:27 +02:00
  • ef8ba1271a
    metal : improve dequantize precision to match CPU Georgi Gerganov 2024-01-09 14:24:51 +02:00
  • 693f6493e5 fixed a bug that resulted in the program hanging Concedo 2024-01-09 18:12:58 +08:00
  • faee132150 llm_load_print_meta: Add additional suffixs for model params brian khuu 2024-01-09 21:03:36 +11:00
  • 128de3585b
    server : update readme about token probs (#4777) Behnam M 2024-01-09 05:02:05 -05:00
  • 3361d29550
    minor : fix trailing whitespace Georgi Gerganov 2024-01-09 12:01:14 +02:00
  • 66533c8424 Merge branch 'master' into concedo_experimental Concedo 2024-01-09 17:48:18 +08:00
  • 24096933b0
    server : try to fix infill when prompt is empty gg/server-infill-empty-prompt-4027 Georgi Gerganov 2024-01-09 11:27:29 +02:00
  • 45a3a9dd20
    Merge 2927cca611 into 8c58330318 Jonas Templestein 2024-01-09 11:13:32 +02:00
  • 8c58330318
    server : add api-key flag to documentation (#4832) Zsapi 2024-01-09 10:12:43 +01:00
  • df20382206 sync slider Concedo 2024-01-09 16:50:23 +08:00
  • 18c2e1752c
    ggml : fix vld1q_s8_x4 32-bit compat (#4828) b1796 Georgi Gerganov 2024-01-09 10:42:06 +02:00
  • d752946a04
    Server docs: add api-key flag to documentation Zsapi 2024-01-09 09:41:46 +01:00
  • 7216af5c09
    ggml : fix 32-bit ARM compat (cont) gg/fix-vld1q_s8_x4-4872 Georgi Gerganov 2024-01-09 10:33:16 +02:00
  • 8f900abfc0
    CUDA: faster softmax via shared memory + fp16 math (#4742) b1795 Johannes Gäßler 2024-01-09 08:58:55 +01:00
  • 5cc64ebb52 dynatemp wizard Concedo 2024-01-09 15:51:32 +08:00
  • 6b95a61aca
    simplified the completion_probabilities JSON schema Behnam M 2024-01-09 00:42:39 -05:00
  • 3609ad3afa Updated Models Layout isaiah 2024-01-08 18:04:33 -07:00
  • a6ba4e2bb0 trimmed trailing white space isaiah 2024-01-08 16:55:55 -07:00
  • a522f7e90f Simplify tensor allocation logic. Will Findley 2024-01-08 16:22:05 -06:00
  • 27afe29927
    ggml : fix vld1q_s8_x4 32-bit compat Georgi Gerganov 2024-01-08 23:45:24 +02:00
  • 3959283eed Merge commit '31f27758fa' into ceb/nomic-vulkan Jared Van Bortel 2024-01-08 15:57:12 -05:00
  • 8b65f4c5e5 Merge commit 'bcc0eb4591' into ceb/nomic-vulkan Jared Van Bortel 2024-01-08 15:50:18 -05:00
  • c02696846a Updated Models Layout isaiah 2024-01-08 12:52:46 -07:00
  • 44b1a97a15 kompute : fix -Wunused-private-field warnings from clang Jared Van Bortel 2023-12-11 13:04:43 -05:00
  • 1fc2f265ff
    common : fix the short form of --grp-attn-w, not -gat (#4825) b1794 howlger 2024-01-08 20:05:53 +01:00
  • e78b53c51f
    Fix typo: the short form of --grp-attn-w is -gaw, not -gat howlger 2024-01-08 20:00:31 +01:00
  • a9a8c5de3d
    readme : add link to SOTA models Georgi Gerganov 2024-01-08 20:25:17 +02:00
  • 4ed5f621be llama : only map to a backend buffer the region of the file mapping containing the tensors used in the buffer slaren 2024-01-08 18:17:18 +01:00
  • 11583c1462 llama : rewrite lora with ggml-backend and compute on CPU slaren 2024-01-08 17:12:12 +01:00
  • dd5ae06405
    SOTA 2-bit quants (#4773) b1792 Kawrakow 2024-01-08 16:02:32 +01:00
  • bad5f7f33d PR suggestion Iwan Kawrakow 2024-01-08 16:58:37 +02:00
  • 668b31fc7d
    swift : exclude ggml-metal.metal from the package (#4822) b1791 Georgi Gerganov 2024-01-08 16:40:51 +02:00
  • 5d64a0c015 fixup! CUDA: faster softmax via shared memory + fp16 math JohannesGaessler 2024-01-08 15:22:05 +01:00