Commit graph

  • cef2c97c03 lora : raise error if lm_head is ignored Xuan Son Nguyen 2024-08-20 15:06:20 +02:00
  • e85e232439
    Merge branch 'ggerganov:master' into vulkan Changyeon Kim 2024-08-20 21:35:35 +09:00
  • 065a9d8438 [fix] Use nb1 and nb2 for dst. Changyeon Kim 2024-08-20 21:26:09 +09:00
  • 806c5a4e5b Remove additional snippets for CI/CD issues with python constants.py script Srihari-mcw 2024-08-20 04:02:17 -07:00
  • b0c6ad778d avoid relying on 'logits_all == true' in perplexity_v2 Jia Liu 2024-08-20 17:16:27 +08:00
  • 9262dcbb0a test: comment new test_cases for only local testing zhentaoyu 2024-08-20 17:14:13 +08:00
  • 4f5c138abd test: make new cases only in sycl zhentaoyu 2024-08-19 08:44:33 +00:00
  • f351ac5c05 test: add im2col and convert test cases zhentaoyu 2024-08-19 08:07:23 +00:00
  • 8bd46e8450 sycl: move downsample global_range into common zhentaoyu 2024-08-16 05:45:55 +00:00
  • df3f1c1850 sycl:refine convert zhentaoyu 2024-08-16 01:46:54 +00:00
  • bd960a67dc sycl: fix ib in dmmv zhentaoyu 2024-08-13 06:59:59 +00:00
  • 3ecfbcfaf1 sycl: fix convert and dequantize zhentaoyu 2024-08-12 09:19:03 +00:00
  • 9a9f7c959c sycl: fix convert overflow zhentaoyu 2024-08-12 08:52:09 +00:00
  • d36d6547aa sycl: fix im2col overflow and sync with cuda zhentaoyu 2024-08-12 08:18:20 +00:00
  • 90db8146d5
    tests : add missing comma in grammar integration tests (#9099) b3606 fairydreaming 2024-08-20 11:09:55 +02:00
  • a9fdcfd138 squash! llama : std::move llm_bigram_bpe from work_queue Daniel Bevenius 2024-08-20 09:55:31 +01:00
  • 71fe906453 tests : added missing comma in grammar integration tests Stanisław Szymczyk 2024-08-20 09:57:46 +02:00
  • 7c332dc5bd Update ggml/src/ggml-sycl.cpp Meng, Hengyu 2024-08-20 09:59:10 +08:00
  • 299412e6bc mmvq in cuda path Meng, Hengyu 2024-08-19 05:42:49 +00:00
  • 5b6d224695 fallback mmvq to mul_mat Meng, Hengyu 2024-08-19 03:50:48 +00:00
  • 43c7be57c1 Add the BF16 delta data types in constants.py Srihari-mcw 2024-08-12 06:02:16 -07:00
  • 7927655e42 Update the type name in llama.cpp Srihari-mcw 2024-08-06 01:36:54 -07:00
  • 4a2f703fbb Add changes to use union data type for better conversion to strong type - Based on 5f2e011e2eed2f685521c707b3e74280fcb81dd3 from llamafile Srihari-mcw 2024-08-01 23:57:12 -07:00
  • 29c5129ee2
    squash! llama : std::move llm_bigram_bpe from work_queue Daniel Bevenius 2024-08-20 05:50:55 +02:00
  • 7323304092 Add llmaz as another platform to run llama.cpp on Kubernetes kerthcet 2024-08-20 10:43:41 +08:00
  • 6bee7985b5 Merge branch 'master' into dev-refactoring hongruichen 2024-08-20 10:23:55 +08:00
  • dedadf2a20
    Fixed a bug where debug code was included in the release, resulting i… (#1) みゃん 2024-08-20 11:20:23 +09:00
  • fddff02915 Rework IQ3_XXS and IQ3_XS Nexesenex 2024-08-19 01:43:31 +02:00
  • 207ffe681f Reorder, corrections, settling lower IQ3 quants Nexesenex 2024-08-18 23:28:13 +02:00
  • 8c1a3c5ba2 Merge branch 'master' into pr/8836 Nexesenex 2024-08-20 00:48:05 +02:00
  • a7f91643bb Fix mistake Nexesenex 2024-08-19 16:02:00 +02:00
  • 3491291a32 llama : quantize more Mamba tensors Francis Couture-Harpin 2024-08-19 12:44:35 -04:00
  • cfac111e2b
    cann: add doc for cann backend (#8867) wangshuai09 2024-08-19 16:46:38 +08:00
  • c109a90fb5 cann: add doc for cann backend wangshuai09 2024-07-27 06:34:10 +00:00
  • 1b6ff90ff8
    rpc : print error message when failed to connect endpoint (#9042) b3604 Radoslav Gerganov 2024-08-19 10:11:45 +03:00
  • 18eaf29f4c
    rpc : prevent crashes on invalid input (#9040) b3603 Radoslav Gerganov 2024-08-19 10:10:21 +03:00
  • b31dc0b5ed Merge branch 'master' into yutong/dev get the latest update from upstream::main Yutong Dai 2024-08-19 00:19:16 +00:00
  • caeb839ae3 Boost embeddings and output weights for MOEs. Nexesenex 2024-08-18 17:58:17 +02:00
  • 503048a197 Correct IQ3_M Nexesenex 2024-08-18 17:44:11 +02:00
  • ddb13732c4 IQ3_XXL and IQ3_XXXL Nexesenex 2024-08-18 16:56:55 +02:00
  • a79633b49e Merge branch 'master' into pr/8836 Nexesenex 2024-08-18 22:12:39 +02:00
  • b02eaf6803 Mass use of the few/some/more/many bits bump logic Nexesenex 2024-08-17 14:58:25 +02:00
  • 6a7e28903f llava : zero-initialize clip_ctx structure fields with aggregate initialization Stanisław Szymczyk 2024-08-18 20:23:01 +02:00
  • 5c0f108e15
    Update src/llama.cpp Younes Belkada 2024-08-18 20:12:26 +04:00
  • 7aeccbb7dd fix: correct print format younesbelkada 2024-08-18 16:00:18 +00:00
  • 78ad84f0ff fix: correct printf format for bool younesbelkada 2024-08-18 15:48:10 +00:00
  • ca4db9e551 fix: add dt_b_c_rms in llm_load_print_meta younesbelkada 2024-08-18 15:45:24 +00:00
  • bf5e344056 add in operator younesbelkada 2024-08-18 15:37:42 +00:00
  • 57c3eb4115
    Apply suggestions from code review Younes Belkada 2024-08-18 19:35:57 +04:00
  • 554b049068
    flake.lock: Update (#9068) Georgi Gerganov 2024-08-18 17:43:32 +03:00
  • d637bb97bc fix: change name younesbelkada 2024-08-18 14:42:40 +00:00
  • 4553502d4a
    Update gguf-py/gguf/gguf_writer.py Younes Belkada 2024-08-18 18:41:42 +04:00
  • 9e22bb7ed6 fix: lint younesbelkada 2024-08-18 14:20:37 +00:00
  • f7d2e9105f fix: add more cleanup and harmonization younesbelkada 2024-08-18 14:17:33 +00:00
  • 60e6e2af36
    llava : fix occasional undefined behavior crash Justine Tunney 2024-08-18 07:16:18 -07:00
  • 349426546b
    Update convert_hf_to_gguf.py Younes Belkada 2024-08-18 18:15:40 +04:00
  • 184a4c676f fix: address comments younesbelkada 2024-08-18 14:14:17 +00:00
  • a8109e352e
    Update src/llama.cpp Younes Belkada 2024-08-18 18:12:04 +04:00
  • 343b5836c1
    Update src/llama.cpp Younes Belkada 2024-08-18 18:11:21 +04:00
  • bb668b608e
    ggml : make GeLU more accurate on CPU Justine Tunney 2024-08-05 08:50:50 -07:00
  • b97704c9a0 refactor: better refactor younesbelkada 2024-08-18 12:02:36 +00:00
  • bfa0286866 fix: lint younesbelkada 2024-08-18 11:40:57 +00:00
  • 59a08be755 feat: initial support for llama.cpp younesbelkada 2024-08-18 11:25:08 +00:00
  • 2339a0be1c
    tests : add integration test for lora adapters (#8957) ltoniazzi 2024-08-18 10:58:04 +01:00
  • d6f7b8f687 minor code style changes Xuan Son Nguyen 2024-08-18 11:43:48 +02:00
  • 823948cbb8
    squash! llama : std::move llm_bigram_bpe from work_queue Daniel Bevenius 2024-08-18 10:44:48 +02:00
  • 3a7d69b8e9 fix library is missing during dynamic compiling Aisuko 2024-08-18 17:29:38 +10:00
  • 4ba561808d Adapt token embeddings and output.weight to vocab size Nexesenex 2024-08-17 12:31:36 +02:00
  • 17b71512a6 Update IQ3_M attn_k and IQ3_XL token_embd Nexesenex 2024-08-17 00:17:41 +02:00
  • e4c506d794 Merge branch 'master' into pr/8836 Nexesenex 2024-08-18 04:09:22 +02:00
  • e51864cbd9 flake.lock: Update github-actions[bot] 2024-08-18 00:20:23 +00:00
  • 2fb9267887
    Fix incorrect use of ctx_split for bias tensors (#9063) b3600 Yoshi Suhara 2024-08-17 06:34:21 -07:00
  • b1c163b47e Fix incorrect use of ctx_split for bias tensors Yoshi Suhara 2024-08-16 23:27:40 -07:00
  • 6c6db7bcc5
    llama : std::move llm_bigram_bpe from work_queue Daniel Bevenius 2024-08-17 07:06:30 +02:00
  • 9127800d83 wip sl/prepare-next-graph slaren 2024-08-17 01:51:06 +02:00
  • 8b3befc0e2
    server : refactor middleware and /health endpoint (#9056) b3599 Xuan Son Nguyen 2024-08-16 17:19:05 +02:00
  • a4b63194cd update server docs Xuan Son Nguyen 2024-08-16 15:37:26 +02:00
  • d565bb2fd5
    llava : support MiniCPM-V-2.6 (#8967) b3598 tc-mb 2024-08-16 21:34:41 +08:00
  • 9a8f0508eb Add printing to check weights match torch version ltoniazzi 2024-08-09 11:17:26 +01:00
  • ee2984bdaf
    py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928) Farbod Bijary 2024-08-16 14:06:30 +03:30
  • 68655e3999 llama : fix llama_split_mode enum values in main_gpu document Sutou Kouhei 2024-08-16 19:23:14 +09:00
  • ab3d8ec45b fix CI Xuan Son Nguyen 2024-08-16 11:52:52 +02:00
  • 57d3589769 fix server tests Xuan Son Nguyen 2024-08-16 11:50:00 +02:00
  • cbb5dd7b12 change batch.logits to batch.output Jia Liu 2024-08-16 16:42:44 +08:00
  • c8ddce8560
    Fix inference example lacks required parameters (#9035) Aisuko 2024-08-16 19:08:59 +10:00
  • a53a59c82f
    Update examples/server/server.cpp Xuan Son Nguyen 2024-08-16 11:01:04 +02:00
  • 4af74d81f8 move "fail_on_no_slot" to /slots Xuan Son Nguyen 2024-08-16 10:45:55 +02:00
  • ed4fcf92ff squash! llama : suppress conversion from 'size_t' to 'int' Daniel Bevenius 2024-08-16 09:41:00 +01:00
  • b337a7bf99 server : refactor middleware and /health endpoint Xuan Son Nguyen 2024-08-16 10:28:10 +02:00
  • 23fd453544
    gguf-py : bump version from 0.9.1 to 0.10.0 (#9051) b3595 compilade 2024-08-16 02:36:11 -04:00
  • c679e0cb5c
    llama : add EXAONE model support (#9025) Minsoo Cheong 2024-08-16 15:35:18 +09:00
  • fb487bb567
    common : add support for cpu_get_num_physical_cores() on Windows (#8771) b3593 Liu Jia 2024-08-16 14:23:12 +08:00
  • 346a97f9c9
    fix space Minsoo Cheong 2024-08-16 15:14:37 +09:00
  • 0cd67f88b9 add EXAONE to supported models in README.md Minsoo Cheong 2024-08-16 15:02:21 +09:00
  • 47c01aa24e fix the error in scripts Erhu Feng 2024-08-16 13:59:07 +08:00
  • 01040049a8 fix lint Minsoo Cheong 2024-08-16 14:31:40 +09:00
  • 4c401e510f add exaone pre-tokenizer in llama-vocab.cpp Minsoo Cheong 2024-08-16 14:26:47 +09:00
  • 98ad475fbe add ftype Minsoo Cheong 2024-08-16 13:56:19 +09:00
  • 78c6e6cd16 fix whitespace Minsoo Cheong 2024-08-15 16:27:30 +09:00
  • cd2b34e063 add chat template Minsoo Cheong 2024-08-14 19:12:14 +09:00