Commit graph

  • 8d0dc476c9 llama2.c: convert special-cased "<0xXX>" single byte tokens from tokenizer.bin ochafik 2023-08-23 19:56:16 +01:00
  • 630d8b408a
    llama : default special tokens based on vocab type Georgi Gerganov 2023-08-23 21:39:09 +03:00
  • 8c6d3939c7 cuda : add TODOs for RoPE NeoX implementation Georgi Gerganov 2023-08-23 21:32:12 +03:00
  • 71d05b9ae4 remove atomics and add dynamic log target staviq 2023-08-23 20:27:20 +02:00
  • 7df517c797
    update finetune README xaedes 2023-08-23 20:08:48 +02:00
  • 1a5f0a30e0
    add command line option --rank-wo N for rank of wo tensor xaedes 2023-08-23 20:00:48 +02:00
  • f8ee54bd2c
    llama : revert BPE special-case in llama_byte_to_token() Georgi Gerganov 2023-08-23 20:39:24 +03:00
  • 77a3092c83
    update checkpoint train stats before saving via "--save-every" xaedes 2023-08-23 19:34:45 +02:00
  • 596e1094fb
    common : remove obsolete BPE API + disable test-tokenizer-1 Georgi Gerganov 2023-08-23 20:31:03 +03:00
  • 2424e1d08e
    llama : remove oboslete comment Georgi Gerganov 2023-08-23 20:16:40 +03:00
  • 3bfb720642
    llama : advanced BPE tokenizer based on ggllm.cpp imlpementation Georgi Gerganov 2023-08-23 20:11:45 +03:00
  • b5184d7274
    Make api_like_OAI.py work with Microsoft Guidance Ryder Wishart 2023-08-23 10:10:57 -07:00
  • c3f8a6e49f
    llama : prep new tokenizer support Georgi Gerganov 2023-08-23 19:08:44 +03:00
  • 335acd2ffd
    fix convert-lora-to-ggml.py (#2738) slaren 2023-08-23 16:46:54 +02:00
  • 5290c38e6e
    main : insert bos if no tokens (#2727) master-5290c38 klosax 2023-08-23 16:46:03 +02:00
  • cc34dbda96
    gitignore : fix for windows (#2729) akawrykow 2023-08-23 07:31:34 -07:00
  • 7c2227a197
    chmod : make scripts executable (#2675) Cebtenzzre 2023-08-23 10:29:09 -04:00
  • f19dca04ea
    devops : RPM Specs (#2723) JohnnyB 2023-08-23 15:28:22 +01:00
  • 8263fd7bdb
    Update llama_v3.cpp (#393) askmyteapot 2023-08-24 00:15:48 +10:00
  • 004016e6d8
    Update examples/main/main.cpp Georgi Gerganov 2023-08-23 17:12:26 +03:00
  • c3c5aacef6
    Update examples/main/main.cpp Georgi Gerganov 2023-08-23 17:11:56 +03:00
  • 6938c5f474
    Merge branch 'master' into falcon Georgi Gerganov 2023-08-23 17:08:14 +03:00
  • 356a166b19 reverted log auto endline to better mimic printf staviq 2023-08-23 16:06:46 +02:00
  • d5156d3345 added basic log file handler staviq 2023-08-23 16:03:30 +02:00
  • 727af3ea16 add *.log to .gitignore staviq 2023-08-23 16:02:43 +02:00
  • 176ea716b3
    llama : better model naming and size reporting Georgi Gerganov 2023-08-23 15:53:41 +03:00
  • e7299656bd
    falcon : add CUDA offloading (#2739) slaren 2023-08-23 14:51:30 +02:00
  • 9a13fc4efd Add the short hash back into the tag Danny Daemonic 2023-08-23 05:50:58 -07:00
  • 943a248e40 Explain the user that GGML isn't supported anymore Ignacio DM 2023-08-23 09:18:05 -03:00
  • 95434613b8 falcon : add CUDA offloading slaren 2023-08-23 14:31:39 +02:00
  • 854ae5d030
    metal : temporary workaround for the concurrency optimization bug Georgi Gerganov 2023-08-23 15:25:31 +03:00
  • 0a85ae7397
    metal : fix GELU kernel numerical stability by using precise::tanh Georgi Gerganov 2023-08-23 15:04:53 +03:00
  • 7935986faa fix convert-lora-to-ggml.py slaren 2023-08-23 13:44:27 +02:00
  • b693000c2e
    llama.cpp : fix linefeed token klosax 2023-08-23 13:22:41 +02:00
  • bfdc596d58 gguf reader in file format detection Concedo 2023-08-23 19:19:52 +08:00
  • 8f7fb69031 Fixed double "$". Use ">>" more consistently. Danny Daemonic 2023-08-23 03:33:31 -07:00
  • 8207214b6a
    Fix values shown in the quantize tool help (#2735) master-8207214 Kawrakow 2023-08-23 12:57:12 +03:00
  • 62959e740e
    Strided perplexity (#2714) master-62959e7 Kawrakow 2023-08-23 12:56:42 +03:00
  • 7f7ddd5002
    Fix ggml to gguf conversion on Windows (#2733) IgnacioFDM 2023-08-23 06:31:09 -03:00
  • e2d23bed1b
    falcon : minor changes (still chasing the Metal problem) Georgi Gerganov 2023-08-23 12:25:49 +03:00
  • 575c9066a8 Fix ggml to gguf conversion on Windows Ignacio DM 2023-08-23 04:11:44 -03:00
  • af170fc2db Merge branch 'master' into concedo_experimental Concedo 2023-08-23 17:08:09 +08:00
  • a0dc47a501
    metal : print extra compute pipeline info Georgi Gerganov 2023-08-23 11:25:26 +03:00
  • 981c9131f0 gguf for llama is working Concedo 2023-08-23 16:07:07 +08:00
  • b34ab74094
    falcon : copy-paste self-attention from LLaMA Georgi Gerganov 2023-08-23 11:04:26 +03:00
  • 3f436ea3f3 avoid unnecessary empty data event & send rest of partial tokens on stop Jhen 2023-08-23 15:52:49 +08:00
  • af4bbcc873
    ggml : ggml_repeat always creates new tensor Georgi Gerganov 2023-08-23 10:42:02 +03:00
  • 99bb26078f
    metal : implement RoPE (mode = 2) + avoid ggml_repeat Georgi Gerganov 2023-08-23 10:41:35 +03:00
  • e3c52bd990
    ggml : pass eps to ggml_norm Georgi Gerganov 2023-08-23 10:40:58 +03:00
  • 436c68c365 Fix values shown in the quantize tool help Iwan Kawrakow 2023-08-23 10:24:22 +03:00
  • 3ddec9a52a Adjusted the size/PPL values printed in the quantize help Iwan Kawrakow 2023-08-23 10:16:12 +03:00
  • 3fc1127e2f Merge branch 'master' into server-probs Jhen 2023-08-23 15:14:51 +08:00
  • b8ad1b66b2
    server : allow json array in prompt or content for direct token input (#2306) master-b8ad1b6 Xiao-Yong Jin 2023-08-23 02:12:12 -05:00
  • 5cb5671e22 Alternative way to output PPL results Iwan Kawrakow 2023-08-22 16:12:45 +03:00
  • 53e555d93d Implementing strided computation of perplexity Iwan Kawrakow 2023-08-22 15:55:52 +03:00
  • a9b9f2b341 Modified build.yml to use build number for release Danny Daemonic 2023-08-22 22:40:53 -07:00
  • 07170914a2 Modified build.yml to use build number for release Danny Daemonic 2023-08-22 21:54:01 -07:00
  • 88535ed036 Merge remote-tracking branch 'origin/master' into prompt-array Xiao-Yong Jin 2023-08-22 21:02:57 -05:00
  • ec4a19c5af Merge branch 'master' into server-probs Jhen 2023-08-23 09:26:28 +08:00
  • 4f8d62e444 Add --ctx param to Prepare Data & Run section akawrykow 2023-08-22 17:48:48 -07:00
  • 6b9478ccd5 Remove reference to tokenizer_checklist.chk akawrykow 2023-08-22 17:45:58 -07:00
  • f5fe98d11b
    docs : add grammar docs (#2701) Evan Jones 2023-08-22 21:01:57 -04:00
  • 0a4034f717
    rework GBNF example to be a commented grammar Evan Jones 2023-08-22 20:45:54 -04:00
  • f64c6db5c4 Fix .gitignore for windows akawrykow 2023-08-22 17:38:34 -07:00
  • 62de8a6224 initial, base LOG macro staviq 2023-08-23 02:27:26 +02:00
  • 6803aac321 [gguf] Print the commit hash akawrykow 2023-08-22 17:25:04 -07:00
  • 39a2c89a30 [gguf] Print the date akawrykow 2023-08-22 17:25:04 -07:00
  • 398bedb287 [gguf] Add git commit hash akawrykow 2023-08-22 17:25:04 -07:00
  • 684686eadb [gguf] Add date akawrykow 2023-08-22 17:25:04 -07:00
  • 9407c847a2
    main.cpp : insert bos if no tokens klosax 2023-08-23 01:50:17 +02:00
  • 777f42ba18
    Improve handling of special tokens in GGML to GGUF converter (#2725) master-777f42b Kerfuffle 2023-08-22 17:39:39 -06:00
  • d561b7f724
    llama.cpp : fix the fix of bpe tokenizer klosax 2023-08-23 00:06:53 +02:00
  • a95ae7526a
    llama.cpp : fix bpe tokenizer klosax 2023-08-23 00:02:13 +02:00
  • f382606a13 Try to handle overflow due to buggy Windows Python with a better error message KerfuffleV2 2023-08-22 15:57:49 -06:00
  • 46ef5b5fcf
    llama : fix whitespace escaping in tokenizer (#2724) master-46ef5b5 goerch 2023-08-22 23:10:42 +02:00
  • 6401266cfa Merge remote-tracking branch 'upstream/master' into executable-scripts Cebtenzzre 2023-08-22 17:08:07 -04:00
  • c63bb1d16a
    CUDA: use mul_mat_q kernels by default (#2683) master-c63bb1d Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • 8ad1e2d8d1 llama2.c: comment out legacy "load from ggml model" logic ochafik 2023-08-22 21:46:47 +01:00
  • ffa5099c6d
    llama.cpp : llama default UNK token = id 0 klosax 2023-08-22 22:34:03 +02:00
  • bb58495e89 Set default UNK token mapping from -1 to 0 in llama.cpp KerfuffleV2 2023-08-22 14:31:42 -06:00
  • 14ed02e8d4 Improve UNK, BOS, EOS token handling when converting without metadata. KerfuffleV2 2023-08-22 14:31:24 -06:00
  • 9853f2cfb2
    convert-falcon-hf-to-gguf.py : fix special token mapping klosax 2023-08-22 22:29:11 +02:00
  • 7bbbf38c32
    llama : minor updates Georgi Gerganov 2023-08-22 23:26:16 +03:00
  • 59f67c69a7 llama2c: reinstate ggmlv3 conversion output + update readme w/ gguf conv ochafik 2023-08-22 21:12:29 +01:00
  • 0ec27ad66c
    falcon : minor Georgi Gerganov 2023-08-22 23:11:41 +03:00
  • 2d58444dae
    falcon : support non-40B models Georgi Gerganov 2023-08-22 22:52:14 +03:00
  • 3d59f50fbe Fix für #2721 goerch 2023-08-22 21:37:11 +02:00
  • 3c7c325b98
    falcon : CPU inference working Georgi Gerganov 2023-08-22 22:31:39 +03:00
  • 540765eda3
    Moved to devops dir JohnnyB 2023-08-22 20:16:17 +01:00
  • 085228e1f5
    llama : add arch member to llama_model Georgi Gerganov 2023-08-22 22:09:56 +03:00
  • abafade8a2
    Merge branch 'ggerganov:master' into srpm-testing JohnnyB 2023-08-22 20:09:53 +01:00
  • 3b6cfe7c92
    convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 5c5413dc14
    llama : fix loading progress bar Georgi Gerganov 2023-08-22 21:53:36 +03:00
  • 2f3c80a845
    falcon : load tensor data (CPU only) Georgi Gerganov 2023-08-22 21:42:12 +03:00
  • f6a8864f0a CUDA: use mul_mat_q kernels by default JohannesGaessler 2023-08-20 17:31:21 +02:00
  • 800c9635b4
    Fix CUDA softmax by subtracting max value before exp (#2665) master-800c963 Jiahao Li 2023-08-23 02:27:06 +08:00
  • d1b3b95dc4
    convert : add dummy scores + types Georgi Gerganov 2023-08-22 20:55:05 +03:00
  • 9f28f73785
    llm : read arch-specific KVs Georgi Gerganov 2023-08-22 20:34:17 +03:00
  • e567fabc62
    Clarifying error message Alex Petenchea 2023-08-22 20:18:30 +03:00
  • b19c6e4640
    Merge branch 'master' into falcon Georgi Gerganov 2023-08-22 20:15:01 +03:00