The previous implementation of getchar32() relied on low-level console functions, which caused issues when running the code in subprocesses with redirected stdin. The ReadConsoleInputW function is designed to read from the console input buffer and may not function correctly with input redirection. Additionally, using fgetwc in a Windows environment could lead to potential issues in certain scenarios.
I encountered unexpected results when redirecting stdin, even without passing any argument. To replicate the issue, try the following command in your C# code:
```
ProcessStartInfo startInfo = new()
{
FileName = ".\\main.exe",
Arguments = "-m .\\7b-q4.bin -n 256 --repeat_penalty 1.0 --interactive-first -r \"User:\" -f .\\chat-with-bob.txt",
RedirectStandardOutput = true,
RedirectStandardInput = true,
UseShellExecute = false,
CreateNoWindow = true
};
```
To address this problem and enable people to stream the program directly to a UI without worrying about the C++ part, I made the following changes:
- Replaced ReadConsoleInputW with std::getwchar(), a standard C++ input function that reads wide characters from std::wcin. This change allows getchar32() to handle both console and redirected stdin scenarios consistently.
With these modifications, getchar32() now functions as intended in various environments and ensures that console interactions work correctly, even when stdin is redirected.
* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools
* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci
* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields
* llama, main : constrain sampling to grammar
* allow loading grammar from file
* fix whitespace errors
* handle & print parser errors
* add comments to grammar syntax and allow newlines where unambiguous
* add missing include
* support alternates in root rule
* fix bugs with empty token and EOS
* adjust JSON grammar
* remove swp file
* rewrite ternary expressions
Co-authored-by: Henri Vasserman <henv@hot.ee>
* use struct for grammar elements and add Unicode support
* add unicode escapes
* add inverse char ranges
* only sample full tokens (no peeking or truncation)
* llama : minor style changes
blindly applied in online editor - hopefully I didn't break something
* update help text
* add warning message if EOS is disabled
---------
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>