Implement server mode.
This new mode works by first loading the model then listening for TCP connections on a port. When a connection is received, arguments will be parsed using a simple protocol: - First the number of arguments will be read followed by a newline character. - Then each argument will be read, separated by the 0 byte. - With this we build an argument vector, similar to what is passed to the program entry point. We pass this to gpt_params_parse. Finally `run` will be executed with the input/output streams connected to the socket. Signed-off-by: Thiago Padilha <thiago@padilha.cc>
This commit is contained in:
parent
bf44faa0ee
commit
3a0dcb3920
9 changed files with 331 additions and 2 deletions
7
Makefile
7
Makefile
|
@ -229,11 +229,14 @@ utils.o: utils.cpp utils.h
|
|||
run.o: run.cpp run.h
|
||||
$(CXX) $(CXXFLAGS) -c run.cpp -o run.o
|
||||
|
||||
tcp_server.o: tcp_server.cpp tcp_server.h
|
||||
$(CXX) $(CXXFLAGS) -c tcp_server.cpp -o tcp_server.o
|
||||
|
||||
clean:
|
||||
rm -f *.o main quantize
|
||||
|
||||
main: main.cpp ggml.o llama.o utils.o run.o
|
||||
$(CXX) $(CXXFLAGS) main.cpp ggml.o llama.o utils.o run.o -o main $(LDFLAGS)
|
||||
main: main.cpp ggml.o llama.o utils.o run.o tcp_server.o
|
||||
$(CXX) $(CXXFLAGS) main.cpp ggml.o llama.o utils.o run.o tcp_server.o -o main $(LDFLAGS)
|
||||
@echo "\x1b[36mrun ./main -h for help\x1b[0m"
|
||||
|
||||
quantize: quantize.cpp ggml.o llama.o utils.o
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue