diff --git a/examples/duo/README.md b/examples/duo/README.md
index 7cb8e6e69..8644aab87 100644
--- a/examples/duo/README.md
+++ b/examples/duo/README.md
@@ -12,13 +12,12 @@ In case of two identical devices and equal model split we would leave half of co
 
 We can utilize this compute to speculate and then evaluate larger sequence of tokens.
 
-This demo is fairly limited:
-1. Expects two instances running main model
-2. One of these instances speculating
+This demo is fairly limited, more like a proof of concept:
+1. Expects exactly two instances running main model
+2. Only one of these instances speculating when main model is idle, so we still waste 25% of compute
 3. Speculation is linear
 4. Sampling is greedy
 
-So, in the case of two identical devices and equal model split we still are not utilizing 25% of compute.
 Improvement of the above points is probably easier to do as separate changes, to make reviewing easier.
 
 ### Setup
@@ -50,15 +49,24 @@ Also on M2:
 ./bin/duo -m ../../llms/gguf/Meta-Llama-3-8B-Instruct-fp16.gguf -md ../../llms/gguf/Meta-Llama-3-8B-Instruct-v2.Q2_K.gguf --rpc "localhost:10001,169.254.77.16:10001" -p "Please illustrate the difference between concurrency and parallelism in python." -n 256 -ngl 99 -t 1  --rpcd "localhost:20002"
 
 ...
-decoded 256 tokens in 32.03 s, speed: 7.99 t/s
+llama_print_timings:        load time =   42068.04 ms
+...
+llama_print_timings:       total time =   42792.74 ms /   302 tokens
 
 ```
 
+Seems like eval time is messed up a little 
+
 Compare that with running main with same 2 rpc servers:
 ```
-./bin/main -m ../../llms/gguf/Meta-Llama-3-8B-Instruct-fp16.gguf  --rpc "localhost:10001,169.254.77.16:10001" -p "Please illustrate the difference between concurrency and parallelism in python." -n 256 -ngl 99 -t 1
+./bin/main -m ../../llms/gguf/Meta-Llama-3-8B-Instruct-fp16.gguf  --rpc "localhost:10001,169.254.77.16:10001" -p "Please illustrate the difference between concurrency and parallelism in python." -n 256 -ngl 99
 ...
-
+llama_print_timings:        load time =   42305.61 ms
+...
+llama_print_timings:       total time =   58555.49 ms /   268 tokens
 ```
 
+Extra: 
+
+GPU util for both devices