bpf: derive smin/smax from umin/max bounds

Add smin/smax derivation from appropriate umin/umax values. Previously the
logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax
(if possible), but not trying to do the same in the other direction. A simple
addition to __reg64_deduce_bounds() fixes this.

Added also generic comment about u64/s64 ranges and their relationship.
Hopefully that helps readers to understand all the bounds deductions
a bit better.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231102033759.2541186-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit is contained in:
Andrii Nakryiko 2023-11-01 20:37:45 -07:00 committed by Alexei Starovoitov
parent f4c7e88732
commit 93f7378734

View file

@ -2358,6 +2358,77 @@ static void __reg32_deduce_bounds(struct bpf_reg_state *reg)
static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
{
/* If u64 range forms a valid s64 range (due to matching sign bit),
* try to learn from that. Let's do a bit of ASCII art to see when
* this is happening. Let's take u64 range first:
*
* 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX
* |-------------------------------|--------------------------------|
*
* Valid u64 range is formed when umin and umax are anywhere in the
* range [0, U64_MAX], and umin <= umax. u64 case is simple and
* straightforward. Let's see how s64 range maps onto the same range
* of values, annotated below the line for comparison:
*
* 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX
* |-------------------------------|--------------------------------|
* 0 S64_MAX S64_MIN -1
*
* So s64 values basically start in the middle and they are logically
* contiguous to the right of it, wrapping around from -1 to 0, and
* then finishing as S64_MAX (0x7fffffffffffffff) right before
* S64_MIN. We can try drawing the continuity of u64 vs s64 values
* more visually as mapped to sign-agnostic range of hex values.
*
* u64 start u64 end
* _______________________________________________________________
* / \
* 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX
* |-------------------------------|--------------------------------|
* 0 S64_MAX S64_MIN -1
* / \
* >------------------------------ ------------------------------->
* s64 continues... s64 end s64 start s64 "midpoint"
*
* What this means is that, in general, we can't always derive
* something new about u64 from any random s64 range, and vice versa.
*
* But we can do that in two particular cases. One is when entire
* u64/s64 range is *entirely* contained within left half of the above
* diagram or when it is *entirely* contained in the right half. I.e.:
*
* |-------------------------------|--------------------------------|
* ^ ^ ^ ^
* A B C D
*
* [A, B] and [C, D] are contained entirely in their respective halves
* and form valid contiguous ranges as both u64 and s64 values. [A, B]
* will be non-negative both as u64 and s64 (and in fact it will be
* identical ranges no matter the signedness). [C, D] treated as s64
* will be a range of negative values, while in u64 it will be
* non-negative range of values larger than 0x8000000000000000.
*
* Now, any other range here can't be represented in both u64 and s64
* simultaneously. E.g., [A, C], [A, D], [B, C], [B, D] are valid
* contiguous u64 ranges, but they are discontinuous in s64. [B, C]
* in s64 would be properly presented as [S64_MIN, C] and [B, S64_MAX],
* for example. Similarly, valid s64 range [D, A] (going from negative
* to positive values), would be two separate [D, U64_MAX] and [0, A]
* ranges as u64. Currently reg_state can't represent two segments per
* numeric domain, so in such situations we can only derive maximal
* possible range ([0, U64_MAX] for u64, and [S64_MIN, S64_MAX] for s64).
*
* So we use these facts to derive umin/umax from smin/smax and vice
* versa only if they stay within the same "half". This is equivalent
* to checking sign bit: lower half will have sign bit as zero, upper
* half have sign bit 1. Below in code we simplify this by just
* casting umin/umax as smin/smax and checking if they form valid
* range, and vice versa. Those are equivalent checks.
*/
if ((s64)reg->umin_value <= (s64)reg->umax_value) {
reg->smin_value = max_t(s64, reg->smin_value, reg->umin_value);
reg->smax_value = min_t(s64, reg->smax_value, reg->umax_value);
}
/* Learn sign from signed bounds.
* If we cannot cross the sign boundary, then signed and unsigned bounds
* are the same, so combine. This works even in the negative case, e.g.