bcachefs: fix crc32c checksum merge byte order problem

An fsstress task on a big endian system (s390x) quickly produces a
bunch of CRC errors in the system logs. Most of these are related to
the narrow CRCs path, but the fundamental problem can be reduced to
a single write and re-read (after dropping caches) of a previously
merged extent.

The key merge path that handles extent merges eventually calls into
bch2_checksum_merge() to combine the CRCs of the associated extents.
This code attempts to avoid a byte order swap by feeding the le64
values into the crc32c code, but the latter casts the resulting u64
value down to a u32, which truncates the high bytes where the actual
crc value ends up. This results in a CRC value that does not change
(since it is merged with a CRC of 0), and checksum failures ensue.

Fix the checksum merge code to swap to cpu byte order on the
boundaries to the external crc code such that any value casting is
handled properly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This commit is contained in:
Brian Foster 2023-09-27 07:23:37 -04:00 committed by Kent Overstreet
parent 4220666398
commit 3c40841cdc

View file

@ -362,7 +362,7 @@ struct bch_csum bch2_checksum_merge(unsigned type, struct bch_csum a,
state.type = type;
bch2_checksum_init(&state);
state.seed = (u64 __force) a.lo;
state.seed = le64_to_cpu(a.lo);
BUG_ON(!bch2_checksum_mergeable(type));
@ -373,7 +373,7 @@ struct bch_csum bch2_checksum_merge(unsigned type, struct bch_csum a,
page_address(ZERO_PAGE(0)), page_len);
b_len -= page_len;
}
a.lo = (__le64 __force) bch2_checksum_final(&state);
a.lo = cpu_to_le64(bch2_checksum_final(&state));
a.lo ^= b.lo;
a.hi ^= b.hi;
return a;