Indoor wireless propagation is governed by the interaction among
3D scene geometry, radio-material properties, and transmitter and
receiver configuration. Most learning-based site-specific
prediction methods focus on a single wireless representation, such
as radiomap estimation or CIR prediction, and therefore do not
explicitly exploit the propagation structure shared across
heterogeneous wireless views.
WiSER maps a sparse voxel representation of an indoor scene and a
transmitter location into a transmitter-conditioned sparse 3D scene
memory. This shared memory is queried by two structure-aware
decoders: a ray-corridor decoder for dense receiver-plane path-gain
prediction and a Detection Transformer-style set decoder for
variable-cardinality delay and power tap prediction.
We train and evaluate WiSER with a co-registered indoor
scene--wireless dataset generated from ScanNet++ scenes and Sionna
Ray Tracing. The dataset aligns sparse voxel inputs, dense radiomap
labels, and unordered multipath CIR tap sets under a common
coordinate frame and propagation configuration.