Estimating Application Hierarchical Bandwidth Requirements using BSP Family Models
Abstract:
A vast amount of work has been done on developing programming models
that provide good performance across machine architectures, are easy
to use, and have predictable performance. Similarly, designing and
optimizing architectures to achieve the best performance for an
application class is a challenging task. Accurate cost modeling is
essential to both application development and system design.Many scientific computing codes are developed using libraries that
provide custom-built collective communication primitives. The family
of Bulk Synchronous Parallel (BSP) machine models are suitable tools
for analyzing such problems.However, modeling the effect of bandwidth limitations for globally
unbalanced communication and estimating the hierarchical bandwidth
used by applications remain key challenges.We present a hierarchical bandwidth machine model (alphaDBSP)
that naturally extends the Decomposable BSP (DBSP) model by
associating a bandwidth growth factor $\alpha$ to each message
pattern.Algorithms executed on alphaDBSP have a runtime at least as good
as DBSP. There are globally unbalanced problems for which
alphaDBSP analysis is simpler or more accurate. E.g., the one
element nearest-neighbor message exchange running on a pruned
butterfly takes $O(log^3(p))$ on alphaDBSP and $O(\sqrt{p})$ on
DBSP, while optimally modeling the one-to-all broadcast requires a
single communication step on alphaDBSP vs. $O(log(p))$ steps on
DBSP. We present three scientific computing kernels that illustrate
differences between alphaDBSP and DBSP analysis.Similar to BSP family models, alphaDBSP predicts collective
communication execution time for a given machine. Additionally,
alphaDBSP estimates the hierarchical bandwidth required by a given
application. System architects may use this estimation to design
machines that avoid bandwidth bottlenecks for their target
application class.