Hierarchical Modularity and Intermodule Optimization (Thesis)
Separate compilation is an important tool for coping with design complexity in large software projects. When done right it can also be used to create software libraries, thus promoting code reuse. But separate compilation comes in various flavors and has many facets: namespace management, linking, optimization, dependencies. Many programming languages identify modular units with units of compilation, while only a few extend this to permit hierarchies of language-level modules within individual compilation units. When the number of compilation units is large, then it becomes increasingly important that the mechanism of separate compilation itself can be used to control namespaces. The group model implemented in SML/NJ's compilation manager CM provides the necessary facilities to avoid unwanted interferences between unrelated parts of large programs. Compilation units are arranged into groups, and explicit export interfaces can be used to control namespaces. When there are many groups, they can be organized into super-groups, and so on, thus forming a hierarchical modular structure. CM provides automatic dependency analysis, but automatic dependency analysis is NP-hard for general SML code. We show two simple restrictions that avoid intractability. To remove the penalties for efficiency usually incurred by modularization and separate compilation, I designed an algorithm for automatic inline expansion across compilation unit boundaries that works in the presence of higher-order functions and free variables; it rearranges bindings and scopes as necessary to move non-expansive code from one module to another. I describe---and implement---the algorithm as transformations on $lambda$-calculus. The inliner is efficient, robust, and practical enough for everyday use in the SML/NJ compiler. It preserves separate compilation and has been integrated with CM. I briefly investigate macro systems as an alternative approach---driven by programmer directives---to achieve cross-module inlining and discuss a variety of problems that arise. I show a solution to the problem of integrating macro systems with ML-style modules that use long identifiers and show an implementation technique for reliable name resolution during linking. But I also discover that other problems continue to impede large-scale programming.