William Josephson
wkj at cs
Room 318a
Dept. of Computer Science
Princeton University
35 Olden Street
Princeton, NJ 08544



Alter ego: Find your favorite morphism.

Links: A smattering of links around the web.

Welcome! This is one of my rarely updated outposts on the Internet.

As of Fall 2010 I am wrapping up here at Princeton. If you have an interesting problem or opportunity -- whether or not it is of a computational nature -- I'd love to hear about it. The best way to get in touch is usually via e-mail.

I did my undergraduate work at Harvard, nominally in the mathematics department, but in fact I spent most of my time in the computer science department. The majority of my course work was of a theoretical bent, but I spent much of my otherwise free time with Margo Seltzer and her graduate students. When I graduated, I went to work at Data Domain, a then small storage start-up in the Bay Area where I was involved in algorithm design and implementation. I returned to the east coast for graduate school and along the way I have worked for a variety of new ventures, large companies, and a couple of research labs.

For the past several years I have been working with Kai Li on similarity search and large scale storage systems. If you are looking for a copy of a paper or software that I have written, please contact me directly. Some of my work with the Memex group may be found here.

My recent work has focused on the use of flash memory to build storage systems. The following abstract gives the flavor of the work:

Flash memory has recently emerged as an important component of the storage infrastructure in the data center, but presents several unique challenges to storage system designers: individual blocks must be erased before they can be rewritten, block erasure is time consuming, and individual blocks may be erased only a limited number of times. By combining hardware parallelism and a log-structured approach, state of the art flash storage systems can deliver two to three orders of magnitude more I/O operations per second than existing high-performance fibre channel disk drives. Despite the emergence of state of the art solid state disks, the storage software stack has changed little and is still optimized for magnetic disks; the basic abstraction remains a linear array of fix-sized blocks together with a very large DRAM cache to convert user I/O to a smaller number of large I/O transfers.

We first examine the impact SSDs have had on the utility of the buffer cache. As SSD performance and density improves, the value of very large DRAM based buffer caches declines. We examine the change in tradeoffs through database buffer cache traces and simulation and offer some simple heuristics to take advantage of these changes. Second, we propose a richer interface more suitable for solid-state storage systems. This interface provides for sparse block or object-based allocation, atomic multi-block updates, and a block discard interface to facilitate reclamation of unused storage. Finally, we present DFS, a file system designed specifically for next generation flash memory systems that takes advantage of our proposed storage interface. The result is a much-simplified file system implementation that runs under Linux. In both micro- and application benchmarks DFS shows consistent improvement over ext3 both in throughput and in CPU usage. For direct access DFS delivers as much as a 20% performance improvement in microbenchmarks. On an application level benchmark, DFS outperforms ext3 by 7 to 250% while requiring less CPU power.