William Josephson
wkj at cs
Room 318a
Dept. of Computer Science
Princeton University
35 Olden Street
Princeton, NJ 08544
Alter ego:
Find your favorite morphism.
Links:
A smattering of links around the web.
|
Welcome! This is one of my rarely updated outposts on the Internet.
As of Fall 2010 I am wrapping up here at Princeton. If you have
an interesting problem or opportunity -- whether or not it is of a
computational nature -- I'd love to hear about it. The best way to
get in touch is usually via e-mail.
I did my undergraduate work at Harvard, nominally in the mathematics department, but
in fact I spent most of my time in the computer science department.
The majority of my course work was of a theoretical bent, but I
spent much of my otherwise free time with Margo Seltzer and her
graduate students. When I graduated, I went to work at Data Domain, a then small storage
start-up in the Bay Area where I was involved in algorithm design
and implementation. I returned to the east coast for graduate
school and along the way I have worked for a variety of new ventures,
large companies, and a couple of research labs.
For the past several years I have been working with
Kai Li on similarity
search and large scale storage systems. If you are looking for a
copy of a paper or software that I have written, please contact me
directly. Some of my work with the Memex group may be found
here.
My recent work has focused on the use of flash memory to build
storage systems. The following abstract gives the flavor of the
work:
|
Flash memory has recently emerged as an important component of the
storage infrastructure in the data center, but presents several
unique challenges to storage system designers: individual blocks
must be erased before they can be rewritten, block erasure is time
consuming, and individual blocks may be erased only a limited number
of times. By combining hardware parallelism and a log-structured
approach, state of the art flash storage systems can deliver two
to three orders of magnitude more I/O operations per second than
existing high-performance fibre channel disk drives. Despite the
emergence of state of the art solid state disks, the storage software
stack has changed little and is still optimized for magnetic disks;
the basic abstraction remains a linear array of fix-sized blocks
together with a very large DRAM cache to convert user I/O to a
smaller number of large I/O transfers.
We first examine the impact SSDs have had on the utility of the
buffer cache. As SSD performance and density improves, the value
of very large DRAM based buffer caches declines. We examine the
change in tradeoffs through database buffer cache traces and
simulation and offer some simple heuristics to take advantage of
these changes. Second, we propose a richer interface more suitable
for solid-state storage systems. This interface provides for sparse
block or object-based allocation, atomic multi-block updates, and
a block discard interface to facilitate reclamation of unused
storage. Finally, we present DFS, a file system designed specifically
for next generation flash memory systems that takes advantage of
our proposed storage interface. The result is a much-simplified
file system implementation that runs under Linux. In both micro-
and application benchmarks DFS shows consistent improvement over
ext3 both in throughput and in CPU usage. For direct access DFS
delivers as much as a 20% performance improvement in
microbenchmarks. On an application level benchmark, DFS outperforms
ext3 by 7 to 250% while requiring less CPU power.
|
|
|