6.033 2010 Design Project 1 : Time-Travel Filesystem
Version History
- Feb 9, 2010 -- Initial release
- Feb 16, 2010 (This version) -- Added link to online submission site
I. Due Dates and Deliverables
There are three deliverables for Design Project 1:
- A design proposal not exceeding 800 words, due on February 23th, 2010 (before recitation).
- A design report not exceeding 2,500 words, due on March 18th, 2010 (before recitation).
- An (optional) revision of a portion of your design report to improve your grade; you must set up a meeting with
your writing instructor within one week of receiving your graded DP1 to take advantage of this.
All deliverables should be submitted via the online submission site.
As with real life system designs, 6.033 design projects are
under-specified, and it is your job to complete the specification in a
sensible way given the overall requirements of the project. As with
designs in practice, the specifications often need some adjustment as
the design is fleshed out. We recommend that you start
early so that you can evolve your design over time. A good design is likely to
take more than just a few days to put together.
II. The Problem
Your task is to design a filesystem with support for
time travel. Time travel means that the file system allows the
user to view the contents of a file or directory as of some time in
the past. Time travel is useful in many different applications: for
example, it provides a form of backup, allowing users to recover from
accidental deletions; it also allows users to compare different
versions of a file over time, for example, as a series of edits are
made to a document.
Your design must provide time travel support in Unix, without
requiring changes to the Unix File System API. In other words, it
should be possible to use applications compiled for Unix in your
modified Unix with time-travel support. You will need a way
for applications to access historical versions of files. We suggest
that you create a special directory at the root of the file system that allows you
to refer to historical versions of files, as illustrated
in this example:
% pwd
/home/madden/files
% ls
file.txt file2.txt
% cat file.txt
hello
% date
Tue Feb 9 09:15:50 EST 2010
% echo goodbye > file.txt
% echo file3 > file3.txt
% rm file2.txt
% ls
file.txt file3.txt
% cd /time-2010-02-09-09:00:00.000/home/madden/files
% ls
file.txt file2.txt
% cat file.txt
hello
% diff /home/madden/files/file.txt /time-2010-02-09-09:00:00.000/home/madden/files
1c1
< goodbye
---
> hello
Your design will primarily consist of modifications to the
implementations of every file system operation (read, write,
open, rename, link, unlink, symlink, mkdir, and possibly chdir and stat) -- see
Section 2.5 (page 91) of the textbook for a detailed description of the Unix
File System API. Your implementation must make it possible to travel
to any point of time and see the state of a file or directory as
it existed at the specified time, which is why calls like read() and stat() must
be modified.
You may find that it's hard to reconstruct the contents of a file
as of a time when some application was writing
that file. It's acceptable, if a user travels to some time T and
reads file F, to produce the contents of F as of the close() call
that occurred most recently before T. The user should see any
files that existed at time T, and not see any files that did not
exist at time T.
III. Requirements
Your design must meet the follow requirements:
- It must preserve compatibility with applications programmed to
the Unix file system API, with the exception of the special root directory
that contains historical files, as in the example above (see the next
requirement.)
If necessary, you may add
calls to the API as long as you maintain backwards compatibility.
- It should return an error to applications that try to perform
operations that update a historical version of a directory or
file. Also, it should not allow the user to create a file or
directory that could be interpreted as a historical directory.
Finally, historical directories should be "virtual", in the
sense that they do not appear when examining the contents of the
inode representing the root directory or when performing a command
like "ls -al /", and are only accessible through a direct chdir() ("cd")
operation to a historical pathname, as shown in the example above.
- It must allow observation of the state of any file or
directory as it existed at the time the user has traveled to.
- It must provide reasonable performance in terms of number of
bytes or blocks read or written from disk. In particular, its performance
shouldn't be much worse than the standard Unix file system when accessing either
historical or current files.
Here are a few additional assumptions you can make to simplify your design:
- You can assume that the operating system provides a method time() that returns the
current system a floating point number with millisecond precision that specifies the
number of seconds since January 1, 1970.
- You do not need to worry about the case that the disk runs out of space.
- You do not need to worry about concurrent modifications from different users -- e.g.,
you can assume that each operation happens at a distinct millisecond, and that only one user
edits a file at a time.
- You do not need to worry about what happens in the event of a crash in the middle of
an operation (i.e., how to recover your data structures from various types of failures.)
- You can assume that you have about 1 GB of RAM available to buffer write operations in
memory.
IV. Design Proposal
The design proposal should be a concise summary (800 words) of your
overall system design.
The format of your proposal should be roughly as follows: it should
begin with an overview, describing the overall approach. It should
then summarize your proposed design and explain how it solves the
problem and meets the requirements in this document.
The core of the proposal should be the design description. This
should explain the basic on-disk data structures used to represent
your time travel file system and a sketch of how read and write
operations will work. You may make use of bullets and tables, but
must introduce them with at least 1-2 sentences of context. This
section must include at least one graphic, correctly formatted with a
caption and brief description.
The proposal should end with a summary of how
the design meets the requirements (1-2 sentences). It should also
include problems which remain to be resolved.
You do not have to present a detailed rationale or analysis in your
proposal. However, if any of your design decisions are unusual
(particularly creative, experimental, or risky) or if you deviate from
the requirements, you should explain and justify those decisions in
your proposal.
You will receive feedback on your proposal from your TA in time to
adjust your final report.
V. Design Report
Your report should explain your design. It
should discuss the major design decisions and tradeoffs you made, and
justify your choices. It should discuss any limitations of which you
are aware. You should assume that your report is being read by someone
who has read this assignment and is familiar with the Unix file system and API, but has not thought carefully about this
particular design problem. Give enough detail that your project can be
turned over successfully to an implementation team. Your report
should convince the reader that your design satisfies the requirements
in Section III.
Make sure your report includes the following:
- An explanation of the behavior visible to users.
- A description of how files and directories are stored on disk, and how
changes to files and directories are recorded. You can assume that your
reader is familiar with the Unix File System on-disk representation.
- A description of how a historical version of a file or
directory is located and how its bytes are retrieved from disk.
- A specific discussion of how open, read, write, unlink, and
chdir operations are implemented. For the other file system operations
you do not need to provide a detailed discussion of them but you
should briefly mention how their implementation would change relative to the
standard Unix file system in your design.
- An analysis of the number of reads and writes and space overhead required for each
of the workloads given in Section V.A. below.
- A (brief) discussion of what happens when a user tries to modify a historical file,
and how errors are reported.
V.A. Workloads
Specifically consider how your system would perform on the following three workloads:
- The user creates a website consisting of many small HTML files
(<10 KB) at time T1, updates each of the files several times, and
then travels back to time T1 and rereads each of the files in their
entirety. For this workload compare the number of I/O operations
(reads or writes of blocks from disk) required to perform these
historical reads to the number of I/O that would be required if
these were not historical reads. Also estimate the space required
to store these historical versions, relative to the size of the
original collection of files.
- The user downloads a large movie (~10 GB) from his or her
camcorder at time T2, and then performs edits to several small
scenes in the movie, comprising about 20% of the total disk blocks
occupied by the file (the movie size stays about the same after
these edits.) Compare the number of I/O operations required to
perform these writes to the number that would be required in the
standard Unix file system (as specified in Section 2.5 of the course
textbook.) Also estimate the space required to store the historical
versions of the movie, relatively to size of the original movie.
- Now suppose the user travels to time T2 and plays back the
movie. How many I/O operations are required by your system? How
many I/O operations would be required to play it back at the current
time? How does that compare to the number of operations that would
be required for playback of the current version in the standard Unix
file system?
Assume that the cost of a sequential I/O (where the disk head is already
positioned over the sector being read) is approximately the same as a random
I/O (where the disk must seek to the sector being read.) Though mechanical hard drives
don't actually behave this way, newer stable-storage devices, such solid-state disks
based on Flash memory, do.
It may not be possible to design a system with good performance in
all possible scenarios: you may need to make tradeoffs, supporting
some kinds of use well, and others not so well. Your should explicitly
mention any such tradeoffs in your workload analysis.
V.B. Report organization
Use this organization for your report:
- Title page: Give your report a title that reflects the subject and scope of your project. Include your name, email address, recitation instructor, section time(s), and the date on the title page.
- No table of contents is needed
- Introduction: Summarize what your design is intended to achieve, outline the design, explain the major trade-offs and design decisions you have made, and justify those trade-offs and decisions.
- Design: Explain your design. Identify your design's main components, state, and algorithms for implementing the Unix file system operations. You should sub-divide the design, with corresponding subsections in the text, so that the reader can focus on and understand one piece at a time. Explain why your design makes sense as well as explaining how it works. Use diagrams, pseudo-code, and worked examples as appropriate.
- Analysis: Explain how you expect your design to behave in different scenarios. What scenarios might pose problems for throughput, latency, or even correctness? What do you expect to be the scalability limits of your design?
- Conclusion: Briefly summarize your design and provide recommendations for further actions and a list of any problems that must be resolved before the design can be implemented.
- Acknowledgments and references: Give credit to individuals whom you consulted in developing your design. Provide a list of references at the end using the IEEE citation-sequence system ("IEEE style") described in the Mayfield Handbook.
- Word count. Please indicate the word count of your report at the end of the document. Captions of figures should be included in the total word count.
- Footnotes. Please do not use footnotes in your report.
Here are a few tips:
- Use ideas and terms from the textbook and papers when appropriate; this will save you space (you can refer the reader to the relevant section of the textbook) and will save the reader some effort.
- Before you explain the solution to any given problem, say what the problem is.
- Before presenting the details of any given design component, ensure that the purpose and requirements of that component are well described.
- It's often valuable to illustrate an idea using an example, but an example is no substitute for a full explanation of the idea.
- You may want to separate the explanation of a component's data structures from its algorithms to access or use those data structures.
- Explain all figures, tables, and pseudo-code; explain what is being presented, and what conclusions the reader should draw.
The format and appearance of your report should conform to the DP1 style guide.
VI. Revising Your Report
You may revise a portion of your design project if you received a
grade of 90 or less on the original report (grades are out of 100).
The maximum grade you can receive through revision is 90. Before you
submit your revision, you must set up an appointment with your writing
instructor to agree on a plan for your revision within one week of
receiving your grade on DP1. You writing instructor will work out a
time when your revision is due, typically one week after your meeting.
VII. How we evaluate your work
Your recitation and writing instructors will assign your report a grade that reflects both the design itself and how well your report presents the design. These are the main grading criteria.
The most important aspect of your design is that we can understand how it works and that you have clearly addressed the requirements and provided the elements listed in Sections III and IV. Complicated designs that we cannot understand will not be graded favorably.
Some overall content considerations:
- Does your solution address the stated problem?
- How complex is your solution? Simple is better, yet sometimes simple will not do the job. On the other hand, unnecessary complexity is bad.
- Is your analysis correct and clear?
- Are your assumptions and decisions reasonable?
Some writing considerations:
- Is the report well-organized? Does it follow standard organizational conventions for technical reports? Are the grammar and language sound?
- Do you use diagrams and/or figures appropriately? Are diagrams or figures appropriately labeled, referenced, and discussed in the text?
- Does the report use the concepts, models, and terminology introduced in 6.033? If not, does it have a good reason for using different vocabulary?
- Does the report address the intended audience?
- Are references cited and used correctly?
VIII. Collaboration
This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.