There are three deliverables for Design Project 1:
All deliverables should be submitted via the online submission site. As with real-life system designs, 6.033 design projects are under-specified, and it is your job to complete the specification in a sensible way given the overall requirements of the project. As with designs in practice, the specifications often need some adjustment as the design is fleshed out. We recommend that you start early so that you can evolve your design over time. A good design is likely to take more than just a few days to put together.
Late submission grading policy: If you submit your DP1 report late, we will penalize you one letter grade per 48 hours, starting from 5pm on the submission day. For example, if you submit the report anywhere from 1 minute to 48 hours late, and your report would have otherwise received a grade of "A", you will receive a "B"; if you submitted 49 hours late, you would receive a "C".
Your job is to design a concept similar to Smart Folders on MacOS or Search Folders on Windows into the UNIX file system. The goal is for the user to be able to create special "virtual" directories that contain the results of a file search query as symbolic links to the original files (see Section 2.5/Page 104 of the textbook for a discussion of symbolic links). The main challenge is to do this in a manner that maintains performance of both the virtual directories that contain the search results and the directories that are being searched.
Note that symbolic links are different from the "hard" links described in the UNIX paper; unlike hard links, symbolic links simply encode the path of the file they point to. The file may or may not actually exist (so symbolic links are not removed if the file they point to is removed, and they don't modify the link count in a file's i-node). Symbolic links may also point to files that are on volumes that are not currently mounted.Your search folders interface will take the form of a command-line tool with the following usage:
searchmount mountpoint searchpath [expression]
searchmount -u mountpoint
The first command creates a new directory mountpoint
and makes the new directory appear to contain symbolic links to files under searchpath
matching the search query given in expression
. The second command removes the mountpoint
directory.
The expression
given to the first command is in a format similar to the one accepted by the UNIX find
command. Several variations of the find
command exist depending on whether you are using Linux, BSD, MacOS etc., but your system must support at least the following kinds of expressions:
find
).Note that the directory to be searched should be searched recursively, that is, the search should include files both in the searched directory and in its subdirectories, and so on.
As an example, consider a directory named /Users/jsmith/Research
, containing many files in various subdirectories. The user can find files larger than 10 megabytes contained within this directory using the traditional find
command:
$ find /Users/jsmith/Research -size +10M /Users/jsmith/Research/References/In Spreadsheet/quan_haystackthesis.pdf /Users/jsmith/Research/References/Other external/Example tailor-made/Artifax/eventconcerthallsmall.wmv /Users/jsmith/Research/Work/130924 InfoVis/Slides V2.pptx /Users/jsmith/Research/Work/130924 InfoVis/Slides V3.pptx /Users/jsmith/Research/Work/130924 InfoVis/Slides.ppt
Using the searchmount
command, the user can use a similar syntax to create a directory /Users/jsmith/BigResearchFiles
that always appears to contain symbolic links to files in the /Users/jsmith/Research
directory that are larger than 10 megabytes:
$ searchmount /Users/jsmith/BigResearchFiles /Users/jsmith/Research -size +10M $ ls -l /Users/jsmith/BigResearchFiles Slides V2.pptx -> /Users/jsmith/Research/Work/130924 InfoVis/Slides V2.pptx Slides V3.pptx -> /Users/jsmith/Research/Work/130924 InfoVis/Slides V3.pptx Slides.ppt -> /Users/jsmith/Research/Work/130924 InfoVis/Slides.ppt eventconcerthallsmall.wmv -> /Users/jsmith/Research/References/Other external/Example tailor-made/Artifax/eventconcerthallsmall.wmv quan_haystackthesis.pdf -> /Users/jsmith/Research/References/In Spreadsheet/quan_haystackthesis.pdf
From this point on, as files are modified, added, or removed in the original /Users/jsmith/Research
directory, the apparent contents of /Users/jsmith/BigResearchFiles
should be updated to reflect the results of the search query.
Note the search directories do not contain other directories, only files.
The search folders system will be implemented as a "removable file system" of the kind mentioned in the Section 3.4 of the UNIX paper.
We will now give more details about the specific interfaces you will need to implement for the purposes of this exercise. After the user creates a search directory with searchmount
, the operating system will invoke a collection of methods that you must design to create and read the contents of your directory. These methods are as follows:
DirEntryPointer getFirstDirectoryEntry(name)
:
Get a pointer to the first entry in the directory with the given name
, or null if there are no files in the directory. Called by the operating system when it wants to begin iterating over the files in your virtual file system. Note that there is only one top-level directory in your virtual search file system. So suppose your virtual file system is mounted on /mnt/bigfiles
and this directory contains the files song1.mp3
, song2.mp3
, and song3.mp3
, then getFirstDirectoryEntry("/mnt/bigfiles")
will return a DirEntryPointer
representing the directory entry for song1.mp3
.
DirEntryPointer readNextDirectoryEntry(DirEntryPointer previousFile)
:
Get a pointer to the next directory entry after previousFile
in the directory containing said file, or null if there are no more files in the directory. To iterate over all the files in the directory, the operating system first calls getFirstDirectoryEntry
to get the first directory entry and then calls readNextDirectoryEntry
successively with the previous DirEntryPointer
to retrieve additional entries. Continuing the example above, readNextDirectoryEntry(getFirstDirectoryEntry("/mnt/bigfiles"))
would return a DirEntryPointer
representing the directory entry for song2.mp3
.
String fileName(DirEntryPointer entry)
:
Return the name of the file represented by directory entry entry
. Called when the operating system wants to get the name to display for a file listed with getFirstDirectoryEntry
or readNextDirectoryEntry
. Continuing our example, fileName(getFirstDirectoryEntry("/mnt/bigfiles"))
would return the string "song1.mp3"
.
String readSymbolicLink(DirEntryPointer entry)
:
Return the target of a symbolic link stored in the directory entry pointed to by entry
. Called when the operating system wants to get the path of the file that directory entry entry
on your virtual file system points to. So readSymbolicLink(getFirstDirectoryEntry("/mnt/bigfiles"))
might return a string like "/home/john/Audio/song1.mp3"
.
The operating system will not look at the contents of the DirEntryPointer
structure that you return from getFirstDirectoryEntry
or readNextDirectoryEntry
. However, you should make clear what you need to store in DirEntryPointer
yourself in order to implement the methods above.
Note that the methods above are different than those described in Section 2.5 of the textbook in that they explicitly differentiate between directories and files, whereas the methods in the textbook specify that directories are nothing more than files with a particular structure. The advantage of the above design is that it allows you to design your file system without requiring the creation of inodes that contain the names of linked files.
Your design paper should clearly describe your design for these methods; this includes any on-disk data structures, in-memory data structures, and operations that are done at specific times in order to support the required operations.
In addition, you should describe what happens when the searchmount
utility is initially run; specifically, describe the behavior of the createDirectory
method, which is part of the searchmount
utility:
createDirectory(name, searchPath, expression)
: Called when the user initially mounts the file system name
for the specified search path with the specified search expression. You can assume that this method can call into the operating system to create a new, removable file system that exposes the four methods described above to search its contents.
In reality, implementing a virtual file system requires you to implement many other methods (e.g. to list permissions for directory entries, to attempt to create new files, etc.). On Linux, a virtual file system like this would be most easily implemented using a the FUSE interface. You are not required to specify a complete set of file system functions like those defined by FUSE, only the methods we have listed above. You also do not need to describe the implementation of the searchmount
utility beyond the functioning of the createDirectory
method.
Your design should consider the following use cases:
searchmount
, e.g., creating a search folder to search a never-before-searched target directory and immediately examining the resulting files, when the target directory contains a modest number of files (say, 300).searchmount
to search the system's root directory ("/"), which may contain a very large number of files (say, 100,000).createDirectory
command should not take a long time to complete. It may also be desirable to avoid redoing a whole search at every reboot.You should analyze your solution these use cases. Explain both the performance of access to the search folder itself and the performance impact on the rest of the system, especially the searched target directories. Keep in mind:
Your design does not need to deal with failures (e.g. power outages or disk failures).
Some issues to consider include:
If you wish, you can assume the operating system provides a way to listen for changes to files in a directory, such as inotify.
Additionally, your implementation may make use of a simple persistent database with the following API:
DBPointer openDatabase(String filename)
: opens a database at the specified location on disk; database is created if it doesn’t exist.put(DBPointer p, String key, String value)
: stores the specified key/value in the specified database. If the key already exists, the previous value is overwritten.String get(DBPointer p, String key)
: returns the value of the specified key in the database.remove(DBPointer p, String key)
: deletes the specified key from the databaseYou may create as many databases as you wish, and you can assume that the database is stored persistently on disk (e.g., survives across restarts and reboots, and does not become corrupted.) In addition, you can assume that the put and get commands are relatively efficient (e.g., the running time of a single get or put is small and does not grow significantly with the size of the database, even for databases with millions of records.)
The design proposal should be a concise summary (800 words) of your overall system design.
The core of the proposal should be the design description. This section must include at least one graphic, correctly formatted with a caption and brief description.
You do not have to present a detailed rationale or analysis in your proposal. However, if any of your design decisions are unusual (particularly creative, experimental, or risky) or if you deviate from the requirements, you should explain and justify those decisions in your proposal.
You will receive feedback on your proposal from your TA in time to adjust your final report. You will also receive a writing program grade for your proposal, as well as feedback from your writing instructor that will help you improve the writing of the final report. Your writing instructor will evaluate the proposal according to the CI guidelines.
Some writing considerations for the proposal (and report):
Here are a few tips:
Your report should explain your design. It should discuss the major design decisions and tradeoffs you made, and justify your choices. It should discuss any limitations of which you are aware. You should assume that your report is being read by someone who has read this assignment, but has not thought carefully about this particular design problem. Give enough detail that your project can be turned over successfully to an implementation team. Your report should convince the reader that your design satisfies the requirements in the "Requirements" section above.
Use this organization for your report:
The writing suggestions for the proposal also apply to the report. You can find previous examples of DP1 reports under the Excellent Writing Examples link on the left-hand side of the course home page.
Your recitation and writing instructors will assign your report a grade that reflects both the design itself and how well your report presents the design. The most important aspect of your design is that we can understand how it works and that you have clearly addressed the requirements and provided the elements listed in the "Requirements" and "Design Report" sections above. Complicated designs that we cannot understand will not be graded favorably.
Some overall content considerations:
The grading rubric for the final report is as follows:
Overall design
|
30 |
The degree to which the design addresses the requirements and use cases
|
20 |
Analysis of space and time requirements
|
20 |
User experience
|
15 |
Quality of the figures that illustrate the design |
5 |
Overall presentation |
10 |
The items in the grading rubric are not independent: a design that we cannot understand will likely result in a low score for several items.
85 and above is an A grade. Between 60 and 85 is a B grade. You will have to hand in a design project to pass 6.033.
This project is an individual effort. You are welcome to discuss the problem and ideas for solutions with your friends, but if you include any of their ideas in your solution you should explicitly give them credit. You must be the sole author of your report.