Posted on August 1, 2019 by Gabriel Parmer
This course covers the fundamental concepts of operating systems, focusing on resource management and abstraction. This includes OS structure, processes and thread management, communication with peripherals (I/O), synchronization, deadlocks, memory management, Virtual Machines, cloud infrastructures, and abstractions for cloud computation. The workload for this class is heavy and programming intensive.
github
account).Objectives - In completing this class, students will…
Structure - This class is broken into two main activities: lectures and lab.
Each semester, you’re expected to spend at least:
Prerequisites:
Responsibilities - Students must
You should depend on the instructional staff (including Gabe) for the class to
These can be remembered as “we’re here to make it clear what you should do, what you should know, and what to do if you don’t feel like you know what you’re supposed to do or know”. If at any point, you have concerns about any of these, of if we’re dropping the ball on any of them, please let us know and we’ll do better. There are trade-offs made in the class (for example, homeworks are not formally specified as I don’t want them to be 50 pages each), but you always encouraged to ask questions and clarify.
Above all, it is important that this class provides an environment for learning and self improvement. Gabe takes this very seriously, so if it is not adequately serving that purpose, don’t hesitate to let him know why.
Part of keeping a course “fresh” and successful is trying new things, and keeping the things that work. Given this, the experiments I’m running this year include:
Online office hours. I really want to make office hours work this year. Historically a very small minority of the class attends office hours. There are many reasons for this, but two of them are inconvenience/timing and the fact that office hours can be intimidating. Thus we’re going to try a few things:
xv6
, how to debug using gdb
, how to decipher a Makefile, how to understand and debug C, etc…Organized class clarifications. I’d like to encourage a strong feedback loop for answering questions and concerns about the material in lecture and lab. The instructional staff make mistakes and might not convey everything perfectly, and everyone at some point in the class will get confused by the material. You must take the student responsibility (above) seriously that requires you to ask questions in class if something isn’t clear. However, some of these confusions only surface after you have some time to process/absorb the material.
Thus, after each class, we will post a specific Piazza note asking for areas in which you don’t feel confident. If I forget to post this, please do so with the title “Class clarifications for y xx/xx” where y is the day of the week (Tuesday or Thursday), and xx/xx is the month/day, and use the Piazza topic lecture_clarifications
. I will look at these topics, and try and cover them as much as possible at the start of the next class. This is your opportunity to tell me what needs more coverage in the class.
In class, I’ll be calling on random students. I’m sorry.
This class requires active discussion, and for a broad cross-section of the class must be involved. Historically fewer than 10 students have fielded the majority of the questions, which isn’t a healthy dynamic, nor does it lead to a discussion that will help the majority. I completely understand that answering questions can be very stressful, and the constant possibility of being called on next can keep you on your toes. Know a few things:
So I completely sympathize that answering in-class questions can be quite challenging. Because of this, my expectations are not that you correctly answer every question. Often the most intelligent way to answer a question is with another question. So I expect that if you cannot take a shot at a question, you try and identify where the confusion is and point me in the right direction to better explain the material. However, please stay alert, and do your best.
Everyone gets a single day that they are guaranteed not to be called on. Though attendance is mandatory, if you know you aren’t going to be able pay attention (personal complications, multiple exams, no sleep the night before, you know, college 😉), then you can get a pass. You must fill out the form at least 15 minutes before class, and can only fill it out once. Double check that you’re choosing the correct day.
Topic | xv6 Book Chapters/Additional Reading1 |
Silberschatz Chapters2 |
---|---|---|
OS Fundamentals: Hardware and Structure | 0, 1, 3 and this post | 1, 2 |
Processes, IPC, Threads, Isolation | this post | 3, 4 |
Scheduling and Critical Sections | 5, 4 | 5, 6, 19 |
Mutexes, Condition Vars, Deadlock | N/A | 6, 7 |
Midterm | all of the things… | |
Memory Management and Allocation | N/A | 8 |
Memory Protection | 2 | 9 |
File Systems and Storage | 6 | 10, 11, 12, 13 |
Security | N/A | 15 |
Final | all of the things… |
You can find the lectures on Piazza after they are delivered. By default, I’ll provide these after they are fully covered in class. I’ve found that people read ahead which can harm the learning processes (as quite a bit of information is not on the slides) as it removes the problem solving processes. If this policy hinders your learning process, talk to me and we can discuss options. For your reference, all of the lecture’s pdf
s are on the following topics:
Strongly encouraged (but optional) Text:
If you have trouble understanding the xv6 source code, I recommend the following resources:
make print
)You can fork the xv6 source on github. When you do, test that you can run it with make
, and make qemu
.
In the end, you have to get comfortable walking through the code, and understanding it. Using a code indexing system will make this easier. ggtags
or ctags
for emacs does great, but there are corresponding technologies for nearly all editors.
If you’re having trouble with C, here is a list of references:
cdecl
command, or, if it isn’t installed, you can use the cdecl
webpage.It is your responsibility to quickly get up to speed in C, so please use these resources.
All of your homeworks for this class will be submitted via github
. The last commit that you make to the repo will be graded, and the lateness penalties of that commit (see “Late Policy” below) will apply. Please do not make push
es to your repo after the version you want graded. Please also not that it is necessary to issue a git pull
up to the github
repo for submission. Local commits to your own repo do not update github
, thus cannot be counted as submission.
Grades will be assigned with the following proportions:
Homeworks include smaller programming assignment, written responses, and a long final programming project.
Participation will be graded not only on attendance, but also on interactivity and involvement during class and lab.
If needs be, there will be short quizzes at the beginning of classes. These will only happen if people make it a habit of not coming to class, or coming late.
Homework Grade Modifications:
The following modifies apply to your grade on all homeworks (but not the project):
xv6
warning flags, or -Wall -Wextra -Werror
if not in xv6
) as we will make compilations fail (with -Werror
) if there are warnings. Forgetting to git add
files is not an excuse. “A small change makes it compile” is not an excuse. You should always do a fresh git clone
of your submitted repo, to make sure that it compiles and your tests work. Make sure it works in the VM using the version of the compiler we use (i.e. Ubuntu 18.04). If you don’t test in this specific environment, your code will likely fail compilation.Your grade can be minimum 0%, but your maximum grade can be higher than 100%. There is one exception:
I cannot emphasize this point enough: The only way people have failed this class is by not trying or by cheating.
Late Policy:
The only acceptable excuses for late submissions are medical, family, and related issues. For these, you will require a note (from a doctor, parent, etc…) to justify an extension. Unacceptable excuses include:
git add
necessary files, or forgetting to git push
(proper git
usage is a requirement for the class), andYou get 0 credit for late work that is unexcused. However, once the grades are released, you can resubmit your code within seven days, rounded up to the closest midnight. For example, if the grades are released at 2:30pm on the 8th, resubmissions are due at midnight on the 15th (i.e. \(8 + 7 = 15\)). You will get 0 credit for submissions after this date.
We will grade the last commit before the resubmission deadline, and you will receive 50% of the credit on top of the original grade, rounded up to the nearest percent. For example, if you got a 43% on the initial deadline, and resubmit and get a 68%, your final grade (\(g\)) will be \(g = \lceil 43 + \frac{(68 - 43)}{2} \rceil = 56\).
Do not come to rely on the resubmission process. Homeworks are due every week or every other week, so it is very harmful to get behind on them, or to add more work to your schedule. Thus, our late policy is quite strict to discourage procrastination.
Please note that in terms of academic honesty, you cannot share your code with your peers even after it is due.
Testing your implementation:
When you are required to test your code, please note that our test cases will best every edge case in the implementation. If you do not write code to test the edge-cases of the specifications, then you will lose credit. This is the largest reason why students get lower grades than they believe they deserve. If you believe that an error was made in grading, then please explicitly address why you believe your test cases are sufficient to test all cases. See the discussion about testing and debugging below.
Style and Code Craftsmanship:
Any homework graded by demo (i.e. the project) must adhere to style guidelines, and you must curate your code. It must be simple, readable, and clean. See this post For some thoughts on how to make your code more readable, see the Composite Style Guide.
Just as you can do a google search for code online, it is trivial for us to do the same. We have caught numerous people cheating in the past in this way. If you feel pressured about an assignment, please come see me instead of cheating.
You are not allowed to collaborate on the homeworks and the lab assignments. The group projects require collaboration within each group, but no collaboration between teams is permitted. Please refer to the academic integrity policy linked from the course web page. This policy will be strictly enforced. If you’re having significant trouble with an assignment, please contact me. If you have not signed the academic integrity statement for homework 0, you will not receive a passing grade for the class.
Summary: I want to be very clear:
In the past, the only way that people have failed this class is to either not try (not do homeworks), or to cheat. There is almost no reason to cheat. Come to me, and we can discuss where you’re at, and what you have to do to make it through this difficult class!
Justification: Cheating hurts your ability to compete for CS jobs. Everyone came to college to learn. You learn by doing work to reinforce class ideas. Understanding ideas from class is nowhere near as effective at learning as applying those ideas in an implementation. Thus, you need to do your own implementation to be a competitive CS student in the real-world!
Cheating hurts your peers. Courses that are graded on a curve have the side-effect that if someone inflates their grade not by hard work, but by cheating, your good grade (should the Professor not catch you) will hurt the grade of all of your colleagues that put the work in.
Cheating also hurts the quality of the OS class in the long term, thus hurts all future generations of GW CS undergrads. If you cheat, especially using previous year student’s work, to the point where the Professor believes that they must develop new assignments every year, then those assignments will not be as strong. They will not meet learning objectives in a course like this. All future CS students will be less prepared for competing for the best jobs.
Please come see Gabe to address any of your concerns. You should not wait until a problem comes to a head before meeting. Gabe can help you find a path to avoid bigger problems. Of course, the main use for office hours is to discuss the homeworks, or help prepare for exams.
If you have a particularly horrible debugging problem, then Gabe might be your best bet. Regardless, who you talk to, if you want them to look at code then you must adhere to style conventions. See style conventions below. At the most basic level, you must
If you don’t demonstrate to us that you actively tried to understand your own code by making it as easy to understand as possible, then it isn’t reasonable for us to dive into your code. Note that you really should take all of these steps when developing, debugging, and testing your own code, regardless. In the “real world” it is expected that you generate code that adheres to these guidelines.
For online office hours, you don’t need to prepare at all. Show up, and see what happens. If there aren’t questions, we’ll talk about the material, technical skills, or do code walk throughs.
There are unwritten rules about writing C code, and a set of conventions we must follow for the class. They include:
.h
files should never include global variables. This can cause linking errors when two .c
files include the same .h
file, and are linked together..h
files include function bodies (definitions), then the function must be static inline
..c
file, should be static
. Generally watch this if you have questions about visibility.#include
anything other than a .h
file. .c
files should be compiled separately, and linked together..c
and .h
files, you must expect that a client will write their own .c
file (e.g. main.c
) that contains a main
function, and link that file with your .c
library file, and include your .h
file..h
files.struct
, function, or variable, that file depends on the declaration (and type signature) of that to compile. Thus, each file must #include
a complete set of header files that include all of those declarations/prototypes. For example, if your file calls malloc
, you had better #include <stdlib.h>
. Your tests might compile without including that file explicitly (maybe because they include another file, foo.h
, that includes stdlib.h
), but this is not resilient design. If a client includes foo.h
after the prototype for malloc
is required, this will result in compiler errors. In short, you must write your code such that the order of includes in client code should never matter (i.e. never result in compilation failures).To run xv6
, you need qemu
installed. If you aren’t running Linux, then you might execute Linux in a virtual machine (e.g. using Virtual Box or VMWare Workstation), and qemu
within Linux. I believe others have gotten qemu
executing in OSX as well (but it takes installing a cross-compiled ELF-version of gcc
), so discuss on Piazza if you’re interested in that.
$ git clone https://github.com/gparmer/gwu-xv6.git xv6
$ cd xv6
$ make qemu
Simple!
Note that the xv6
book has a great overview of the design of the entire system. Please use it as a resource if you need it. Note that since you don’t own this repo, you cannot git push
to it. You have to fork it on github
to become the owner of your own fork. For this class, all assignments will be completed on repos that we provide for you.
Some hints for understanding the code-base. All code that runs in user-level is in the files that correspond to the entries of UPROGS
, and all of the header files that they include. The rest of the code executes in the kernel. Thus to add a new program, you only need to provide the .c
file, and add it in a similar manner to UPROGS
.
When you’re looking at a .c
or .h
file, always make sure you understand if it is executing in user-space (thus must make a system call to enter the kernel), or is in the kernel (thus can directly call functions within the kernel). Using a code indexing system will make it easier to walk through the source code. ggtags
or ctags
for emacs does great, but there are corresponding technologies for nearly all editors.
Some helpful xv6
tips:
When xv6
boots up, it has a set of files and programs that populate its root /
directory. You can add a new file3 (in this example, named new.txt
) into the root by modifying the Makefile
in the following ways (see the new.txt
in each):
fs.img: mkfs README new.txt $(UPROGS) ./mkfs fs.img README new.txt $(UPROGS)
PRINT = runoff.list runoff.spec README new.txt toc.hdr toc.ftr $(FILES)
EXTRA=\ mkfs.c ulib.c user.h cat.c echo.c forktest.c grep.c kill.c\ ln.c ls.c mkdir.c rm.c stressfs.c usertests.c wc.c zombie.c\ printf.c umalloc.c\ README new.txt dot-bochsrc *.pl toc.* runoff runoff1 runoff.list\ .gdbinit.tmpl gdbutil
When you add more files (e.g. test files) into the system, you may run out of blocks. You can increase the number of blocks in the system by increasing the FSSIZE
variable.
When you want to add a user-level program, you simply implement it as a .c
file, and add it, appropriately, to the list of UPROGS
in the Makefile
.
When you want create a library (a .c
file in xv6
) compiled into each program, add it, appropriately, to the list of ULIB
in the Makefile
.
git
and github
Using github
. github
provides a wonderful set of repository management features online. You can go through the github
bootcamp if you don’t know how to use it.
You can use github
to coordinate between team members in many different ways. I’d strongly recommend that when working in a team, you use the feature branch workflow pattern of interaction. At the highest-level, this means that each team member will be working on their features on separate branches, and will integrate them into the mainline to coordinate with everyone else.
A few of the ways to interact with github
:
git pull
and git push
are you main means of pulling your repo off of github
, and to update the version on github
. If you’re working in a group, these are the only ways that you synchronize your work between team members. If you want to submit an assignment, git push
is the only means to do so.git pull
when another team member has previously made changes to the repo. You have to go through your code looking for any place that git
has inserted text like “HEAD”, “>>>”, “<<<”, and “—” that indicate where you’ve made changes that conflict with the github
repo.github
on PRs! See the software engineering guide for some information on code reviews and PRs. However, it is more advanced, so make sure you understand the simpler method, above, and get some practice using it before moving on to PRs.Using git
. I’m not going to do a deep dive into using git
here, and expect that you can figure out the main features online. Some quick git tips can be found at the bottom of software engineering guide. An incomplete list of commands that you can use with git include:
git diff
, git diff --stat
, and git status
which all let you understand at different granularities what changes have been made. Before you commit code, you should likely execute all of these commands. You do this for two reasons. First, you want to make sure that only code modifications that you want in the commit, are in it. Each commit should be relatively focused, so it is important to catch this. Second, it is a good way to remind yourself what the commit contains so that you can better summarize it in the commit message. It is also useful to know that you can execute diff
on ranges of commits to see what changed (see git log
to see the list of commits).git commit -a
, obviously. Do not, do not, use the m
flag. You should write a “full form” message, not a single line. The focus of your message should be on 1. summarizing the commit in a single-line (which is displayed along with the commit in github
), and 2. an explanation of the details of the commit. The latter should explain the intention, and often includes an itemized list of the main aspects of the commit. @
s should be used to reference any issues it addresses.git branch <b>
and git co <b>
to use and manage your branches. Get used to using branches. Branches are useful for when you’re trying to debug a problem, or implement a new feature. To see how they’re useful, it is important to realize that it is not uncommon to get distracted, and have to redirect your attention to another feature or bug. Without branches (and the ability to roll-back to before your work on the feature), you end up with a commit history of half-finished features intertwined with bug fixes and other features. This makes your Pull Requests schizophrenic and difficult to review. So by default, you should always be developing on a non-mainline branch. When you go fix a bug, or work on a new feature, ask yourself if it deserves its own branch.git stash
is an acknowledgment that we mess up with our branches. If you find that you’re somewhat accidentally developing a new feature, or fixing a new bug, but have a bunch of unstaged changes on the current branch, you can stash
them, create a new branch, stash pop
the changes into the new branch. I’m sure there are other ways to do this, but this example at the very least demonstrates how stash
can be used to delay a current set of changes.git rebase
enables you to update a branch to an updated master, and to “squash” multiple commits, and clean up your commit history. When the master progresses beyond the point where your branch split from it, you generally want to fast-forward merge your changes onto the new master. git rebase master
is your friend here. It is generally superior to git merge
as it maintains a cleaner (linear) history. git rebase -i
allows you to rewrite history by combining different commits. This is very useful as it enables you to have a set of commits that tell a story, and don’t have a number of “in progress” commits. My suggestion is that you label your commits that you’ll likely want to get rid of in the future with TODO
as a header on the title line. More information about this can be found here and here.Submitting assignments. When you submit an assignment, I suggest that you follow this procedure:
git status
to make sure that you have git add
ed all of the intended files.git diff
and git diff --stat
to make sure that you’re committing what you think you are. Of note, check to see if you’re committing any printf
s that are intended for debugging, not for program output.git commit
and git push
to upload your code to github
.git clone ...
to get a fresh version of the repository, build it, and run your tests. Students who omit this step get bad grades for very avoidable reasons. Always make sure that your submission matches what you think you’re submitting.Please document in your repository (e.g. in the README.md
) which coding style you’re using. By default, the xv6
repo we use is formatted in accordance with the Composite Style Guide. See that description to understand why style is important, and why we choose the style we do.
It is important, when you’re working on a team to use a consistent style. It is more important to choose one and stick with it, than to choose the “perfect style”. You can find many different styles online, including
Though not relevant for this class, many modern languages have prescribed styles such as go, and Rust.
Style requirements for this class. For the purposes of this class, never bring code to the instructional staff that isn’t properly indented. That is the minimum bar that humans need to be able to understand code. If you bring code that isn’t indented, then we will think that you haven’t put a sincere effort into understanding your code as you haven’t invested care into making it understandable.
Debugging is a skill. Just like any other skill, practicing doing it well will make you much better at it. Most students are so rushed by deadlines (often due to procrastination) that they focus only on “making it work” rather than on refining their debugging skills to become better computer scientists. Finding a fix to any given issue is important, but you must focus on developing your long-term skills, or else you’ll find your “ceiling” of capability much sooner than you’d like.
To deliberately and continuously improve your debugging skills, I implore you to focus on the following:
They are each briefly discussed below.
When something goes wrong with your code, it is because your mental model of how everything should fit together does not match the instructions you’re providing the computer. Your mental model is build of many assumptions about what your code is doing at any point in time. One of the most important learned skills is recognizing when you’re making an assumption, and identifying what the assumptions are. Examples of these assumptions include:
sbrk
system call in my code, it expand the heap and allows me to use more memory.xv6
, the first code in the kernel properly accesses the system call’s arguments.Most your debugging time should be going to coming up with creative ways to check your assumptions. When something goes wrong, (at least) one of your assumptions is wrong. Your job is to figure out which one(s). This changes the emphasis of debugging from “why so wrong!?!?” to “which specific part of my understanding of this is incorrect?”. This is how you change debugging from what feels like a random process, to something that is methodical, deliberate, and, most importantly, productive.
It is very common for people to get stuck with no obvious path forward. The program is simply not doing what they think it should, and it makes no sense! If you’re at this point, then you’ve already made too many assumptions (at least one of them being wrong), and have built your entire mental model upon them. The solution is to go back, and start testing not your code, but the assumptions made throughout the lines of your code. Where is your understanding of your code going wrong?
The scientific method is based around formulation hypothesis about how some phenomenon will behave, designing tests for checking the hypothesis, running the experiments/tests, and either confirming the hypothesis, or starting the entire process again. An important aspect of this process is that it is focused on knowledge creation. Deriving a hypothesis takes a deliberate process that focuses you to apply your brain, and think about the workings of the entire system. If it is confirmed, you often know not only that the hypothesis is true, but also why. If the hypothesis is wrong, you learn that there is a bug in its formulation, and you might go back to more foundational tests to figure out how and why it is wrong. This enables you to understand the system better, and made a better hypothesis next time around.
You should treat your assumptions about your own code with the same care. First, acknowledge that they are assumptions – never just assume that a part of your code works without testing – and then treat the assumptions as hypothesis about the code that need to be tested. Most importantly treat your assumptions as something that you refine through intentional testing. It is through this process that you learn about how your code works.
When you sit down with the goal of solving some programming task, it is natural to try and jump in and start developing. This is a fine approach if the problem that you’re trying to solve is below some complexity threshold. If it is a very comfortable task for you, then just “diving in” can be a perfectly viable approach. As you get more and more skilled as a programmer, this threshold will increase. However, as you get more and more skilled, you’re responsibilities will likely increase faster than this threshold. Thus, it is necessary to learn to be productive on complex projects that you cannot easily just “sit down and implement”.
It is necessary to do work that is beyond this threshold for your own personal development. Much of the work in this class is valuable because it is, by design, too complex to implement productively without a plan. This forces you to practice the act of planning out your approach. I help with this in the homeworks by providing a set of “levels” that need to be completed. You must complete the levels one at a time as they provide a path to incrementally add complexity only after you’ve learned about the problem from the previous levels.
Your development plan is simply a sequence of implementation goals that you implement, and test one at a time. You don’t need to have a complete plan mapping from the start of the implementation to the end, but you do want the first N steps so that you have a plan. This allows you to implement you code in a way that has a little bit more of a long-term design behind it.
When we write code, we often don’t understand the problem domain well enough to provide an ideal implementation. The effect of this is that our code is far too complex. As we learn about different aspects of the problem domain – “ohhh, I have to worry about i > N
as well”, “yikes, what if the head is NULL
?”, etc.. – we add more and more conditions and loops. Keep this in mind: at any point in your code look at the level of indentation; that is a good proxy for the number of things you need to keep in your head to understand a given line of code. Each condition and loop encodes variables and changes that you need to remember to understand the code in their bodies. Given that human short-term memory holds only 7$$2, code simply becomes too complex to keep in your head.
It is necessary to revisit and rewrite your code when it becomes to complex. Debugging code that is too difficult to understand and too challenging to even formulate assumptions about will take orders of magnitude more time to debug than to rewrite your code with a simpler structure and debug that. I very frequently find that I cannot make sense about how to modify my code to achieve some goal, and the only anecdote that I know is rewriting the code to constrain and abstract away complexity. You’d be amazing how much a few well-named functions containing well-abstracted code can make your code easier to understand. Additionally, the act of trying to simplify your code makes your code easier to debug and forces you to understand the fundamental problems, which also makes debugging (and assumption formulation) easier. In short, real-world programming involves a lot of code rewriting. For much more information about how to write C code with an eye toward constraining the complexity, see the composite style guide.
Untested code must be assumed wrong, simply put. You can never make the assumption that any part of your code works that you haven’t tested, no matter how simple. “It should work” is something that the instructional staff will never believe, and should be an indication of hard debugging ahead if you find yourself uttering it. “It should work” is a huge assumption with a builtin recognition that you aren’t going to test the assumption. Put another way, it is a recognition that you’re about to waste a lot of your own time. So I want to make it clear that you must test all parts of your own code. If your grade is lower because a branch doesn’t quite do what it is supposed to, “but it is just a small error” not only is completely incorrect, but demonstrates a complete lack of understanding of the fundamental importance and purpose of testing.
The job of a library, an operating system, or a piece of utility code is to make the programmer able to perform some operation, or to make the operation easier. In such situations, you have to assume that programmers don’t know what they are doing, and that, to your greatest ability, you’ll catch their errors and return a useful indication of where they went wrong. In the case of an OS, you have to check that all parameters/inputs are valid as if they are not, they might cause a bug that could crash the entire system, or allow the user to take over the system. This is accomplished in a similar way in every language, on every platform: write the code to explicitly check the validity of arguments, write the code to check for edge cases, and return the proper result. In the case (as in this class) where you’re given the specification for each operation, you simply need to ensure that you encode all of these erroneous inputs into conditions, and return the proper value.
Next, since code that isn’t tested doesn’t work, you have to write tests to make sure that all inputs result in the correct output. Properly testing your code is simple: codify your assumptions about how your implementation behaves into code. This absolutely means that you’ll spend time writing your testing code to generate all the edge cases, but this is absolutely necessary.
Most submissions that lose points, lose them because:
There is almost no reason to lose points for 1. and 2. Testing is an essential part of being a developer. Thus, the instructional staff will not be lenient if you don’t do proper testing.