CS 5631 -- Operating Systems (Spring 2019)
Course Description
Operating system as resource manager. Processor management and scheduling, deadlocks, concurrency, memory management and protection and security as applied in modern operating systems. Concepts are illustrated via laboratory assignments which heavily emphasize concurrency.
prereq: 2511, 2521 or instructor consent, a grade of C- or better is required in all prerequisite courses
The appendix to this syllabus contains a topic list; the course schedule is online.
Student Learning Outcomes (SLOs)
This course supports the following UMD SLOs:
- Demonstrate competence in a major field.
- Construct, integrate, and apply knowledge from instruction and experience.
- Think critically and creatively in seeking solutions to practical and theoretical problems.
Instructor
Dr. Peter A. H. Peterson
Email: pahp@d.umn.edu
Ph: 218.726.7988
Office: Heller Hall 329
Office Hours: MW 11A-11:50P, F 12-12:50P (for other times, see my homepage)
Teaching Assistant
Prateek Joshi
Email: joshi275@d.umn.edu
Office Hours:
Monday 9:00 - 9:50 am, 11:00 -11:50 am
Wednesday 2:00 - 3:00 pm
Location: HH 314
Meeting Times and Locations
|
Day
|
Time
|
Location
|
Lecture
|
MWF
|
1:00-1:50
|
CHEM 150
|
Lab I
|
M
|
7:00-7:50
|
MWAH 177
|
Lab II
|
M
|
8:00-8:50
|
MWAH 177
|
Important note: lab attendance is mandatory. Bring headphones so that you can listen to videos privately.
Moodle Site and Google Group
We will use Moodle to assign reading and homework and for submissions, grading and other class-related activities.
We will use Google Groups for critical course announcements.
We will use Slack for course discussion, exam reviews, etc.
Pre-requisites
- CS2511 -- Software Analysis and Design
- CS2521 -- Computer Organization and Architecture
In extremely special circumstances, pre-requisites may be waived by the instructor's consent.
Readings
Textbook
Operating Systems: Three Easy Pieces (a.k.a, OSTEP)
Remzi and Andrea Arpaci-Dusseau
OSTEP is freely available online as PDF chapters. However, at a fraction of what you'd pay for a typical textbook, you're also able to order it in hardcover, softcover and as a single, unified PDF.
I'm really excited to use OSTEP for this class. Not only is it free (as in beer, not as in speech), but it also happens to be the best textbook I could find! It is a well-written, easy to read, fun and organized tour of Operating Systems. It's also divided into roughly one section per class lecture. As a result, we are going to more-or-less read the entire textbook this semester! "Wait," I hear you saying, "the book is basically 600 pages long! How are we going to read the whole thing?" Well, there are around 40 lectures per semester (usually a few more), and 600 divided by 40 is 15. This means you only need to read about 15 pages per class period. Totally manageable! Since we don't have class every day, that's less than 8 pages per day. And think of what an OS whiz you're going to become by the end!
Homework and Laboratory Projects
Forty years ago, students learning Operating Systems would study the actual sourcecode of the original UNIX along with commentary by a professor, John Lions. Eventually, the copyright holder of UNIX made this illegal, inspiring many people to photocopy the sources and discuss it in secret. "The Lions Book" became the closest thing OS geeks had to a sacred text. This change also inspired the creation of other UNIX-like operating systems, including a famous one designed for education, MINIX. MINIX wasn't truly free (as in speech), which inspired a Finnish Computer Science student named Linus Torvalds to write his own UNIX for the i386 CPU. He named it -- wait for it -- Linux.
Studying the original UNIX sources is now legal again, but the code is so old now that it is not very useful for a hands-on course (the version of C it uses is obsolete and it doesn't include parallelism, for example). Fortunately, some folks at MIT developed the xv6 Operating System, which is a modern take on a bare-bones UNIX for OS classes like ours. xv6 includes sources and exercises, just like a modern Lions Book would. We'll use xv6 (and other software) for our labs and homeworks.
While lab session will be in a UMD classroom, we'll SSH into compute servers operated by the UMD CS department to run our code. This means that you'll be able to work on your homework from pretty much anywhere and won't need to install much, if any, extra software on your computers besides the PuTTY ssh client (if you run Windows). Users of Linux and OS X won't need to install ssh or a terminal emulator.
Miscellaneous
Other readings (paper handouts or online resources) may be assigned and used during class. We will send an email to the class when this occurs.
Assignments and Grading
Moodle will be used to manage multiple aspects of the course, including homework submission and grading. We will use Google Groups for announcements, and Slack for discussions. (See the links at top of syllabus.)
Grading
Grading is broken down as follows:
- Homework/Quizzes/Lab Attendance (15%) -- Short assignments to test your understanding and prepare you for the midterm, final and lab projects. Some homework/quizzes will require you to use a computer, some won't. Weekly in-class quizzes will incentivize reading assigned material before class. Attendance of at least one hour of a scheduled lab section per week. (See "Lab Attendance", below.)
- Midterm I (10%) -- essay and multiple choice questions covering Virtualization
- Midterm II (10%) -- essay and multiple choice questions covering Concurrency
- Final (%25) -- cumulative written exam in the style of the midterms, focusing on Persistence but also cumulative (covering material from the semester)
- Labs (20%) -- smaller and shorter hands-on tasks intended to give you real experience working with and programming operating systems
- Projects (20%) -- three larger hands-on projects serving as capstones to each of the "three easy pieces" of OS
Extra Credit (3-6%) -- There will be extra credit opportunities throughout the semester (providing 3-6% extra points) composed of a mix of moderately difficult questions/projects and much more difficult "challenge problems" to encourage you to dig deeper. Extra credit points will be added to your earned percentage of required deliverables. For example, if you have an 89% based on all required tasks and you also earned 3% of extra credit, your final grade percentage would be 92%.
Lab Attendance
A new or ongoing lab/project will be assigned most weeks during the scheduled lab section along with accompanying videos or documentation. Sometimes lab will include a live demo or discussion. Often, it will be unstructured so you can watch the video, read documentation, start work, or ask your TA or instructor questions. Regardless, lab attendance is mandatory to encourage you to start work early and ask questions of your peers and TA.
Final Grades
Final grades will be assigned as follows:
- 90% guarantees at least an A-
- 80% guarantees at least a B-
- 70% guarantees at least a C-
- 60% guarantees at least a D
We will make every effort to post grades to Moodle in a timely fashion.
Final Exam
This course will include a written final, the time and location of which are:
- Midterm I: TBD
- Midterm II: TBD
- Final: Monday, May 6th at 12:00 AM CHEM 150
The Midterms and Final will be composed of "choice questions", simple and open-ended short answer questions, questions about papers or other materials, etc. and will be of the general character of the homework questions. The final will be cumulative but will skew towards the material presented after the last midterm.
Hardware and Software
A Note on Programming, Networking and Unix Proficiency
While theory and intellectual knowledge (i.e., "book larnin'") about Operating Systems are essential, being able to use that knowledge effectively in the real world is just as (if not more) important. As a result, in addition to reading and written work, this Operating Systems class includes a significant amount of hands-on coding, debugging, experimentation, etc., on real systems. Required projects will involve programming and debugging primarily in C and Python, but potentially other languages. You do not need to be a coding wizard to succeed and no specific expertise with these languages is expected. However, basic programming literacy and proficiency in at least one language such as Python, Java or C/C++/C# and the understanding that all computer languages are fundamentally similar is critical. Experience working at the Linux/Unix command-line will be helpful but is not strictly necessary.
However, the most critical prerequisite is a willingness to learn, experiment, push yourself and do things.
Hardware and Software
We will be performing our work primarily on compute servers operated by the CS department. We will access these computers through the SSH command line interface. This means that you can perform most of work on your projects on campus or at home and should only rarely need to install any special software on your personal computer.
Class Policies
Calling on Students in Class
From time to time, I call on students randomly in class -- usually using a die. This is to encourage engagement, which I believe will improve your learning and ultimately, your grade. However, if you do not wish to be called on in class, email me and request that your name be taken off my roster. This will not affect your grade as I do not grade answers or include "participation" as a graded element.
Attendance and Excused Absences
Class will include discussion and quizzes in addition to lectures on assigned reading. Students are expected to attend all scheduled class meetings. It is the responsibility of students to plan their schedules to avoid excessive conflict with course requirements. However, there are legitimate and verifiable circumstances that lead to excused student absence from the classroom. These are subpoenas, jury duty, military duty, religious observances, illness, bereavement for immediate family, and NCAA varsity intercollegiate athletics. For complete information, please see: http://www.d.umn.edu/vcaa/ExcusedAbsence.html
If you miss class for whatever reason, it is your responsibility to obtain the information covered in class from a classmate, instructor or TA.
Lab attendance is counted and included in the final grade; each attendance is considered a homework assignment.
Late Work
Late work will not be accepted (excepting certain extra credit problems or because of health issues or true emergencies) because we may do "post-mortems" on the projects in class and interactively discuss the answers. Partial credit will be given when appropriate! Therefore, turn in your work, even if it is not completely finished.
Missed Exams and Quizzes
Exams and quizzes will not be given early, and make up quizzes will not be given (see "Late Work," above). Make up exams will be provided only in extreme emergencies (and with the instructors consent).
Incompletes
I will not give incompletes except for very extreme circumstances (e.g., a major health crisis accompanied by a doctor's note). The last day to turn in extra credit tasks is the last day of Finals Week.
Academic Integrity and Conduct
Academic dishonesty tarnishes UMD's reputation and discredits the accomplishments of students. Academic dishonesty is regarded as a serious offense by all members of the academic community. UMD's Student Academic Integrity Policy can be found at: http://www.d.umn.edu/vcaa/StudentAcademicIntegrity.html
Teaching & Learning: Instructor and Student Responsibilities:
UMD is committed to providing a positive, safe, and inclusive place for all who study and work here. Instructors and students have mutual responsibility to insure that the environment in all of these settings supports teaching and learning, is respectful of the rights and freedoms of all members, and promotes a civil and open exchange of ideas. To reference the full policy please see: http://www.d.umn.edu/vcaa/TeachingLearning.html
Student Conduct Code
Appropriate classroom conduct promotes an environment of academic achievement and integrity. Disruptive classroom behavior that substantially or repeatedly interrupts either the instructor's ability to teach, or student learning, is prohibited. Disruptive behavior includes inappropriate use of technology in the classroom. Examples include ringing cell phones, text-messaging, watching videos, playing computer games, email, or surfing the Internet on your computer instead of note-taking or other instructor-sanctioned activities.
Students are expected adhere to Board of Regents Policy: http://www.d.umn.edu/vcaa/documents/Student_Conduct_Code.pdf
Cheating and Code Reuse
I will not tolerate plagiarism.
Not sure what constitutes plagiarism? Dr. Ted Pedersen of the UMD CS department has written a nice case study on the subject.
Solo (non-group) project assignments must be your own work. You may discuss general, high-level, or conceptual issues with other students, but should not share actual code or answers with others. Cheating is considered to be sharing code either by copying, retyping, looking at, or supplying a copy of a file, and applies to information from both current and previous versions of this class (i.e., looking at answers from a previous semester is considered cheating). For group projects, these rules apply between groups instead of individuals.
Sometimes, students feel compelled to cheat on homework because they are afraid of admitting that they do not understand the material or do not know how to complete some task or overcome some technical hurdle. Nobody understands everything -- you should never be afraid of asking questions you have made a reasonable effort to answer. If you are struggling with any material in the class, please come talk to the TA or the instructor early enough to get the help you need -- that is the reason we are here.
While getting answers from current or previous students is considerd cheating, in this class it is acceptable to find and use existing code snippets, libraries, tutorials, HOWTO's, Stack Exchange information and other similar resources, provided that:
- the information is not explicitly an answer to an assigned programming/design problem
- the information used is from a legitimate general information source (i.e., not a cheating website)
- you cite the resource used.
Please note that this policy may not apply to other classes at UMD (or elsewhere). It makes sense in this course because, rather than demonstrating your understanding by designing and programming discrete, standalone solutions, most projects involve solving large, system-level problems using a synthesis of many smaller solutions (some original and some found elsewhere). In many cases, we have intentionally left critical information out of course materials explicitly so that you will need to go online to find resources with the answers.
That said, it is up to you to ensure that any source you use is sufficiently attributed; this should -- at the very least -- include a comment(in your source code or writeup) identifying:
- A description of the information used, including enough information to identify the scope of the cited material (i.e., is it a single line of code? an entire script? etc.)
- The author's name or nickname if available
- A URL referencing the source of the information
- The date that the information was downloaded and
- A note in the lab writeup (if applicable)
In the case of libraries or programs provided by us for the class (e.g., tcpdump or ettercap) or widely available pre-packaged applications (such as tools available in the standard Ubuntu distribution), it is sufficient to refer to the software by name. For example, "I installed the chaosreader package from the Ubuntu repository and used it to extract data from the network trace" or "I got this command line from the tcpdump manpage."
It is also your responsibility to understand any material you use -- its purpose or functionality may be included in later assignments or tests.
Finally, copying or reusing the answers of another student and passing them off as your own will result in serious consequences. See http://www.d.umn.edu/vcaa/documents/Student_Conduct_Code.pdf for more information.
If you have any questions about this policy or how to make proper attribution, please contact your instructor/TA.
Use of Class Notes
Taking notes is a means of recording information but more importantly of personally absorbing and integrating the educational experience. However, broadly disseminating class notes beyond the classroom community or accepting compensation for taking and distributing classroom notes undermines instructor interests in their intellectual work product while not substantially furthering instructor and student interests in effective learning.
In particular:
Students may not distribute, via the Internet or other means, lecture notes or instructor-provided materials, except to other members of the same class or with the express written consent of the instructor.
This includes the solutions to homework, quizzes, exams and course projects.
For additional information, please see: http://www.d.umn.edu/vcaa/ClassNotesAppropriateUseof.html
Inclusion and Accessibility
Equal Opportunity
As instructor I shall make every attempt to treat all students equally, without regard to race, religion, color, sex, handicap, age, veteran status, or sexual orientation. Furthermore, I will not tolerate behavior that excludes or marginalizes anyone. I strongly encourage you to talk to me if you have any concerns regarding equal opportunity in the classroom. To inquire further about the University's policy on equal opportunity, contact the Office of Equal Opportunity (6827), 269-273 DAdB.
Students with Disabilities
It is my policy, and the policy and practice of the University of Minnesota Duluth to create inclusive learning environments for all students, including students with disabilities. If there are aspects of this course that result in barriers to your inclusion or your ability to meet course requirements -- such as time limited exams, inaccessible web content, or the use of non-captioned videos -- please notify the instructor as soon as possible. You are also encouraged to contact the Office of Disability Resources to discuss and arrange reasonable accommodations.
Please call 218-726-6130 or visit the DR website at http://www.d.umn.edu/access for more information.
Mental Health Statement
As a student you may experience a range of issues that can cause barriers to learning, such as strained relationships, increased anxiety, alcohol/drug problems, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may lead to diminished academic performance or reduce a student's ability to participate in daily activities. University of Minnesota services are available to assist you with addressing these and other concerns you may be experiencing. You can learn more about the broad range of confidential mental health services available on campus via the UMD Health Service Counseling website at http://www.d.umn.edu/hlthserv/counseling/
If you think these services might help you, I urge you to take advantage of them as soon as possible.
Appendix: Topic Lists
Virtualization
Chapter 2: Introduction
- processor instruction fetch
- processor instruction decode
- Von Neumann model of computing
- operating system (OS)
- virtualization as a concept
- virtualization as an OS principle (why, how)
- standard library
- OS as resource manager
- CPU
- difference between CPU registers and RAM
- virtualization illusion (e.g., virtualizing CPU, virtualizing memory)
- concept of an OS policy (e.g., scheduling, page replacement)
- physical memory as an array of bytes that can be read or written
- address spaces (virtual and physical)
- batch job processing
- hardware privilege level (e.g., user mode vs. kernel mode)
- system calls (and how they are called via traps and a trap handler)
- concept of CPU interrupts, exceptions and exception handlers
- multiprogramming
- memory protection
Chapter 4: The Process
- process and difference between process and program
- time sharing of a CPU
- concept of hardware mechanisms that support special OS behaviors (e.g., TLB, privilege levels, etc.)
- context switch and machine state (e.g., register context)
- instruction pointer (IP) or program counter (PC) [same thing, different terms]
- stack pointer and frame pointer
- process: create, destroy, run, wait, status
- process states: running, ready, blocked
- scheduled vs. descheduled
- blocked process
- how other processes can be run when one process is blocked
- memory segments: code, heap, stack (and what each is used for)
- process list
- concept of scheduling policies
- concept of CPU utilization percentage (percentage of time period the CPU was in use)
Chapter 5: Process API
- parent, child and zombie processes
- waiting for child processes
- fork(), wait() and exec() uses and differences
- return values of fork() and exec()
- process identifier (PID)
- concept of deterministic vs. probabilistic (non-deterministic) behavior
- standard input, output and error
- output redirection using >
- piping using |
- concept of process signals
Chapter 6: Limited Direct Execution
- time sharing the CPU (even a single-core CPU)
- concept of limited direct execution (baby-proofing the CPU)
- privilege level and user mode vs. kernel mode
- trap, return from trap, trap table
- system call
- yield system call
- scheduler and context switch
- cooperative multitasking (concept, when context switch occurs, pro and con, etc.)
- preemptive scheduling, timer interrupt and interrupt handler
- disabling interrupts (why would an OS ever do this?)
Chapter 7: Scheduling Introduction
- scheduling policies or disciplines
- process workload (re: scheduling)
- scheduling metrics: turnaround time
- First In, First Out (FIFO) a.k.a. First Come, First Served (FCFS) [scheduling policy]
- convoy effect
- Shortest Job First (SJF) [scheduling policy]
- notion of an optimal solution to a problem
- Shortest Time to Completion FIrst (STCF) [scheduling policy]
- scheduling metrics: response time (and averages)
- Round Robin (RR) [scheduling policy]
- time slice (i.e., scheduling quantum)
- notion of amortization
- notion of fair scheduling
- notion of trade-offs in scheduling (and OS in general)
- notion of overlapping operations (i.e., performing computation during I/O)
Chapter 8: Scheduling: The Multi-Level Feedback Queue
- Multi-level Feedback Queue (MLFQ) [scheduling policy]
- process starvation
- notion of "gaming the scheduler"
- priority boosting (with respect to [wrt] MLFQ)
- voo-doo constants
- queue-level accounting [process stays at level until its time is up -- anti-gaming strategy]
- notion of parameterizing a scheduler
Chapter 13: The Abstraction: Address Spaces
- notion of process interactivity
- memory protection
- address space
- segments: code, heap, and stack
- virtualizing memory, virtual memory (VM)
- virtual address, virtual address space
- VM transparency
- efficiency
Chapter 14: Interlude: Memory API
- stack memory, "automatic allocations"
- heap memory, "dynamic allocations"
- malloc() call and API
- notion of a type cast
- free() call and API
- automatic memory management, garbage collection
- segmentation fault
- buffer overflow
- uninitialized read
- memory leak
- use after free
- double free
- invalid free
- why memory doesn't leak after process exit
- purpose of break system call (brk() or sbrk())
- anonymous memory
- other calls: calloc(), realloc()
Chapter 15: Mechanism: Address Translation
- OS concept of interposing for control and mechanism
- address translation
- physical versus virtual addresses
- memory manager / memory managment unit
- process address space: code at base, adjacent to heap (growing up), free space, and then stack (growing down)
- transparent relocation of process in physical memory through use of virtual address translation
- dynamic relocation through base and bounds registers
- virtual address and physical address
- memory managment unit
- why changing base/bounds is a privileged instruction
- exceptions and exception handlers for illegal access
- necessity to track free space
- context switching in the context of base/bounds dynamic relocation
- internal fragmentation
Chapter 16: Segmentation
- memory segmentation - base/bounds pair for each segment of code, heap, stack
- segmentation fault
- necessity for additional base/bounds registers for segmentation approach
- identifying segment in question by high bits of virtual address (necessary for identifying which segment registers to use)
- tracking which way a segment grows (e.g., how to identify that stack segment grows down)
- how segmentation enables sharing of memory
- why sharing requires addition of protection bits for segments
- fine-grained vs. coarse-grained segmentation
- larger segment table instead of fewer segment registers
- os handling of segmentation on context switch
- managing free space
- external fragmentation
- compacting memory by rearranging segments
Chapter 17: Free Space Management
- why segmentation causes external fragmentation
- splitting and coalescing free regions
- notion of tracking free regions using a linked list (address and length)
- allocated buffer headers tracking length and holding magic number
- free space linked-list header with address and length
- how the free list is embedded within heap
- best fit memory allocation (pros and cons)
- worst fit memory allocation (pros and cons)
- first fit memory allocation (pros and cons)
- next fit memory allocation (pros and cons)
- segregated lists for common object types, including slab allocator
- buddy allocator
- scaling
Chapter 18: Paging: Introduction
- concept of page
- virtual page vs. page frames
- address translation of pages
- virtual address split into VPN (virtual page number) and byte offset into page
- how VPN bit length depends on number of VPs (e.g., 16 pages == 4 bits)
- how page size determines number of offset bits (e.g., 256-byte page == 8 bit offset)
- physical page number (PPN) or physical frame number (PFN) [same thing]
- address translation mechanism (e.g., Figure 18.3)
- concept of page tables and page table entries (PTE)
- concept that page tables are stored in OS RAM
- why page tables must be isolated from user processes (security, but why?)
- linear page table (indexed by VPN)
- PTE valid bit and meaning
- PTE protection bit and use
- PTE present bit and meaning
- meaning of page being "swapped out"
- access bit (or reference bit) for page replacement
- problems with paging (performance and page table size)
- page table base register (meaning and use)
Chapter 19: Paging: Faster Translations (TLBs)
- translation lookaside buffer (TLB), i.e., address translation cache
- TLB is part of memory management unit (MMU)
- that TLB can be hardware or OS managed (and why you'd care)
- purpose of TLB
- concepts of TLB hit and miss and their impact on performance
- spatial locality
- temporal locality
- page table base register (again)
- software TLB miss handlers and how to keep them from getting evicted from TLB
- fully associative (FA) cache vs. direct mapped and why TLBs are generally FA
- why TLB valid bit != PTE valid bit in terms of meaning
- context switching and the TLB
- flushing the TLB
- address space identifier in TLB entry (purpose)
- cache replacement (how to choose which cache lines to evict)
- least-recently used (LRU) (pros and cons)
- random eviction (pros and cons)
- process of handling a TLB miss
- why miss-causing instruction is retried
Chapter 20: Paging: Smaller Tables
- making page tables smaller using larger pages (why it works, pros and cons)
- making page tables smaller using multi-level page tables (MLPT)
- page directory (purpose and use)
- page directory entries (PDE) (how valid bit and PFN are used)
- notion of a "level of indirection"
- extracting page directory index (PDI), page table index (PTI) and offset from virtual address (like extracting VPN and offset, only now two indexes)
- MLPTs with more than two levels
- integration of MLPTs with TLB
Chapter 21: Beyond Physical Memory: Mechanisms
- memory hierarchy (CPU cache, RAM, disk)
- swap space on disk
- present bit in PTE indicating whether data in RAM or on disk
- data on disk can be in a file (named) or in swap (anonymous)
- page fault -- a "miss" on a page valid but not present in memory
- page fault handler -- OS code that fixes up page faults (why in OS and not HW?)
- that processes are blocked when they are having a page fault (see scheduling)
- concept of a page replacement policy
- interaction of present and valid bits
- free page pool high watermark / low watermark management
- swap daemon or page daemon purpose
- doing work in background idle time
Chapter 22: Beyond Physical Memory: Policies
- notion of memory pressure
- evicting / paging out pages
- page replacement policy (PRP)
- treating RAM as a cache for named or anonymous data
- cache hits / misses
- hit rate / miss rate
- notion of memory and disk access cost (in time)
- Average Memory Access Time (AMAT) and how to compute it
- optimal replacement policy -- furthest in future
- compulsory cache miss (first access of every page)
- PRP: FIFO (First In, First Out) (pros / cons)
- PRP: random eviction (pros / cons)
- PRP: LRU (Least Recently Used) (pros / cons)
- principle of locality: temporal
- approximating LRU using accessed bit and clock algorithm
- modified, i.e., dirty pages and use
- demand paging
- prefetching
- clustering / grouping swap writes
- thrashing
Concurrency
Chapter 26: Concurrency Introduction
- Basic concept of threads and thread creation
- How threads are created
- Problem of data shared between threads
- Unpredictability of thread scheduling
- Arbitrary interleaving of machine instructions
- Race conditions
- Determinism vs. indeterminism
- Critical section
- Mutual exclusion
- Concept of atomicity
Chapter 27: Thread API
I won't expect you to memorize the declarations of various functions, but I expect you to understand the basic mechanisms and most common parameters for these functions.
- How threads are created
- How the parent can wait for a child thread to complete
- Locking
- Condition variables
- How to compile POSIX thread code
Chapter 28: Locks
- Basic criticality issue of operations like add on a shared variable (e.g., counter++)
- How "lock" and "unlock" of a global mutex provides mutual exclusion
- How to place locking primitives in a simple code example
- How locks are evaluated: correctness, fairness, performance
- How atomicity can be provided by disabling interrupts and the serious drawbacks of this approach
- Function and use of assertions, e.g., assert(x == 0)
- Coarse vs. fine-grained locks -- issues and pros/cons
- Test and set (a.k.a. "Atomic exchange")
- How atomic CPU operations can be used to build locks
- Fetch and add and the operation of "ticket locks"
- Ways to avoid spinning: yielding CPU, wait queues
- Two phase (or "Dahm") locks
Chapter 29: Lock-based Concurrent Data Structures
- Review: data structures are things like counters, arrays, linked lists, hash table, queue, etc.
- Adding more threads can sometimes hurt performance. Why?
- Scalability -- we'd like additional threads to linearly improve performance
- Understand how locks typically reduce performance but improve correctness
- Sloppy counting -- why it is faster / what it's limitations are
- Concurrent linked lists -- potential approaches
- Hand-over-hand locking for linked lists
- Queue data structure and basic (global) locking
- Two locks -- enqueuing and dequeuing
- Hash table basics (and global locking)
- Hash table locking per bucket
Chapter 30: Condition Variables
- Condition variables: Signalling between threads / controlling thread execution
- Condition variable definition
- Waiting
- Signaling
- Difference between pthread_cond_t and pthread_mutex_t
- Pthread condition variable invocation
- Pthread condition variable lock/unlock behavior
- Join as an example condition variable use
- Why you should hold lock while signalling
- Producer / Consumer (bounded buffer) problem
- Why we use two condition variables (fill and empty) and a mutex
- Why we use while and recheck conditions instead of if (see fig. 30.8)
- Covering condition (why you might want to wake all threads) and pthread_cond_broadcast()
- Mesa semantics vs. Hoare semantics
Chapter 31: Semaphores
- Basic operation of semaphores
- Difference between sem_wait() and sem_post()
- Various ways that initial settings of semaphore determine behavior
- Binary semaphores, i.e., locks
- Semaphores as condition variables
- Semaphores for producer / consumer
- Notion of reader / writer locks
- Dining philosophers problem and solutions
Chapter 32: Common Concurrency Problems
- Deadlock definition and basics
- Atomicity violations
- Ordering violations
- How encapsulation can make deadlock harder to avoid (e.g., Vector v1.AddAll(v2) example)
- Know the four conditions for deadlock (and understand them)
- Definition of livelock and general issue
- How deadlock can be avoided by scheduling threads intelligently
- How deadlock can be avoided by "detect and recover"
Chapter 33: Event-based Concurrency
- Basic idea of event-based programming (EBP)
- Event loop concept
- How event-based programming uses kernel for concurrency
- How select() functions and what its parameters mean
- Why blocking is such a serious issue for EBP
- Asynchronous IO basics; use of AIO control block
- Why multicore CPUs make locking essential, even for EBP
Petri Nets
- Petri net basics
- Modelling a description of a concurrent system with a Petri net
Persistence
Chapter 36: I/O Devices
- How do we integrate I/O into systems? Mechanism? How to make it efficient?
- Memory, I/O and peripheral bus, description and difference (36.1)
- Why are busses hierarchical? (36.1)
- Disk interface vs. internal structure (36.2)
- Firmware (36.2)
- "The Canonical Protocol" and device (36.3)
- Use of status, command and data register (36.3)
- Polling and programmed I/O (PIO), definition and difference (36.3)
- How interrupts can improve CPU utilization (36.4)
- Interrupts and interrupt handlers (a.k.a. Interrupt service routines) (36.4)
- Overlapping CPU and I/O (36.4)
- Why/when interrupts are not always best / when polling best (36.4)
- Hybrid (or two-phase) approaches (36.4)
- Livelock (36.4)
- I/O coalescing by waiting before completing I/O (36.4)
- DMA / DMA Engines (36.5)
- I/O Interaction methods (36.6)
- Direct I/O / I/O instructions (36.6)
- Memory-mapped I/O (36.6)
- Why I/O instructions are privileged (36.6)
- Device drivers (36.7)
- How we use abstraction to hide implementation and details for drivers (36.7)
Chapter 37 - Hard Disk Drives
- How they work, how they are built, their interface, scheduling, etc.
- HDD Interface (37.1)
- Linear addressed sectors (512B blocks) with atomic access (37.1)
- Torn write (37.1)
- "Unwritten contract" adjacent numbers near each other on disk (37.1)
- Physical: Platter, spindle, arm, disk head, tracks with sectors (37.2)
- Typical disk speeds: 7200, 10,000, 15,000 rotations per minute (RPM) (37.2)
- Rotation(al) delay (37.3)
- If full delay is R, average is R/2 (37.3)
- Changing tracks - "seek time" (37.3)
- Settling time (37.3)
- Transfer phase and transfer time (37.3)
- Concept of "track skew" - what and why (37.3)
- Concept of multi-zone disks. (37.3)
- Disk cache (RAM) or "track buffer" purpose and use (37.3)
- Write back vs. write through caching (37.3)
- I/O Time: TI/O = Tseek + Trotation + Ttransfer (37.4)
- I/O Rate: RI/O = SizeTransfer / TI/O (37.4)
- Different workload types: random vs. sequential. Performance and impact. (37.4)
- Disk scheduling (37.5)
- Scheduling: Shortest Seek Time First (SSTF) (37.5)
- Scheduling: Starvation (37.5)
- Scheduling: Elevator (SCAN or C-SCAN) and variants (37.5)
- Scheduling: Shortest Positioning Time First (SPTF or SATF) (37.5)
- Why OS doesn't know everything about disk performance (37.5)
- I/O merging (37.5)
- Work-conserving / anticipatory disk scheduling (didn't use these names but concepts) (37.5)
- Disk performance modeling (like homework)
Chapter 38: Redundant Arrays of Inexpensive Disks (RAIDs)
- How RAIDs make a large, fast and reliable disk. Trade-offs?
- Concept of RAIDs
- Disk redundancy
- How/why RAIDs are transparent
- RAID logical vs. physical (38.1)
- RAID mirroring (38.1)
- Why RAID controllers have RAM and sometimes non-volatile RAM (NVRAM)
- Fail-stop fault model (38.2)
- Evaluating a RAID: capacity, reliability, performance (38.3)
- RAID 0 -- striping and concept of stripe (38.4)
- Large sequential and small random workloads for RAID 0 (38.4)
- Performance and capacity calculations for RAID 0 (38.4)
- RAID 1 -- Mirroring (38.5)
- How mirroring improves read performance (38.5)
- Why that's not possible for writing (38.5)
- Performance and capacity calculations for RAID 1 (38.5)
- Large sequential and small random workloads for RAID 1 (38.5)
- RAID 4 -- space savings with parity (38.6)
- How parity with XOR works; why we only need one parity disk (38.6)
- RAID 4 performance and capacity calculations (38.6)
- Calculating parity using XOR and any number of disks (38.6)
- "Small write problem" (bottleneck) for parity based RAID (38.6)
- RAID 5 -- rotating parity and benefits (38.7)
- How it partially solves the small-write problem. (38.7)
- RAID 5 performance and space analysis (like 4 but better) (38.7)
- RAID "hotspare" (38.9)
- Software RAID (38.9)
Chapter 39 - Files and Directories
- File API - How to store data persistently
- Files and directories (39.1)
- Low-level name -- inode (39.1)
- High-level name -- file name (39.1)
- Directory tree or hierarchy (39.1)
- "root" directory / and concept of subdirectories (39.1)
- Absolute vs. relative pathname (39.1)
- Directory separator (usually /) (39.1)
- Filename and extension. (39.1)
- Creating files creat() syscall used in open (39.3)
- Different modes to open files in (39.3)
- Notion of file descriptor (39.3)
- Reading and writing files (39.4)
- strace system call tracer (39.4)
- Library calls read(), write(), open(), close() (39.4)
- Seeking in files with lseek() (39.5)
- Syncing files with fsync() (39.6)
- Buffering writes (39.6)
- Dirty vs. clean pages for buffering / writeback (39.6)
- Renaming files (39.7)
- That rename() is atomic and how that is useful for temp files (39.7)
- Getting information about files with stat() or fstat() calls (39.8)
- Removing files and unlink() call (39.9)
- Directories and making them (39.10)
- Reading directories with ls (39.11)
- Deleting directories (39.12)
- Hard links (39.13)
- Concept of reference count or link count (39.13)
- Symbolic link or soft link and dangling references (39.14)
- Creating and mounting file systems (39.15)
Chapter 40 - File System Implementation
- Vsfs basics and operation
- Data structures required to manage disk (40.1)
- Access methods for disk -- how to read and write? (40.1)
- Data organization -- layout and blocks of sectors (40.2)
- Metadata: superblocks, inode bitmaps, data bitmaps, inodes and data blocks (40.2)
- What's in a superblock? (40.2)
- Why i-map and d-map are called "allocation structures"? (40.2)
- Purpose and use of index-node (40.3)
- Calculating offsets with inodes (40.3)
- Data stored in i-node (40.3)
- Direct pointers for data blocks (40.3)
- Indirect pointers (single and double level) (40.3)
- Calculating max file size based on block size and number/type of pointers (40.3)
- Concept of extents and difference from block addresses (40.3)
- Directory organization (40.4)
- unlink() on directory; fragmentation in directory; reserved "invalid" i-node (40.4)
- Managing free space with bitmaps (40.5)
- Pre-allocation (40.5)
- Access paths (40.6)
- Operations required to read/write/create files -- see figures (40.6)
- Caching and buffering disk traffic (40.7)
- Fixed size cache and use of LRU for page eviction (40.7)
- Dynamic partitioning and the unified page cache (40.7)
- Batching reading and writing (40.7)
- Scheduling I/O to maximize FS performance (40.7)
- Direct I/O versus buffered writes (40.7)
- Disk operation modeling like from homework (e.g., Forwards vs. Backwards) (40.7)
Chapter 41 - Locality and Fast File System
- How fragmentation and other issues killed UFS (Unix File System) performance (41.1)
- Internal fragmentation in disk blocks versus external on disk (41.1)
- Cylinder groups and their modern version, block groups. Why they exist, how they improve performance (41.3)
- Policies: how grouping files in block groups helps performance (41.4)
- Concept of file locality by name and how it relates to disk (41.5)
- Large file exception (41.6)
- Sub-blocks (41.7)
- Concept of parameterized placement (41.7)
Chapter 42 - Crash Consistency: FSCK and Journaling
- Basics of crash consistency and issues (42.1)
- File system checker operations (42.2)
- Journaling (Write-ahead Logging) (42.3)
- Journaling recovery (42.3)
- Batching updates (42.3)
- Finite log entries (42.3)
- Journaling metadata only and tricky cases (42.3)
- Other approaches, e.g., COW (42.4)
Chapter 43 - Log-Structured File Systems (LFS)
- Reasons for developing LFSs (43.0)
- LFS core idea (43.0)
- Converting writes (including inodes) to sequential writes (43.1)
- Write buffering in LFS (43.2)
- Amortizing positioning cost by varying buffer size (43.3)
- Finding inodes when they're scattered around the disk (43.4-5)
- How do you find the inode map pieces? (43.6)
- Basic process of reading from LFS (43.7)
- Adding directories to sequential writes (43.8)
- Garbage collecting of "dead" blocks / compacting (43.9)
- Versioning filesystem, concept (43.9)
- Determining block liveness via segment summary block (43.10)
- Choosing blocks to clean, policy (43.11)
- Crash recovery with LFS (43.12)
- Modern LFS details (43.13)
Chapter 44 -- Data Integrity and Protection
- Data integrity and protection, concepts (44.0)
- Disk failure modes and types (44.1)
- Disk failure types by disk cost (44.1)
- Handling latent sector errors (44.2)
- Basic concept of checksums and algorithms to detect errors (44.3)
- Where to store checksums on disk (44.3)
- Using checksums to detect errors, walkthrough (44.4)
- Misdirected writes (44.5)
- Lost writes (44.6)
- Scrubbing the disk (44.7)
- Checksum overheads and basics (44.8)
Chapter 47 -- Distributed Systems
- Basic concept (47.0)
- Goals: combat failure, performance, security (47.0)
- Main mechanism: network communication (47.0)
- Communication basics (47.1)
- Unreliable communication (47.2)
- Reliable communication (47.3)
- Communication abstractions -- DSM (47.4)
- Communication abstractions -- RPC (47.5)
Chapter 48 -- Sun's Network File System (NFS)
- NFS basic idea (48.0)
- Goals of basic distributed file system (48.1)
- NFS as an open standard (48.2)
- Simple and fast crash recovery and statelessness (48.3-4)
- NFSv2 -- implementing a stateless protocol (48.5)
- NFS operational overview (48.6)
- Idempotent operations (48.7)
- Client-side caching (48.8)
- Cache consistency, update visibility, staleness, flush-on-close, attribute cache, etc. (48.9)
- Cache consistency reliant on implementation (48.10)
- Server-side write buffering and when WRITE returns success (48.11)
Chapter 49 -- The Andrew File System (AFS)
- AFS basic goal: scalability (49.0)
- AFSv1 and whole file caching (49.1)
- Problems with version 1 (TestAuth and path traversal) (49.2)
- AFSv2: callbacks, file identifier, caching directories (49.4)
- Cache consistency in AFS (49.5)
- Coping with crashes and caching in AFS (49.6)
- Scale, performance, and comparison to NFS (49.7)
- Other improvements in AFS (49.8)
Complexity and OS
- Basic concept of complexity
- Importance of complexity of OS
- OS algorithms impacted by complexity
- Requirements for creating O(1) algorithms
Other
Writer's Workshop
If you have difficulty writing, please consider visiting the Writers' Workshop at UMD. They can help you with any writing project you might have.
Acknowledgments
Some policy text used or adapted from the following sources (with permission):