Notes:

  1. All assigned reading should be completed before class.
  2. All homework is due before class starts.
  3. All homework is available on blackboard, and should be submitted via blackboard.
  4. All project reports are due before class, and should be submitted via blackboard.

1/16: 01 - Introduction (D)

This lecture will motivate the course, introduce the curriculum, talk about expectations, and answer any questions. The underlying theme in this class is software vulnerabilities and defenses. We discuss this theme in the context of three software security topics:

  1. Insecure to secure programming languages. We start by examining code that is completely unprotected (machine code), and show how adding various security mechanisms either eliminates vulnerability classes or reduces the chances of successful exploitation. This topic culminates with type safe languages.
  2. Information Flow. This topic investigates how security-critical information may flow through a program, and the sorts of vulnerabilities that may arise.
  3. Software Model Checking. In this class, software model checking includes (broadly speaking) three areas: analysis in the compiler for finding bugs (goals: speed, large code bases, automatic), model checking ala Ed Clarke (goals: precision, automatic), and proving systems correct (goals: proofs of correctness, semi-automatic).

Reading: Reflections on Trusting Trust
The reading is Ken Thompson's Turing Award lecture, which is considered a seminal paper in security. The main thing to think about when reading this paper is what it means to trust software.

Slides: Introduction

1/18: 02 - From Source to x86 Execution (D)

This lecture covers how a program written in C is compiled to binary code, and how that binary code executes on the processor. We will cover compilation, the basic machine model, and how programs load and run. This lecture sets the stage for buffer overflows, format string vulnerabilities, and various OS defenses covered in later lectures.

Reading: x86 Assembly Guide This is basic introductory material that covers how programs written in C get compiled down to executable code. A more thorough resource is "Computer Systems, A Programmers Perspective" by Randal Bryant and David O'Hallaron (which is used in 15-213 here).

Slides: Compilation

Note: There is a lot of reading for the next lecture. You may want to get ahead by reading it early.

1/23: 03 - Control Flow Hijack: Buffer Overflows and Format String Attacks (M)

This class will discuss the basics of control flow hijack. We examine two popular ways to hijack execution flow of a program: stack-based buffer overflow and format string vulnerabilities. These are examples of channeling vulnerabilities, which arise when you mix control and data into a single channel. In the case of stack-based buffer overflows, the control data is return addresses, which are mixed with function variables. Format strings mix format parameters, which control argument parsing, with data on the stack.

Key Concepts: Code injection, control flow hijack, buffer overflow, format string attack, channeling vulnerability.

Reading:

  1. Smashing the stack for fun and profit, by Aleph One
  2. . This is the classic paper on buffer overflows. It goes into detail about how the stack works, how arguments are passed (this should be review from our last lecture!), and the basic stack-based buffer overflow. Everyone who calls themselves a computer security professional should have read this paper.
  3. Exploiting Format String Vulnerabilities, by Team Teso. A very good introduction to how varatic arguments work in C, and how varatic functions are exploited in format string attacks.
  4. Smashing the stack in 2011, by Paul Makowski. Since publication of Aleph's paper, there have been numerous proposed defenses and some changes in terminology. Paul's blog touches on many of the important issues. Some of the things mentioned, such as ASLR, NX, and Pro-Police will be covered in-depth in subsequent lectures. Here you'll be introduced to the concepts to put them in perspective.

Slides: Control Flow Attack

1/25: 04 - In-class Exercise: Exploits (D)

1/30: 05 - Defenses Against Control Flow Hijack Attacks: Canaries, DEP, and ASLR (M)

In this class we will discuss modern defenses against control flow hijack attacks. Stack canaries were introduced in 1998 by Cowan et al as a way to detect when a return address is overwritten (Original paper, which is not required reading). Address space layout randomization (ASLR) makes it hard for an attacker to guess addresses in memory, which is traditionally used for control flow hijacks that perform code injection. Data Execution Prevention (DEP) is designed to also help prevent code injection attacks by making it so memory pages cannot both be writable and executable. DEP and ASLR are currently used in major OSes (Ubuntu, Windows, OS X), and canaries are available in most compilers. We will also look at weaknesses in these defenses.

Key Concepts: ASLR, DEP, and canaries. Important points include canaries require recompilation of the code, thus require developer participation. ASLR and DEP are OS mechanisms, and do not require developer participation. We will also investigate the theoretic security offered by ASLR.

Reading:

  1. ASLR Smack and Laugh Reference, by Tilo Muller. This paper describes several ways to bypass ASLR and DEP techniques. Some of these techniques will be needed in your homework assignment.
  2. Design of ASLR in PAX. This describes how ASLR is implemented in Linux, and touches on the security guarantees offered. This is a short and sweet read. I'd also recommend for those interested in exploitation to read a much more thorough analysis in On the Effectiveness of Address Space Randomization by Shacham et al (Shachem is not required reading).

Slides: Control Flow Defense, Security of ASLR

2/1: 06 - Return-oriented programming (Ed)
This is our last lecture in exploiting C programs. We cover Return Oriented Programming (ROP), a technique for control flow hijack that does not require code injection. The main idea is that instead of injecting shellcode, we find snippets in the vulnerable program to implement our shellcode. For example, return-to-libc (from the Smack and Laugh reference) can be thought of as a type of return-oriented programming attack where we simply return into "system(/bin/sh)".

Key Concepts: Return-oriented programming, gadgets.

Reading: The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86), by Shacham et al. This paper is one of the first descriptions on how to automate ROP attack creation. Some of the latest research techniques for automating Q can be found in the optional (not required!) paper Q: Exploit Hardening Made Easy, by Edward Schwartz et al.

Slides: Return-Oriented Programming

2/6: 07 - Control Flow Integrity and Software Fault Isolation (M)

Control flow integrity (CFI) is a security principle that says the control flow you see in C should be the control-flow you get when you execute. (This is the first security principle we have a name for so far in this class!) CFI requires that we first analyze the source to see what control flow is intended, and then instrument the binary to ensure that execution follows the source. We'll cover control flow analysis of source, which includes concepts like path sensitivity, flow sensitivity, and context sensitivity. Since CFI is intended to prevent attacks like control flow hijack from working, it also needs to make sure the instrumentation can't be bypassed by a buggy program.

We'll also discuss Software Faulty Isolation (SFI), which is a way to enforce protection domains so that a fault in one domain will not break another domain.

Key Concepts: Control flow integrity, path sensitivity, flow sensitivity, and context sensitivity. Software fault isolation. (We'll see the sensitivities later on again when we look at software model checking.)

Reading:

  1. Control Flow Integrity: Principles, Implementations, and Applications, by Abadi et al. The main thing to focus on is what CFI is, and how CFI is enforced. For example, you should be able to answer to yourself why the CFI enforcement mechanism will not be bypassed even if there is a buffer overflow. Although this paper's application setting is on native binaries (no source code required), this is a bit of a red herring in my opinion. In class, we'll consider this technique as implemented in the back-end of a compiler. (If you are interested in this topic, a more thorough description is in the journal version of the paper.)
  2. Efficient Software-Based Fault Isolation, by Wahbe et al. This paper describes the main ideas in SFI. Although the paper is old, the techniques are still used today (see the optional reading below). The main things to read for are how enforcement works, and why it is secure. For example, ask yourself how safety is ensured even if there is a buffer overflow that overwrites random things on the stack.
  3. Optional: Native Client: Portable, Untrusted x86 Native Code, by Yee et al. This describes the Google NaCL project for running native code in the browser. The main tie-in is the inner sandbox, which implements SFI.

Slides: Control Flow Integrity

2/8: 08 - Safe C Pointers (D)

A central problem in attacks so far is that a pointers can read and write out-of-bounds. So why don't we just check to make sure pointers are in-bounds? Good question! In this lecture, we will cover basic research that investigates adding bounds-checking to C. This is the final lecture on how to make the C language safer.

Key Concepts: Understanding fat pointers, referent objects, and three key challenges for bounds checking C:

  1. Is bounds checking fast?
  2. Is it backward-compatible with existing code?
  3. Does it always work?

Indeed, these are three of the most important design space points in all the defenses we've looked at so far. You should think to yourself how canaries, DEP, ASLR, etc. compare.

Reading: Backwards-compatible Array Bounds Checking for C with Very Low Overhead, by Dhurjati and Adve. This paper presents the culmination of the line of thought for adding bounds checking going back to 1997. It describes previous techniques, there limitations, and a new technique that offers (relative to previous work) improved backward compatibility, speed, and accuracy. You should ask yourself whether CFI is needed if we have safe pointers, and vice-versa.

Slides: Safe Pointers

2/13: 09 - Type Safety - Theory (M)

"All security problems would go away if people just coded in type safe languages." You may have heard someone say something like this. What does type safety mean? Surely if the code is "safe", it is secure, right?

We first introduce a scientific notation used in language research called inference rules. Inference rules are a handy shorthand for describing algorithms over language statements, expressions, and so on. We'll then cover what type safety means at a high level, and the type of security errors it catches.

As a warning, traditionally students have felt type safety was a difficult concept. We'll do our best to simplify it, but you should also do your homework understanding the reading (a quick skim will not be adequate if you don't have a background in type systems already).

One caveat: type safety can be a very broad concept. In this class we'll consider type safety ala Java (no NP-complete type checking for us!).

Key Concepts: Inference rules, type safety. The sorts of errors type safety catches, and the sorts of errors it does not.

Reading: Type Systems, by Luca Cardelli, Chapters 1-4. Cardelli's paper gives a detailed overview of type theory. For some of you the paper may be very dense. This is normal. I would love an overview more tailored to the security audience, but alas, I've yet to find one. In my opinion the best way to really grok type checking is to learn (at least a little bit) of functional programming in a language like SML, OCaml, or Lisp.

For those of you without a functional programming background, here are a few terms to keep in mind:

  1. A side effect changes system state. For example, the assignment x = 2 changes the system state so that the cell for the value x stores the number 2.
  2. An expression is something that returns a value, like 1+2, a*b, etc. In functional languages, as typically described in type theory, everything is an expression. Expressions do not have side effects.
  3. A statement does not return a value, and done purely for side effects. A purely functional language would typically not have statements, only expressions.
  4. The lambda notation. λx.e represents an unnamed function definition where x is the parameter, and e is the function body. If we give the name "foo" to the function, the above roughly represents: foo(x){ return e;}

If you are interested in learning more about type theory, there is no better place than CMU. The computer science department offers many, many courses from luminaries in the field.

Slides: Type Theory

2/15: 10 - Type Safety - Applications (M)

We take a glimpse out how type theory is useful at formally defining safety problems in programming languages. Specifically, we look at integer overflow. The C99 standard defines integer operations in a very weird way involving promotions, overflow, and undefined behavior. This paper shows how these properties can be checked via a well-defined type system. The type system is used to find places where the code may be unsafe (we say may because the analysis is conservative). At each potentially unsafe location, the code is instrumented to check at runtime for violations. The main thing to understand is how the typing rules work. For example, ask yourself whether int8_t is a subtype of int16_t, or vice-versa (and know why).

Key Concepts: Integer overflow, subtyping

Reading:

  1. Efficient and Accurate Detection of Integer-based Attacks, by Brumley et al. The main things to read for are how the typing rules are used to infer where additional checks are needed. Spend some time thinking about them. It formalizes the idea of casting, integer overflow, etc., which you may encounter in other classes (such as the secure programming).
  2. Optional: CCured in the real world. This paper describes efforts to retro-fit type safety into C. It discusses some lessons learned over their initial implementation. This is the second in a series; reference [15] is the first. I have the second listed here because its a bit easier to read. However,, if you're interested in the area, you should read both this and reference [15] from the paper.

Slides: Type Safety

2/20: 11 - Project presentations

Students will present their proposed projects. I'm looking to understand the problem domain, why it's interesting, and a nugget of an idea. Each presentation gets 5 minutes, so you're going to need to practice to make it effective given the short time.

2/22: 12 - Introduction to Information Flow (D)

So far we've covered attacks which target unsafe language constructs. Are attacks possible even in safe languages? Yes! Information flow is one example. This is the first of several lectures that cover information flow in theory and practice. A central idea in security is we want to keep secret things secret. Information flow is how we express this formally.

Key Concepts: Non-interference, static information flow, dynamic information flow, taint analysis, lattice.

Reading:

  1. Introduction to Static Information Flow. These are course notes.
  2. Language-base Information Flow Security, by Sabelfeld and Myers. Please read sections III. This paper is considered the definitive overview of information flow. Every time I read it I learn something new.

Slides: Information Flow

2/27: 13 - Dynamic Taint Analysis (M)

The key idea in this class is how we can keep track of taint as the program runs. At a high level, taint analysis introduces a "taint" bit from untrusted input sources. The result of any computation that uses a tainted variable is also tainted. A warning/error is raised if taint flows to an untrusted taint sink. The main challenges are (a) how do we accurately keep track of taint, and (b) efficiency. (a) depends upon the particular taint rules used; in the reading the "tainted index" policy is discussed. Why is this conservative, and what does it mean for false positives and false negatives? (b) typically depends upon the language analyzed, e.g., in TaintCheck overhead is high because taint analysis is performed on a compiled binary and thus requires interposition at each execution step, while in TaintDroid the overhead is low because it can be incorporated into the Java interpreter/JITer.

Key Concepts: Taint analysis can be used to catch buffer overflow attacks, and detect information leakage in android applications. The "tainted index" rule for propagating taint is key, as is the fact that dynamic taint analysis (and dynamic information flow in general) does track "control flow" taint.

Reading:

  1. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software, by Newsome and Song. Pay close attention to the "tainted index" rule, as well as the overhead. Ask yourself why you would get this sort of overhead in a binary-only taint analysis setting. Also think about false positives, where the proposed analysis would incorrectly say there is an attack based upon taint analysis.
  2. TaintDroid: Information Flow Tracking System for Realtime Privacy for Smartphones, by Enck et al. Notice how much better the performance is in Java. Why? Also think about how you could create a malicious application that leaks private information that is not detected by TaintDroid.

Slides: Dynamic Taint Analysis

2/29: 14 - Timing Attacks and Covert Information Flow (D)

We discuss why implementation matters in security in the context of information flow. Although we (think) RSA is secure theoretically, implementing just the most efficient algorithms, as done in OpenSSL, leaves a gaping security hole. Note in class we will discuss other timing attacks than just the one in the reading.

Key Concepts: There are two main takeaway points I'd like students to get. First, even cryptographically secure algorithms may not have secure implementations. Second, timing can leak a lot of information.

Reading: Remote Timing Attacks are Practical, by Brumley and Boneh. The main thing to understand is (a) the algorithm used to implement RSA operations leaks information, and (b) the leak can be leveraged in a binary search to recover the secret key. You should understand exactly what is being broken in RSA (is it the private key, factoring, etc.)?

Slides: Covert Information Flow

3/5: 15 - Pre-midterm summary (D & M)

No planned material. Ask any question you want.

3/7: 16 - Test 1

In class, closed book, closed note, closed neighbor.

3/12, 3/14: Spring Break

3/19: 17 - Midterm Answers (D)

3/21: 18 - Malware Analysis: BitShred (Jiyong)

In this class we talk about malware. We will discuss some of the challenges to malware reverse engineering. We will also discuss scaling malware analysis. The main motivation is that we need techniques that can scale to the thousands of new malware discovered each day.

Key Concepts: Feature Hashing, Bloom Filters

Reading: BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis, by Jang et al.

Slides: Malware Analysis, BitShred

3/26: 19 - Software Model Checking Overview (M)

This class introduces the notion of software model checking. In this class, we use software model checking as an umbrella term that contains static analysis (fast and automatic), temporal model checking (slow and automatic), and proving correctness (semi-automated). We also introduce the notion of a safety property, and describe which safety properties are enforceable (per the reading).

Key Concepts: Safety property, enforceable safety property

Reading:

  1. Enforceable Security Policies, by Schneider
  2. Software Model Checking, by Jhala and Majumdar. Optional. I really recommend you read at least section 1, though.

Slides: Software Model Checking

3/28: 20 - Symbolic Execution: Basic Algorithm (D)

In this lecture we discuss the basic algorithm for symbolic execution. Symbolic execution is similar to concretely executing a program on a real input, but instead of producing a value, it produces a formula. The main thing to pay attention to is how the algorithm (in terms of the rules given) work, and the challenges of symbolic memory/symbolic indices.

Key Concepts: symbolic execution

Reading: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask), by Schwartz et al. (you need not read section 3)

Slides: Symbolic Execution Algorithm

4/2: 21 - Symbolic Execution: For Defense (M)

This paper presents how symbolic execution can be used to create input filters, aka signatures for an IDS system. The main thing to understand is a) why the signatures should have zero false positives, and b) the challenges of scaling to cover all paths to a vulnerable point in the code.

Key Concepts: input filters

Reading: Bouncer: Securing Software by Blocking Bad Inputs, by Costa et al.

Slides: Symbolic Execution Defense

4/4: 22 - Symbolic Execution: For Offense (Thanassis)

This paper presents automatic exploit generation using symbolic execution. In order to scale symbolic execution to this particular domain, the authors present preconditioned symbolic execution. Preconditioned symbolic execution helps focus path exploration to the likely exploitable parts of the code.

Key Concepts: automatic exploit generation, preconditioned symbolic execution

Reading: Automatic Exploit Generation, by Avgerinos et al.

Slides: Symbolic Execution Offense

4/9: 23 - Proof Carrying Code (PCC) (M)

PCC introduces a technique for proving that code is safe to execute. This is a foundational paper in the field. The main goal of this paper is to understand what proof carrying code is, what is meant by a "proof" in this context, and the difference in work between the producer and verifier.

Key Concepts: proof carrying code

Reading: Proof Carrying Code, by George Necula

Slides: Proof Carrying Code

4/11: 24 - Temporal Model Checking Software (M)

Overview

Key Concepts:

Reading: Model Checking One Million Lines of C Code, by Chen et al.

Slides: Software Temporal Model Checking

4/16: 25 - A tale of Coverity (D)

Coverity was founded on the basic ideas set for in today's reading. The system builds a control flow graph, and then scans for common patterns indicating bugs. The first paper describes these techniques, and the sorts of bugs they found. The second paper deals with the lessons learned as the research matured into a product. The optional paper covers an additional nice idea on how bugs can be discovered by looking for contradictions in the code.

Key Concepts:

Reading:

  1. Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions, by Engler et al.
  2. A few billion lines of code later: using static analysis to find bugs in the real world, by Bessey et al
  3. OPTIONAL: Bugs as Deviant Behavior: A general approach to inferring Errors in Systems Code, by Engler et al.

Slides: Tale of Coverity

4/18: Project Time (No Class)

4/23: 26 - Review (D & M)

Slides: Summary

4/25: 27 - Test 2

4/30: 28 - Group Presentations

5/2: 29 - Group Presentations