Dear Miao Yu,

The 35th IEEE Symposium on Security & Privacy (Oakland 2014) program
committee would like to inform you that your paper #336 has been
conditionally accepted to appear in the conference. We will contact you
with the name of a shepherd who will inform you of the conditions for
acceptance.

       Title: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O
              on Commodity Platforms
     Authors: Zongwei Zhou (Carnegie Mellon University)
              Miao Yu (Carnegie Mellon University)
              Virgil Gligor (Carnegie Mellon University)
  Paper site: https://www.infsec.cs.uni-saarland.de/oakland2014/paper/336?cap=0336azB82R1TDgPA

Your paper was one of 44 accepted (which includes the conditionally
accepted papers) out of 324 submissions. Congratulations!

Reviews and comments on your paper are appended to this email. The
submissions site also has the paper's reviews and comments, as well as more
information about review scores.

Contact the site administrator, Oakland 2014 Chairs
<oakland14-pcchairs@ieee-security.org>, with any questions or concerns.

===========================================================================
                         Oakland 2014 Review #336A
                     Updated 14 Dec 2013 1:59:58am CET
---------------------------------------------------------------------------
  Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O
              on Commodity Platforms
---------------------------------------------------------------------------

                      Overall merit: 3. Weak reject
                Reviewer confidence: 4. High
                        Correctness: 3. Only minor technical problems
               Presentation quality: 4. Good
                            Novelty: 3. Unsurprising next step

                         ===== Paper Summary =====

This paper presents the design, implementation and evaluation of a wimpy kernel for providing on-demand isolated I/O channels to wimpy apps.  The wimpy kernel runs on top of a micro hypervisor similar to XMHF.  To minimize the size and complexity of the wimpy kernel, most driver functionalities for the on-demand I/O are put either in the wimpy apps or in the untrusted OS.  The wimpy kernel also mediates the communication between the wimpy apps and the OS which is done via IPI and shared memory.

                        ===== Paper strengths =====

The paper tackles an important and hard problem: securely sharing devices between trusted and untrusted VMs (or compartments).   This is because someone has to maintain the shared device's state.  If the state is maintained by the untrusted part, then it may lead to various attacks such as those pointed out by the paper.  If the state is maintained by the trusted part, it increases the size and complexity of the TCB at least.  The solution for USB proposed in this paper seems to work, which is impressive.

                       ===== Paper weaknesses =====

The paper doesn't make it clear that how to securely share device is device specific.  The whole paper focuses on USB, but the paper doesn't point out what properties USB has allows such secure device sharing.

Another claimed contribution is to move functionalities to either user apps or the untrusted OS.  However, the paper doesn't make any attempt to explain what principles were used when such a decision was made and how they can be generalized.  In some sense, the paper isn't for on-demand isolated I/O in general but for how to share USB devices securely.

              ===== Justification of the overall merit =====

Some detailed comments:

1. It's not clear to me why the virtual file system code can be removed from both WK (the wimpy kernel)  and wimp apps (Page 5).

2. WK periodically monitors the port status.  What if anything malicious happens between the checks?

3. typo: "Whit" on page 7

4. WK verifies the results.  Does it guarantee both memory safety and semantic correctness?

5. typos: missing "." before "Due to"; ", (and) a WK-app"; "it do not" on page 9.

6. How exactly did you choose the 3 interfaces that should be implemented by the WK?

7. It's not clear if the I/O channel becomes the bottleneck in the I/O benchmarks.  If so, the experiments really measure the bandwidth not the overhead.

===========================================================================
                         Oakland 2014 Review #336B
                    Updated 16 Dec 2013 11:27:28pm CET
---------------------------------------------------------------------------
  Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O
              on Commodity Platforms
---------------------------------------------------------------------------

                      Overall merit: 4. Weak accept
                Reviewer confidence: 3. Average
                        Correctness: 4. No obvious problems
               Presentation quality: 4. Good
                            Novelty: 3. Unsurprising next step

                         ===== Paper Summary =====

This paper presents two core ideas (1) the idea of a wimpy kernel that
runs along side a commodity OS to implement backend driver code; and (2)
privilege separation of Linux's USB controller interface using (1). A
key insight of the design for (1) is the use of InterProcessor
Interrupts (IPIs), which reduces the number of context switches required
for the proposed architecture by about two-thirds (2 instead of 6). This
design assumes a dedicated processor for the Wimpy Kernel and Wimp apps.

                        ===== Paper strengths =====

The use of IPIs in place of context switches appears to be a novel
contribution for security applications. Researchers frequently blame
context switches for poor performance that keeps commodity platforms
from using privilege separation.

Privilege separating the USB controller to provide trusted operation of
the USB operation provides security value. The authors note that this
was previously done for the PCI controller.

Using the wimpy design, the authors are able to keep a significant
amount of code in the untrusted OS.

The authors describe how the TEE (wimpy kernel) can verify inputs from
the untrusted OS. The inclusion of a verification algorithm is valuable
and fundamental to a privilege separated architecture such as this.

                       ===== Paper weaknesses =====

Isolating code for trusted execution is a saturated space. The specific
research contributions of this work are sliced thin.

The privilege separation process was a rather manual engineering task.
(however, it is rather useful and insightful as to how to use Wimpy
properly).

It's not clear how tied Wimpy is to USB. The authors note that [58]
provides something similar for PCI. Does a verification algorithm exist
for PCI? What about other IO buses? The verification of operation in the
untrusted OS a critical assumption of the architecture.

What happens when other IO buses get moved to Wimpy? Right now, Wimpy is
just USB, and there are Wimp apps for each USB device. When we add PCI,
Firewire, Bluetooth, etc., more and more code is shifted into the
privileged process. A similar problem happened with microkernels:
everything was moved into a single UNIX process, and a vulnerability in
that process affected the other functionality. I think that Wimpy may be
resilient to this due to the way that code is inherently privilege
separated; however, the paper could use more discussion of the grand
vision for Wimpy that extends beyond just USB.

The authors claim that the IPI-based design has a significant impact on
their end design; however, this is never demonstrated. USB is a
relatively slow bus. Would the extra context switches be hidden by the
device latency or the code in the wimp app? How frequently does the
optimization get executed (i.e., Ahmdal's law)?

The performance evaluation is insufficient. The authors look at CPU
and IO macrobenchmarks; however, the only USB device that is considered
is a USB keyboard. The evaluation really should have (1) a USB storage
device, and (2) a USB network interface, and then perform storage and
network IO benchmarks. Furthermore, given that only a USB keyboard
device driver was available in the prototype, Wimpy may not very simple
to port existing drivers. In which case, Wimpy may not be having the
last dance ...

              ===== Justification of the overall merit =====

There is some novel contribution in the architectural design of Wimpy.
It is useful to explore different ways of accomplishing tasks. The use
of IPIs to reduce the implications of context switches is neat. That
said, one of the biggest weaknesses is the performance evaluation that
does not consider storage and network USB devices, which would be the
real test of whether or not Wimpy is practical.

Other Comments:

- In Section 4.4, the paper discusses that Wimp apps encrypt data and
  store it in the untrusted OS. What key management enables this?

===========================================================================
                         Oakland 2014 Review #336C
                     Updated 8 Jan 2014 4:10:14pm CET
---------------------------------------------------------------------------
  Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O
              on Commodity Platforms
---------------------------------------------------------------------------

                      Overall merit: 3. Weak reject
                Reviewer confidence: 3. Average
                        Correctness: 3. Only minor technical problems
               Presentation quality: 2. Fair
                            Novelty: 4. Surprising new idea or result

                         ===== Paper Summary =====

This paper presents a reduced TCB software architecture for isolating
I/O channels (e.g., USB devices) for applications. A micro,
non-virtualizing hypervisor (modified XMHF) isolates a light wimpy
kernel and the app from an untrusted OS. The app runs on the wimpy
kernel, which verifies that the app has exclusive access to any
devices it needs (the devices must be requested in advance when the
app is setup). The device handling subsystem code is split up between
the untrusted OS, the wimpy kernel and the app, with the goal of
minimizing the TCB. The paper shows how this can be done for a custom
USB subsystem. The wimpy kernel also provides an asynchronous, fast
message passing interface from the app to the untrusted OS kernel for
all other services. An implementation is evaluated in terms of its TCB
size and its performance on micro- and macro- benchmarks.

                        ===== Paper strengths =====

The idea is interesting and the paper demonstrates a reasonable
reduction in the TCB for the USB subsystem. The idea of using a wimpy
kernel to verify untrusted OS subsystem functionality of the paper
appears to be novel.

                       ===== Paper weaknesses =====

- The paper focuses exclusively on the USB subsystem. There is no
  evidence that the idea will scale to any other subsystem. My
  specific concern is the verification performed by the wimpy
  kernel. The algorithm in paper is very specific to USB.

- The implementation is incomplete and lacks a key component:
  isolation. That is a central point of the paper. While it is
  reasonable to assume that the isolation as described can be
  implemented, the evaluation, as it stands, does not cover the
  isolation aspect.

- The evaluation is rather preliminary and seems to have been written
  in a hurry. The baseline for Figure 6 (microbenchmarks) is
  TrustVisor. Why? The baseline should be an OS-only implementation
  since everything else has been added in order to make the I/O
  isolation work. There is also a confusing sentence on page 12 which
  indicates that the implemented system relies on the IOMMU primitives
  of XMHF. In Section 5.1, you said that you modified the IOMMU
  primitive of XMHF.
 
  There is no baseline for syscall latency. How does your latency
  compare to standard syscalls?

  The macrobenchmarks are rather non-informative. One of the most
  important aspects --- effort needed to port a standard app to a
  wimpy app --- is completely missing. This should be clearly
  reported. The Apache experiment with 200,000 requests (the only real
  macrobenchmark) is also rather small. Figure 7 only shows normalized
  values of unknown metrics. What did you measure in each benchmark?

Presentation:

- Section 4.5 should to be presented earlier. It is the high-level
  picture that links the whole system. 

- Section 5.2.1: Paragraph 2 onward talks of some steps that lack a
  context.

              ===== Justification of the overall merit =====

I like the paper's main idea but the work is partly incomplete and
does not include a comprehensive evaluation.

===========================================================================
                         Oakland 2014 Review #336D
                     Updated 10 Jan 2014 6:47:22pm CET
---------------------------------------------------------------------------
  Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O
              on Commodity Platforms
---------------------------------------------------------------------------

                      Overall merit: 4. Weak accept
                Reviewer confidence: 2. Fair
                        Correctness: 3. Only minor technical problems
               Presentation quality: 4. Good
                            Novelty: 4. Surprising new idea or result

                         ===== Paper Summary =====


The paper presents a software architecture that is used to keep a small TCB
while being able to provide on-demand I/O channel isolation when particular
applications (e.g., trusted applications) require access to I/O devices. The
main idea is to use a small (wimpy) kernel alongside the normal untrusted OS
running on top of a micro hypervisor. The wimpy kernel will remain small as most
of the I/O communication will be taken care by either the wimpy applications
(the driver components) or the untrusted OS (basic device management).

                        ===== Paper strengths =====


The presented system does not have any major problem. Furthermore
the system is implemented through a prototype and the performance evaluation
shows no game-stopper performance degradation. The paper is quite detailed.

                       ===== Paper weaknesses =====


While the overall solution is understandable some details remain quite
vague. In particular, the design choices are never really explained apart from
trying to keep the TCB small.

1) Why is there a need for a micro-hypervisor and the two OSs running as VMs?
2) The system looks like a TrustZone-enabled system (although for x86 such
hardware-supported security architecture is not available), what are the
differences?
3) Is the micro-hypervisor solution really required?

The paper focuses extensively on the USB subsystem, as an example of I/O that
can be used with this solution. How can the system adapt to other subsystems? In
the wimpy kernel ~60% of the lines of code are dedicated to supporting the USB
subsystem. If one is to apply this solution for all possible I/O devices will
the wimpy kernel size double or worse (or it will not, and why?)?

In Section 6.2 the authors experiment with the USB Address Overlap attack and
show that it works. It is not clear if the attack works with their system in
place or not (hopefully the latter). Furthermore, if USB in the downstream
direction (OUT) is broadcast, how can a secure channel to a particular device be
established? On a similar note, can the Untrusted OS, during device enumeration,
present a "fake" USB device to the wimpy OS which is not able to detect if such
device is physically attached to the machine or virtually managed by the
malicious OS?

The solutions presented in the paper (refactoring an I/O subsystem to limit the
TCB size) are not entirely novel ([55,58]) or rely on common knowledge. Given
that the majority of the paper is dedicated to refactoring I/O, then, it should
be rewritten in that light rather than presenting a full-fledged system without
motivating the design choices.

How would other subsystems that require multiplexing between applications be implemented? What about e.g., network interface?

              ===== Justification of the overall merit =====


The paper presentation is quite bad in terms of structure/contributions
explanation (the language is not the problem). In particular the authors spend
too much time focusing on the USB example (throughout the paper) and do not
explain how the solution can be generalized to other I/O
interfaces. Furthermore, design choices are not explained.


===========================================================================
          Authors' Response by Zongwei Zhou <stephenzhou@cmu.edu>
  Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O
              on Commodity Platforms
---------------------------------------------------------------------------
Thanks for the reviews!

Why USB?
We chose an I/O subsystem to illustrate wimp composition with giants because I/O 
is among the most complex ones in any OS, and high-assurance, on-demand isolated 
I/O services have been unavailable to wimps (Sections 7.1 & 7.2). We chose the USB 
as a case study since it is extremely popular in terms of device connectivity, and 
most complex in terms of I/O channel isolation and device-hierarchy verification. 
Popular? 36% of devices are PCI and 35% USB; and 10% of higher-level protocols 
use either [29]. Most complex? USB (1) mixes control and data channels, (2) 
maintains all hierarchy information in untrusted software, and (3) versions earlier 
than 3.0 use device addresses initialized by untrusted software. In contrast, other 
subsystems (e.g., PCI) have separate control channels; some (e.g., PCI, Firewire) 
have hierarchy information stored in hardware; and some others (e.g., Bluetooth 
and HDMI) have hardware-assigned device addresses. Finally, devices with 
dedicated I/O ports and/or memory (e.g., PS/2, VGA, RS-232) are trivial to isolate.

Solution scalability?
The wimpy kernel (WK) size will undoubtedly increase when supporting other 
subsystems, but this would not affect verification complexity. Why? Verification 
is incremental and composable, since the added code and data are separable from 
that of the USB subsystem. We will explain this in detail and illustrate composability 
with the PCI subsystem. For all other subsystems, the WK size increases are probably 
sublinear since they are simpler. 

Why micro-hypervisor-based? 
The Introduction stresses that the only basic requirements for WK support are 
memory isolation and DMA protection (and I/O port isolation for x86). All ideas 
in this paper (IPI-based Wimp-OS communications, outsource-and-verify, export and 
mediate) clearly apply to all architectures that meet these requirements, including TrustZone. 
We chose a micro-hypervisor (XMHF) that has formal memory-
isolation proofs and is open-source, so that we could show that WK can compose with it and retain its 
high assurance. 

General Principles?
Our outsource-and-verify and export-and-mediate methods are already known to 
follow general principles; e.g., for kernel size and complexity minimization (e.g., 
PAJanson76; Schroeder77). Neither method has been used for general I/O subsystems before, 
because I/O isolation was either in the kernel for few simple devices and not on demand, 
or outside the kernel and not minimized for high assurance (Section 7.1). We will re-emphasize this.

Why not file storage and network devices? 
Channel isolation for such devices (bulk I/O) is much easier than for character 
devices (57% of all Linux devices [29]). Our Wimp Apps
use standard authenticated encryption (AE) and key distribution to outsource-and-
verify storage and network I/O directly. We will re-emphasize this (in sections 3.2, 4.4 
and 5.2.4) and cite the 1979-1984 papers on using AE to eliminate bulk I/O from 
security kernels.  

Clarification to Reviewer C's assertions 
1. Channel isolation is already implemented and evaluated in Section 6.4 and Table 7. 
2. Figure 6 uses an OS�Conly (Linux) implementation as baseline. 
3. Our micro-hypervisor only uses the added find-grained IOMMU primitives when WK is registered. 
4. Our syscall uses standard SYSENTER/SYSEXIT instructions. 
5. Our macrobenchmarks use identical settings as XMHF[55].