Dear Miao Yu, The 35th IEEE Symposium on Security & Privacy (Oakland 2014) program committee would like to inform you that your paper #336 has been conditionally accepted to appear in the conference. We will contact you with the name of a shepherd who will inform you of the conditions for acceptance. Title: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O on Commodity Platforms Authors: Zongwei Zhou (Carnegie Mellon University) Miao Yu (Carnegie Mellon University) Virgil Gligor (Carnegie Mellon University) Paper site: https://www.infsec.cs.uni-saarland.de/oakland2014/paper/336?cap=0336azB82R1TDgPA Your paper was one of 44 accepted (which includes the conditionally accepted papers) out of 324 submissions. Congratulations! Reviews and comments on your paper are appended to this email. The submissions site also has the paper's reviews and comments, as well as more information about review scores. Contact the site administrator, Oakland 2014 Chairs , with any questions or concerns. =========================================================================== Oakland 2014 Review #336A Updated 14 Dec 2013 1:59:58am CET --------------------------------------------------------------------------- Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O on Commodity Platforms --------------------------------------------------------------------------- Overall merit: 3. Weak reject Reviewer confidence: 4. High Correctness: 3. Only minor technical problems Presentation quality: 4. Good Novelty: 3. Unsurprising next step ===== Paper Summary ===== This paper presents the design, implementation and evaluation of a wimpy kernel for providing on-demand isolated I/O channels to wimpy apps. The wimpy kernel runs on top of a micro hypervisor similar to XMHF. To minimize the size and complexity of the wimpy kernel, most driver functionalities for the on-demand I/O are put either in the wimpy apps or in the untrusted OS. The wimpy kernel also mediates the communication between the wimpy apps and the OS which is done via IPI and shared memory. ===== Paper strengths ===== The paper tackles an important and hard problem: securely sharing devices between trusted and untrusted VMs (or compartments). This is because someone has to maintain the shared device's state. If the state is maintained by the untrusted part, then it may lead to various attacks such as those pointed out by the paper. If the state is maintained by the trusted part, it increases the size and complexity of the TCB at least. The solution for USB proposed in this paper seems to work, which is impressive. ===== Paper weaknesses ===== The paper doesn't make it clear that how to securely share device is device specific. The whole paper focuses on USB, but the paper doesn't point out what properties USB has allows such secure device sharing. Another claimed contribution is to move functionalities to either user apps or the untrusted OS. However, the paper doesn't make any attempt to explain what principles were used when such a decision was made and how they can be generalized. In some sense, the paper isn't for on-demand isolated I/O in general but for how to share USB devices securely. ===== Justification of the overall merit ===== Some detailed comments: 1. It's not clear to me why the virtual file system code can be removed from both WK (the wimpy kernel) and wimp apps (Page 5). 2. WK periodically monitors the port status. What if anything malicious happens between the checks? 3. typo: "Whit" on page 7 4. WK verifies the results. Does it guarantee both memory safety and semantic correctness? 5. typos: missing "." before "Due to"; ", (and) a WK-app"; "it do not" on page 9. 6. How exactly did you choose the 3 interfaces that should be implemented by the WK? 7. It's not clear if the I/O channel becomes the bottleneck in the I/O benchmarks. If so, the experiments really measure the bandwidth not the overhead. =========================================================================== Oakland 2014 Review #336B Updated 16 Dec 2013 11:27:28pm CET --------------------------------------------------------------------------- Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O on Commodity Platforms --------------------------------------------------------------------------- Overall merit: 4. Weak accept Reviewer confidence: 3. Average Correctness: 4. No obvious problems Presentation quality: 4. Good Novelty: 3. Unsurprising next step ===== Paper Summary ===== This paper presents two core ideas (1) the idea of a wimpy kernel that runs along side a commodity OS to implement backend driver code; and (2) privilege separation of Linux's USB controller interface using (1). A key insight of the design for (1) is the use of InterProcessor Interrupts (IPIs), which reduces the number of context switches required for the proposed architecture by about two-thirds (2 instead of 6). This design assumes a dedicated processor for the Wimpy Kernel and Wimp apps. ===== Paper strengths ===== The use of IPIs in place of context switches appears to be a novel contribution for security applications. Researchers frequently blame context switches for poor performance that keeps commodity platforms from using privilege separation. Privilege separating the USB controller to provide trusted operation of the USB operation provides security value. The authors note that this was previously done for the PCI controller. Using the wimpy design, the authors are able to keep a significant amount of code in the untrusted OS. The authors describe how the TEE (wimpy kernel) can verify inputs from the untrusted OS. The inclusion of a verification algorithm is valuable and fundamental to a privilege separated architecture such as this. ===== Paper weaknesses ===== Isolating code for trusted execution is a saturated space. The specific research contributions of this work are sliced thin. The privilege separation process was a rather manual engineering task. (however, it is rather useful and insightful as to how to use Wimpy properly). It's not clear how tied Wimpy is to USB. The authors note that [58] provides something similar for PCI. Does a verification algorithm exist for PCI? What about other IO buses? The verification of operation in the untrusted OS a critical assumption of the architecture. What happens when other IO buses get moved to Wimpy? Right now, Wimpy is just USB, and there are Wimp apps for each USB device. When we add PCI, Firewire, Bluetooth, etc., more and more code is shifted into the privileged process. A similar problem happened with microkernels: everything was moved into a single UNIX process, and a vulnerability in that process affected the other functionality. I think that Wimpy may be resilient to this due to the way that code is inherently privilege separated; however, the paper could use more discussion of the grand vision for Wimpy that extends beyond just USB. The authors claim that the IPI-based design has a significant impact on their end design; however, this is never demonstrated. USB is a relatively slow bus. Would the extra context switches be hidden by the device latency or the code in the wimp app? How frequently does the optimization get executed (i.e., Ahmdal's law)? The performance evaluation is insufficient. The authors look at CPU and IO macrobenchmarks; however, the only USB device that is considered is a USB keyboard. The evaluation really should have (1) a USB storage device, and (2) a USB network interface, and then perform storage and network IO benchmarks. Furthermore, given that only a USB keyboard device driver was available in the prototype, Wimpy may not very simple to port existing drivers. In which case, Wimpy may not be having the last dance ... ===== Justification of the overall merit ===== There is some novel contribution in the architectural design of Wimpy. It is useful to explore different ways of accomplishing tasks. The use of IPIs to reduce the implications of context switches is neat. That said, one of the biggest weaknesses is the performance evaluation that does not consider storage and network USB devices, which would be the real test of whether or not Wimpy is practical. Other Comments: - In Section 4.4, the paper discusses that Wimp apps encrypt data and store it in the untrusted OS. What key management enables this? =========================================================================== Oakland 2014 Review #336C Updated 8 Jan 2014 4:10:14pm CET --------------------------------------------------------------------------- Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O on Commodity Platforms --------------------------------------------------------------------------- Overall merit: 3. Weak reject Reviewer confidence: 3. Average Correctness: 3. Only minor technical problems Presentation quality: 2. Fair Novelty: 4. Surprising new idea or result ===== Paper Summary ===== This paper presents a reduced TCB software architecture for isolating I/O channels (e.g., USB devices) for applications. A micro, non-virtualizing hypervisor (modified XMHF) isolates a light wimpy kernel and the app from an untrusted OS. The app runs on the wimpy kernel, which verifies that the app has exclusive access to any devices it needs (the devices must be requested in advance when the app is setup). The device handling subsystem code is split up between the untrusted OS, the wimpy kernel and the app, with the goal of minimizing the TCB. The paper shows how this can be done for a custom USB subsystem. The wimpy kernel also provides an asynchronous, fast message passing interface from the app to the untrusted OS kernel for all other services. An implementation is evaluated in terms of its TCB size and its performance on micro- and macro- benchmarks. ===== Paper strengths ===== The idea is interesting and the paper demonstrates a reasonable reduction in the TCB for the USB subsystem. The idea of using a wimpy kernel to verify untrusted OS subsystem functionality of the paper appears to be novel. ===== Paper weaknesses ===== - The paper focuses exclusively on the USB subsystem. There is no evidence that the idea will scale to any other subsystem. My specific concern is the verification performed by the wimpy kernel. The algorithm in paper is very specific to USB. - The implementation is incomplete and lacks a key component: isolation. That is a central point of the paper. While it is reasonable to assume that the isolation as described can be implemented, the evaluation, as it stands, does not cover the isolation aspect. - The evaluation is rather preliminary and seems to have been written in a hurry. The baseline for Figure 6 (microbenchmarks) is TrustVisor. Why? The baseline should be an OS-only implementation since everything else has been added in order to make the I/O isolation work. There is also a confusing sentence on page 12 which indicates that the implemented system relies on the IOMMU primitives of XMHF. In Section 5.1, you said that you modified the IOMMU primitive of XMHF. There is no baseline for syscall latency. How does your latency compare to standard syscalls? The macrobenchmarks are rather non-informative. One of the most important aspects --- effort needed to port a standard app to a wimpy app --- is completely missing. This should be clearly reported. The Apache experiment with 200,000 requests (the only real macrobenchmark) is also rather small. Figure 7 only shows normalized values of unknown metrics. What did you measure in each benchmark? Presentation: - Section 4.5 should to be presented earlier. It is the high-level picture that links the whole system. - Section 5.2.1: Paragraph 2 onward talks of some steps that lack a context. ===== Justification of the overall merit ===== I like the paper's main idea but the work is partly incomplete and does not include a comprehensive evaluation. =========================================================================== Oakland 2014 Review #336D Updated 10 Jan 2014 6:47:22pm CET --------------------------------------------------------------------------- Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O on Commodity Platforms --------------------------------------------------------------------------- Overall merit: 4. Weak accept Reviewer confidence: 2. Fair Correctness: 3. Only minor technical problems Presentation quality: 4. Good Novelty: 4. Surprising new idea or result ===== Paper Summary ===== The paper presents a software architecture that is used to keep a small TCB while being able to provide on-demand I/O channel isolation when particular applications (e.g., trusted applications) require access to I/O devices. The main idea is to use a small (wimpy) kernel alongside the normal untrusted OS running on top of a micro hypervisor. The wimpy kernel will remain small as most of the I/O communication will be taken care by either the wimpy applications (the driver components) or the untrusted OS (basic device management). ===== Paper strengths ===== The presented system does not have any major problem. Furthermore the system is implemented through a prototype and the performance evaluation shows no game-stopper performance degradation. The paper is quite detailed. ===== Paper weaknesses ===== While the overall solution is understandable some details remain quite vague. In particular, the design choices are never really explained apart from trying to keep the TCB small. 1) Why is there a need for a micro-hypervisor and the two OSs running as VMs? 2) The system looks like a TrustZone-enabled system (although for x86 such hardware-supported security architecture is not available), what are the differences? 3) Is the micro-hypervisor solution really required? The paper focuses extensively on the USB subsystem, as an example of I/O that can be used with this solution. How can the system adapt to other subsystems? In the wimpy kernel ~60% of the lines of code are dedicated to supporting the USB subsystem. If one is to apply this solution for all possible I/O devices will the wimpy kernel size double or worse (or it will not, and why?)? In Section 6.2 the authors experiment with the USB Address Overlap attack and show that it works. It is not clear if the attack works with their system in place or not (hopefully the latter). Furthermore, if USB in the downstream direction (OUT) is broadcast, how can a secure channel to a particular device be established? On a similar note, can the Untrusted OS, during device enumeration, present a "fake" USB device to the wimpy OS which is not able to detect if such device is physically attached to the machine or virtually managed by the malicious OS? The solutions presented in the paper (refactoring an I/O subsystem to limit the TCB size) are not entirely novel ([55,58]) or rely on common knowledge. Given that the majority of the paper is dedicated to refactoring I/O, then, it should be rewritten in that light rather than presenting a full-fledged system without motivating the design choices. How would other subsystems that require multiplexing between applications be implemented? What about e.g., network interface? ===== Justification of the overall merit ===== The paper presentation is quite bad in terms of structure/contributions explanation (the language is not the problem). In particular the authors spend too much time focusing on the USB example (throughout the paper) and do not explain how the solution can be generalized to other I/O interfaces. Furthermore, design choices are not explained. =========================================================================== Authors' Response by Zongwei Zhou Paper #336: Dancing with Giants: Wimpy Kernels for On-demand Isolated I/O on Commodity Platforms --------------------------------------------------------------------------- Thanks for the reviews! Why USB? We chose an I/O subsystem to illustrate wimp composition with giants because I/O is among the most complex ones in any OS, and high-assurance, on-demand isolated I/O services have been unavailable to wimps (Sections 7.1 & 7.2). We chose the USB as a case study since it is extremely popular in terms of device connectivity, and most complex in terms of I/O channel isolation and device-hierarchy verification. Popular? 36% of devices are PCI and 35% USB; and 10% of higher-level protocols use either [29]. Most complex? USB (1) mixes control and data channels, (2) maintains all hierarchy information in untrusted software, and (3) versions earlier than 3.0 use device addresses initialized by untrusted software. In contrast, other subsystems (e.g., PCI) have separate control channels; some (e.g., PCI, Firewire) have hierarchy information stored in hardware; and some others (e.g., Bluetooth and HDMI) have hardware-assigned device addresses. Finally, devices with dedicated I/O ports and/or memory (e.g., PS/2, VGA, RS-232) are trivial to isolate. Solution scalability? The wimpy kernel (WK) size will undoubtedly increase when supporting other subsystems, but this would not affect verification complexity. Why? Verification is incremental and composable, since the added code and data are separable from that of the USB subsystem. We will explain this in detail and illustrate composability with the PCI subsystem. For all other subsystems, the WK size increases are probably sublinear since they are simpler. Why micro-hypervisor-based? The Introduction stresses that the only basic requirements for WK support are memory isolation and DMA protection (and I/O port isolation for x86). All ideas in this paper (IPI-based Wimp-OS communications, outsource-and-verify, export and mediate) clearly apply to all architectures that meet these requirements, including TrustZone. We chose a micro-hypervisor (XMHF) that has formal memory- isolation proofs and is open-source, so that we could show that WK can compose with it and retain its high assurance. General Principles? Our outsource-and-verify and export-and-mediate methods are already known to follow general principles; e.g., for kernel size and complexity minimization (e.g., PAJanson76; Schroeder77). Neither method has been used for general I/O subsystems before, because I/O isolation was either in the kernel for few simple devices and not on demand, or outside the kernel and not minimized for high assurance (Section 7.1). We will re-emphasize this. Why not file storage and network devices? Channel isolation for such devices (bulk I/O) is much easier than for character devices (57% of all Linux devices [29]). Our Wimp Apps use standard authenticated encryption (AE) and key distribution to outsource-and- verify storage and network I/O directly. We will re-emphasize this (in sections 3.2, 4.4 and 5.2.4) and cite the 1979-1984 papers on using AE to eliminate bulk I/O from security kernels. Clarification to Reviewer C's assertions 1. Channel isolation is already implemented and evaluated in Section 6.4 and Table 7. 2. Figure 6 uses an OS¨Conly (Linux) implementation as baseline. 3. Our micro-hypervisor only uses the added find-grained IOMMU primitives when WK is registered. 4. Our syscall uses standard SYSENTER/SYSEXIT instructions. 5. Our macrobenchmarks use identical settings as XMHF[55].