Orathai Sukwong, Yongjun Jeon
{osukwong, yongjunj}@andrew.cmu.edu
Objective
We aim to build a malicious email
filtering system
to examine each incoming email for any malicious elements using
virtual machines
(VMs). The
purpose of this project is to show whether it is feasible to use this detection
mechanism in email servers with real-world traffic and workloads.
We reduce the overhead associated with VM creations by using flash cloning [1] and evaluate
our system against VMware Workstation, on which the detection mechanism currently runs. Note that we already have the detection
algorithm which
executes inside each VM.
System Design

Figure 1. The email filtering system design.
As illustrated in Figure 1, the system consists of three main components: 1) the
dispatcher, 2) the VMM infrastructure, and 3) the detector inside each VM.
The dispatcher runs on the host machine. It fetches each email in the incoming email queue and extracts all its attachments. It also
pre-downloads files associated with the external links embedded within the emails. It then checks whether a resulting file already has
an entry in the scanned table; if it does, the file has already been
inspected, and we retrieve the result
from the table. Otherwise, we create a hash entry for this file, insert it in the scanned table, and append the email in the
email buffer.
The retrieve thread checks any message in the out-queue. If the queue is not empty, we dequeue an entry and process it. A queue entry contains a pointer to the hash entry in the scanned table. The retrieve thread can view the detailed detection
result as stored in the hash entry. For each result, it finds and marks the
corresponding email accordingly.
Second main component
is the VMM infrastructure. We integrate the flash cloning
technique [1] in Xen with modification to allow more efficient
communication between the host machine and the VMs by
sharing a memory region. Along with the light-forking
technique to avoid VM creation/resumption overheads, this shared memory ¡°hack¡±
is the main source of speedup we expect over our current solution using VMware Workstation, in which there is no direct
communication channel between the host and the VMs.
The scheduling and queue management (SQM) script executes on the host machine.
It automatically dequeues any entry in the in-queue and forks a new child VM from the reference VM and mounts the file specified in the in-queue
entry to the child VM for inspection, with all
subsequent writes in the child VM
redirected to a
sparse, copy-on-write file image. Once a child VM finishes its task, the SQM script enqueues a corresponding entry in the out-queue.
Lastly, the detector inside each VM determines whether a supplied file is malicious or not. The launch script executes a given
file and monitors its activity in term of file system modification,
registry activity, process creation/deletion event, network connectivity, and
image loading. Our detection algorithm identifies any malicious activity. The timeout for each file execution is
currently set to 60 seconds. Once the execution is done, the monitoring record (log) and
the result are written back to the mounted
image, which is
visible to the dispatcher and the SQM scripts on the host machine.
Implementation Plan
|
Week |
Tasks |
|
1 (10/22) |
Jun – VMM infrastructure, minus
the shared memory ¡°hack¡± |
|
2,3 (10/29, 11/5) |
Jun – Shared memory ¡°hack¡± |
|
4,5 (11/12, 11/19) |
Integrate
the VMM infrastructure and detector |
|
6 (11/26) |
(Midterm
2) |
|
7 (12/3) |
Report Writeup |
Evaluation
l
Measure the
overhead of each component in order to determine which
contributes the most to the overhead.
l
Compare the
performance of our customized Xen Potemkin VM against VMware
Workstation in the context of the email filtering system.
l
Measure the
overall performance in terms of the number
emails processed per second.
l
Experiment with different number of VMs to use in
the email filtering system to determine how the VMM infrastructure scales with
differing number of cores/processors.
[1]
Vrable, M. Ma, J. Chen, J. Moore, D. Vandekieft,
E. Snoeren, A. Voelker, G.
and Savage, S. Scalability, Fidelity, and Containment in the Potemkin Virtual Honey-farm. SOSP¡¯ 05