15-712 Project: Email Filtering System

Orathai Sukwong, Yongjun Jeon

{osukwong, yongjunj}@andrew.cmu.edu

 

Objective

 

We aim to build a malicious email filtering system to examine each incoming email for any malicious elements using virtual machines (VMs). The purpose of this project is to show whether it is feasible to use this detection mechanism in email servers with real-world traffic and workloads. We reduce the overhead associated with VM creations by using flash cloning [1] and evaluate our system against VMware Workstation, on which the detection mechanism currently runs. Note that we already have the detection algorithm which executes inside each VM.

 

System Design

 

 

Figure 1. The email filtering system design.

 

As illustrated in Figure 1, the system consists of three main components: 1) the dispatcher, 2) the VMM infrastructure, and 3) the detector inside each VM.

 

The dispatcher runs on the host machine. It fetches each email in the incoming email queue and extracts all its attachments. It also pre-downloads files associated with the external links embedded within the emails. It then checks whether a resulting file already has an entry in the scanned table; if it does, the file has already been inspected, and we retrieve the result from the table. Otherwise, we create a hash entry for this file, insert it in the scanned table, and append the email in the email buffer.

 

The retrieve thread checks any message in the out-queue. If the queue is not empty, we dequeue an entry and process it. A queue entry contains a pointer to the hash entry in the scanned table. The retrieve thread can view the detailed detection result as stored in the hash entry. For each result, it finds and marks the corresponding email accordingly.

 

Second main component is the VMM infrastructure. We integrate the flash cloning technique [1] in Xen with modification to allow more efficient communication between the host machine and the VMs by sharing a memory region. Along with the light-forking technique to avoid VM creation/resumption overheads, this shared memory ¡°hack¡± is the main source of speedup we expect over our current solution using VMware Workstation, in which there is no direct communication channel between the host and the VMs.

 

The scheduling and queue management (SQM) script executes on the host machine. It automatically dequeues any entry in the in-queue and forks a new child VM from the reference VM and mounts the file specified in the in-queue entry to the child VM for inspection, with all subsequent writes in the child VM redirected to a sparse, copy-on-write file image. Once a child VM finishes its task, the SQM script enqueues a corresponding entry in the out-queue.

 

Lastly, the detector inside each VM determines whether a supplied file is malicious or not. The launch script executes a given file and monitors its activity in term of file system modification, registry activity, process creation/deletion event, network connectivity, and image loading. Our detection algorithm identifies any malicious activity. The timeout for each file execution is currently set to 60 seconds. Once the execution is done, the monitoring record (log) and the result are written back to the mounted image, which is visible to the dispatcher and the SQM scripts on the host machine.

 

Implementation Plan

 

Week

Tasks

1 (10/22)

Jun – VMM infrastructure, minus the shared memory ¡°hack¡±
KobImplement all detection modules in VM

2,3 (10/29, 11/5)

Jun Shared memory ¡°hack¡±
Kob – Implement the dispatcher

4,5 (11/12, 11/19)

Integrate the VMM infrastructure and detector
Construct the email testbed, including generation of artificial email traffic
E
valuate

6 (11/26)

(Midterm 2)
Presentation

7  (12/3)

Report Writeup

 

 

Evaluation

l         Measure the overhead of each component in order to determine which contributes the most to the overhead.

l         Compare the performance of our customized Xen Potemkin VM against VMware Workstation in the context of the email filtering system.

l         Measure the overall performance in terms of the number emails processed per second.

l         Experiment with different number of VMs to use in the email filtering system to determine how the VMM infrastructure scales with differing number of cores/processors.

 

[1]     Vrable, M. Ma, J. Chen, J. Moore, D. Vandekieft, E. Snoeren, A. Voelker, G. and Savage, S. Scalability, Fidelity, and Containment in the Potemkin Virtual Honey-farm. SOSP¡¯ 05