Contact me
Acedemic History
My favorists

Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation


Rows 2,5:segmenting results by our CRLS

Row 3, 6: groundtruth

We propose a novel definition of contour evolution named Recurrent Level Set (RLS) to employ Gated Recurrent Unit under the energy minimization of a variational LS functional. The curve deformation process in RLS is formed as a hidden state evolution procedure and updated by minimizing an energy functional composed of fitting forces and contour length. By sharing the convolutional features in a fully end-to-end trainable framework, we extend RLS to Contextual RLS (CRLS) to address semantic segmentation in the wild. The experimental results have shown that our proposed RLS improves both computational time and segmentation accuracy against the classic variational LS-based method whereas the fully end-to-end system CRLS achieves competitive performance compared to the state-of-the-art semantic segmentation approaches



Deep Contextual Recurrent Residual Networks for Scene Labeling


Rows 2,5:segmenting results by our CRRN

Row 3, 6: groundtruth



Designed as extremely deep architectures, deep residual networks which provide a rich visual representation and offer robust convergence behaviors have recently achieved exceptional performance in numerous computer vision problems. Being directly applied to a scene labeling problem, however, they were limited to capture long-range contextual dependence, which is a critical aspect. To address this issue, we propose a novel approach, Contextual Recurrent Residual Networks (CRRN) which is able to simultaneously handle rich visual representation learning and long-range context
modeling within a fully end-to-end deep network. Furthermore, our proposed end-toend CRRN is completely trained from scratch, without using any pre-trained models in contrast to most existing methods usually fine-tuned from the state-of-the-art pretrained models, e.g. VGG-16, ResNet, etc. The experiments are conducted on four challenging scene labeling datasets, i.e. SiftFlow, CamVid, Stanford background and SUN datasets, and compared against various state-of-the-art scene labeling methods.



Hand On Steering Wheel Detection and Classification

This paper presents an advanced Convolutional Neural Network (ConvNet) based approach, named Multiple Scale Region-based Fully Convolutional Networks (MS-RFCN), for hand detection and classification. In order to robustly deal with the challenging factors, we proposed to span the receptive fields in the ConvNet in multiple deep feature maps. By this way, both global and local context information are able to be efficiently synchronized and simultaneously contribute to the human hand feature representation process.

The experiments are presented on the challenging hand databases, i.e. the Vision for Intelligent Vehicles and Applications (VIVA) Challenge and Oxford Hand Detection database. Our proposed method achieves the state-of-the-art results

ex2 ex1
DeepSafeDrive: A Grammar-aware Driver Parsing Approach

This work is to support the Federal Highway Administration (FHWA) by automatically classifying driver behavior using facial cues, such as: mouth movement, eye movement, head pose estimation, as well as steering wheel mannerisms, seatbelt usage, cell phone talking. In addition, the work will also be able to automatically detect the soft biometric information of the driver, such as: age, gender, ethnicity, glasses, etc. In this work a grammar based deep learning approach is employed to learn the drive structure as well as to detect and segment RoI simultaneously.

Beard/Moustache Detection & Segmentation



Facial hair analysis has recently received significant attention from forensic and biometric researchers because of three important observations as follows. Firstly, changing facial hairstyle can modify a person's appearance such that it effects facial recognition systems . Secondly, most females do not have beard or moustache. Therefore, detecting facial hair helps to distinguish male against female with high confidence in the gender classification problem. Finally, opposed to babies and young adults, only male senior adults generally have beard or moustache. The facial hair detection can help to improve the accuracy of an age estimation system. In addition to beard and moustache detection, segmentation also plays an crucial role, especially in facial recognition systems due to the following biometric observation. There is lack of small patches under a human mouth and these features does not change during the lifetime of a person.

  1. The algorithm we proposed has the following steps

    1. Super Pixel Generation: We start with grouping pixels into perceptually meaningful region called superpixel. Because we want to accurately segment object (facial hair) as well as reduce computational complexity, we use Simple linear iterative clustering (SLIC)
  2. 2. Feature Extraction: We extract feature, which is a combination of HoG and HOGG feature from both the bounding box of the superpixel as well as from the superpixel foreground.
  3. 3. Super Pixel Classification: We train an combination of Random Forest, Random Ferns and SVM on top of the high frequency features to assign a score for each superpixel. We first use self-trained model to decide the label of superpixel. Based on the confidence of superpixel labeling, the pre-trained model maybe used if the score is low.
  4. 4. Aggregately Searching: Because the region of interest (ROI) is selected by landmarker, the facial hair maybe inaccurately segmented if the landmarking points are not aligned well. We propose an aggregately search strategy to refine the result as well as overcome the limitations of landmarker.



Facial Decomposition

This system is able to extract fine soft biometrics facial attributes (such as hair, eyebrow, eyes, nose, mouth, beard, moustache, etc.). The new facial features play as a key role in our proposal facial matching/retrieval system in which the given information can be either a face image or a partial face image or a key-word. Furthermore, the new facial features are able to significantly improve the overall system's face identification performance. To achieve this goal, there are three tasks implemented in our system. The first task extracts the hair regions and the facial components by robust and smart segmentation methods making use of our active contours. In the second task, we will explore to classify various types and shapes of the hair regions and the facial components. The third task illustrates how to combines the two first tasks to build up a facial matching/retrieval system as well as to improve the face identification/verification system. Some example of facial decomposition is shown


Our proposed system is able to provide valuable information for face matching and retrieval with either full face image based or partial face image based or key-word based queries. We consider the proposal system with two different user cases: image based query and key-word based query. Even two user cases are different in query procedure; they share the same training procedure which is shown in following figure.
Every training image is first segmented by our ACF model. Each facial component is considered as an element in a cluster. In our system, we defined seven clusters corresponding to seven facial components (hair, eyes, eyebrow, mouth, nose, moustache, and beard). The element will be presented by a low dimensional vector via the manifold learning based dimensionality deduction technique. All vectors of the same cluster conduct a dictionary using dictionary learning. Thus, we construct seven dictionaries from seven clusters defined. Sparsity property helps in clustering/classifying all facial components in training data. Based on the clustered results, we are able to count the number of hair styles, beard shapes, etc.



  Twin Identification

Examples of pairs of similar looking twins drawn from different age groups. The facial features of the twins are more alike when they are young but differ in the aged due to the role played by environmental factors. The facial images in (a) and (b) are publicly available on the Internet while those in (c) are from the ND-Twins database. Our facial aging based approach to identification of twins relies on the use of Gabor filters to extract features from 9 facial aging regions of the face



Intrinsic and extrinsic facial asymmetry are common in humans and have been used in many biometric applications. Intrinsic facial asymmetry is caused by changes that occur to the structure of the face as a result of aging, growth, injuries, birthmarks and splotches and sun burns. Extrinsic facial asymmetr is caused by external factors such as viewing orientation, illumination variation, etc. The asymmetry of a face is an individualized characteristic, differing in perceptible ways even between identical twins. We describe two techniques of asymmetry decomposition used to identify twins. The first technique is based on projecting the difference between two symmetric images obtained by reflecting the right side of a face and the left side of a face respectively onto an SVD subspace. The second technique uses Procrustes analysis and makes used of the angle between the two left sides image and the two right sides image.


Eyebrow Shape-based  Face Recognition
In order to analyze the shape of eyebrow, there are many intermediate modules involved including eyebrow region detection, image normalization, eyebrow segmentation, eyebrow shape-based feature extraction and matching. Each of these modules is a non-trivial task and has been developed separately in previous researches. The primary contribution of this work is the construction of a complete end-to-end, fully automatic, high performance system for eyebrow shape-based analysis and matching. To the best of our knowledge, this is the first such system which does not require any manual intervention, yet our results demonstrate that our proposed system outperforms the state of the art in terms of reliability and accuracy. A second contribution of this work is the introduction of the previously unexplored influence of eyebrow asymmetry as a weak biometric feature. eye

Some examples of segmentation and matching results are shown as follows:



 Color Modeling

Color space or color modeling is defined as a model that is able to represent color numerically in terms of three or more coordinates. Some common color modeling such as RGB, YIQ, HSV, Lab have been effectively used in many computer vision applications. However, these color spaces are not particularly designed for blood cell which is shown in very special color range. In our system, a blood cell color space is proposed. A visualization of available color space and our proposed modeling is shown as follows:

Iris Segmentation
iris iris1

Most of the works found in literature talk about segmentation of and feature extraction from ”ideal” iris images. These are
images acquired from users under ideal conditions - with the the user facing the acquisition device, little or no eyelid/eyelash occlusion, ambient illumination that is neither too bright nor too low, uniform intensities. If the segmentation stage does not accurately detect the iris boundaries, then the performance of the iris feature matching stage is severely affected resulting in sub-par recognition performance. Most state of the art methods to localize the pupillary and limbic boundaries in an acquired eye image are based on the geometry of the eye. Assumptions are made on the radii of the pupil/iris and on the shape of the pupil/iris (i.e. circular or elliptical). In our method, an Iris_ASM model with 64 landmarks is proposed. A graph-cuts based methods is proposed to find the contour around the eyeball and iris.

Document Enhancement

The proposed methods are able to deal with image degradations which can occur due to bleeding-through ink, large black border, fading ink, uneven illumination, contrast variation, smear, and various pattern backgrounds. Given an input image, the contrast of intensity is first estimated by a grayscale morphological closing operator. A double-threshold is generated by our Shannon entropy-based thresholding methods corresponding to 1-D histogram and 2-D histogram to classify pixels into text, near-text, and non-text categories. The pixels in the second group are relabeled by the local mean and the standard deviation values. Our proposed methods classify noise into two classes, which are dealt with by binary morphological operators, shrink and swell filters, and a graph searching strategy



Audio Watermarking

Update soon


Copyright © 2011 All Rights Reserved - Ngan Le