Using a data set scraped from Wikipedia's curated list of famous personalities in the U.S. with foreign descent, we built a parallel coordinates graph and linked data table which supports filtering and allows rapid exploration of the impact of immigrants in the United States.
This project was completed for the class 05839: The Data Pipeline: Collecting and Using Data for Interactive Systems and can be found here.
Our data set consisted of key features about important individuals from a variety of countries. For each individual, we extracted the following key features: Source Country, Name, Field of Influence, and Description. We categorized individuals as First Generation or Second Generation or Higher using a Naive Bayes classifier. We calculated the feature Influence Score as the length of the person's Wikipedia page.
List of countries and respective number of individuals:
South Korea: 280
We decided that a parallel coordinates chart was the best way to visualize the information. Each line in the chart corresponds to an individual. This allows you to quickly view both trends and outliers in the data.
You can filter by Source Country, Field of Influence, Generation, and Influence Score by filtering along one of the four vertical axis.
You can choose the region (Europe, Asia, Other) using the buttons on the right.
We linked the data table in the lower portion of the visualization to the parallel coordinates chart to update its display after filtering in real-time.
Individuals are sorted decreasingly based on Influence Score and by Country.
Clicking on an individual entry redirects you to his or her Wikipedia page.
Hovering on an individual entry highlights that individual in the parallel coordinates chart.
The site is hosted on Google App Engine and the visualization was made using D3.js