Dataset: Citycam

Citycam dataset is introduced by a paper accepted by CVPR 2017-IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017. Preprint version will come out soon.


This dataset aims to to understand traffic density and traffic flow. As a starting point, we will focus initially on NYC traffic cameras. NYC has been instrumented with 564 surveillance cameras mounted on streets and intersections. These cameras capture traffic 24 hours a day, 7 days a week, generating large scale video data. Different cameras have different positions, scenes, and perspectives. Even for the same camera, weather and illumination change significantly over time. Figure 1 shows the cameras installed in the Manhattan borough and images from 9 sampled cameras.

As there is no existing labeled real-world citycam traffic dataset, in order to evaluate our proposed method, we utilize existing traffic web cameras to collect continuous stream of street images and annotate rich information. Different from currently existing traffic datasets, citycam data are challenging for analysis due to the low frame rate, low resolution, high occlusion, and large perspective. We select 212 representative web cameras from Manhattan, covering different locations, camera perspective, and traffic states. For each camera, we downloaded videos for four time intervals each day (7am-8am, 12pm-1pm; 3pm-4pm; 6pm- 7pm). These cameras have frame rates around 1 frame/second and resolution 352 × 240. Collecting such data for 4 weeks generated 1.4 Terabytes of video data consisting of 60 million frames. To the best of our knowledge, citycamT is the first and largest annotated citycam traffic dataset to date. See Table below for a detailed comparison with other benchmark datasets.

We annotated 60,000 frames with rich information: 1) Bounding box: closest rectangle around each vehicle. 2) Vehicle type: ten types including taxi, black sedan, other cars, little truck, middle truck, big truck, van, middle bus, big bus, other vehicles. 3) Orientation: each vehicle orientation into four categories: 0◦, 90◦, 180◦, and 270◦. 4) Vehicle density: the number of vehicles in the target region of each frame. 5) Re-identification: we match the same car in sequential frames. 6) Weather: five types of weather, including sunny, cloudy, rainy, snowy, and intensive sunshine. The results of the annotation for two successive frames is shown in Figure 2. The dataset is divided into training and testing sets, with 45,850 and 14,150 frames, respectively. To reduce the chance of overfitting, as well as to ensure generalization from training to testing phases, we deliberately select training videos that are taken at different locations from the testing videos, but ensure the training and testing videos share similar traffic conditions and attributes.

Contact Info

Please email us for getting access to the dataset: