Using a deep learning computer model and a dataset containing millions of dashboard camera images from New York City rideshare drivers, Cornell Tech researchers were able to see which neighborhoods had the highest numbers of New York Police Department marked vehicles, a possible indication of deployment patterns.
Their results showed trends in police presence by time of day, neighborhood demographics, proximity to police stations and commercial districts. Among the researchers’ findings: Gramercy Park, in Midtown Manhattan, had the most police vehicles visible in dashcam images -- almost 20 times more than Arden Heights/Rossville, in southern Staten Island, which had the fewest.
Areas with more police vehicle images included wealthy commercial zones and low-income neighborhoods with higher proportions of Black and Latino residents.
Matt Franchi, a doctoral student in the field of computer science, is lead author of “Detecting Disparities in Police Deployments Using Dashcam Data,” which he presented June 12 at the Association for Computing Machinery Conference on Fairness, Accountability, and Transparency (FAccT ’23) in Chicago.
Senior authors are Wendy Ju, associate professor of information science at the Jacobs Technion-Cornell Institute at Cornell Tech, and Emma Pierson, assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech. Both are field members in the Cornell Ann S. Bowers College of Computing and Information Science.
While data on police stops, use of force, searches, criminal incidents and arrests is publicly available, police generally do not provide information on how and where officers are deployed, citing security concerns. Deployment disparities, the researchers wrote, can result in downstream biases like increased arrests: A neighborhood that’s more heavily policed than another might not necessarily have more crime, just more people arrested for it.
In 2020, the team – which included J.D. Zamfirescu-Pereira, a doctoral student at the University of California, Berkeley – obtained 24.8 million dashcam images from Nexar, a company that provides rideshare drivers with dashboard cameras. Uber, for example, recommends that their drivers install cameras for their own protection.
Nexar made their proprietary images (taken between March 4, 2020, and Nov. 15, 2020) available for research purposes; originally, the Cornell team used it to gather information on how people in New York City were socially distancing during the pandemic.
“I was initially interested in the underlying methodologies that powered the project – processing large amounts of data that are time-stamped and geo-tagged,” Franchi said. “When you run the right analysis, you can get a rich picture of a huge urban environment, and extract trends that you wouldn’t be able to otherwise.”
The researchers then used thousands of images, annotated for whether they contained a police car, to train a deep learning model to identify marked police vehicles. While the model could identify police vehicles, it could not account for the reasons they were there.
The researchers identified more than 233,000 images containing marked police vehicles. The pictures were taken across all five boroughs, at all times of day. The team then studied the geo-tagged images in context of specific factors such as neighborhood, borough and zone type, and whether the pictures were taken in a bustling commercial area, near manufacturing or in a primary residential location.
They also analyzed the images in relation to census data for the area, such as population density, median household income, and the racial makeup of residents (white, Black, Hispanic, Asian).
The researchers said they see two main benefits to this work: a step toward greater transparency in policing; and the potential for auditing all government agencies for efficiency and equity.
Ju said researchers could apply this method of training a model to identify objects – such as garbage piles -- with computer vision in myriad ways that help city governments and urban planners. It’s the ability of this method to adapt to detect a wide variety of phenomena – in this case, marked police vehicles – that is most exciting to Ju.
“From a technology standpoint, you have these dashcams collecting data, and then we have computer vision so we can actually pick out what we’re interested in from that data,” she said. “But it’s really the method of aggregating the data post hoc, and being able to compare the incidence of those things across neighborhoods – that’s the thing that we haven't been able to do before.”
Support for this work came from an Amazon Research Award, a Google Research Scholar Award and the National Science Foundation.
By Tom Fleischman
This story was originally published in the Cornell Chronicle.