Hate crimes in America have historically been difficult to track since there is very little official data collected and what does exist, is incomplete and not very useful for reporters desperate to find out the facts. This led ProPublica — with the support of the Google News Lab — to form Documenting Hate earlier this year, a collaborative reporting project that aims to create a national database for hate crimes by collecting and categorizing news stories related to hate crime attacks and abuses from across the country.
Now, with ProPublica, we are launching a new machine learning tool to help journalists covering hate news leverage this data in their reporting.
The Documenting Hate News Index — built by the Google News Lab, data visualization studio Pitch Interactive and ProPublica — takes a raw feed of Google News articles from the past six months and uses the Google Cloud Natural Language API to create a visual tool to help reporters find the news happening all over the country, from Oklahoma to Florida, California to Kentucky. It’s a constantly-updating snapshot of data from this year, one which is valuable as a starting point to reporting on this area of news.
The Documenting Hate project was in response to the lack of national data on hate crimes. While the FBI is required by law to collect data about hate crimes its database is patchy and almost unusable for reporters because local jurisdictions aren’t required to report incidents up to the federal government.
All of which underlines the value of the Documenting Hate Project, which is powered by a number of different news organisations and journalists who collect, and verify reports of hate crimes and events. Documenting Hate is informed by both reports from members of the public and raw Google News data of stories from across the nation.
The new Index will help make this data easier to understand and visualize. It is one of the first visualisations to use machine learning to generate its content using the Google Natural Language API, which analyses text and extracts information about people, places, and events. In this case, it helps reporters by digging out locations, names and other useful data from the 3,000-plus news reports – the feed is updated every day, and goes back to February 2017.
The feed is generated from news articles that cover events suggestive of hate crime, bias or abuse — such as anti-semitic graffiti or local court reports about incidents. And we are monitoring it to look out for errant stories that slip in, ie searches for phrases that just include the word “hate” — it hasn’t happened yet but we will be paying close attention.
Users of the app can filter the reports by searching for a keyword in the search box or by clicking on algorithmically-generated keywords. They can also see reports by date by clicking ‘calendar’.
The Hate News Index is available now and we will be developing it further over the next few months as we see how journalists use it day to day to unearth these stories of hate and help collate a national database to monitor.
, the New York Times, WNYC, BuzzFeed News, First Draft, Meedan, New America Media, The Root, Latino USA, The Advocate, 100 Days in Appalachia and Ushahidi. They are also working with civil-rights groups such as the Southern Poverty Law Center, and schools such as the University of Miami School of Communications.
As part of our mission to create new resources for the journalism community, we are also open-sourcing the data on our GitHub page — let us know what you do with it by emailing firstname.lastname@example.org.