« Back Home

Data Journalism: An Intro

Lena Groeger, ProPublica, September 2015

What is it, anyways?

Data journalism is an ambiguous term and likely you'll get 18 different explanations by asking 18 different people. But in general, you could think of it as the new possibilities that arise when you combine traditional journalistic skills (being curious & skeptical, asking questions, talking to people, finding powerful narratives, telling compelling stories) and the explosion of new digital information and tools.

Data stories themselves can be about any topic, but often, they focud on trends and exceptions. A story about rising income inequality or affirmative action bans across America are both data stories — they tell us about trends using statistics and data analysis. But a story about the one university that saddles its students with thousands in debt or the couple doctors that charge way more for the same procedures are also data stories — they tell us about outliers or exceptions.

Data stories can also take on a multitude of forms. Sometimes the data appears in the form of a long traditional investigative story, providing the context and specific numbers that form the backbone of the piece. Other times the data will appear in the form of a chart or interactive graphic that illustrates a paticular point. Other times the data will appear in the form of an interactive database that let's users search for their own city or school or hospital and tell their own story in the data.

Now, whatever the topic or form, remember that data journalism is not the same as merely throwing up a lot of data on a webpage. In other words, data journalism is not: "Here's some data, hope you find something interesting!" Data journalism is about helping people understand what's important about the data, why it matters, what it means for you and what it means for us. That's the real "journalism" in data journalism.

So let's take a look at some examples! Here are some questions that data can help us answer.

1. How much is my arm worth?

Workers' Comp Benefits

2. Is my doctor taking money from drug companies?

Dollars for Docs

3. Which emergency room will see me the fastest?

ER Wait Watcher

4. What's the worst that could happen on a cruise ship?

Cruise Control

Why should I care?

Data can help you answer questions that regular reporting can't do (or it would take a million years). So you automatically put yourself in a position to do stories that other journalists can't do. Think about that!

Sometimes the data can be hard to collect. All of the examples above required effort to get the data. Sometimes you have to collect it yourself. Sometimes it comes from the government and you have to reformat, analyze & mash it up with other data sets. And other times you have to scrape it off of really annoying websites like this one. We'll talk more about scraping soon.

Data can come from lots of different places

  1. Public/Official Sources
  2. FOIA
  3. Random Places on the Internet

Ok, but why should I bother with coding?

  1. Speed: Coding can help you do things a million times faster than you could by hand (including processing, analyzing or formatting your data.)
  2. Access: A few lines of code can help you get you (almost!) whatever you want from the web.
  3. Tools: There are so many tools that require only a little bit of code to tell really useful and visual stories. We'll cover just a few of them.

Next up: How to Get Data From the Web »