Misrepresentation



Data science ethics

Data Science with R

Overview

Misrepresenting data science results

Some common ways people do this, either intentionally or unintentionally, include:

  • Claiming causality where it’s not in the scope of inference of the underlying study

  • Distorting axes and scales to make the data tell a different story

  • Visualizing spatial areas instead of human density for issues that depend on and affect humans

  • Omitting uncertainty in reporting

Causality

TIME coverage

How plausible is the statement in the title of this article?

LA Times coverage

What does “research shows” mean?

Original study

Moore, Steven C., et al. “Association of leisure-time physical activity with risk of 26 types of cancer in 1.44 million adults.” JAMA internal medicine 176.6 (2016): 816-825.

  • Volunteers were asked about their physical activity level over the preceding year.
  • Half exercised less than about 150 minutes per week, half exercised more.
  • Compared to the bottom 10% of exercisers, the top 10% had lower rates of esophageal, liver, lung, endometrial, colon, and breast cancer.
  • Researchers found no association between exercising and 13 other cancers (e.g. pancreatic, ovarian, and brain).

Axes and scales

Tax cuts

What is the difference between these two pictures? Which presents a better way to represent these data?

Cost of gas

What is wrong with this picture? How would you correct it?

Cost of gas

COVID in GA

What is wrong with this picture? How would you correct it?

COVID in GA

PP services

What is wrong with this picture? How would you correct it?

PP services

Maps and areas

Voting map

Do you recognize this map? What does it show?

Two alternate tales

Voting percentages

Voting percentages

Uncertainty

Catalan independence

On December 19, 2014, the front page of Spanish national newspaper El País read “Catalan public opinion swings toward ‘no’ for independence, says survey”.

Catalan independence

Further reading

How Charts Lie

How Charts Lie

Getting Smarter about Visual Information

by Alberto Cairo

Calling Bullshit

Calling Bullshit
The Art of Skepticism in a
Data-Driven World

by Carl Bergstrom and Jevin West