JournalArticle,

Deepfake detection by human crowds, machines, and machine-informed crowds

M. Groh, Z. Epstein, C. Firestone, and R. Picard.
(May 13, 2021)
DOI: 10.1073/pnas.2110013119

Abstract

Significance The recent emergence of deepfake videos raises theoretical and practical questions. Are humans or the leading machine learning model more capable of detecting algorithmic visual manipulations of videos? How should content moderation systems be designed to detect and flag video-based misinformation? We present data showing that ordinary humans perform in the range of the leading machine learning model on a large set of minimal context videos. While we find that a system integrating human and model predictions is more accurate than either humans or the model alone, we show inaccurate model predictions often lead humans to incorrectly update their responses. Finally, we demonstrate that specialized face processing and the ability to consider context may specially equip humans for deepfake detection. The recent emergence of machine-manipulated media raises an important societal question: How can we know whether a video that we watch is real or fake? In two online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary human observers with the leading computer vision deepfake detection model and find them similarly accurate, while making different kinds of mistakes. Together, participants with access to the model’s prediction are more accurate than either alone, but inaccurate model predictions often decrease participants’ accuracy. To probe the relative strengths and weaknesses of humans and machines as detectors of deepfakes, we examine human and machine performance across video-level features, and we evaluate the impact of preregistered randomized interventions on deepfake detection. We find that manipulations designed to disrupt visual processing of faces hinder human participants’ performance while mostly not affecting the model’s performance, suggesting a role for specialized cognitive capacities in explaining human deepfake detection performance.

BibTeX key: Matthew2021
entry type: JournalArticle
year: 2021
month: 5
day: 13
journal: Proceedings of the National Academy of Sciences of the United States of America
volume: 119
DOI: 10.1073/pnas.2110013119
url: https://www.semanticscholar.org/paper/d3f7cef8256d85e9505c3f4c329225c22b8162dc

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Generic %1 Matthew2021 %A Groh, Matthew %A Epstein, Ziv %A Firestone, C. %A Picard, Rosalind W. %D 2021 %J Proceedings of the National Academy of Sciences of the United States of America %K deepfake_detection human_cognition machine_learning visual_processing media_studies posted_with_chatgpt %R 10.1073/pnas.2110013119 %T Deepfake detection by human crowds, machines, and machine-informed crowds %U https://www.semanticscholar.org/paper/d3f7cef8256d85e9505c3f4c329225c22b8162dc %V 119 %X Significance The recent emergence of deepfake videos raises theoretical and practical questions. Are humans or the leading machine learning model more capable of detecting algorithmic visual manipulations of videos? How should content moderation systems be designed to detect and flag video-based misinformation? We present data showing that ordinary humans perform in the range of the leading machine learning model on a large set of minimal context videos. While we find that a system integrating human and model predictions is more accurate than either humans or the model alone, we show inaccurate model predictions often lead humans to incorrectly update their responses. Finally, we demonstrate that specialized face processing and the ability to consider context may specially equip humans for deepfake detection. The recent emergence of machine-manipulated media raises an important societal question: How can we know whether a video that we watch is real or fake? In two online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary human observers with the leading computer vision deepfake detection model and find them similarly accurate, while making different kinds of mistakes. Together, participants with access to the model’s prediction are more accurate than either alone, but inaccurate model predictions often decrease participants’ accuracy. To probe the relative strengths and weaknesses of humans and machines as detectors of deepfakes, we examine human and machine performance across video-level features, and we evaluate the impact of preregistered randomized interventions on deepfake detection. We find that manipulations designed to disrupt visual processing of faces hinder human participants’ performance while mostly not affecting the model’s performance, suggesting a role for specialized cognitive capacities in explaining human deepfake detection performance.

@JournalArticle{Matthew2021, abstract = {Significance The recent emergence of deepfake videos raises theoretical and practical questions. Are humans or the leading machine learning model more capable of detecting algorithmic visual manipulations of videos? How should content moderation systems be designed to detect and flag video-based misinformation? We present data showing that ordinary humans perform in the range of the leading machine learning model on a large set of minimal context videos. While we find that a system integrating human and model predictions is more accurate than either humans or the model alone, we show inaccurate model predictions often lead humans to incorrectly update their responses. Finally, we demonstrate that specialized face processing and the ability to consider context may specially equip humans for deepfake detection. The recent emergence of machine-manipulated media raises an important societal question: How can we know whether a video that we watch is real or fake? In two online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary human observers with the leading computer vision deepfake detection model and find them similarly accurate, while making different kinds of mistakes. Together, participants with access to the model’s prediction are more accurate than either alone, but inaccurate model predictions often decrease participants’ accuracy. To probe the relative strengths and weaknesses of humans and machines as detectors of deepfakes, we examine human and machine performance across video-level features, and we evaluate the impact of preregistered randomized interventions on deepfake detection. We find that manipulations designed to disrupt visual processing of faces hinder human participants’ performance while mostly not affecting the model’s performance, suggesting a role for specialized cognitive capacities in explaining human deepfake detection performance.}, added-at = {2023-12-10T17:46:00.000+0100}, author = {Groh, Matthew and Epstein, Ziv and Firestone, C. and Picard, Rosalind W.}, biburl = {https://www.bibsonomy.org/bibtex/268e8c3a9f1d73c1216c19f7823f203bc/sarahajjib}, day = 13, description = {This study explores the effectiveness of human crowds, machines, and a combination of both in detecting deepfake videos. It reveals that disruptions in visual processing of faces affect human performance more than machine models, highlighting the unique cognitive abilities humans possess in deepfake detection.}, doi = {10.1073/pnas.2110013119}, interhash = {e9e16fbe42ce3396cb5ed5484b90d507}, intrahash = {68e8c3a9f1d73c1216c19f7823f203bc}, journal = {Proceedings of the National Academy of Sciences of the United States of America}, keywords = {deepfake_detection human_cognition machine_learning visual_processing media_studies posted_with_chatgpt}, month = {5}, timestamp = {2023-12-10T17:46:00.000+0100}, title = {Deepfake detection by human crowds, machines, and machine-informed crowds}, url = {https://www.semanticscholar.org/paper/d3f7cef8256d85e9505c3f4c329225c22b8162dc}, volume = 119, year = 2021 }

BibSonomy

Deepfake detection by human crowds, machines, and machine-informed crowds

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on