The Web is replete with tutorial-style content on how to accomplish
programming tasks. Unfortunately, even top-ranked tutorials suffer from severe
security vulnerabilities, such as cross-site scripting (XSS), and SQL injection
(SQLi). Assuming that these tutorials influence real-world software
development, we hypothesize that code snippets from popular tutorials can be
used to bootstrap vulnerability discovery at scale. To validate our hypothesis,
we propose a semi-automated approach to find recurring vulnerabilities starting
from a handful of top-ranked tutorials that contain vulnerable code snippets.
We evaluate our approach by performing an analysis of tens of thousands of
open-source web applications to check if vulnerabilities originating in the
selected tutorials recur. Our analysis framework has been running on a standard
PC, analyzed 64,415 PHP codebases hosted on GitHub thus far, and found a total
of 117 vulnerabilities that have a strong syntactic similarity to vulnerable
code snippets present in popular tutorials. In addition to shedding light on
the anecdotal belief that programmers reuse web tutorial code in an ad hoc
manner, our study finds disconcerting evidence of insufficiently reviewed
tutorials compromising the security of open-source projects. Moreover, our
findings testify to the feasibility of large-scale vulnerability discovery
using poorly written tutorials as a starting point.
Description
Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability
Discovery
%0 Generic
%1 unruh2017leveraging
%A Unruh, Tommi
%A Shastry, Bhargava
%A Skoruppa, Malte
%A Maggi, Federico
%A Rieck, Konrad
%A Seifert, Jean-Pierre
%A Yamaguchi, Fabian
%D 2017
%K php
%T Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability
Discovery
%U http://arxiv.org/abs/1704.02786
%X The Web is replete with tutorial-style content on how to accomplish
programming tasks. Unfortunately, even top-ranked tutorials suffer from severe
security vulnerabilities, such as cross-site scripting (XSS), and SQL injection
(SQLi). Assuming that these tutorials influence real-world software
development, we hypothesize that code snippets from popular tutorials can be
used to bootstrap vulnerability discovery at scale. To validate our hypothesis,
we propose a semi-automated approach to find recurring vulnerabilities starting
from a handful of top-ranked tutorials that contain vulnerable code snippets.
We evaluate our approach by performing an analysis of tens of thousands of
open-source web applications to check if vulnerabilities originating in the
selected tutorials recur. Our analysis framework has been running on a standard
PC, analyzed 64,415 PHP codebases hosted on GitHub thus far, and found a total
of 117 vulnerabilities that have a strong syntactic similarity to vulnerable
code snippets present in popular tutorials. In addition to shedding light on
the anecdotal belief that programmers reuse web tutorial code in an ad hoc
manner, our study finds disconcerting evidence of insufficiently reviewed
tutorials compromising the security of open-source projects. Moreover, our
findings testify to the feasibility of large-scale vulnerability discovery
using poorly written tutorials as a starting point.
@misc{unruh2017leveraging,
abstract = {The Web is replete with tutorial-style content on how to accomplish
programming tasks. Unfortunately, even top-ranked tutorials suffer from severe
security vulnerabilities, such as cross-site scripting (XSS), and SQL injection
(SQLi). Assuming that these tutorials influence real-world software
development, we hypothesize that code snippets from popular tutorials can be
used to bootstrap vulnerability discovery at scale. To validate our hypothesis,
we propose a semi-automated approach to find recurring vulnerabilities starting
from a handful of top-ranked tutorials that contain vulnerable code snippets.
We evaluate our approach by performing an analysis of tens of thousands of
open-source web applications to check if vulnerabilities originating in the
selected tutorials recur. Our analysis framework has been running on a standard
PC, analyzed 64,415 PHP codebases hosted on GitHub thus far, and found a total
of 117 vulnerabilities that have a strong syntactic similarity to vulnerable
code snippets present in popular tutorials. In addition to shedding light on
the anecdotal belief that programmers reuse web tutorial code in an ad hoc
manner, our study finds disconcerting evidence of insufficiently reviewed
tutorials compromising the security of open-source projects. Moreover, our
findings testify to the feasibility of large-scale vulnerability discovery
using poorly written tutorials as a starting point.},
added-at = {2017-12-30T11:13:20.000+0100},
author = {Unruh, Tommi and Shastry, Bhargava and Skoruppa, Malte and Maggi, Federico and Rieck, Konrad and Seifert, Jean-Pierre and Yamaguchi, Fabian},
biburl = {https://www.bibsonomy.org/bibtex/2cbd47f983007037d2d58027f6033ab6d/s_bergmann},
description = {Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability
Discovery},
interhash = {0a2a93bbd060502c1dff78f490569e7a},
intrahash = {cbd47f983007037d2d58027f6033ab6d},
keywords = {php},
note = {cite arxiv:1704.02786Comment: 17+3 pages},
timestamp = {2017-12-30T11:13:20.000+0100},
title = {Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability
Discovery},
url = {http://arxiv.org/abs/1704.02786},
year = 2017
}