Abstract
Anomaly based intrusion detection systems suffer from a lack of appropriate evaluation data sets. Often, existing data sets may not be published due to privacy concerns or do not reflect actual and current attack scenarios. In order to
overcome these problems, we identify characteristics of good data sets and develop an appropriate concept for the
generation of labelled flow-based data sets that satisfy these criteria. The concept is implemented based on OpenStack, thus
demonstrating the suitability of virtual environments. Virtual environments offer advantages compared to static data sets
by easily creating up-to-date data sets with recent trends in user behaviour and new attack scenarios. In particular, we
emulate a small business environment which includes several clients and typical servers. Network traffic is generated by
scripts which emulate typical user activities like surfing the web, writing emails, or printing documents on the clients. These
scripts follow some guidelines to ensure that the user behaviour is as realistic as possible, also with respect to working hours
and lunch breaks. The generated network traffic is recorded in unidirectional NetFlow format. For generating malicious
traffic, attacks like Denial of Service, Brute Force, and Port Scans are executed within the network. Since origins, targets, and
timestamps of executed attacks are known, labelling of recorded NetFlow data is easily possible. For inclusion of actual
traffic, which has its origin outside the OpenStack environment, an external server with two services is deployed. This server
has a public IP address and is exposed to real and up-to-date attacks from the internet. We captured approximately 32 million
flows over a period of four weeks and categorized them into five classes. Further, the chronological sequence of the flows is
analysed and the distribution of normal and malicious traffic is discussed in detail. The main contribution of this paper is the
demonstration of a novel approach to use OpenStack as a basis for generating realistic data sets that can be used for the
evaluation of network intrusion detection systems.
Users
Please
log in to take part in the discussion (add own reviews or comments).