Abhishta defends PhD thesis on the impacts of DDoS attacks

The Blind Man and the Elephant: Measuring Economic Impacts of DDoS Attacks

by Abhishta

Internet has become an important part of our everyday life. We use services like Netflix, Skype, online banking and scopus etc. daily. We even use Internet for filing our taxes and communicating with municipality. This dependency on network-based technologies also provides an opportunity to malicious actors in our society to remotely attack IT infrastructure. One such cyberattack that may lead to unavailability of network resources is known as distributed denial of service (DDoS) attack. A DDoS attack leverages many computers to launch a coordinated Denial of Service attack against one or more targets.
These attacks cause damages to victim businesses. According to reports published by several consultancies and security companies these attacks lead to millions of dollars in losses every year. One might ponder: are the damages caused by temporary unavailability of network services really this large? One of the points of criticism for these reports has been that they often base their findings on victim surveys and expert opinions. Now, as cost accounting/book keeping methods are not focused on measuring the impact of cyber security incidents, it is highly likely that surveys are unable to capture the true impact of an attack. A concerning fact is that most C-level managers make budgetary decisions for security based on the losses reported in these surveys. Several inputs for security investment decision models such as return on security investment (ROSI) also depend on these figures. This makes the situation very similar to the parable of the blind men and the elephant, who try to conceptualise how the elephant looks like by touching it. Hence, it is important to develop methodologies that capture the true impact of DDoS attacks. In this thesis, we study the economic impact of DDoS attacks on public/private organisations by using an empirical approach.

[download thesis]

Honor Code Databases

Welcome to the Databases part of the course. We will resume Tuesday 5 November with the introduction lecture in SP 2 at 8:30h. The Databases part contains individual quizzes (that are mandatory) and assignments (that are optional, but give a bonus for the end grade), for which the following rules apply:

  • You do not share the solutions of the quizzes and assignments;
  • The solutions to the quizzes and assignments should be your own work;
  • You do not post the assignments, nor the solutions anywhere online;
  • You are allowed, and encouraged, to discuss the quizzes and assignments, and to ask clarifying questions to your fellow students; Please use the Brightspace Discussion Forum to reach out to your fellow students.

PhD candidate vacancy: Transfer Learning for Federated Search

We are looking for a PhD candidate to join the Data Science group at Radboud University for an exciting new project on transfer learning for language modelling with an application for federated search. Transfer learning learns general purpose language models from huge datasets, such as web crawls, and then trains the models further on smaller datasets for a specific task. Transfer learning in NLP has successfully used pre-trained word-embeddings for several tasks. Although the success of word embeddings on search tasks has been limited, recently pre-trained general purpose language representations such as BERT and ELMo have been successful on several search tasks, including question answering tasks and conversational search tasks. Resource descriptions in federated search consist of samples of the full data that are sparser than full resource representations. This raises the question of how to infer vocabulary that is missing from the sampled data. A promising approach comes from transfer learning from pre-trained language representations. An open question is how to effectively and efficiently apply those pre-trained representations and how to adapt them to the domain of federated search. In this project, you will use pre-trained language models, and further train those models for a (federated) search task. You will evaluate the quality of those models as part of international evaluation conferences like the Text Retrieval Conference (TREC) and the Conference and Labs of the Evaluation Forum (CLEF).

[more information]

Predicting Semantic Labels of Text Regions in Heterogeneous Document Images

by Somtochukwu Enendu, Johannes Scholtes, Jeroen Smeets, Djoerd Hiemstra, and Mariet Theune

This paper describes the use of sequence labeling methods in predicting the semantic labels of extracted text regions of heterogeneous electronic documents, by utilizing features related to each semantic label. In this study, we construct a novel dataset consisting of real world documents from multiple domains. We test the performance of the methods on the dataset and offer a novel investigation into the influence of textual features on performance across multiple domains. The results of the experiments show that the neural network method slightly outperforms the Conditional Random Field method with limited training data available. Regarding generalizability, our experiments show that the inclusion of textual features aids performance improvements.

Presented at The Conference on Natural Language Processing (“Konferenz zur Verarbeitung natürlicher Sprache”, KONVENS) on 9-11 October in Nürnberg, Germany

[download pdf]

Ties de Kock graduates on visualization recommendation

Visualization recommendation in a natural setting

by Ties de Kock

Data visualization is often the first step in data analysis. However, creating visualizations is hard: it depends on both knowledge about the data and design knowledge. While more and more data is becoming available, appropriate visualizations are needed to explore this data and extract information. Knowledge of design guidelines is needed to create useful visualizations, that are easy to understand and communicate information effectively.
Visualization recommendation systems support an analyst in choosing an appropriate visualization by providing visualizations, generated from design guidelines implemented as (design) rules. Finding these visualizations is a non-convex optimization problem where design rules are often mutually exclusive: For example, on a scatter plot, the axes can often be swapped; however, it is common to have time on the x-axis.
We propose a system where design rules are implemented as hard criteria and heuristics encoded as soft criteria that do not need to be satisfied, that guide the system toward effective chart designs. We implement this approach in a visualization recommendation system named OVERLOOK , modeled as an optimization problem implemented with the Z3 Satisfiability Modulo Theories solver. Solving this multi-objective optimization problem results in a Pareto front of visualizations balancing heuristics, of which the top results were evaluated in a user study using an evaluation scale for the quality of visualizations as well as the low-level component tasks for which they can be used. In evaluation, we did not find a difference in performance between OVERLOOK and a baseline of manually created visualizations for the same datasets.
We demonstrated OVERLOOK, a system that creates visualization prototypes based on formal rules and ranks them using the scores from both hard- and soft criteria. The visualizations from OVERLOOK were evaluated in a user study for quality. We demonstrate that the system can be used in a realistic setting. The results lead to future work on learning weights for partial scores, given a low-level component task, based on the human quality annotations for generated visualizations.

[download pdf]

Somto Enendu graduates cum laude on labelling document images

Predicting Semantic Labels of Text Regions in Heterogeneous Document Images

by Somtochukwu Enendu

This MSc thesis describes the use of sequence labeling methods in predicting the semantic labels of extracted text regions of heterogeneous electronic documents, by utilizing features related to each semantic label. In this study, we construct a novel dataset consisting of real world documents from multiple domains. We test the performance of the methods on the dataset and offer a novel investigation into the influence of textual features on performance across multiple domains. The results of the experiments show that the Conditional Random Field method is robust, outperforming the neural network when limited training data is available. Regarding generalizability, our experiments show that the inclusion of textual features does not guarantee performance improvements.

[download pdf]

FFORT: A benchmark suite for fault tree analysis

by Enno Ruijters, Carlos Budde, Muhammad Nakhaee, Mariëlle Stoelinga, Doina Bucur, Djoerd Hiemstra, and Stefano Schivo

This paper presents FFORT (the Fault tree FOResT): A large, diverse, extendable, and open benchmark suite consisting of fault tree models, together with relevant metadata. Fault trees are a common formalism in reliability engineering, and the FFORT benchmark brings together a large and representative suite of fault tree models. The benchmark provides each fault tree model in standard Galileo format, together with references to its origin, and a textual and/or graphical description of the tree. This includes quantitative information such as failure rates, and the results of quantitative analyses of standard reliability metrics, such as the system reliability, availability and mean time to failure. Thus, the FFORT benchmark provides:(1) Examples of how fault trees are used in various domains; (2) A large class of tree models to evaluate fault tree methods and tools; (3) Results of analyses to compare newly developed methods with the benchmark results. Currently, the benchmark suite contains 202 fault tree models of great diversity in terms of size, type, and application domain. The benchmark offers statistics on several relevant model features, indicating e.g. how often such features occur in the benchmark, as well as search facilities for fault tree models with the desired features. Inaddition to the trees already collected, the website provides a user-friendly submission page, allowing the general public to contribute with more fault trees and/or analysis results with new methods. Thereby, we aim to provide an open-access, representative collection of fault trees at the state of the art in modeling and analysis.

Presented at the 29th European Safety and Reliability Conference (ESREL 2019) in Hannover, Germany

[download pdf]

Welcome to Information Modelling and Databases

We are excited to announce a new course setup for the topics Information Modeling and Databases that will be combined this year into a single new course. We will consider design practices and tools that are relevant for the entire software system’s life cycle. We will study how to accurately model a system by understanding the domain under consideration, to specify the boundaries of the domain, to identify the relevant concepts in the domain and their relationships, and to specify the rules or constraints of the behaviour of those concepts. We will use relational database technology — one of the most successful inventions in computer science — to implement the data part of our model. Relational databases are great tools for storing large amounts of data persistently, and they allow efficient, safe, multi-user access to that data. We will study SQL to formulate and answer complex questions on the data in a declarative way. The course consists of lectures and practical assignments. More information will be published shortly on Brightspace.

Wishing you a fruitful, interesting course,
Patrick van Bommel and Djoerd Hiemstra.

ECIR 2019 proceedings online

by Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoerd Hiemstra

The 41st European Conference on Information Retrieval (ECIR) was held in Cologne, Germany, during April 14–18, 2019, and brought together hundreds of researchers from Europe and abroad. The conference was organized by GESIS–Leibniz Institute for the Social Sciences and the University of Duisburg-Essen — in cooperation with the British Computer Society’s Information Retrieval Specialist Group (BCS-IRSG). These proceedings contain the papers, presentations, workshops, and tutorials given during the conference. This year the ECIR 2019 program boasted a variety of novel work from contributors from all around the world and provided new platforms for promoting information retrieval-related (IR) activities from the CLEF Initiative. In total, 365 submissions were fielded across the tracks from 50 different countries.
The final program included 39 full papers (23% acceptance rate), 44 short papers (29% acceptance rate), eight demonstration papers (67% acceptance rate), nine reproducibility full papers (75% acceptance rate), and eight invited CLEF papers. All submissions were peer reviewed by at least three international Program Committee members to ensure that only submissions of the highest quality were included in the final program. As part of the reviewing process we also provided more detailed review forms and guidelines to help reviewers identify common errors in IR experimentation as a way to help ensure consistency and quality across the reviews.
The accepted papers cover the state of the art in IR: evaluation, deep learning, dialogue and conversational approaches, diversity, knowledge graphs, recommender systems, retrieval methods, user behavior, topic modelling, etc., and also include novel application areas beyond traditional text and Web documents such as the processing and retrieval of narrative histories, images, jobs, biodiversity, medical text, and math. The program boasted a high proportion of papers with students as first authors, as well as papers from a variety of universities, research institutes, and commercial organizations.
In addition to the papers, the program also included two keynotes, four tutorials, four workshops, a doctoral consortium, and an industry day. The first keynote was presented by this year’s BCS IRSG Karen Sparck Jones Award winner, Prof. Krisztian Balog, On Entities and Evaluation, and the second keynote was presented by Prof. Markus Strohmaier, On Ranking People. The tutorials covered a range of topics from conducting lab-based experiments and statistical analysis to categorization and deeplearning, while the workshops brought together participants to discuss algorithm selection (AMIR), narrative extraction (Text2Story), Bibliometrics (BIR), as well as social media personalization and search (SoMePeAS). As part of this year’s ECIR we also introduced a new CLEF session to enable CLEF organizers to report on and promote their upcoming tracks. In sum, this added to the success and diversity of ECIR and helped build bridges between communities.
The success of ECIR 2019 would not have been possible without all the help from the team of volunteers and reviewers. We wish to thank all our track chairs for coordinating the different tracks along with the teams of meta-reviewers and reviewers who helped ensure the high quality of the program. We also wish to thank the demo chairs: Christina Lioma and Dagmar Kern; student mentorship chairs: Ahmet Aker and Laura Dietz; doctoral consortium chairs: Ahmet Aker, Dimitar Dimitrov and Zeljko Carevic; workshop chairs: Diane Kelly and Andreas Rauber; tutorial chairs: Guillaume Cabanac and Suzan Verberne; industry chair: Udo Kruschwitz; publicity chair: Ingo Frommholz; and sponsorship chairs: Jochen L. Leidner and Karam Abdulahhad. We would like to thank our webmaster, Sascha Schüller and our local chair, Nina Dietzel along with all the student volunteers who helped to create an excellent online and offline experience for participants and attendees.

Published as: Advances in Information Retrieval. Proceedings of the 41st European Conference on Information Retrieval Research (ECIR), Lecture Notes in Computer Science, volumes 11437 and 11438, Springer, 2019
[Part I] [Part II]