DB guest lecture by Hannes Mühleisen

We are proud to announce that Hannes Mühleisen will give a guest lecture on Tuesday 10 December at 15:30h. in EOS N 01.630 for the course Information Modelling and Databases. Hannes Mühleisen is professor of Data Engineering at Radboud University, the creator of DuckDB and co-founder and CEO of DuckDB Labs. Students of the course use DuckDB to practice their SQL skills.

Analytical Query Processing and the DuckDB System

by Hannes Mühleisen

DBMSs have historically been created to support transactional (OLTP) workloads. However, a second use case, analytical data analysis (OLAP), quickly appeared. These workloads are characterised by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. Its rather impossible for an OLTP-focused DBMS to perform well in OLAP scenarios, which is why specialised systems have been developed. In this lecture, I will introduce analytical query processing, give an overview over the state of the art in research and industry, and describe our own analytical DBMS, DuckDB.

Introducing Zoekeend

We made a little tool for running information retrieval experiments using DuckDB which we appropriately called Zoekeend (Dutch for “search duck”). Zoekeend will be presented at DuckCon #6 in Amsterdam on 31 January 2025.

I will present several reproduced experiments, such as ranking using (small) language models, imports of indexes in the common index file format (CIFF), and the CIFF tokenizer based on tokenizers of large language models, all elegantly defined as SQL queries. I will further present ongoing work on new types of indexes for search engines, such as the score-fitted index, the constant length index and the term-grouped index, all of which would be extremely cumbersome to implement in existing search engines like Lucene, but can be easily defined as SQL queries in DuckDB. Zoekeend will greatly simplify information retrieval experimentation. Zoekeend is open source and available from: https://gitlab.science.ru.nl/informagus/zoekeend/

Welcome to Databases

Welcome to Information Modelling & Databases, Part B, Databases! We will resume Tuesday 5 November with a lecture at 15:30h. in EOS N 01.630.

The Databases part contains mandatory, individual quizzes, for which the following honour code applies:

  • You do not share the solutions;
  • The solutions to the quizzes should be your own work;
  • You do not post the quizzes, nor the solutions anywhere online;
  • You do not use instruction-tuned large language models like Github Copilot or ChatGPT;
  • You are allowed, and encouraged, to discuss the quizzes, and to ask clarifying questions to your fellow students; Please use the Brightspace Discussion Forum to reach out to me, the teaching assistants and your fellow students.

New this year are the optional SQL Mastery Assignments for students that want to go the extra mile. Students that successfully submit solutions to the SQL Mastery Assignments get free travel and participation to DuckCon #6 in Amsterdam on 31 January 2025!!

Also this year, we will experiment with a new automatic grader called Socoles that will automatically give feedback on open questions that require SQL solutions. Socoles is developed by Benard Wanjiru. Socoles helps us grade the assignments for more than 300 students in the course. Of course, you will get human feedback too, during the tutorials on Friday mornings.

Wishing you a fruitful Part B!
Best wishes,  Djoerd Hiemstra

Sensitivity of Automated SQL Grading in Computer Science Courses

by Benard Wanjiru, Patrick van Bommel, and Djoerd Hiemstra

Previous research has primarily relied on fixed procedures when implementing partial grading systems. As a result, the sensitivity of such systems in terms of error analysis becomes inflexible as well. In this paper, we employ a software correctness model that allows for a dynamic and flexible approach for adjusting the sensitivity of a grading system based on the user’s needs and goals. We show how partial grading can be used to award fair grades and also categorize students into groups based on their strengths and weaknesses observed in their answers. Furthermore, we show how the sensitivity of a grading system can be varied to allow such grouping. To illustrate this, we analysed more than 2000 answers for 6 SQL programming assignments. An implication of this study is that instructors can carry out more effective partial grading of SQL queries as well as adjust learning material based on the needs of a particular group of students. They can address the observed limitations, thereby bridging the gap between high-performing students and those that require additional attention.

To be presented at the third International Conference on Innovations in Computing Research (ICR) on August 12–14, 2024 in Athens, Greece.

[download pdf]

Towards a Generic Model for Classifying Software into Correctness Levels and its Application to SQL

by Benard Wanjiru, Patrick van Bommel, and Djoerd Hiemstra

Automated grading systems can save a lot of time when carrying our grading of software exercises. In this paper, we present our ongoing work on a generic model for generating software correctness levels. These correctness levels enable partial grades of students’ software exercises. The generic model can be used as a foundation for correctness of SQL queries and can be generalized to different programming languages.

To be presented at the SEENG 2023 Workshop on Software Engineering for the Next Generation of the 45th International Conference on Software Engineering on Tuesday 16 May in Melbourne, Australia.

[download pdf]

Guest lecture by Hannes Mühleisen

We are proud to announce that Hannes Mühleisen will give a guest lecture on Tuesday 13 December at 13:30h. in LIN-2 for the course Information Modelling and Databases. Hannes Mühleisen is the creator of DuckDB and co-founder and CEO of DuckDB Labs. He is also a senior researcher of the Database Architectures group at the Centrum Wiskunde & Informatica (CWI) in Amsterdam. Students of the course use DuckDB to practice their SQL skills.

Analytical Query Processing and the DuckDB System

by Hannes Mühleisen

DBMSs have historically been created to support transactional (OLTP) workloads. However, a second use case, analytical data analysis (OLAP), quickly appeared. These workloads are characterised by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. Its rather impossible for an OLTP-focused DBMS to perform well in OLAP scenarios, which is why specialised systems have been developed. In this lecture, I will introduce analytical query processing, give an overview over the state of the art in research and industry, and describe our own analytical DBMS, DuckDB.

PhD vacancy for software correctness

We’re hiring a PhD Candidate for Software Correctness. The work will target the correctness of high-level programming languages that are “only” strings in your host language, such as SQL and regular expressions.

Software has shaped almost every aspect of our modern lives. Ensuring that software is correct, is both a major scientific challenge and an enterprise with enormous social relevance. Would you like to examine possibilities to introduce a theory for correctness levels for software? Then you have a part to play as a PhD Candidate.

The correctness of software is of major importance in computer science. Unfortunately, the significance of software correctness is not always clear. Furthermore, the automatic checking of software correctness is difficult. This leads to problems during system development projects and during the grading of software exercises.

This PhD candidate position is intended for four years. You understand the importance of correct software and know how to work with several meanings of software correctness. Your goal is to introduce a generic and formal theory of software correctness levels, in which partially correct/incorrect software can be handled in a flexible way.

You will put the generic theory into practice, by experimenting with automatic grading of software exercises in the context of our courses. One application will be dealing with automatic grading of SQL statements. You will be supervised by Patrick van Bommel and Djoerd Hiemstra. Profile:

  • You are an enthusiastic and motivated researcher.
  • You should have a Master’s degree in computer science, or a Master’s degree in mathematics and a demonstrable interest in computer science.

[Apply On-line]

(Deadline: 6 March 2022)

Welcome to Databases

Welcome to IM&DB Part B, Databases. We will resume Monday 2 November with online pre-recorded lectures and the Live Session on Zoom at 15:30h. The Databases part contains mandatory, individual quizzes, for which the following rules apply:

  • You do not share the solutions;
  • The solutions to the quizzes should be your own work;
  • You do not post the quizzes, nor the solutions anywhere online;
  • You are allowed, and encouraged, to discuss the quizzes, and to ask clarifying questions to your fellow students; Please use the Brightspace Discussion Forum to reach out to your fellow students.

Wishing you a fruitful Part B!
Best wishes,  Djoerd Hiemstra

Guest lecture by Arjen de Vries

Tuesday 17 December 8:30h. in SP-2, prof. Arjen de Vries will give a guest lecture on column-oriented relational database management systems (DBMS). A column-oriented DBMS (or column store) is a DBMS that physically stores tables by column rather than by row. In previous lectures we have been mostly concerned with Online Transaction Processing (OLTP) workloads, with lots of small inserts and lots of queries over parts of the data. Column stores, however, are well-suited for Online Analytical Processing (OLAP) workloads which involve complex analytical queries over all data.

Attendance to this lecture is highly recommended.

Honor Code Databases

Welcome to the Databases part of the course. We will resume Tuesday 5 November with the introduction lecture in SP 2 at 8:30h. The Databases part contains individual quizzes (that are mandatory) and assignments (that are optional, but give a bonus for the end grade), for which the following rules apply:

  • You do not share the solutions of the quizzes and assignments;
  • The solutions to the quizzes and assignments should be your own work;
  • You do not post the assignments, nor the solutions anywhere online;
  • You are allowed, and encouraged, to discuss the quizzes and assignments, and to ask clarifying questions to your fellow students; Please use the Brightspace Discussion Forum to reach out to your fellow students.