Jeroen Vonk graduates on Bisimulation reduction with MapReduce

by Jeroen Vonk

Within the field of Computer Science a lot of previous and current research is done on model checking. Model checking allows researchers to simulate a process or system, and exhaustively test for wanted or non-wanted properties. Logically, the result of these test are as dependable as your model represents the actual system. The best model then, would be a model representing the system down to its last atom, allowing for every possible interaction with the model. The model of course will become extremely large, a situation known as state space explosion. Current research therefore focuses on:

  • Storing larger models
  • Processing large models faster and smarter
  • Reducing the size of models, whilst keeping the same properties

In this thesis we will focus on reducing the size of the models using bisimulation reduction. Bisimulation reduction allows to identify similar states that can be merged whilst preserving certain properties of the model. These similar, or redundant states will be identified by comparing them with other states in the model using a bisimulation relation. The bisimulation relation will identify states showing the same behavior, that therefore can be merged. This process is called bisimulation reduction. A common method to determine the smallest model is using partition refinement. In order to use the algorithm on large models it needs to be scalable. Therefore we will be using a framework for distributed processing that is part of Hadoop, called MapReduce. Using this framework provides us with a robust system that automatically recovers from e.g. hardware faults. The use of MapReduce also makes our algorithm scalable, and easily executed at third party clusters.
During our experiments we saw that the execution-time for a MapReduce job takes a relatively long time. We have estimated that there is a startup cost for each job of circa 30 seconds. This means that the reduction of transition systems that need a lot of iterations can be very high. Extreme cases such as the vasy 40 60 which take over 20,000 iterations therefore could not be benchmarked within an acceptable time-frame. Each iteration all of our data is passed over the disk. Therefore it is not unreasonable to see a factor 10-100 slow down compared to a mpi-based implementation (e.g. LTSmin). From our experiments we have concluded that the separate iteration times of our algorithm scale linearly up to 108 transitions for strong bisimulation and 107 for branching bisimulation. On larger models the iteration time increases exponentially, therefore we where not able to benchmark our largest model.

[download pdf]