Scientific and economic progress is increasingly powered by our capabilities to explore big datasets. Data is the driving force behind the successful innovation of Internet companies like Google, Twitter, and Yahoo, and job advertisements show an increasing need for data scientists and big data analysts. Data scientists dig for value in data by analyzing for instance texts, application usage logs, and sensor data. The need for data scientists and big data analysts is apparent in almost every sector in our society, including business, health care, and education.
The Twente Center for Data Science is a collaboration between research groups of the University of Twente to research, promote and facilitate big data analysis for all scientific disciplines. The center operates by the participants sharing their expertise, sharing their contacts, sharing their data, and sharing their research infrastructure (hardware and software) for large-scale data analysis.
The Twente Data Science Center offers a unique combination of expertise in computer science, mathematics, management, behavioral sciences and social sciences; collaborations with leading international companies such as Google, Twitter and Yahoo; and local infrastructure and support for the analysis of very large datasets.
The Norvig Web Data Science Award is organized by Common Crawl and SURFsara for researchers and students in the Benelux. SURFsara provides free access to the their Hadoop cluster with a copy of the full Common Crawl web crawl from March 2014 – almost 3 billion web pages. Participants are completely free in choosing their research question. For example, last year there were submissions looking at concept association, connections between languages, readability and more. Be creative and think outside of the box!
The award is named after Peter Norvig, Director of Research at Google, who chairs the jury that will select the winning submission. The contest will run until July 31, 2014. The winning team will be announced at the award ceremony in September 2014 and will get a tablet, smart watch and Github small plan for a year.
We are very proud that Ravi Kumar from Google agreed to give a keynote speech at the CTIT Symposium on Big Data and the Emergence of Data Science. Kumar, who is well-known for hist work on web and data mining and algorithms for large data sets, has been a senior staff research scientist at Google since June 2012. Prior to this, he was a research staff member at the IBM Almaden Research Center and a principal research scientist at Yahoo! Research. He obtained his Ph.D. in Computer Science from Cornell University in 1998. Ravi Kumar's talk will cover two non- conventional computational models for analyzing big data. The first is data streams: in this model, data arrives in a stream and the algorithm is tasked with computing a function of the data without explicitly storing it. The second is map-reduce: in this model, data is distributed across many machines and computation is done as sequence of map and reduce operations. Kumar will present a few algorithms in these models and discuss their scalability.
The workshop takes place on Tuesday 4 June at the University of Twente. Other invited spearkers at the CTIT symposium are Maarten de Rijke (U. Amsterdam) and Milan Petkovic (Philips).
We are very excited to announce that the winners of the Norvig Web Data Science Award: Lesley Wevers, Oliver Jundt, and Wanno Drijfhout from the University of Twente! The Norvig Web Data Science Award was created by Common Crawl and SURFsara to encourage research in web data science and named in honor of distinguished computer scientist Peter Norvig.
There were many excellent submissions that demonstrated how you can extract valuable insight and knowledge from web crawl data. Be sure to check out the work of the winning team, Traitor – Associating Concepts Using The World Wide Web, and the other finalists on the award website. You will find descriptions of the projects as well as links to the code that was used. We hope that these projects will serve as an inspiration for what kind of work can be done with the Common Crawl corpus. All code is open source and we are looking forward to seeing it reused and adapted for other projects.
Together with Common Crawl and SARA, we invite students and researchers studying at or employed by research institutes or universities in the Netherlands to dive into the Common Crawl web corpus using the SARA Hadoop service. The best submission will receive the The Norvig Web Data Science Award, a tablet, and 1500 Euro to spend on travel, accommodation, and conference registration fee for SIGIR 2013 to be held in Dublin, Ireland.
The award is named after Peter Norvig, Google's director of research with a resume too impressive to summarize. Peter is on the advisory board of Common Crawl, and is chair of the jury for this award. Other jury members are Ricardo Baeza-Yates (Yahoo!), Hilary Mason (bit.ly), Jimmy Lin (University of Maryland), and Evert Lammerts (SARA).