Candy Reebroek graduates on engagement behavior in online brand communities

Understanding engagement behavior in online brand communities : how social identity relates to frequency of interaction and tweet sentiment.

by Candy Reebroek

This study explains engagement behavior in online brand communities based on data of Twitter users who present different types of social identities. For this, we examined fifteen online brand communities that are popular on Twitter and originated from fashion, fast-food, gaming, cars, and sports sectors. In total, 27,143 twitter messages were analyzed from 22,333 unique Twitter users. We used the Twitter user’s profile descriptions to classify their social identity with the help of computational methods such as Machine Learning and Natural Language Processing. To study the engagement behavior of the Twitter users, we calculated the tweets sentiment and the frequency of interaction between Twitter users and online brand communities. We found that tweet sentiment and frequency of interaction vary significantly between different social identity groups when mentioning different online brand communities. This result is important for online brand community managers to understand what kind of Twitter users interact with their online brand community and how these users engage with the community. Right now, they might only investigate demographics about the users but do not consider the user’s self-presentation online. Furthermore, we made a theoretical contribution by including a larger dataset, by performing computational methods and by exploring multiple online brand communities from different sectors.

[download pdf]

The role of Online Identity on Donations

The role of Online Identity on Donations to Nonprofit Organizations in Online Health Campaigns

by Anna Priante, Ariana Need, Tijs van den Broek, and Djoerd Hiemstra

Nonprofit Organizations largely use social media to mobilize people for social causes and encourage participation in collective action, such as advocacy campaigns. However, little is known about the micro-level mechanisms that drive individual mobilization outcomes that require a substantial effort in participation such as collecting donations during advocacy campaigns. By answering the call to combine motivational and structural factors that explain the mechanisms driving people’s engagement in collective action via social media, we focus on the role of online social identity as a motivator to engage in campaigns, and on individual network positions as opportunity structures for online mobilization. Using the 2014 US Movember health movement campaign on Twitter as an empirical context, we adopt a multi-method approach combining Natural Language Processing, social network analysis and multivariate regression analysis to investigate the effects of online social identity and structural network position on the amount of collected donations for medical research during campaign. We find that only social identities related to occupations and professions have significant effects on the amount of collected donation, whereas network position matters when movement members are central in the communication process because they connect different cohesive subgroups, or communities of the network, characterized by the prevalence of weak ties. We show the importance of integrating the study of identity and network to advance our understanding of online micro-mobilization dynamics. This study offers contributions to research at the intersection of research on the non-profit sector, social movements, media and communication, and health fundraising.

To be presented at the 78th Annual Meeting of the Academy of Management on 14 August 2018 in Chicago, USA

Tweeting about my moustache

How Online Identity influences Collected Donations in Online Health Campaigns

by Anna Priante, Michel Ehrenhard, Tijs van der Broek, Ariana Need, Djoerd Hiemstra

Health advocacy organizations increasingly use social media to engage people in fundraising campaigns for medical research, such as cancer prevention. However, little is known about the effectiveness of online health campaigns and the psychosocial mechanisms that drive people’s voluntary engagement to collect money for medical research. By using identity-based motivation theory from social psychology, we focus on campaign participants’ online occupational identity, such as being a doctor, and how it provides motivation to collect donations. We investigate the mechanisms, such as fundraisers’ Twitter activity as a cognitive process and their central network positions in online communication, that mediate the relationship between identity and donations.

We adopt a multi-method approach combining automatic text analysis, Natural Language Processing from computational linguistics, social network analysis and multivariate regression analysis. Using the 2014 US Movember health movement campaign on Twitter as an empirical context, we find that when people are engaged in health fundraising on Twitter, their success depends on the extent to which they act in occupational identity-congruent ways. In addition, we find that fundraisers’ Twitter activity as a sense-making, cognitive process – and not their central positions in online communication – mediates the relation between identity and donations.

We show the importance of integrating both people’s social identification and cognitive processes into theory and research for a better understanding of how occupational identity matters in online health campaigns. This study offers contributions to research at the intersection of health advocacy, social media use, and, more broadly, online social movements. We conclude by discussing the practical implications of these findings for health advocacy organizations.

To be presented at the 113th Annual Meeting of the American Sociological Association
(ASA 2018) on 11-14 August 2018 in Philadelphia, USA.

UT Mastodon now live for all students, alumni and employees

The University of Twente is the first Dutch university to run its own Mastodon server. Mastodon is a social network based on open web protocols and free, open-source software. It is decentralized like e-mail. Learning from failures of other networks, Mastodon aims to make ethical design choices to combat the misuse of social media. By joining U. Twente Mastodon, you join a global social network with more than a million people. The university will not sell your data, nor show you advertisements. Mastodon U. Twente is available to all students, alumni, and employees.

Join Mastodon U. Twente now

Christel Geurts graduates on Cross-Domain Authorship Attribution

Cross-Domain Authorship Attribution as a Tool for Digital Investigations

by Christel Geurts

On the darkweb sites promoting illegal content are abundant and new sites are constantly created. At the same time Law Enforcement is working hard to take these sites down and track down the persons involved. Often, after taking down a site, users change their name and move to a different site. But what if Law Enforcement could track users across sites? Different sites or sources of information are called a domain. As the domain changes, often the context of a message also changes, making it challenging to track users simply on words used. The aim of this thesis is to develop a system that can link written text of authors in a cross-domain setting. The system was tested on a blog corpus and verified on police data. Tests show that multinomial logistic regression and Support Vector Machines with a linear kernel perform well. Character 3-grams work well as features, combining multiple feature sets increases performance. Tests show that Logistic Regression models with a combined feature set performed best (accuracy = 0.717, MRR = 0.7785, 1000 authors (blog corpus)). On the police data the Logistic Regression model had an accuracy of 0.612 and a MRR of 0.6883 for 521 authors.

Slavica Zivanovic graduates on capturing and mapping QOL using Twitter data

by Slavica Zivanovic

There is an ongoing discussion about the applicability of social media data in scientific research. Moreover, little is known about the feasibility to use these data to capture the Quality of Life (QoL). This study explores the use of social media in QoL research by capturing and analysing people’s perceptions about their QoL using Twitter messages. The methodology is based on a mixed method approach, combining manual coding of the messages, automated classification, and spatial analysis. The city of Bristol is used as a case study, with a dataset containing 1,374,706 geotagged Tweets sent within the city boundaries in 2013. Based on the manual coding results, health, transport, and environment domains were selected to be further analysed. Results show the difference between Bristol wards in number and type of QoL perceptions in every domain, spatial distribution of positive and negative perceptions, and differences between the domains. Furthermore, results from this study are compared to the official QoL survey results from Bristol, statistically and spatially. Overall, three main conclusions are underlined. First, Twitter data can be used to evaluate QoL. Second, based on people’s opinions, there is a difference in QoL between Bristol neighbourhoods. And, third, Twitter messages can be used to complement QoL surveys but not as a proxy. The main contribution of this study is in recognising the potential Twitter data have in QoL research. This potential lies in producing additional knowledge about QoL that can be placed in a planning context and effectively used to improve the decision-making process and enhance quality of life of residents.

[download pdf]

Marco Schultewolter graduates on Verification of User Information

by Marco Schultewolter

Often, software providers ask users to insert personal data in order to grant them the right to use their software. These companies want the user profile as correct as possible, but users sometimes tend to enter incorrect information. This thesis researches and discusses approaches to automatically verify this information using third-party web resources.
Therefore, a series of experiments is done. One experiment compares different similarity measures in the context of a German phone book directory for again different search approaches. Another experiment takes the approach to use a search engine without a specific predefined data source. Ways of finding persons in search engines and of extracting address information from unknown websites are compared in order to do so.
It is shown, that automatic verification can be done to some extent. The verification of name and address data using external web resources can support the decision with Jaro-Winkler as similarity measure, but it is still not solid enough to only rely on it. Extracting address information from unknown pages is very reliable when using a sophisticated regular expression. Finding persons on the internet should be done by using just the full name without any additions.

[download pdf]

IPython Notebook Exercises for Web Science

Check out the Jupyter IPython Notebook Exercises made for the module Web Science. The exercises closely follow the exercises from Chapter 13 and 14 of the wonderful Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley and Jon Kleinberg. Download the notebooks here:

Graph Update (February 2016). The notebooks with answers are now available below:

Maurice Bolhuis graduates on Estimating Creditworthiness using Uncertain Online Data

Estimating Creditworthiness using Uncertain Online Data

by Maurice Bolhuis

The rules for credit lenders have become stricter since the financial crisis of 2007-2008. As a consequence, it has become more difficult for companies to obtain a loan. Many people and companies leave a trail of information about themselves on the Internet. Searching and extracting this information is accompanied with uncertainty. In this research, we study whether this uncertain online information can be used as an alternative or extra indicator for estimating a company’s creditworthiness and how accounting for information uncertainty impacts the prediction performance.
A data set consisting 3579 corporate ratings has been constructed using the data of an external data provider. Based on the results of a survey, a literature study and information availability tests, LinkedIn accounts of company owners, corporate Twitter accounts and corporate Facebook accounts were chosen as an information source for extracting indicators. In total, the Twitter and Facebook accounts of 387 companies and 436 corresponding LinkedIn owner accounts of this data set were manually searched. Information was harvested from these sources and several indicators have been derived from the harvested information.
Two experiments were performed with this data. In the first experiment, a Naive Bayes, J48, Random Forest and Support Vector Machine classifier was trained and tested using solely these Internet features. A comparison of their accuracy to the 31% accuracy of the ZeroR classifier, which as a rule always predicts the most occurring target class, showed that none of the models performed statistically better. In a second experiment, it was tested whether combining Internet features with financial data increases the accuracy. A financial data mining model was created that approximates the rating model of the ratings in our data set and that uses the same financial data as the rating model. The two best performing financial models were built using the Random Forest and J48 classifiers with an accuracy of 68% and 63% respectively. Adding Internet features to these models gave mixed results with a significant decrease and an insignificant increase respectively.
An experimental setup for testing how incorporating uncertainty affects the prediction accuracy of our model is explained. As part of this setup, a search system is described to find candidate results of online information related to a subject and to classify the degree of uncertainty of this online information. It is illustrated how uncertainty can be incorporated into the data mining process.

[download pdf]