Davide Vega D'Aurelio

Foundations of Temporal Text Networks

Over the last few years online social media have been frequently used as a way to explain human social behavior around a particular area of interest. Examples include voting, political sentiment, psychological well-being and disaster response. The public accessibility of several sources allows us to observe these phenomena at various scales, from conversations among small groups of individuals to the effects on large communities. To cope with the complexity of online information, researchers have typically focused on either the topology of the interactions, as commonly done in Network Science, or the text exchanged among individuals, using methods from Computational Linguistics. In both cases time has also been taken into consideration, as in Temporal Networks or Temporal Information Retrieval. Fewer recent studies have begun to combine methods from multiple fields to exploit the richness of the sources, leading to new fundamental methodologies for analyzing online complex information. However, their validity and scope are mostly limited to answer specific questions, preventing such new methods to be applied to other domains. As a result, nearly every new problem in this context requires the development of its own ad-hoc models and/or methods. In this work, we introduce an attributed bipartite model for temporal text networks, enabling the application of a wide range of existing methods to this context. Our model can represent all the information contained in the aforementioned data sources, including different types of text interactions, such as direct messages exchanged between individuals, multicast information targeting specific communities or broadcast news. The aim of this talk is to provide a flexible framework to analyze text networks while exploiting all the available data. To achieve this, we will first present a comparative analysis of several models for text networks, showing how all previous models can be reduced to our minimal model. This will lead to a discussion about what we can/need to capture to analyze such type of data. Then, we will introduce one of the many approaches to analyze text networks based on information discretization. In this approach text messages can be classified into a number of (possibly overlapping) groups, based on their attributes. This classification can be exploited by projecting the original information into several derived types of networks, such as communication networks, user-annotated networks, topic-based multiplex networks and information propagation networks for which existing analysis methods exist. As a running example, we use the aforementioned model and methodology to investigate the existence of strong and deep communities of individuals discussing about several topics around the IoT space in a corpus of more than 200,000 tweets collected during June, 2017. Using different types of projections, or working directly with the original model, several types of text and structural analysis can be unified under the same framework. Examples include conversation retrieval and text message clustering.