AGAN workshop

About

The analysis of large data networks, or graphs, is paramount for the understanding of complex phenomena in human interactions, information exchange, and every activity that involve connections among humans, objects, or concepts. The goal of this workshop is to ease the access of such graphs to users that are not data specialists. Data accessibility, targeted by this workshop, is a transversal problem that requires:

database research on specialized data structures to store data into the computer memory,
efficient algorithms with quality and time guarantees to achieve fast data access,
machine learning models to reorganize the data structures adaptively and predict the next user search on the data.

This fertile research network will form the basis of methods for the study of a new generation of data systems, data structures, and algorithms that adapt to the user needs and the constant changes in the data.

Venue

The workshop will be held at the Department of Computer Science at Aarhus University.
Building 5335, Room 395

Keynote Speakers

Sayan Ranu

IIT Delhi - India

Graph Generative Modeling

Abstract: Graph generative models have been extensively studied in the literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addressed. First, learning graph distributions introduces additional computational overhead, which limits their scalability to large graph databases. Second, many techniques only learn the structure and do not address the need to learn additional semantics such as node and edge labels, temporal constraints, etc., which encode important semantic information and influence the structure itself. Third, existing techniques need large volumes of data to effectively learn the underlying graph distribution and such large-scale data may not always be available. In this talk, we will discuss strategies to address these limitations.

Learning Graph Similarity via Graph Neural Networks

Abstract: Similarity search in graph databases is one of the most fundamental operations in graph analytics. Among various distance functions, graph and subgraph edit distances (GED and SED respectively) are two of the most popular and expressive measures. Unfortunately, exact computations for both are NP-hard. To overcome this computational bottleneck, neural approaches to learn and predict edit distance in polynomial time have received much interest. While considerable progress has been made, there exist limitations that need to be addressed. First, the efficacy of an approximate distance function lies not only in its approximation accuracy, but also in the preservation of its properties. To elaborate, although GED is a metric, its neural approximations do not provide such a guarantee. This prohibits their usage in higher order tasks that rely on metric distance functions, such as clustering or indexing. Second, several existing frameworks for GED do not extend to SED due to SED being asymmetric. In this work, we design a novel siamese graph neural network called Greed, which through a carefully crafted inductive bias, learns GED and SED in a property-preserving manner. Through extensive experiments across 10 real graph datasets containing up to 7 million edges, we establish that Greed is not only more accurate than the state of the art, but also up to 3 orders of magnitude faster. Even more significantly, due to preserving the triangle inequality, the generated embeddings are indexable and consequently, even in a CPU-only environment, Greed is up to 50 times faster than GPU-powered computations of the closest baseline.

Yuya Sasaki

Osaka University - Japan

Path search on graphs

Abstract: Path searches are one of the fundamental searches in graphs, which find matched paths from source and target vertices. Several works address to discover effective path searches and their efficient algorithms. In this talk, I will present my studies related to path searches: indexing for conjunctive path queries, sequenced route query, and network reliability. The conjunctive path queries find the paths that are matched with path patterns that include path navigation, cycles, and conjunction. The sequence route query finds the shortest route through user-specified categories of vertices, for example, paths through restaurant, bar, and theater. The network reliability aims to compute the probability that the given paths are connected by paths on uncertain graphs. In this talk, I will present my works to efficiently solve their path searches and their extension. In more concretely, I talk about indexing for conjunctive path queries, efficient algorithms for extended sequenced route query, called SkySR queries, and sampling methods to efficiently and accurately compute network reliability.

On discovering discriminatory bias in graphs

Abstract: Fairness is an important keyword for recent machine learning studies. Current studies mainly focus on tabular data and/or natural languages. My research interest is fairness in graph data, for example, knowledge graphs and social networks. There are yet a few works to address fairness on graphs. I will introduce the concept of fairness in machine learning, and then present why graph fairness is important for us. I also explain discriminatory biases in wikidata, which is the largest public knowledge graph, and then methods to find biases on graphs.

Hiroaki Shiokawa

University of Tsukuba - Japan

Graph-based Clustering at Scale

Abstract: Affinity Propagation (AP) is a fundamental algorithm to identify clusters included in data objects. Given similarities among objects, it iteratively performs message updates between all data object pairs until convergence. Although AP yields a higher clustering quality compared with other methods, it is computationally expensive. Hence, it has difficulty handling massive datasets that include numerous data objects. This is because the message updates require a quadratic cost of the number of data objects. In this talk, we discuss a novel algorithm published in AAAI 2021 for handling large datasets; we propose a fast affinity propagation algorithm, ScaleAP, which outputs the same clusters as AP but within a shorter computation time. ScaleAP dynamically excludes unnecessary message updates without sacrificing its clustering accuracy. Our extensive evaluations demonstrate that ScaleAP outperforms existing AP algorithms in terms of running time by up to two orders of magnitude.

Fast Similarity Search for Large Knowledge Graphs

Abstract: Entity search on knowledge graphs is now an essential data resource to develop practical AI models. However, it is expensive to find essential entities that are well relevant to user-specified queries. In this talk, we discuss a fast similarity search algorithm based on ObjectRank, a well-established technique for managing heterogeneous graphs. Given user-specified query nodes in a heterogeneous graph, ObjectRank extracts the most relevant nodes by evaluating node-importances against the query node. This talk presents a novel algorithm SchemaRank, which detects the exact top-k important nodes for a given query within a short running time. SchemaRank dynamically excludes unpromising nodes and edges, ensuring that it detects the same top-k important nodes as ObjectRank. Our extensive evaluations demonstrate that the running time of SchemaRank outperforms existing methods by up to two orders of magnitude.

Andreas Pavlogiannis

Aarhus University - Denmark

Evolutionary Graph Theory: Analysis and Optimization

Abstract: Diffusion theory studies the spread of novel traits (social memes, influence, atomic spins, genetic mutations, etc) in a population of individuals (robots, people, atoms, genotypes, etc), and has far reaching applications in diverse domains, such as physics, biology and sociology.

In this talk we will introduce the Moran and Voter diffusion models, which are standard processes for modeling the stochastic spread of genetic mutations in a structured population, where the structure is represented as a graph (network). The success of trait spread is measured in terms of its fixation probability, i.e., the probability that the novel mutation takes over the population. We will sketch the diverse effects that population structure can have on the fixation probability and fixation time, and present amplifiers, which are graphs with the remarkable property of significantly increasing the fixation probability, thereby amplifying the effect of natural selection. We will conclude with some optimization problems, which are concerned with engineering the network so as to maximize fixation probability.

Social Balance on Networks: Local Minima and Best Edge Dynamics

Abstract: Structural balance theory is an established framework for studying social relationships of friendship and enmity. These relationships are modeled by a signed network whose energy potential measures the level of imbalance, while stochastic dynamics drives the network towards a state of minimum energy that captures social balance. It is known that this energy landscape has local minima that can trap socially-aware dynamics, preventing it from reaching balance. In this talk we will highlight some robustness and attractor properties of these local minima. Motivated by these anomalies, we introduce Best Edge Dynamics (BED), a new plausible stochastic process that is guaranteed to always reach balance, and that it does so fast in various interesting settings.

Amit Boyarski

Technion University - Israel

Geometry in numerical algorithms.

Abstract: A common paradigm in engineering consists of mathematical modeling followed by numerical optimization. Over the years, a chasm has formed between the two stages: On the one hand, models are being developed with ever growing complexity in order to account for intricate data irregularities. On the other hand, the numerical optimization is being casually delegated to an external "general purpose" solver, often lacking a broad view about the problem at hand.

Attempting to reconcile between the two stages, we explore problems in which there exists some underlying geometric or topological structure. Such structure naturally exists in shape processing applications where it is explicitly encoded in the form of a discretized manifold. Popular examples include correspondence and reconstruction of non-rigid shapes. In other fields the geometry may be latent but can be inferred from data. For example, in recommender systems it is common to assume a relational structure between different users and different items that can be encoded ex-ante in the form of graphs.

We propose a methodological approach that links between the computational structures arising within common numerical algorithms, and simple dynamical processes on the underlying geometric structures. This gives rise to emergent models, often very simple, exhibiting dramatic improvements in run times, robustness, accuracy, ease of implementation, and many other aspects.

Spectral Subgraph Localization

Abstract: Several graph mining problems are based on some variant of the subgraph isomorphism problem: Given two graphs, G and Q, does G contain a subgraph isomorphic to Q? As this problem is NP hard, many methods avoid addressing it explicitly. In this talk, we will propose a method that attempts to solve the problem by localizing, i.e., finding the position of Q in G, through solving an inverse eigenvalue problem with structural constraints. We demonstrate that our spectral approach outperforms state-of-the-art localization and subgraph matching methods on both real and synthetic graphs. Joint work with Judith Hermanns ,Davide Mottin, Panagiotis Karras and Alex Bronstein.

Program

Monday June 27

SESSION 1 - Graph generation and consumption
09:15 - 09:30	Workshop Opening - Welcome
09:30 - 10:15	Keynote 1: Sayan Ranu Graph Generative Modeling
10:15 - 10:30	Coffee Break
10:30 - 11:15	Keynote 2: Amit Boyarski Spectral Subgraph Localization
11:15 - 12:00	Panagiotis Karras - SIEVE: A Space-Efficient Algorithm for Viterbi Decoding
12:00 - 13:00	Lunch break
SESSION 2 - Graph mining
13:00 - 13:45	Keynote 3: Yuya Sasaki Path search on graphs
13:45 - 14:00	Discussion
14:00 - 14:45	Keynote 4: Hiroaki Shiokawa Graph-based Clustering at Scale
14:45 - 15:00	Coffee Break
15:00 - 15:45	Davide Mottin - Towards Robust Exploration of Knoweledge Graphs
15:45 - 16:30	Round table and brainstorming.
19:00 - 22:00	Casual dinner at the Aarhus Street Food.

Tueday June 28

SESSION 3 - Social concerns
09:15 - 09:30	Workshop Opening - Welcome
09:30 - 10:15	Keynote 5: Yuya Sasaki On discovering discriminatory bias in graphs
10:15 - 10:30	Coffee Break
10:30 - 11:15	Keynote 6: Andreas Pavlogiannis Social Balance on Networks: Local Minima and Best Edge Dynamics
11:15 - 12:00	Fatemeh Zardbani - Data management & analysis techniques for scientific data exploration Petros Petsinis - Identifying High-Coverage Communities in Edge-Weighted Networks
12:00 - 13:00	Lunch break
SESSION 4 - Dynamic processes
13:00 - 13:45	Keynote 7: Andreas Pavlogiannis Evolutionary Graph Theory: Analysis and Optimization
13:45 - 14:00	Discussion
14:00 - 14:45	Keynote 8: Amit Boyarski Geometry in numerical algorithms
14:45 - 15:00	Coffee Break
15:00 - 15:45	Judith Hermanns - Spectral graph alignment
15:45 - 16:30	Round table and brainstorming.
19:00 - 22:00	Gala dinner

Wednesday June 29

SESSION 5 - Similarity learning and search
09:15 - 09:30	Workshop Opening - Welcome
09:30 - 10:15	Keynote 9: Sayan Ranu Learning Graph Similarity via Graph Neural Networks
10:15 - 10:30	Coffee Break
10:30 - 11:15	Keynote 10: Hiroaki Shiokawa Fast Similarity Search for Large Knowledge Graphs
11:15 - 12:00	Kasper Overgaard Mortensen - Efficient Adversarial Multi-Armed Bandits
12:00 - 13:00	Lunch break