Aarhus workshop on Graph Access and Analysis

June 27th - June 29th, 2022 — Aarhus University, Department of Computer Science

The analysis of large data networks, or graphs, is paramount for the understanding of complex phenomena in human interactions, information exchange, and every activity that involve connections among humans, objects, or concepts. The goal of this workshop is to ease the access of such graphs to users that are not data specialists. Data accessibility, targeted by this workshop, is a transversal problem that requires:

**database research**on specialized data structures to store data into the computer memory,**efficient algorithms**with quality and time guarantees to achieve fast data access,**machine learning**models to reorganize the data structures adaptively and predict the next user search on the data.

This fertile research network will form the basis of methods for the study of a new generation of data systems, data structures, and algorithms that adapt to the user needs and the constant changes in the data.

The workshop will be held at the Department of Computer Science at Aarhus University.

**Building 5335, Room 395**

IIT Delhi - India

**Abstract:**
Graph generative models have been extensively studied in the literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addressed. First, learning graph distributions introduces additional computational overhead, which limits their scalability to large graph databases. Second, many techniques only learn the structure and do not address the need to learn additional semantics such as node and edge labels, temporal constraints, etc., which encode important semantic information and influence the structure itself. Third, existing techniques need large volumes of data to effectively learn the underlying graph distribution and such large-scale data may not always be available. In this talk, we will discuss strategies to address these limitations.

**Abstract:**
Similarity search in graph databases is one of the most fundamental operations in graph analytics. Among various distance functions, graph and subgraph edit distances (GED and SED respectively) are two of the most popular and expressive measures. Unfortunately, exact computations for both are NP-hard. To overcome this computational bottleneck, neural approaches to learn and predict edit distance in polynomial time have received much interest. While considerable progress has been made, there exist limitations that need to be addressed. First, the efficacy of an approximate distance function lies not only in its approximation accuracy, but also in the preservation of its properties. To elaborate, although GED is a metric, its neural approximations do not provide such a guarantee. This prohibits their usage in higher order tasks that rely on metric distance functions, such as clustering or indexing. Second, several existing frameworks for GED do not extend to SED due to SED being asymmetric. In this work, we design a novel siamese graph neural network called Greed, which through a carefully crafted inductive bias, learns GED and SED in a property-preserving manner. Through extensive experiments across 10 real graph datasets containing up to 7 million edges, we establish that Greed is not only more accurate than the state of the art, but also up to 3 orders of magnitude faster. Even more significantly, due to preserving the triangle inequality, the generated embeddings are indexable and consequently, even in a CPU-only environment, Greed is up to 50 times faster than GPU-powered computations of the closest baseline.

Osaka University - Japan

**Abstract:**
Path searches are one of the fundamental searches in graphs, which
find matched paths from source and target vertices. Several works
address to discover effective path searches and their efficient
algorithms.
In this talk, I will present my studies related to path searches:
indexing for conjunctive path queries, sequenced route query, and
network reliability. The conjunctive path queries find the paths that
are matched with path patterns that include path navigation, cycles,
and conjunction.
The sequence route query finds the shortest route through
user-specified categories of vertices, for example, paths through
restaurant, bar, and theater. The network reliability aims to compute
the probability that the given paths are connected by paths on
uncertain graphs.
In this talk, I will present my works to efficiently solve their path
searches and their extension. In more concretely, I talk about
indexing for conjunctive path queries, efficient algorithms for
extended sequenced route query, called SkySR queries, and sampling
methods to efficiently and accurately compute network reliability.

**Abstract:**
Fairness is an important keyword for recent machine learning
studies. Current studies mainly focus on tabular data and/or natural
languages. My research interest is fairness in graph data, for
example, knowledge graphs and social networks. There are yet a few
works to address fairness on graphs. I will introduce the concept of
fairness in machine learning, and then present why graph fairness is
important for us. I also explain discriminatory biases in wikidata,
which is the largest public knowledge graph, and then methods to find
biases on graphs.

University of Tsukuba - Japan

**Abstract:**
Affinity Propagation (AP) is a fundamental algorithm to identify clusters included in data objects. Given similarities among objects, it iteratively performs message updates between all data object pairs until convergence. Although AP yields a higher clustering quality compared with other methods, it is computationally expensive. Hence, it has difficulty handling massive datasets that include numerous data objects. This is because the message updates require a quadratic cost of the number of data objects. In this talk, we discuss a novel algorithm published in AAAI 2021 for handling large datasets; we propose a fast affinity propagation algorithm, ScaleAP, which outputs the same clusters as AP but within a shorter computation time. ScaleAP dynamically excludes unnecessary message updates without sacrificing its clustering accuracy. Our extensive evaluations demonstrate that ScaleAP outperforms existing AP algorithms in terms of running time by up to two orders of magnitude.

**Abstract:**
Entity search on knowledge graphs is now an essential data resource to develop practical AI models. However, it is expensive to find essential entities that are well relevant to user-specified queries. In this talk, we discuss a fast similarity search algorithm based on ObjectRank, a well-established technique for managing heterogeneous graphs. Given user-specified query nodes in a heterogeneous graph, ObjectRank extracts the most relevant nodes by evaluating node-importances against the query node. This talk presents a novel algorithm SchemaRank, which detects the exact top-k important nodes for a given query within a short running time. SchemaRank dynamically excludes unpromising nodes and edges, ensuring that it detects the same top-k important nodes as ObjectRank. Our extensive evaluations demonstrate that the running time of SchemaRank outperforms existing methods by up to two orders of magnitude.

Aarhus University - Denmark

**Abstract:**
Diffusion theory studies the spread of novel traits (social memes, influence, atomic spins, genetic mutations, etc) in a population of individuals (robots, people, atoms, genotypes, etc), and has far reaching applications in diverse domains, such as physics, biology and sociology.

In this talk we will introduce the Moran and Voter diffusion models, which are standard processes for modeling the stochastic spread of genetic mutations in a structured population, where the structure is represented as a graph (network). The success of trait spread is measured in terms of its fixation probability, i.e., the probability that the novel mutation takes over the population. We will sketch the diverse effects that population structure can have on the fixation probability and fixation time, and present amplifiers, which are graphs with the remarkable property of significantly increasing the fixation probability, thereby amplifying the effect of natural selection. We will conclude with some optimization problems, which are concerned with engineering the network so as to maximize fixation probability.

**Abstract:**
Structural balance theory is an established framework for studying social relationships of friendship and enmity.
These relationships are modeled by a signed network whose energy potential measures the level of imbalance,
while stochastic dynamics drives the network towards a state of minimum energy that captures social balance.
It is known that this energy landscape has local minima that can trap socially-aware dynamics, preventing it from reaching balance.
In this talk we will highlight some robustness and attractor properties of these local minima.
Motivated by these anomalies, we introduce Best Edge Dynamics (BED), a new plausible stochastic process that is guaranteed to always reach balance, and that it does so fast in various interesting settings.

Technion University - Israel

**Abstract:**
A common paradigm in engineering consists of mathematical modeling followed by numerical optimization.
Over the years, a chasm has formed between the two stages: On the one hand, models are being developed with ever growing complexity in order to account for intricate data irregularities. On the other hand, the numerical optimization is being casually delegated to an external "general purpose" solver, often lacking a broad view about the problem at hand.

Attempting to reconcile between the two stages, we explore problems in which there exists some underlying geometric or topological structure. Such structure naturally exists in shape processing applications where it is explicitly encoded in the form of a discretized manifold. Popular examples include correspondence and reconstruction of non-rigid shapes. In other fields the geometry may be latent but can be inferred from data. For example, in recommender systems it is common to assume a relational structure between different users and different items that can be encoded ex-ante in the form of graphs.

We propose a methodological approach that links between the computational structures arising within common numerical algorithms, and simple dynamical processes on the underlying geometric structures. This gives rise to emergent models, often very simple, exhibiting dramatic improvements in run times, robustness, accuracy, ease of implementation, and many other aspects.

**Abstract:**
Several graph mining problems are based on some variant of the subgraph isomorphism problem: Given two graphs, G and Q, does G contain a subgraph isomorphic to Q? As this problem is NP hard, many methods avoid addressing it explicitly. In this talk, we will propose a method that attempts to solve the problem by localizing, i.e., finding the position of Q in G, through solving an inverse eigenvalue problem with structural constraints. We demonstrate that our spectral approach outperforms state-of-the-art localization and subgraph matching methods on both real and synthetic graphs.
Joint work with Judith Hermanns ,Davide Mottin, Panagiotis Karras and Alex Bronstein.

SESSION 1 - Graph generation and consumption | |

09:15 - 09:30 | Workshop Opening - Welcome |

09:30 - 10:15 | Keynote 1: Sayan RanuGraph Generative Modeling |

10:15 - 10:30 | Coffee Break |

10:30 - 11:15 | Keynote 2: Amit BoyarskiSpectral Subgraph Localization |

11:15 - 12:00 | Panagiotis Karras - SIEVE: A Space-Efficient Algorithm for Viterbi Decoding |

12:00 - 13:00 | Lunch break |

SESSION 2 - Graph mining | |

13:00 - 13:45 | Keynote 3: Yuya SasakiPath search on graphs |

13:45 - 14:00 | Discussion |

14:00 - 14:45 | Keynote 4: Hiroaki ShiokawaGraph-based Clustering at Scale |

14:45 - 15:00 | Coffee Break |

15:00 - 15:45 | Davide Mottin - Towards Robust Exploration of Knoweledge Graphs |

15:45 - 16:30 | Round table and brainstorming. |

19:00 - 22:00 | Casual dinner at the Aarhus Street Food. |

SESSION 3 - Social concerns | |

09:15 - 09:30 | Workshop Opening - Welcome |

09:30 - 10:15 | Keynote 5: Yuya SasakiOn discovering discriminatory bias in graphs |

10:15 - 10:30 | Coffee Break |

10:30 - 11:15 | Keynote 6: Andreas PavlogiannisSocial Balance on Networks: Local Minima and Best Edge Dynamics |

11:15 - 12:00 | Fatemeh Zardbani - Data management & analysis techniques for scientific data explorationPetros Petsinis - Identifying High-Coverage Communities in Edge-Weighted Networks |

12:00 - 13:00 | Lunch break |

SESSION 4 - Dynamic processes | |

13:00 - 13:45 | Keynote 7: Andreas PavlogiannisEvolutionary Graph Theory: Analysis and Optimization |

13:45 - 14:00 | Discussion |

14:00 - 14:45 | Keynote 8: Amit BoyarskiGeometry in numerical algorithms |

14:45 - 15:00 | Coffee Break |

15:00 - 15:45 | Judith Hermanns - Spectral graph alignment |

15:45 - 16:30 | Round table and brainstorming. |

19:00 - 22:00 | Gala dinner |

SESSION 5 - Similarity learning and search | |

09:15 - 09:30 | Workshop Opening - Welcome |

09:30 - 10:15 | Keynote 9: Sayan RanuLearning Graph Similarity via Graph Neural Networks |

10:15 - 10:30 | Coffee Break |

10:30 - 11:15 | Keynote 10: Hiroaki ShiokawaFast Similarity Search for Large Knowledge Graphs |

11:15 - 12:00 | Kasper Overgaard Mortensen - Efficient Adversarial Multi-Armed Bandits |

12:00 - 13:00 | Lunch break |

**Judith Hermanns**- Spectral Graph Alignment**Kasper Overgaard Mortensen**- Efficient Adversarial Multi-Armed Bandits**Petros Petsinis**- Aarhus University**Fatemeh Zardbani**- Aarhus University

Copyright © Davide Mottin