Dense Substructures in Dynamic Graph
Contents
General Introduction
In many applications of graph algorithms including communication networks, social networks, graph changes over time due to addition or deletion of edges. Over the last two decades, there has been growing interest in such dynamically changing graphs. Many data structures and algorithmic techniques have been designed to solve a wide range of problems related to maintaining properties like connectivity, diameter, minimum spanning tree, shortest paths etc. in dynamic graph. However, much less work has been done in maintaining dense substructures in dynamic graph compared to static setting. The goal in this project is to design time and space efficient algorithms for maintaining dense substructures.
Dense Substructures
A dense substructure is a sub-graph in a graph where there are many edges. In other words, a densely connected sub-graph within a graph is called dense substructure. Among many different dense substructures, we focus on a fundamental dense substructure, maximal clique. A clique C in a graph G = (V,E) is a subset of V such that every pair of vertices in C are connected. C is maximal if it is not a proper subset of another clique C'.
Maximal Clique Maintenance
There have been many works on enumerating maximal cliques in static graph where the input graph is given. But, there have been relatively few works on maintaining maximal cliques in dynamic graph. In 2004, Stix first proposed an algorithm for maintaining maximal cliques in dynamic graph. In this work, the author considers both addition and deletion of edges. In 2010, Ottosen and Vomlel proposed another algorithm for clique maintenance, where only addition of edges have been considered. Both are sequential algorithms but does not provide any provable performance bounds.
In our work, we have studied this problem when the graph evolves due to addition of edges. We have designed practical and change sensitive algorithms for maintaining maximal cliques in dynamic graph which performs in many order or magnitude better in computation time than prior work in practice. We have experimentally evaluated our algorithms on many real world semi-synthetic data set.
Our Algorithms
When a new set of edges H is added to the graph G = (V,E), G is updated to G+H = (V,E∪H). then, a set of new maximal cliques are formed in G+H which subsumes a set of maximal cliques in G. For enumerating new maximal cliques, we have designed algorithm EnumNew which enumerates new maximal cliques by iterating over newly inserted edges. The time complexity of the algorithm is O(δ^{3}|H||Λ^{new}|) where δ is the maximum degree in G+H and |Λ^{new}| is the number of new maximal cliques. Algorithm EnumNew is change sensitive as it enumerates all and only new maximal cliques, and we use change-sensitive algorithm for enumerating maximal cliques in static graph as subroutine for enumerating new maximal cliques.
For enumerating subsumed cliques we have designed three different algorithms. Algorithms EnumSubsumed_{1} and EnumSubsumed_{2} enumerates subsumed cliques from new maximal cliques whereas EnumSubsumed_{3} enumerates subsumed cliques by iterating over newly inserted edges. Theoretically, EnumSubsumed_{1} and EnumSubsumed_{2} are change sensitive while EnumSubsumed_{3} is not. In practice, EnumSubsumed_{2} performs better than other two in time, but space requirement is quite high and EnumSubsumed_{3} performs better than EnumSubsumed_{1} in time and space requirement is linear in the size of the graph. Overall, EnumSubsumed_{3} is good in practice.
So, for maintaining maximal cliques in dynamic graph, we have designed three algorithms :
- SymDiff_{1} : EnumNew + EnumSubsumed_{1}
- SymDiff_{2} : EnumNew + EnumSubsumed_{2}
- SymDiff_{3} : EnumNew + EnumSubsumed_{3}
Experiments
We have implemented all our algorithms in java and performed experiments on semi-synthetic datasets, where we have considered the real world graphs from Stanford Large Network Dataset Collection.
For each graph considered, we have removed edges from the graph with probability 0.9 and have used the deleted edges in random order to create the edge stream. We begin computation with the reduced graph. The reduced graph is stored in a text file in adjacency list format where each line starts with a vertex followed by the vertices adjacent to it. The deleted edges which we use to create edge stream is stored in a file where each line contains a pair of nodes representing an undirected edge. The set of maximal cliques of the reduced graph is stored as a text file where each line contains a set of vertices representing a maximal clique.
- Experimental Results with random ordering of new edges
- Computation time for total changes in each iteration with batch size 80000
- Distribution of computation time at each iteration with batch size 80000
- Division of cumulative time for computing new maximal cliques and subsumed cliques with as-skitter-9 graph
- Computation time for total change vs. size of total change with batch size 80000
- Space cost for computing total change at each iteration with batch size 80000
- Experimental Results with new edges around 100 top degree vertices
- Computation time for total changes in each iteration with batch size 80000
- Computation time for total change vs. size of total change with batch size 80000
- Space cost for computing total change at each iteration with batch size 80000
- Experimental Results with new edges around isolated vertices
- Computation time for total changes in each iteration with batch size 80000
- Computation time for total change vs. size of total change with batch size 80000
- Space cost for computing total change at each iteration with batch size 80000
Maximal Bi-clique Maintenance
There have been many works on enumerating maximal bi-cliques in static graph where the input graph is given. However,