Dense Substructures in Dynamic Graph

From Large Scale Data Analysis Lab
Jump to: navigation, search

General Introduction

In many applications of graph algorithms including communication networks, social networks, graph changes over time due to addition or deletion of edges. Over the last two decades, there has been growing interest in such dynamically changing graphs. Many data structures and algorithmic techniques have been designed to solve a wide range of problems related to maintaining properties like connectivity, diameter, minimum spanning tree, shortest paths etc. in dynamic graph. However, much less work has been done in maintaining dense substructures in dynamic graph compared to static setting. The goal in this project is to design time and space efficient algorithms for maintaining dense substructures.

Dense Substructures

A dense substructure is a sub-graph in a graph where there are many edges. In other words, a densely connected sub-graph within a graph is called dense substructure. Among many different dense substructures, we focus on a fundamental dense substructure, maximal clique. A clique C in a graph G = (V,E) is a subset of V such that every pair of vertices in C are connected. C is maximal if it is not a proper subset of another clique C'.

Maximal Clique Maintenance

There have been many works on enumerating maximal cliques in static graph where the input graph is given. But, there have been relatively few works on maintaining maximal cliques in dynamic graph. In 2004, Stix first proposed an algorithm for maintaining maximal cliques in dynamic graph. In this work, the author considers both addition and deletion of edges. In 2010, Ottosen and Vomlel proposed another algorithm for clique maintenance, where only addition of edges have been considered. Both are sequential algorithms but does not provide any provable performance bounds.

In our work, we have studied this problem when the graph evolves due to addition of edges. We have designed practical and change sensitive algorithms for maintaining maximal cliques in dynamic graph which performs in many order or magnitude better in computation time than prior work in practice. We have experimentally evaluated our algorithms on many real world semi-synthetic data set.

Our Algorithms

When a new set of edges H is added to the graph G = (V,E), G is updated to G+H = (V,E∪H). then, a set of new maximal cliques are formed in G+H which subsumes a set of maximal cliques in G. For enumerating new maximal cliques, we have designed algorithm EnumNew which enumerates new maximal cliques by iterating over newly inserted edges. The time complexity of the algorithm is O(δ3|H||Λnew|) where δ is the maximum degree in G+H and new| is the number of new maximal cliques. Algorithm EnumNew is change sensitive as it enumerates all and only new maximal cliques, and we use change-sensitive algorithm for enumerating maximal cliques in static graph as subroutine for enumerating new maximal cliques.

For enumerating subsumed cliques we have designed three different algorithms. Algorithms EnumSubsumed1 and EnumSubsumed2 enumerates subsumed cliques from new maximal cliques whereas EnumSubsumed3 enumerates subsumed cliques by iterating over newly inserted edges. Theoretically, EnumSubsumed1 and EnumSubsumed2 are change sensitive while EnumSubsumed3 is not. In practice, EnumSubsumed2 performs better than other two in time, but space requirement is quite high and EnumSubsumed3 performs better than EnumSubsumed1 in time and space requirement is linear in the size of the graph. Overall, EnumSubsumed3 is good in practice.

So, for maintaining maximal cliques in dynamic graph, we have designed three algorithms :

  • SymDiff1 : EnumNew + EnumSubsumed1
  • SymDiff2 : EnumNew + EnumSubsumed2
  • SymDiff3 : EnumNew + EnumSubsumed3

Experiments

We have implemented all our algorithms in java and performed experiments on semi-synthetic datasets, where we have considered the real world graphs from Stanford Large Network Dataset Collection.

For each graph considered, we have removed edges from the graph with probability 0.9 and have used the deleted edges in random order to create the edge stream. We begin computation with the reduced graph. The reduced graph is stored in a text file in adjacency list format where each line starts with a vertex followed by the vertices adjacent to it. The deleted edges which we use to create edge stream is stored in a file where each line contains a pair of nodes representing an undirected edge. The set of maximal cliques of the reduced graph is stored as a text file where each line contains a set of vertices representing a maximal clique.

Data Sets

  • Experimental Results with random ordering of new edges
    • Computation time for total changes in each iteration with batch size 80000
web-google-9
cit-patents-9
as-skitter-9
wiki-talk-9
    • Distribution of computation time at each iteration with batch size 80000
web-google-9
cit-patents-9
as-skitter-9
wiki-talk-9
    • Division of cumulative time for computing new maximal cliques and subsumed cliques with as-skitter-9 graph
batch size = 10000
batch size = 20000
batch size = 40000
batch size = 80000
    • Computation time for total change vs. size of total change with batch size 80000
web-google-9
cit-patents-9
as-skitter-9
wiki-talk-9
    • Space cost for computing total change at each iteration with batch size 80000
web-google-9
cit-patents-9
as-skitter-9
wiki-talk-9
  • Experimental Results with new edges around 100 top degree vertices
    • Computation time for total changes in each iteration with batch size 80000
as-skitter-9
wiki-talk-9
    • Computation time for total change vs. size of total change with batch size 80000
as-skitter-9
wiki-talk-9
    • Space cost for computing total change at each iteration with batch size 80000
as-skitter-9
wiki-talk-9
  • Experimental Results with new edges around isolated vertices
    • Computation time for total changes in each iteration with batch size 80000
as-skitter-9
wiki-talk-9
    • Computation time for total change vs. size of total change with batch size 80000
as-skitter-9
wiki-talk-9
    • Space cost for computing total change at each iteration with batch size 80000
as-skitter-9
wiki-talk-9

Maximal Bi-clique Maintenance

There have been many works on enumerating maximal bi-cliques in static graph where the input graph is given. However,

Future Work