networkit: growing open-source toolkit for large-scale network analysis

network analysis

NetworKit

NetworKit is an open-source tool suite for high-performance network analysis. Its aim is to provide tools for the analysis of large networks in the size range from thousands to billions of edges. For this purpose, it implements efficient graph algorithms, many of them parallel to utilize multicore architectures. These are meant to compute standard measures of network analysis. NetworKit is focused on scalability and comprehensiveness. NetworKit is also a testbed for algorithm engineering and contains novel algorithms from recently published research (see list of publications below).

NetworKit is a Python module. High-performance algorithms are written in C++ and exposed to Python via the Cython toolchain. Python in turn gives us the ability to work interactively and a rich environment of tools for data analysis and scientific computing. Furthermore, NetworKit’s core can be built and used as a native library if needed.

Design Goals and Principles

There is a variety of software packages which provide graph algorithms in general and network analysis capabilities in particular. However, NetworKit aims to balance a specific combination of strengths:

  • Performance: Algorithms and data structures are selected and implemented with high performance and parallelism in mind. Some implementations are among the fastest in published research. For example, community detection in a 3.3 billion edge web graph can be performed on a 16-core server in less than three minutes.
  • Usability: Networks are as diverse as the series of questions we might ask of them – e.g. what is the largest connected component, what are the most central nodes in it and how do they connect to each other? A practical tool for network analysis should therefore provide modular functions which do not restrict the user to predefined workflows. An interactive shell, which the Python language provides, is one prerequisite for that. While NetworKit works with the standard Python 3 interpreter, calling the module from the IPython shell and Jupyter Notebook HTML interface allows us to integrate it into a fully-fledged computing environment for scientific workflows, from data preparation to creating figures. It is also easy to set up and control a remote compute server.
  • Integration: As a Python module, NetworKit can be seamlessly integrated with Python libraries for scientific computing and data analysis, e.g. pandas for data frame processing and analytics, matplotlib for plotting or numpy, and scipy for numerical and scientific computing. For certain tasks, we provide interfaces to external tools, e.g. Gephi for graph visualization.
  • Design Principles: Our main focus is on scalable algorithms to support network analysis on massive networks. Several algorithms and implementation patterns are used to achieve this goal: parallelism, fast heuristics, and approximation algorithms for problems that are otherwise not solvable in nearly-linear time, efficient data structures, and modular software design.

Install & Use

Copyright (c) 2013 Christian Staudt, Henning Meyerhenke