Kam1n0-Community v2.2 releases: Assembly Analysis Platform

Assembly Analysis Platform

Kam1n0 v2.x is scalable assembly management and analysis platform. It allows a user to first index a (large) collection of binaries into different repositories and provides different analytic services such as clone search. It supports multi-tenancy access and management of assembly repositories by using the concept of Application. An application instance contains its own exclusive repository and provides a specialized analytic service. Considering the versatility of reverse engineering tasks, the Kam1n0 v2.x server currently provides three different types of clone-search applications: Asm-CloneSym1n0, and Asm2Vec. A new application type can be further added to the platform.

A user can create multiple application instances. An application instance can be shared among a specific group of users. The application repository read-write access and on-off status can be controlled by the application owner. Kam1n0 v2.x server can serve the applications concurrently using several shared resource pools.

Kam1n0 was developed by Steven H. H. Ding and Miles Q. Li under the supervision of Benjamin C. M. Fung of the Data Mining and Security Lab at McGill University in Canada. It won the second prize at the Hex-Rays Plug-In Contest 2015. If you find Kam1n0 useful, please cite our paper:

  • S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 461-470, San Francisco, CA: ACM Press, August 2016.
  • S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th IEEE Symposium on Security and Privacy (S&P), 18 pages, San Francisco, CA: IEEE Computer Society, May 2019.

Asm-Clone

Asm-Clone applications try to solve the efficient subgraph search problem (i.e. graph isomorphism problem) for assembly functions (<1.3s average query time and <30ms average index time with 2.3M functions). Given a target function (the one on the left as shown below), it can identify the cloned subgraphs among other functions in the repository (the one on the right as shown below).

  • Application Type: Asm-Clone
  • The original clone search service used in Kam1n0 v1.x.
  • Currently support Meta-PC, ARM, PowerPC, and TMS320c6 (experimental).
  • Support subgraph clone search within a certain assembly code family.
    • + Good interpretability of the result: breaks down to subgraphs.
    • + Accurate for searching within the given code family.
    • + Good for differing various patches or versions for big binaries.
    • – Relatively more sensitive to instruction set changes, optimizations, and obfuscation.
    • – Need to pre-define the syntax of the assembly code language.
    • – Need to have an assembly code of the same chosen family in the repository.

Sym1n0

Semantic clone search by differentiated fuzz testing and constraint solving. An efficient and scalable dynamic-static hybrid approach (<1s average query time and <100ms average index time with 1.5M functions). Given a target function (the one on the left as shown below), it can identify the cloned subgraphs among other functions in the repository (the one on the right as shown below). Support visualization of abstract syntax graph.

  • Application Type: Sym1n0 (v2 only)
  • Clone search by both symbolic execution and concrete execution.
  • Differentiate functions based on their different I/O behavior.
  • Clone search conducted on the abstract syntax graph constructed from Vex IR (powered by LibVex).
    • + Clone search across different assembly code families.
      • For example, indexed x86 binaries but the query is ARM code.
    • + Subgraph clone search.
    • + Support a wide range of families through LibVex.
      • x86, AMD64, MIPS32, MIPS64, PowerPC32, PowerPC64, ARM32, and ARM64.
    • + An efficient dynamic-static hybrid approach.
    • + Ideal for analyzing firmware compiled for different processors.
    • – Sensitive to heavy graph manipulation (such as a full flattening).
    • – Sensitive to large scale breakdown of basic block integrity.

Ams2Vec

Asm2Vec leverages representation learning. It understands the lexical-semantic relationship of assembly code. For example, xmm* registers are semantically related to vector operations such as addps. memcpy is similar to strcpy. The graph below shows different assembly functions compiled from the same source code of gmpz_tdiv_r_2exp in libgmp. From left to right, the assembly functions are compiled with GCC O0 option, GCC O3 option, O-LLVM obfuscator Control Flow Graph, Flattening option, and LLVM obfuscator Bogus Control Flow Graph option. Asm2Vec can statically identify them as clones.

  • Leverage representation learning.
  • Understand the lexical-semantic relationship of assembly code.
    • + State-of-the-art for clone search against heavy code obfuscation techniques.
      • (>0.8 accuracy for all options applied in O-LLVM, multiple iterations).
    • + State-of-the-art for clone search against code optimization.
      • (>0.8 accuracy between O0 and O3, >0.94 accuracy between O2 and O3)
    • + Even better result than the most recent dynamic approach.
    • + Much more efficient than recent dynamic approaches.
    • + Do not need to define the architecture. It self-learns by reading a large volume of code.
    • + Static approach: efficient and scalable.
    • – No subgraphs.
    • – Assume the assembly code comes from the same processor family.
    • – Static approach: cannot recognize jump table, etc.

Platform Overview

The figure below shows the major UI components and functionalities of Kam1n0 v2.x. We adopt a material design. In general, each user has an application list, a running-job list, and a result file list.

  • Application list shows the application instances owned by the user and shared by the others.
  • Running-job list shows the running progress for a large query (such as chrome.dll) and indexing procedure.
  • Result file list displays the saved results. More details of the UI design can be found in our detailed tutorial.

Changelog v2.2

  • UI updates for server and IDA Pro plug-in
  • Improve Documentation
  • Allow to reset parameters in workbench
  • Allow to append files multiple times to index and search

Install && Use

Copyright 2015 McGill University