Paper accepted at ASIA CCS'25
2 July 2025, by Mathias Fischer

Photo: ACM ASIA CCS
Our paper, ”Enhancing Binary Code Similarity Analysis for Software Updates: A Contextual Diffing Framework” was accepted for publication at ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2025).
We will present our results at the workshop in Hanoi/Vietnam and seek an exchange with international researchers.
Paper Abstract:
Software analysis and reverse engineering are essential for identifying vulnerabilities, analyzing malware, and modifying software. Rediscovering function locations across versions is critical in security applications such as vulnerability tracking, malware analysis, and maintaining instrumentations. Software updates disrupt these workflows by changing function layouts, requiring re-identification in updated binaries. Binary Code Similarity Analysis (BCSA) can help, but most existing approaches are designed and evaluated for cross-optimization (XO), cross-compiler (XC), cross-architecture (XA), or cross-binary (XB) scenarios, not the cross-version (XV) setting. XV enables assumptions such as the stability of a function’s role and surrounding call graph, which are often invalidated by transformations like inlining or splitting in other settings. We explore BCSA for function rediscovery across software updates (XV). Our method incorporates version-specific call graphs and recursive neighborhood matching to improve accuracy. While neighborhood-based strategies are common, we generalize them through recursive application and contextual integration. Applied as a meta-algorithm, our approach refines both structural and embedding-based techniques. We also introduce an evaluation framework tailored to rediscovery, measuring effectiveness beyond accuracy or recall@1. Our metrics capture differences across function properties like size, call graph complexity, and uniqueness. The approach improves the accuracy of existing BCSA tools (e.g., Asm2Vec, SAFE) by up to 38.66% in XV settings, reaching 90.02% overall. Significant gains are seen for functions with limited call graph context (up to 78%) and for uniquely identifiable functions (up to 352%).