@brusilovsky

SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction

, , , , and . Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, page 783-792. ACM, (October 2023)
DOI: 10.1145/3583780.3615047

Abstract

Automated analysis of programming data using code representation methods offers valuable services for programmers, from code completion to clone detection to bug detection. Recent studies show the effectiveness of Abstract Syntax Trees (AST), pre-trained Transformer-based models, and graph-based embeddings in programming code representation. However, pre-trained large language models lack interpretability, while other embedding-based approaches struggle with extracting important information from large ASTs. This study proposes a novel Subtree-based Attention Neural Network (SANN) to address these gaps by integrating different components: an optimized sequential subtree extraction process using Genetic algorithm optimization, a two-way embedding approach, and an attention network. We investigate the effectiveness of SANN by applying it to two different tasks: program correctness prediction and algorithm detection on two educational datasets containing both small and large-scale code snippets written in Java and C, respectively. The experimental results show SANN's competitive performance against baseline models from the literature, including code2vec, ASTNN, TBCNN, CodeBERT, GPT-2, and MVG, regarding accurate predictive power. Finally, a case study is presented to show the interpretability of our model prediction and its application for an important human-centered computing application, student modeling. Our results indicate the effectiveness of the SANN model in capturing important syntactic and semantic information from students' code, allowing the construction of accurate student models, which serve as the foundation for generating adaptive instructional support such as individualized hints and feedback.

Description

SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction | Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Links and resources

Tags

community

  • @brusilovsky
  • @dblp
@brusilovsky's tags highlighted