开源软件名称(OpenSource Name): src-d/awesome-machine-learning-on-source-code开源软件地址(OpenSource Url): https://github.com/src-d/awesome-machine-learning-on-source-code开源编程语言(OpenSource Language): 开源软件介绍(OpenSource Introduction): Awesome Machine Learning On Source Code
Notice: This repository is no longer actively maintained, and no further updates will be done, nor issues/PRs will be answered or attended.
An alternative actively maintained can be found at ml4code.github.io repository .
A curated list of awesome research papers, datasets and software projects devoted to machine learning and source code. #MLonCode
Contents
Posts
Talks
Software
Datasets
Credits
Contributions
License
Digests
Conferences
Competitions
CodRep - competition on automatic program repair: given a source line, find the insertion point.
Papers
Program Synthesis and Induction
Program Synthesis and Semantic Parsing with Learned Code Idioms - Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, Oleksandr Polozov, 2019.
Synthetic Datasets for Neural Program Synthesis - Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, Dawn Song, ICLR 2019.
Execution-Guided Neural Program Synthesis - Xinyun Chen, Chang Liu, Dawn Song, ICLR 2019.
DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing - Xiao Liu, Xiaoting Li, Rupesh Prajapati, Dinghao Wu, AAAI 2019.
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System - Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, Michael D. Ernst, LREC 2018.
Recent Advances in Neural Program Synthesis - Neel Kant, 2018.
Neural Sketch Learning for Conditional Program Generation - Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, Chris Jermaine, ICLR 2018.
Neural Program Search: Solving Programming Tasks from Description and Examples - Illia Polosukhin, Alexander Skidanov, ICLR 2018.
Neural Program Synthesis with Priority Queue Training - Daniel A. Abolafia, Mohammad Norouzi, Quoc V. Le, 2018.
Towards Synthesizing Complex Programs from Input-Output Examples - Xinyun Chen, Chang Liu, Dawn Song, ICLR 2018.
Glass-Box Program Synthesis: A Machine Learning Approach - Konstantina Christakopoulou, Adam Tauman Kalai, AAAI 2018.
Synthesizing Benchmarks for Predictive Modeling - Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather, CGO 2017
Program Synthesis for Character Level Language Modeling - Pavol Bielik, Veselin Raychev, Martin Vechev, ICLR 2017.
SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning - Xiaojun Xu, Chang Liu, Dawn Song, 2017.
Learning to Select Examples for Program Synthesis - Yewen Pu, Zachery Miranda, Armando Solar-Lezama, Leslie Pack Kaelbling, 2017.
Neural Program Meta-Induction - Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew Hausknecht, Pushmeet Kohli, NIPS 2017.
Learning to Infer Graphics Programs from Hand-Drawn Images - Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Joshua B. Tenenbaum, 2017.
Neural Attribute Machines for Program Generation - Matthew Amodio, Swarat Chaudhuri, Thomas Reps, 2017.
Abstract Syntax Networks for Code Generation and Semantic Parsing - Maxim Rabinovich, Mitchell Stern, Dan Klein, ACL 2017.
Making Neural Programming Architectures Generalize via Recursion - Jonathon Cai, Richard Shin, Dawn Song, ICLR 2017.
A Syntactic Neural Model for General-Purpose Code Generation - Pengcheng Yin, Graham Neubig, ACL 2017.
Program Synthesis from Natural Language Using Recurrent Neural Networks - Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, Michael Ernst, 2017.
RobustFill: Neural Program Learning under Noisy I/O - Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli, ICML 2017.
Lifelong Perceptual Programming By Example - Gaunt, Alexander L., Marc Brockschmidt, Nate Kushman, and Daniel Tarlow, 2017.
Neural Programming by Example - Chengxun Shu, Hongyu Zhang, AAAI 2017.
DeepCoder: Learning to Write Programs - Balog Matej, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow, ICLR 2017.
A Differentiable Approach to Inductive Logic Programming - Yang Fan, Zhilin Yang, and William W. Cohen, 2017.
Latent Attention For If-Then Program Synthesis - Xinyun Chen, Chang Liu, Richard Shin, Dawn Song, Mingcheng Chen, NIPS 2016.
Latent Predictor Networks for Code Generation - Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom, ACL 2016.
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version) - Liang Chen, Jonathan Berant, Quoc Le, Kenneth D. Forbus, and Ni Lao, NIPS 2016.
Programs as Black-Box Explanations - Singh, Sameer, Marco Tulio Ribeiro, and Carlos Guestrin, NIPS 2016.
Search-Based Generalization and Refinement of Code Templates - Tim Molderez, Coen De Roover, SSBSE 2016.
Structured Generative Models of Natural Source Code - Chris J. Maddison, Daniel Tarlow, ICML 2014.
Source Code Analysis and Language modeling
Modeling Vocabulary for Big Code Machine Learning - Hlib Babii, Andrea Janes, Romain Robbes, 2019.
Generative Code Modeling with Graphs - Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, Oleksandr Polozov, ICLR 2019.
NL2Type: Inferring JavaScript Function Types from Natural Language Information - Rabee Sohail Malik, Jibesh Patra, Michael Pradel, ICSE 2019.
A Novel Neural Source Code Representation based on Abstract Syntax Tree - Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, Xudong Liu, ICSE 2019.
Deep Learning Type Inference - Vincent J. Hellendoorn, Christian Bird, Earl T. Barr and Miltiadis Allamanis, FSE 2018. Code .
Tree2Tree Neural Translation Model for Learning Source Code Changes - Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray, 2018.
code2seq: Generating Sequences from Structured Representations of Code - Uri Alon, Omer Levy, Eran Yahav, 2018.
Syntax and Sensibility: Using language models to detect and correct syntax errors - Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral, SANER 2018.
code2vec: Learning Distributed Representations of Code - Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav, 2018.
Learning to Represent Programs with Graphs - Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi, ICLR 2018.
A Survey of Machine Learning for Big Code and Naturalness - Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton, 2017.
Are Deep Neural Networks the Best Choice for Modeling Source Code? - Vincent J. Hellendoorn, Premkumar Devanbu, FSE 2017.
A deep language model for software code - Hoa Khanh Dam, Truyen Tran, Trang Pham, 2016.
Convolutional Neural Networks over Tree Structures for Programming Language Processing - Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin, AAAI-16. Code .
Suggesting Accurate Method and Class Names - Miltiadis Allamanis, Earl T. Barr, Christian Bird, Charles Sutton, FSE 2015.
Mining Source Code Repositories at Massive Scale using Language Modeling - Miltiadis Allamanis, Charles Sutton, MSR 2013.
Neural Network Architectures and Algorithms
Learning Compositional Neural Programs with Recursive Tree Search and Planning - Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas, 2019.
From Programs to Interpretable Deep Models and Back - Eran Yahav, ICCAV 2018.
Neural Code Comprehension: A Learnable Representation of Code Semantics - Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler, NIPS 2018.
A General Path-Based Representation for Predicting Program Properties - Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav, PLDI 2018.
Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks - Nghi D. Q. Bui, Lingxiao Jiang, Yijun Yu, AAAI 2018.
Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification - Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang, SANER 2018.
Syntax-Directed Variational Autoencoder for Structured Data - Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, Le Song, ICLR 2018.
Divide and Conquer with Neural Networks - Nowak, Alex, and Joan Bruna, ICLR 2018.
Hierarchical multiscale recurrent neural networks - Chung Junyoung, Sungjin Ahn, and Yoshua Bengio, ICLR 2017.
Learning Efficient Algorithms with Hierarchical Attentive Memory - Andrychowicz, Marcin, and Karol Kurach, 2016.
Learning Operations on a Stack with Neural Turing Machines - Deleu, Tristan, and Joseph Dureau, NIPS 2016.
Probabilistic Neural Programs - Murray, Kenton W., and Jayant Krishnamurthy, NIPS 2016.
Neural Programmer-Interpreters - Reed, Scott, and Nando de Freitas, ICLR 2016.
Neural GPUs Learn Algorithms - Kaiser, Łukasz, and Ilya Sutskever, ICLR 2016.
Neural Random-Access Machines - Karol Kurach, Marcin Andrychowicz, Ilya Sutskever, ERCIM News 2016.
Neural Programmer: Inducing Latent Programs with Gradient Descent - Neelakantan, Arvind, Quoc V. Le, and Ilya Sutskever, ICLR 2015.
Learning to Execute - Wojciech Zaremba, Ilya Sutskever, 2015.
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets - Joulin, Armand, and Tomas Mikolov, NIPS 2015.
Neural Turing Machines - Graves, Alex, Greg Wayne, and Ivo Danihelka, 2014.
From Machine Learning to Machine Reasoning - Bottou Leon, Journal of Machine Learning 2011.
Embeddings in Software Engineering
A Literature Study of Embeddings on Source Code - Zimin Chen and Martin Monperrus, 2019.
AST-Based Deep Learning for Detecting Malicious PowerShell - Gili Rusak, Abdullah Al-Dujaili, Una-May O'Reilly, 2018.
Deep Code Search - Xiaodong Gu, Hongyu Zhang, Sunghun Kim, ICSE 2018.
Word Embeddings for the Software Engineering Domain - Vasiliki Efstathiou, Christos Chatzilenas, Diomidis Spinellis, MSR 2018.
Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces - Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps, FSE 2018.
Document Distance Estimation via Code Graph Embedding - Zeqi Lin, Junfeng Zhao, Yanzhen Zou, Bing Xie, Internetware 2017.
Combining Word2Vec with revised vector space model for better code retrieval - Thanh Van Nguyen, Anh Tuan Nguyen, Hung Dang Phan, Trong Duc Nguyen, Tien N. Nguyen, ICSE 2017.
From word embeddings to document similarities for improved information retrieval in software engineering - Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, Chang Liu, ICSE 2016.
Mapping API Elements for Code Migration with Vector Representation - Trong Duc Nguyen, Anh Tuan Nguyen, Tien N. Nguyen, ICSE 2016.
Program Translation
Towards Neural Decompilation - Omer Katz, Yuval Olshaker, Yoav Goldberg, Eran Yahav, 2019.
Tree-to-tree Neural Networks for Program Translation - Xinyun Chen, Chang Liu, Dawn Song, ICLR 2018.
Code Attention: Translating Code to Comments by Exploiting Domain Features - Wenhao Zheng, Hong-Yu Zhou, Ming Li, Jianxin Wu, 2017.
Automatically Generating Commit Messages from Diffs using Neural Machine Translation - Siyuan Jiang, Ameer Armaly, Collin McMillan, ASE 2017.
A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation - Antonio Valerio Miceli Barone, Rico Sennrich, ICNLP 2017.
A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes - Pablo Loyola, Edison Marrese-Taylor, Yutaka Matsuo, ACL 2017.
Code Suggestion and Completion
Aroma: Code Recommendation via Structural Code Search - Sifei Luan, Di Yang, Koushik Sen and Satish Chandra, 2019.
Intelligent Code Reviews Using Deep Learning - Anshul Gupta, Neel Sundaresan, KDD DL Day 2018.
Code Completion with Neural Attention and Pointer Networks - Jian Li, Yue Wang, Irwin King, Michael R. Lyu, 2017.
Learning Python Code Suggestion with a Sparse Pointer Network - Avishkar Bhoopchand, Tim Rocktäschel, Earl Barr, Sebastian Riedel, 2016.
Code Completion with Statistical Language Models - Veselin Raychev, Martin Vechev, Eran Yahav, PLDI 2014.
Program Repair and Bug Detection
SampleFix: Learning to Correct Programs by Sampling Diverse Fixes - Hossein Hajipour, Apratim Bhattacharya, Mario Fritz, 2019.
Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection - Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, Lizhen Qu, ICLR 2019.
Neural Program Repair by Jointly Learning to Localize and Repair - Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, Rishabh S
请发表评论