About

I am a Senior Manager of Applied Science & AI at Oracle Corporation, Melbourne, Australia. My research and engineering work spans Natural Language Processing (NLP), Generative AI, Large Language Models (LLMs), Agentic AI, NL2SQL, NL2Code, Conversational AI, Analytics AI, and Healthcare AI. I lead multiple science teams building production-grade AI systems across Oracle's enterprise cloud portfolio — Oracle Analytics Cloud (OAC), Oracle Cloud Infrastructure (OCI), Oracle Digital Assistant (ODA), and Oracle Health & AI (OHAI).

I obtained my PhD in Engineering (NLP & Deep Learning) from the University of Melbourne in 2019, under the joint supervision of Prof. Trevor Cohn and Prof. Reza Haffari. Prior to Oracle, I was an AI Scientist at Speak.AI (a spin-off of Voicebox Technologies, subsequently acquired by Oracle), a research intern at NAVER LABS Europe (formerly Xerox Research Centre Europe), a visiting scholar at Carnegie Mellon University, Language Technologies Institute, and a Senior Research Engineer at HLT, I²R, A*STAR, Singapore. Before that, I studied at National University of Singapore (NUS) (MSc) and was a teaching & research assistant at the University of Science, Vietnam National University HCMC.

I hold 34 granted U.S. patents and more than 42 pending, covering NL2SQL, NL2Code, LLM training, NER, and agentic systems. My research has received 1,350+ citations (h-index 13, i10-index 17) on Google Scholar, with publications at ACL, EMNLP, NAACL, and COLING.

Recent Highlights

*** Our paper CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems has been accepted at ACL 2026 (Industry Track).

*** Two papers accepted at NAACL 2025: Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs (Industry Track) and Mastering the Craft of Data Synthesis for CodeLLMs (Long Paper).

*** Our paper SQLong: Enhanced NL2SQL for Longer Contexts with LLMs has been accepted at the Table Representation Learning Workshop at ACL 2025.

*** I was promoted to Senior Manager, Applied Science & AI at Oracle Corporation (Oct 2024), leading AI research programs across OAC, OCI, ODA, and OHAI.

*** Our Oracle IP portfolio has grown to 34 granted U.S. patents and more than 42 pending applications spanning NL2SQL, NL2Code, LLM training, NER, and dialog systems.

Selected Publications

CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems
Tabinda Sarwar, Farhad Moghimifar, Cong Duy Vu Hoang, Xiaoxiao Ma, Shawn Chang Xu, Fahimeh Saleh, Poorya Zaremoodi, Avirup Sil, Katrin Kirchhoff. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) (Industry Track), 2026.
Abstract
Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs
Cong Duy Vu Hoang, Gioacchino Tangari, Clemence Lanfranchi, Dalu Guo, Paul Cayet, Steve Siu, Don Dharmasiri, Yuan-Fang Li, Long Duong, Damien Hilloulin, Rhicheek Patra, Sungpack Hong, Hassan Chafi. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) (Industry Track), 2025.
Abstract
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Duc Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) (Long Paper), 2025.
Abstract
On the Use of Prior and External Knowledge in Neural Sequence Models
PhD Thesis, The University of Melbourne, VIC, Australia. 2019.
Abstract
An Adaptable Task-oriented Dialog System for Stand-alone Embedded Devices
Long Duong, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew Bleeker, Serge Le Huitouze, Mark Johnson. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-19) (System Demonstrations), 2019.
Abstract Video
Moment Matching Training for Neural Machine Translation - A Preliminary Study
Cong Duy Vu Hoang, Ioan Calapodescu, Marc Dymetman. In arXiv preprint, 2018.
Abstract
Improved Neural Machine Translation using Side Information
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 16th Annual Workshop of The Australasian Language Technology Association (ALTA'18) (long, oral) (best paper award), 2018.
Abstract Code
Iterative Back-Translation for Neural Machine Translation
Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 2nd Workshop on Neural Machine Translation and Generation associated with ACL 2018 (long, poster), 2018.
Abstract
Towards Decoding as Continuous Optimization in Neural Machine Translation
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP'17) (long, oral), 2017.
Abstract Code
Improving Neural Translation Models with Linguistic Factors
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 14th Annual Workshop of The Australasian Language Technology Association (ALTA'16) (long, oral) (best paper award), 2016.
Abstract
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vylomova, Kaisheng Yao, Chris Dyer and Gholamreza Haffari. In Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'16) (long), 2016.
Abstract Code
Incorporating Side Information into Recurrent Neural Network Language Models
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'16) (short), 2016.
Abstract Code