About

I am a Senior Manager of Applied Science & AI at Oracle Corporation, Melbourne, Australia. My research and engineering work spans Natural Language Processing (NLP), Generative AI, Large Language Models (LLMs), Agentic AI, NL2SQL, NL2Code, Conversational AI, Analytics AI, and Healthcare AI. I lead multiple science teams building production-grade AI systems across Oracle's enterprise cloud portfolio — Oracle Analytics Cloud (OAC), Oracle Cloud Infrastructure (OCI), Oracle Digital Assistant (ODA), and Oracle Health & AI (OHAI).

I obtained my PhD in Engineering (NLP & Deep Learning) from the University of Melbourne in 2019, under the joint supervision of Prof. Trevor Cohn and Prof. Reza Haffari. Prior to Oracle, I was an AI Scientist at Speak.AI (a spin-off of Voicebox Technologies, subsequently acquired by Oracle), a research intern at NAVER LABS Europe (formerly Xerox Research Centre Europe), a visiting scholar at Carnegie Mellon University, Language Technologies Institute, and a Senior Research Engineer at HLT, I²R, A*STAR, Singapore. Before that, I studied at National University of Singapore (NUS) (MSc) and was a teaching & research assistant at the University of Science, Vietnam National University HCMC.

I hold 34 granted U.S. patents and 42 pending, covering NL2SQL, NL2Code, LLM training, NER, and agentic systems. My research has received 1,350+ citations (h-index 13, i10-index 17) on Google Scholar, with publications at ACL, EMNLP, NAACL, and COLING.

Recent Highlights

*** Our paper CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems has been accepted at ACL 2026 (Industry Track).

*** Two papers accepted at NAACL 2025: Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs (Industry Track) and Mastering the Craft of Data Synthesis for CodeLLMs (Long Paper).

*** Our paper SQLong: Enhanced NL2SQL for Longer Contexts with LLMs has been accepted at the Table Representation Learning Workshop at ACL 2025.

*** I was promoted to Senior Manager, Applied Science & AI at Oracle Corporation (Oct 2024), leading AI research programs across OAC, OCI, ODA, and OHAI.

*** Our Oracle IP portfolio has grown to 34 granted U.S. patents and 42 pending applications spanning NL2SQL, NL2Code, LLM training, NER, and dialog systems.


Selected Patents (2022–2026)

See full patent portfolio (34 granted • 42 pending).

Detecting out-of-domain, out-of-scope, and confusion-span (OOCS) input for a natural language to logical form model
US Patent 12,608,373 • 2026   Granted NL2SQL
Gioacchino Tangari, Cong Duy Vu Hoang, Poorya Zaremoodi, Philip Arthur, Nitika Mathur, Mark Edward Johnson, Thanh Long Duong
Transforming natural language to structured query language based on multi-task learning and joint training
US Patent 12,430,329 • 2025   Granted NL2SQL
Cong Duy Vu Hoang, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong, Srinivasa Phani Kumar Gadde, Balakota Srinivas Vinnakota
Framework for focused training of language models and techniques for end-to-end hypertuning of the framework
US Patent 12,288,550 • 2025   Granted LLM Training
Poorya Zaremoodi, Cong Duy Vu Hoang, Duy Vu, Dai Hoang Tran, Budhaditya Saha, Nagaraj N. Bhat, Thanh Tien Vu, Tuyen Quang Pham, Adam Craig Pocock, Katherine Silverstein, Srinivasa Phani Kumar Gadde, Vishal Vishnoi, Mark Edward Johnson, Thanh Long Duong
Multi-feature balancing for natural language processors
US Patent 12,153,885 • 2024   Granted NL2SQL
Thanh Long Duong, Vishal Vishnoi, Mark Edward Johnson, Elias Luqman Jalaluddin, Tuyen Quang Pham, Cong Duy Vu Hoang, Poorya Zaremoodi, Srinivasa Phani Kumar Gadde, Aashna Devang Kanuga, Zikai Li, Yuanxu Wu
Method and system for training neural sequence-to-sequence models by incorporating global features
US Patent 11,681,911 • 2023   Granted ML Training
Cong Duy Vu Hoang, Ioan Calapodescu, Marc Dymetman
Task-oriented dialog suitable for a standalone device
US Patent 11,574,636 • 2023   Granted Dialogue
Thanh Long Duong, Mark Edward Johnson, Cong Duy Vu Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Eric Black, Andrew David Bleeker, Serge Le Huitouze

Selected Publications

CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems
Tabinda Sarwar, Farhad Moghimifar, Cong Duy Vu Hoang, Xiaoxiao Ma, Shawn Chang Xu, Fahimeh Saleh, Poorya Zaremoodi, Avirup Sil, Katrin Kirchhoff. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) (Industry Track), 2026. NL2SQL
Abstract
Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs
Cong Duy Vu Hoang, Gioacchino Tangari, Clemence Lanfranchi, Dalu Guo, Paul Cayet, Steve Siu, Don Dharmasiri, Yuan-Fang Li, Long Duong, Damien Hilloulin, Rhicheek Patra, Sungpack Hong, Hassan Chafi. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) (Industry Track), 2025. NL2SQL
Abstract
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Duc Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) (Long Paper), 2025. NL2Code
Abstract
On the Use of Prior and External Knowledge in Neural Sequence Models
PhD Thesis, The University of Melbourne, VIC, Australia. 2019. PhD Thesis
Abstract
An Adaptable Task-oriented Dialog System for Stand-alone Embedded Devices
Long Duong, Cong Duy Vu Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew Bleeker, Serge Le Huitouze, Mark Johnson. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-19) (System Demonstrations), 2019. Dialogue
Abstract Video
Moment Matching Training for Neural Machine Translation - A Preliminary Study
Cong Duy Vu Hoang, Ioan Calapodescu, Marc Dymetman. In arXiv preprint, 2018. Machine Translation
Abstract
Improved Neural Machine Translation using Side Information
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 16th Annual Workshop of The Australasian Language Technology Association (ALTA'18) (long, oral) (best paper award), 2018. Machine Translation
Abstract Code
Iterative Back-Translation for Neural Machine Translation
Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 2nd Workshop on Neural Machine Translation and Generation associated with ACL 2018 (long, poster), 2018. Machine Translation
Abstract
Towards Decoding as Continuous Optimization in Neural Machine Translation
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP'17) (long, oral), 2017. Machine Translation
Abstract Code
Improving Neural Translation Models with Linguistic Factors
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 14th Annual Workshop of The Australasian Language Technology Association (ALTA'16) (long, oral) (best paper award), 2016. Machine Translation
Abstract
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vylomova, Kaisheng Yao, Chris Dyer and Gholamreza Haffari. In Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'16) (long), 2016. Machine Translation
Abstract Code
Incorporating Side Information into Recurrent Neural Network Language Models
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'16) (short), 2016. Language Modeling
Abstract Code