Peer-Reviewed Publications

Google Scholar: 1,350+ citations • h-index 13 • i10-index 17

CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems
Tabinda Sarwar, Farhad Moghimifar, Cong Duy Vu Hoang, Xiaoxiao Ma, Shawn Chang Xu, Fahimeh Saleh, Poorya Zaremoodi, Avirup Sil, Katrin Kirchhoff. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) (Industry Track), 2026.
Abstract
Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs
Cong Duy Vu Hoang, Gioacchino Tangari, Clemence Lanfranchi, Dalu Guo, Paul Cayet, Steve Siu, Don Dharmasiri, Yuan-Fang Li, Long Duong, Damien Hilloulin, Rhicheek Patra, Sungpack Hong, Hassan Chafi. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) (Industry Track), 2025.
Abstract
Mastering the Craft of Data Synthesis for CodeLLMs
Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Duc Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025) (Long Paper), 2025.
Abstract
SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
Quoc Dai Nguyen, Cong Duy Vu Hoang, Duy Quang Vu, Gioacchino Tangari, Thanh Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong. In Proceedings of the 4th Table Representation Learning Workshop at ACL 2025, 2025.
Abstract
On the Use of Prior and External Knowledge in Neural Sequence Models
PhD Thesis, The University of Melbourne, VIC, Australia. 2019.
Abstract
An Adaptable Task-oriented Dialog System for Stand-alone Embedded Devices
Long Duong, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew Bleeker, Serge Le Huitouze, Mark Johnson. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-19) (System Demonstrations), 2019.
Abstract Video
Moment Matching Training for Neural Machine Translation - A Preliminary Study
Cong Duy Vu Hoang, Ioan Calapodescu, Marc Dymetman. In arXiv preprint, 2018.
Abstract
Improved Neural Machine Translation using Side Information
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 16th Annual Workshop of The Australasian Language Technology Association (ALTA'18) (long, oral) (best paper award), 2018.
Abstract Code
Iterative Back-Translation for Neural Machine Translation
Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 2nd Workshop on Neural Machine Translation and Generation associated with ACL 2018 (long, poster), 2018.
Abstract
Towards Decoding as Continuous Optimization in Neural Machine Translation
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP'17) (long, oral), 2017.
Abstract Code
Improving Neural Translation Models with Linguistic Factors
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 14th Annual Workshop of The Australasian Language Technology Association (ALTA'16) (long, oral) (best paper award), 2016.
Abstract
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vylomova, Kaisheng Yao, Chris Dyer and Gholamreza Haffari. In Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'16) (long), 2016.
Abstract Code
Incorporating Side Information into Recurrent Neural Network Language Models
Cong Duy Vu Hoang, Gholamreza Haffari and Trevor Cohn. In Proceedings of The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'16) (short), 2016.
Abstract Code
I2R Chinese-English Translation System for OpenMT 2015
Xuancong Wang, Cong Duy Vu Hoang, Kui Wu, Nina Zhou, Boon Hong Yeo, AiTi Aw, Haizhou Li. In Proceedings of the NIST Open Machine Translation Evaluation (OpenMT15), 2015.
Abstract Result(s)
A Rule-Augmented Statistical Phrase-based Translation System
Cong Duy Vu Hoang, AiTi Aw, Hong-Nhung Nguyen-Thi. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-14) (System Demonstration Track), 2014.
Abstract
Perspectives on Crowdsourcing Annotations for Natural Language Processing
Aobo Wang, Cong Duy Vu Hoang, Min-Yen Kan. In Language Resources and Evaluation Journal (JLRE), 2013.
Abstract
An Unsupervised and Data-Driven Approach for Spell Checking in Vietnamese OCR-scanned Texts
Cong Duy Vu Hoang, Ai Ti Aw. In Proceedings of the EACL'12 Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (long), 2012.
Abstract
Towards Automated Related Work Summarization
Cong Duy Vu Hoang, Min-Yen Kan. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING) (long), 2010.
Abstract Data
Towards Automated Related Work Summarization
Master's Thesis, National University of Singapore. 2010.
Abstract
A Dependency-based Word Reordering Approach for Statistical Machine Translation
Cong Duy Vu Hoang, Mai Ngo, Dien Dinh. In Proceedings of IEEE International Conference on Research, Innovation and Vision for the Future (RIVF 2008) (long), 2008.
Abstract
A Comparative Study on Vietnamese Text Classification Methods
Cong Duy Vu Hoang, Dien Dinh, Le Nguyen Nguyen, Quoc Hung Ngo. In Proceedings of IEEE International Conference on Research, Innovation and Vision for the Future (RIVF 2007) (long), 2007.
Abstract Data

Patents

Full portfolio: 34 granted U.S. patents & more than 42 pending — spanning NL2SQL, NL2Code, LLM training, NER, text classification, analytics, and dialog systems.
All patents listed below (inventor order as filed). Bold name = CDV Hoang.

Granted Patents
Detecting out-of-domain, out-of-scope, and confusion-span (OOCS) input for NL2Logical Form models
US Patent 12,608,373 • 2026   Granted NL2SQL
G. Tangari, CDV Hoang, P. Zaremoodi, P. Arthur, N. Mathur, ME. Johnson, TL. Duong
Data manufacturing frameworks for synthesizing synthetic training data for NL2Logical Form models
US Patent 12,602,617 • 2026   Granted NL2SQL
P. Arthur, V. Vishnoi, ME. Johnson, TL. Duong, SPK. Gadde, BS. Vinnakota, CDV Hoang, SWC. Siu, N. Mathur, G. Tangari, AD. Kanuga
Wide and deep network for language detection using hash embeddings
US Patent 12,602,545 • 2026   Granted Text Classification
TT. Vu, P. Zaremoodi, D. Vu, ME. Johnson, TL. Duong, X. Zhong, V. Blinov, CDV Hoang, YH. Hong, V. Goel, PV. Ogren, SPK. Gadde, V. Vishnoi
Output interpretation for a meaning representation language system
US Patent 12,579,137 • 2026   Granted NL2SQL
C. Xu, P. Zaremoodi, CDV Hoang, N. Mathur, P. Arthur, SWC. Siu, AD. Kanuga, G. Tangari, ME. Johnson, TL. Duong, V. Vishnoi, SA. McRitchie, CM. Broadbent
Transforming natural language to a logical form
US Patent 12,573,380 • 2026   Granted NL2SQL
G. Tangari, CDV Hoang, SA. McRitchie, SWC. Siu, D. Guo, CM. Broadbent, TL. Duong, SPK. Gadde, V. Vishnoi, KKH. Eng, C. Basavaraju
Lexical dropout for natural language processing
US Patent 12,572,852 • 2026   Granted NLP
TQ. Pham, CDV Hoang, TT. Vu, ME. Johnson, TL. Duong
System and method for augmenting training data for NL2Meaning Representation Language systems
US Patent 12,572,755 • 2026   Granted NL2SQL
P. Arthur, G. Tangari, N. Mathur, AD. Kanuga, CDV Hoang, P. Zaremoodi, TL. Duong, ME. Johnson
Gazetteer integration for neural named entity recognition
US Patent 12,566,921 • 2026   Granted NER
TQ. Pham, CDV Hoang, ME. Johnson, TL. Duong
Techniques for using named entity recognition to resolve entity expression
US Patent 12,554,929 • 2026   Granted NER
AD. Kanuga, CDV Hoang, ME. Johnson, V. Raghavendra, Y. Wu, SWC. Siu, et al.
Addressing catastrophic forgetting and over-generalization in neural networks
US Patent 12,541,672 • 2026   Granted ML Training
CDV Hoang et al.
Techniques for augmenting training data for aggregation and sorting operations in NL2SQL
US Patent 12,530,349 • 2026   Granted NL2SQL
G. Tangari, N. Mathur, P. Arthur, CDV Hoang, AD. Kanuga, SWC. Siu, SNA. Zaidi, P. Zaremoodi, TL. Duong, ME. Johnson
Fusion of word embeddings and word scores for text classification
US Patent 12,518,098 • 2026   Granted Text Classification
AA. Abobakr, ME. Johnson, TL. Duong, V. Blinov, YH. Hong, CDV Hoang, D. Vu
Method and system for over-prediction in neural networks
US Patent 12,518,129 • 2026   Granted ML Training
CDV Hoang, TT. Vu, P. Zaremoodi, Y. Xu, V. Blinov, YH. Hong, YDT. Dharmasiri, V. Vishnoi, EL. Jalaluddin, M. Parekh, TL. Duong, ME. Johnson
Fine-tuning multi-head network from a single transformer layer of pre-trained language model
US Patent 12,512,091 • 2025   Granted ML Training
TT. Vu, TQ. Pham, OM. Nezami, ME. Johnson, TL. Duong, CDV Hoang
Enhanced logits for natural language processing
US Patent 12,511,492 • 2025   Granted NL2SQL
Y. Xu, P. Zaremoodi, TT. Vu, CDV Hoang, V. Blinov, YH. Hong, YDT. Dharmasiri, V. Vishnoi, EL. Jalaluddin, M. Parekh, TL. Duong, ME. Johnson
Model robustness on operators and triggering keywords in NL2MRL systems
US Patent 12,475,325 • 2025   Granted NL2SQL
G. Tangari, C. Xu, N. Mathur, P. Arthur, SNA. Zaidi, AD. Kanuga, CDV Hoang, P. Zaremoodi, TL. Duong, ME. Johnson, V. Vishnoi
Calibrating confidence scores of a machine learning model trained as a natural language interface
US Patent 12,430,330 • 2025   Granted NL2SQL
G. Tangari, CDV Hoang, ME. Johnson, P. Zaremoodi, N. Mathur, AD. Kanuga, TL. Duong
Transforming natural language to structured query language based on multi-task learning and joint training
US Patent 12,430,329 • 2025   Granted NL2SQL
CDV Hoang, V. Vishnoi, ME. Johnson, TL. Duong, SPK. Gadde, BS. Vinnakota
Transforming natural language to structured query language based on scalable search and schema linking
US Patent 12,412,034 • 2025   Granted NL2SQL
JM. John, V. Vishnoi, ME. Johnson, TL. Duong, SPK. Gadde, BS. Vinnakota, S. Subramanian, CDV Hoang, YDT. Dharmasiri, N. Mathur, AD. Kanuga, P. Arthur, G. Tangari, SWC. Siu
Method and system for constraint-based hyperparameter tuning
US Patent 12,405,975 • 2025   Granted ML Training
ME. Johnson, TL. Duong, V. Vishnoi, BS. Vinnakota, TQ. Pham, CDV Hoang
Context tag integration with named entity recognition models
US Patent 12,361,219 • 2025   Granted NER
D. Vu, TQ. Pham, CDV Hoang, SPK. Gadde, TL. Duong, ME. Johnson, V. Vishnoi
Data manufacturing frameworks for synthesizing synthetic training data (continuation)
US Patent 12,307,336 • 2025   Granted NL2SQL
P. Arthur, V. Vishnoi, ME. Johnson, TL. Duong, SPK. Gadde, BS. Vinnakota, CDV Hoang, SWC. Siu, N. Mathur, G. Tangari, AD. Kanuga
Techniques for out-of-domain (OOD) detection
US Patent 12,299,402 • 2025   Granted NL2SQL
TL. Duong, ME. Johnson, V. Vishnoi, CC. Pan, V. Blinov, CDV Hoang, EL. Jalaluddin, D. Vu, BS. Vinnakota
Framework for focused training of language models and end-to-end hypertuning
US Patent 12,288,550 • 2025   Granted LLM Training
P. Zaremoodi, CDV Hoang, D. Vu, DH. Tran, B. Saha, NN. Bhat, TT. Vu, TQ. Pham, AC. Pocock, K. Silverstein, SPK. Gadde, V. Vishnoi, ME. Johnson, TL. Duong
Distance-based logit value for natural language processing
US Patent 12,210,842 • 2025   Granted NL2SQL
Y. Xu, P. Zaremoodi, TT. Vu, CDV Hoang, V. Blinov, YH. Hong, YDT. Dharmasiri, V. Vishnoi, EL. Jalaluddin, M. Parekh, TL. Duong, ME. Johnson
System and techniques for handling long text for pre-trained language models
US Patent 12,210,830 • 2025   Granted NLP
TT. Vu, TQ. Pham, ME. Johnson, TL. Duong, Y. Xu, P. Zaremoodi, OM. Nezami, B. Saha, CDV Hoang
Multi-feature balancing for natural language processors
US Patent 12,153,885 • 2024   Granted NL2SQL
TL. Duong, V. Vishnoi, ME. Johnson, EL. Jalaluddin, TQ. Pham, CDV Hoang, P. Zaremoodi, SPK. Gadde, AD. Kanuga, Z. Li, YX. Wu
Distance-based logit value for natural language processing (continuation)
US Patent 12,019,994 • 2024   Granted NL2SQL
Y. Xu, P. Zaremoodi, TT. Vu, CDV Hoang, V. Blinov, YH. Hong, YDT. Dharmasiri, V. Vishnoi, EL. Jalaluddin, M. Parekh, TL. Duong, ME. Johnson
Techniques for out-of-domain (OOD) detection (continuation)
US Patent 12,014,146 • 2024   Granted NL2SQL
TL. Duong, ME. Johnson, V. Vishnoi, CC. Pan, V. Blinov, CDV Hoang, EL. Jalaluddin, D. Vu, BS. Vinnakota
Enhanced logits for natural language processing (continuation)
US Patent 11,972,220 • 2024   Granted NL2SQL
Y. Xu, P. Zaremoodi, TT. Vu, CDV Hoang, V. Blinov, YH. Hong, YDT. Dharmasiri, V. Vishnoi, EL. Jalaluddin, M. Parekh, TL. Duong, ME. Johnson
Techniques for out-of-domain (OOD) detection (original)
US Patent 11,763,092 • 2023   Granted NL2SQL
TL. Duong, ME. Johnson, V. Vishnoi, CC. Pan, V. Blinov, CDV Hoang, EL. Jalaluddin, D. Vu, BS. Vinnakota
Method and system for training neural sequence-to-sequence models by incorporating global features
US Patent 11,681,911 • 2023   Granted ML Training
CDV Hoang, I. Calapodescu, M. Dymetman
Task-oriented dialog suitable for a standalone device
US Patent 11,574,636 • 2023   Granted Dialogue
TL. Duong, ME. Johnson, CDV Hoang, TQ. Pham, YH. Hong, V. Dovgalecs, G. Bashkansky, J. Black, A. Bleeker, S. Le Huitouze
Patent Applications & Pre-Grant Publications
Large language models for NL2SQL with long context finetuning
US Patent App. 19/035,561 • 2026   Application NL2SQL
CDV Hoang et al.
Fusion of word embeddings and word scores for text classification
US Patent App. 19/415,225 • 2026   Application Text Classification
CDV Hoang et al.
Fine-tuning multi-head network from a single transformer layer (continuation)
US Patent App. 19/402,418 • 2026   Application ML Training
CDV Hoang et al.
In-context learning for NL2SQL with pattern-based retrieval
US Patent App. 19/062,932 • 2026   Application NL2SQL
CDV Hoang et al.
Generating test suites for testing code modules
US Patent App. 18/936,052 • 2026   Application NL2Code
CDV Hoang et al.
SQL fixit — automated generation of fine-tuning data using LLMs
US Patent App. 18/820,708 • 2026   Application NL2SQL
CDV Hoang et al.
Execution and semantic error correction capabilities for NL2Logical Form models
US Patent App. 18/788,732 • 2026   Application NL2SQL
CDV Hoang et al.
Calibrating confidence scores of a machine learning model trained as a natural language interface (continuation)
US Patent App. 19/311,565 • 2025   Application NL2SQL
CDV Hoang et al.
Context tag integration with named entity recognition models (continuation)
US Patent App. 19/235,404 • 2025   Application NER
CDV Hoang et al.
Techniques for efficient encoding in neural semantic parsing systems
US Patent App. 18/409,676 • 2025   Application NL2SQL
CDV Hoang, P. Zaremoodi, TT. Vu, G. Tangari, ME. Johnson, TL. Duong, V. Vishnoi
Resolving date/time expression ambiguity in transforming natural language to a meaning representation
US Patent App. 18/409,193 • 2025   Application NL2SQL
CDV Hoang et al.
Framework for focused training of language models and end-to-end hypertuning (continuation)
US Patent App. 19/085,675 • 2025   Application LLM Training
P. Zaremoodi, CDV Hoang, D. Vu, DH. Tran, B. Saha, et al.
System and techniques for handling long text for pre-trained language models (continuation)
US Patent App. 18/987,825 • 2025   Application NLP
CDV Hoang et al.
Managing date-time intervals in transforming natural language to a logical form
US Patent App. 18/794,986 • 2025   Application NL2SQL
CDV Hoang et al.
Techniques for transforming natural language conversation into a visualization representation
US Patent App. 18/616,801 • 2025   Application Analytics
CDV Hoang et al.
Techniques for manufacturing training data to transform natural language into a visualization representation
US Patent App. 18/593,316 • 2025   Application Analytics
CDV Hoang et al.
Multi-feature balancing for natural language processors (continuation)
US Patent App. 18/819,441 • 2024   Application NL2SQL
CDV Hoang et al.
System and method of selective fine-tuning for custom training of a NL2Logical Form model
US Patent App. 18/236,192 • 2024   Application ML Training
CDV Hoang et al.
Techniques for converting a natural language utterance to an intermediate database query representation
US Patent App. 18/209,844 • 2024   Application NL2SQL
CDV Hoang, SA. McRitchie, ME. Johnson, et al.
Method and system for target-based hyperparameter tuning
US Patent App. 17/216,498 • 2021   Application ML Training
CDV Hoang et al.