NLP Research Scientist
Researching multimodal document understanding methods for reasoning over documents that combine text, tables, and images. Building large-scale multimodal ML systems.
About
I'm a postdoctoral researcher at the University of Edinburgh, working with Prof. Mirella Lapata on how vision-language models reason over real-world documents that combine text, tables, figures, and images. Tables have been my long-standing focus, but my work now spans documents as a whole. I have 14 years of combined experience across academia and industry.
Experience
Research on multimodal document understanding in Prof. Mirella Lapata's group, focusing on visually-represented language, table understanding, and reasoning over multimodal documents.
- TABLET (ICLR 2026): a 4M-example visual table understanding dataset across 21 tasks, where fine-tuned models improve robustness when exposed to real-world visually rich tables.
- Improving numerical reasoning of VLMs over long-context hybrid documents. Exploring agentic thinking and latent reasoning beyond chain-of-thought.
Research across two projects: Luminous (EU project on multimodal dialogue for mixed reality headsets) and Antidote (retrieval-augmented medical question answering in LLMs).
- MATE (ACL 2025): a benchmark to test cross-modal entity correlation in vision-language models, showing state-of-the-art models fall significantly short of human performance.
- MedExpQA (Artificial Intelligence in Medicine, 2024): the first multilingual medical QA benchmark with gold explanations from medical doctors. Later adopted by Google as a training dataset for MedGemma.
Developed T5-based data-to-text generation for a widely-used commercial smart assistant. Earlier, developed LSTM-based models for text summarisation and slot filling, deployed in production on AWS.
Early-career research roles applying machine learning to data science problems, including anomaly detection in web logs and retail customer segmentation.