Research Vision
Modern foundation models learn rich representations of language, culture, and visual information, yet we still have limited understanding of what these representations encode and how they should be evaluated. My research asks: what do neural representations actually capture about language, culture, and meaning — and how can we evaluate and improve them?
Evaluation
Benchmarks can be gamed, saturated, or misaligned with the competencies they claim to measure. I am interested in building evaluation frameworks that are sensitive to the cultural, linguistic, and semantic dimensions that standard leaderboards obscure. This led to my work on multilingual cultural benchmarks that probe culturally situated knowledge rather than surface-level pattern matching.
Semantic Representations
Language is not static — words change meaning across time, domains, and communities. Contextual embeddings offer a powerful lens for studying this change, but the relationship between geometric shifts in representation space and human-interpretable meaning is not well understood. I study how learned representations track semantic drift across languages and domains.
Multimodal Systems
Vision-language models are increasingly capable, yet what linguistic structure their visual encoders actually learn remains opaque. I analyze multimodal representations to understand what syntactic and semantic information survives the visual encoding process, and where models rely on spurious correlations instead.
Across these threads, my goal is to contribute evaluation tools and empirical findings that make AI systems more linguistically faithful, culturally aware, and interpretable.
