Lucas Bandarkar

UCLA — Ph.D. Candidate

Machine Learning, Natural Language Processing

Twitter

Semantic Scholar

Google Scholar

Summary

I'm a third-year A.I. Ph.D. student in the Computer Science department at UCLA. I'm advised by Nanyun (Violet) Peng in the PLUS Lab and study multilingual NLP. In particular, I am interested in the interpretability of cross-lingual representations and modularity in LLMs. My Ph.D. is supported by the generous Amazon AI PhD Fellowship.

I did my undergrad at UC Berkeley, and worked in Marti Hearst's NLP lab under the mentorship of Philippe Laban. I then spent a few years as a research data scientist at Meta/Facebook AI working on large-scale multilingual NLP systems, generally focusing on model evaluation, data resources, and global language strategy for a suite of production models such as machine translation, language identification, and text embeddings. Notably, I led the development of the Belebele dataset.

During my Ph.D., I have completed research internships back at Meta in the multilingual GenAI team and at Google Research Australia on Gemini multilinguality. I'm currently in Paris working with the FAIR Omnilingual team.

Research Interests

multi-/cross-lingual representations: cross-lingual transfer, shared or abstract representations, tokenization, model interpretability
model modularity: model merging, mixture-of-experts, implicit modularity, adapters, pruning, PEFT
multilingual data & evaluation: language identification, data annotation & resource creation, embeddings evaluation, translation evaluation

Applications: multilingual embeddings, LLM language adaptation, LMs in low-resource languages, language identification, machine translation

Education

(in progress) Ph.D. in Computer Science, UCLA

Sep 2023 - current

B.A. in Statistics, Data Science, UC Berkeley

Aug 2017 - May 2021

Industry

(in progress) Research Scientist Intern, Meta FAIR

Jun 2026 - current

multilingual agentic evaluation

Student Researcher, Google Research

Aug 2025 - Dec 2025

LLM multilingual knowledge

Research Scientist Intern, Meta AI

Jun 2024 - Sep 2024

LLM cross-lingual transfer

Research Data Scientist, Meta AI

Aug 2021 - Sep 2023

(Data Scientist from Aug 2021 - Nov 2022)

machine translation, language identification, multilingual text embeddings, multilingual optical character recognition, Arabic dialect identification,

machine translation for human content review & automated moderation

Data Scientist Intern, Meta AI

May 2020 - Aug 2020

optical character recognition

Teaching

Spring 2025, CS 180 (Algorithms & Complexity)

Winter 2025: CS 32 (Data Structures)

Fall 2024: CS M148 (Intro to Data Science)

Google Sites

Report abuse