Lucas Bandarkar

UCLA — Ph.D. Candidate

Machine Learning, Natural Language Processing

Twitter

Semantic Scholar

Google Scholar

Summary

I'm a third-year A.I. Ph.D. student in the Computer Science department at UCLA. I'm advised by Nanyun (Violet) Peng in the PLUS Lab and study multilingual NLP. In particular, I am interested in the interpretability of cross-lingual representations and modularity in LLMs. My Ph.D. is supported by the generous Amazon AI PhD Fellowship.

I did my undergrad at UC Berkeley, and worked in Marti Hearst's NLP lab under the mentorship of Philippe Laban. I then spent a few years as a research data scientist at Meta/Facebook AI working on large-scale multilingual NLP systems, generally focusing on model evaluation, data resources, and global language strategy for a suite of production models such as machine translation, language identification, and text embeddings. Notably, I led the development of the Belebele dataset.

During my Ph.D., I have completed research internships back at Meta in the multilingual GenAI team and at Google Research Australia on Gemini multilinguality.

Research Interests

multi-/cross-lingual representations: cross-lingual transfer, shared or abstract representations, tokenization, model interpretability
model modularity: model merging, mixture-of-experts, implicit modularity, adapters, pruning, PEFT
multilingual data & evaluation: language identification, data annotation & resource creation, embeddings evaluation, translation evaluation

Applications: multilingual embeddings, LLM language adaptation, LMs in low-resource languages, language identification, machine translation

Education

(in progress) Ph.D. in Computer Science, UCLA

Sep 2023 - current

B.A. in Statistics, Data Science, UC Berkeley

Aug 2017 - May 2021

Industry

Student Researcher, Google Research

Aug 2025 - Dec 2025

LLM multilingual knowledge

Research Scientist Intern, Meta AI

Jun 2024 - Sep 2024

LLM cross-lingual transfer

Research Data Scientist, Meta AI

Aug 2021 - Sep 2023

(Data Scientist from Aug 2021 - Nov 2022)

machine translation, language identification, multilingual text embeddings, multilingual optical character recognition, Arabic dialect identification,

machine translation for human content review & automated moderation

Data Scientist Intern, Meta AI

May 2020 - Aug 2020

optical character recognition

Teaching

Spring 2025, CS 180 (Algorithms & Complexity)

Winter 2025: CS 32 (Data Structures)

Fall 2024: CS M148 (Intro to Data Science)

Google Sites

Report abuse