- InterSystemsSummer 2026Software Engineer Intern
Data Platforms
- Harvard Data Analytics GroupPresentCEO
Lead and manage Harvard’s premier data consulting group, overseeing strategy, operations, and client engagements across the organization
- Harvard Medical SchoolSummer 2025Research Fellow
LLM-as-a-judge, semantic analyses
view more view less
Participated in the Dr. Susanne E. Churchill Summer Institute in Biomedical Informatics (SIBMI), a highly selective (~5% acceptance rate) 9-week program engaging in intensive coursework and mentored research at the intersection of computer science and biomedicine. Mentored by Dr. Griffin Weber, Associate Professor of Biomedical Informatics. Addressed the challenge of scalable, high-fidelity narrative generation for thousands of faculty profiles on the Harvard Catalyst Profiles platform by architecting an end-to-end LLM-powered synthesis pipeline. Engineered a novel framework leveraging state-of-the-art transformer models to autonomously generate semantically rich, domain-specific research overviews with high factual consistency. Integrated DeepEval (LLM-as-a-judge) for fine-grained model benchmarking, enabling robust evaluation across dimensions of coherence, relevance, faithfulness, etc. Delivered the solution to senior leadership at the HMS Office of Faculty Affairs, resulting in formal validation of its effectiveness and interest in integrating the tool into official workflows for faculty promotion and CV generation. Designed and executed a novel experiment demonstrating that semantic similarity between LLM-generated research narratives is a significantly stronger predictor of future scientific collaboration than traditional metrics such as prior co-authorship or shared citations.
- Harvard Medical SchoolSummer 2022 – Fall 2024Research Intern
Geoinference, NLP, recurrent neural networks. First-author publication in Nature’s Scientific Reports (10.1038/s41598-024-73318-7)
view more view less
Worked under Dr. Isaac Kohane, Chair of Biomedical Informatics, and Dr. John Brownstein, Professor of Biomedical Informatics. Developed and evaluated a high-performance Natural Language Processing multi-class text classification model that enables accurate geoinference (inferring the precise geographical location) of unstructured free-text author affiliations. Built data processing pipeline to handle 52M+ records, implementing efficient data structures and optimizing for memory usage with Parquet file format; processed and analyzed large-scale bibliometric dataset using advanced data cleaning and validation techniques; engineered text preprocessing pipeline utilizing spaCy's named entity recognition (NER) for filtering and classifying organizational and geopolitical entities; implemented and benchmarked multiple text vectorization approaches (TF-IDF, Word2Vec, BERT embeddings) for feature extraction, optimizing for high-dimensional sparse data; evaluated and compared performance of various ML classifiers including LinearSVC, Random Forest, Logistic Regression, and deep learning models (LSTM, BiLSTM, GRU); achieved superior performance metrics compared to existing solutions through careful model selection and feature engineering; implemented custom evaluation metrics and validation approaches to assess model performance across multiple test datasets. Published article "Geoinference of Author Affiliations using NLP-based Text Classification" in Nature's Scientific Reports as first author.