Hi, I'm
 
Y
i
d
a
 
C
h
e
n
 
😊



< CS Ph.D. Student at Harvard >

About Me

Hello! I am a Computer Science PhD candidate at Harvard Insight + Interaction Lab led by Prof. Fernanda Viegas and Prof. Martin Wattenberg. 📖 I am working on uncovering the inner workings of generative AIs and controlling their behaviors through mechanistic interpretability.

Before entering Harvard, I was a Computer Science and Engineering student at Bucknell University 🦬 advised by Prof. Joshua Stough (Bucknell University) and Prof. Christopher Haggerty (Columbia University & New York-Presbyterian). My past projects focused on using sparse labels to train deep learning models that can annotate the medical video data 🧡. My works were funded by the Ciffolillo Healthcare Technology Inventors Program (HTIP) 🏥. Our papers ( [1] [2] ) are published at the SPIE Medical Imaging 2021 & 2022 Conferences with oral presentations 📝.

I also developed a color analysis toolkit for films (KALMUS) 🎬. You can find the project's GitHub Repo here, KALMUS-Color-Toolkit. KALMUS' development was supported by the Mellon Academic Year Research Fellowship awarded 🥇 by Bucknell Humanities Center, and now used as a instructional software at Bucknell.

Reviewer for EMNLP, NAACL, NeurIPS 2024 Creative AI Track, and NeurIPS Interpretable AI Workshop.

First-year/Pre-concentration Advisor for Harvard College.

Judge for National Collegiate Research Conference 2024 (NCRC) at Harvard.

Portrait of Yida Chen

Meta

Research Scientist Intern

May 2025 – Nov 2025

Research 📋

Visualization of reasoning clusters across LRMs

Model Evaluation, Interpretability

Your thoughts tell who you are: Characterize the reasoning patterns of LRMs

arXiv Preprint (2025)

Current comparisons of large reasoning models often stop at aggregate metrics such as accuracy or chain length. We profile 12 open-source LRMs across science, math, and coding tasks to uncover their distinctive reasoning signatures.

Our automated method clusters reasoning styles, revealing families of models that hedge, self-correct, or boldly extrapolate. The taxonomy lets practitioners pick LRMs whose thinking style aligns with their safety or creativity needs.

Toy experiment showing bad data leading to good models in post-training inference

AI Alignment, Mechanistic Interpretability

When bad data leads to good models

ICML 2025 (Main)

Could models better at reducing its own undersiable behaviors if they have better understanding of what the bad behaviors are? We explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model’s output toxicity.

TalkTuner dashboard overview

Mechanistic Interpretabilit, HCI

Designing a Dashboard for Transparency and Control of Conversational AI

ICML 2025 AIW Workshop

We found empirical evidence that assistant LLMs maintain implicit user models that shape downstream answers. TalkTuner, our prototype dashboard, surfaces this hidden profile in real time and lets people recalibrate it.

The design study shows that exposing internal models improves trust calibration while revealing latent biases. Participants preferred the transparent workflow over today’s black-box chat interfaces.

Guardrail sensitivity heatmap

Responsible AI

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context

EMNLP 2024 (Main)

Guardrails rarely behave uniformly across user demographics. We designed an evaluation framework that perturbs the conversational context with different gender, racial, and political cues, and uncovered refusal disparities in ChatGPT.

Younger, female, and Asian-American personas were disproportionately blocked—even when making the same request. The analysis urges safety teams to audit contextual sensitivity before deploying refusal policies at scale.

Change in GPT logits after spatial edits

Mechanistic Interpretability

Do Language Models Learn Causal Representations of Space?

arXiv Preprint

Building on correlational evidence from prior work, we intervene on spatial traces inside transformer activations. Editing latent directions that encode north/south cues causally improves performance on spatial reasoning probes.

The results suggest that causal representation analysis can bridge the gap between probing and control, offering a recipe for steering how LLMs internalize the physical world.

Controlled 3D-aware diffusion samples

Mechanistic Interpretabilit

Probe & Control the 3D Representations in Diffusion Model

NeurIPS 2023

We asked whether 2D diffusion models secretly learn 3D priors. By identifying and steering geometry-sensitive directions, we gain precise control over rotations and camera elevations without retraining.

The intervention produces view-consistent edits, enabling controllable generation pipelines for film and XR teams.

Animated attention flow visualization

Visualization Systems

Visualize Attention Flow inside Large Transformer Models

IEEE VIS 2023

Collaborating with Catherine Yeh, I designed visual analytics that expose learned attention flows within ViTs. The system lets researchers scrub through heads, layers, and patches to see how patterns align with human perception.

Insightful attention narratives improve model debugging and storytelling for broader audiences.

KALMUS color analysis dashboard

Digital Humanities

KALMUS: tools for color analysis of films

JOSS 2021

KALMUS is an open-source Python package for quantitative film color analysis. It computes palette statistics, contrast timelines, and interactive swatches so filmmakers can study authorship styles or plan grading.

The toolkit is now used in classroom critiques and was supported by the Mellon Fellowship.

Echocardiography segmentation across multiple heartbeats

Medical Imaging

Fully Automated Full-Video Multi-heartbeat Echocardiography Segmentation

SPIE Medical Imaging 2022

We introduced a sliding-window augmentation strategy that learns motion-aware segmentation from sparsely annotated echocardiography videos. The approach generalizes multi-heartbeat sequences beyond traditional ED/ES frames.

Clinicians receive both segmentation and volumetric trend estimates, streamlining downstream cardiac assessments.

Four-chamber echo segmentation demo

Medical Imaging

Joint Motion Tracking and Video Segmentation of Echocardiography

SPIE Medical Imaging 2021

Training a 3D U-Net jointly for segmentation and motion estimation yielded superior generalization from CAMUS to EchoNet. The model handles sparsely annotated videos and improves tracking of cardiac structures in new hospitals.

The study offers a pathway for video-native clinical AI without dense frame labels.

Project Repos 💻

characterize-the-reasoning-patterns-of-large-reasoning-models

Open Source Nov 2025

Open-source project focused on research tooling and visualization.

TalkTuner-chatbot-llm-dashboard

Open Source ★ 29 ⑂ 10 Oct 2025

Designing a Dashboard for Transparency and Control of Conversational AI, https://arxiv.org/abs/2406.07882

reasoning-progress-viz

Open Source May 2025

Blog page for the "Reasoning or Performing" ARBOR project. See our project description here: https://github.com/ARBORproject/arborproject.github.io/discussions/11

reasoning-or-performing

Open Source Mar 2025

Open-source project focused on research tooling and visualization.

scene-representation-diffusion-model

Open Source ★ 36 ⑂ 6 Jul 2024

Linear probe found representations of scene attributes in a text-to-image diffusion model

fully-automated-multi-heartbeat-echocardiography-video-segmentation-and-motion-tracking

Open Source ★ 6 ⑂ 3 Mar 2022

The implementation of CLAS-FV described in "Fully automated multi-heartbeat echocardiography video segmentation and motion tracking".

Skills 🤺

Python
Java
Haskell
Ruby
C++
LaTeX
MATLAB
Bash
SQLite
Android
PyTorch
Keras
Matplotlib
seaborn
Scikit-Learn
Scikit-Image
NumPy
Pandas
OpenCV
SciPy
PIL
JavaFX
pytest
Git
Scrum
GitHub CI/CD
Codecov
PyCharm
Intellij

Contact 📪

Thank you so much for visiting my website!