experience
Research, software engineering, and data engineering roles.
Garner Health
Software Engineer Intern current
- Building a backend observability pipeline with OpenTelemetry, Prometheus, and Grafana LGTM so engineers can trace failures across employer insurance workflows.
- Instrumented Litestar services and agent-support tooling for 1K+ daily sessions; added dashboards for latency, errors, failed agent/tool calls, and funnel drop-offs.
Columbia Data, Agents, and Processes Lab
Data Agents Research Assistant current
- Optimized core evaluation infrastructure for agents answering questions over a 9.5 TB/~40M-document data lake, including process-based benchmark workers, isolated task sandboxes, tool-call and reasoning telemetry, and BM25 and hybrid retrieval engines.
- Improved GPT-5-mini semantic match from 2.22% to 56.3% by adding context compaction, loop-detection plugins, structured search-result context, and stronger data-analysis tools.
- Built a diagnostic ablation framework that swaps in idealized search, planning, and data-analysis tools to isolate whether agents failed from retrieval, decomposition, SQL/Python execution, or final answer policy.
Columbia Zuckerman Institute
Machine Learning Research Assistant current
- Developed targeted embeddings for Qwen2-Audio by bottlenecking 4096-D hidden states into feature-specific representations, each preserving one of 12 linguistic or paralinguistic features while suppressing off-target signal.
- Curated training-ready datasets from 12 speech and language benchmarks, including parallel Kokoro-TTS generation to convert linguistic-feature examples into audio.
- Built and parallelized a GPU-optimized PyTorch/Hugging Face pipeline for batched Qwen2-Audio inference, hidden-state extraction, bottleneck-width sweeps, and classification-probe training under leakage-aware cross-validation across 12 speech attributes and 120K+ audio examples.
Ruangguru
Data Engineering Intern
- Consolidated millions of payment events into user sessions in BigQuery, producing a product funnel used to identify major checkout drop-off points.
- Automated Google Review scraping and sentiment analysis with Puppeteer and Hugging Face; processed tutor-scheduling data into live Looker Studio heatmaps for operations teams.