CV
Education, research experience, projects, skills, and honors.
Contact Information
| Name | Austin Senna Wijaya |
| Professional Title | Computer Science Student and Research Assistant |
| asw2215@columbia.edu |
Professional Summary
Undergraduate researcher focused on agentic systems, machine learning, and large-scale data infrastructure.
Experience
-
2025 - New York, NY
Data Agents Research Assistant
Columbia Data, Agents, and Processes Lab
- Contributing to DataSeek, a system providing infrastructure for deep-research agents to operate over TBs of multimodal data; submitted a benchmark paper to ICML 2026.
- Re-architected a legacy discovery tool into a scalable Agentic GraphRAG engine; reduced retrieval complexity to O(N log K) via LSH and Min-Heap Top-K pruning, increasing accuracy by 10% across a 10TB S3 data lake.
- Engineered a parallelized benchmark evaluation framework using AWS Bedrock to test autonomous data tasks and multi-step reasoning, and added agentic tool capabilities to DeepSeek R1.
- Tech stack: Python, SQL, AWS (EC2, S3, Bedrock), Docker.
-
2025 - New York, NY
Machine Learning Research Assistant
Columbia Zuckerman Institute
- Trained linear probes via Ridge Regression on external audio benchmarks to decompose Qwen-Audio-2-7B’s latent space into orthogonal feature vectors and map embeddings to ECoG neural signals.
- Engineered an end-to-end GPU-optimized pipeline on HPC clusters (NCSA Delta) to parallelize TTS generation and extract penultimate layer embeddings across hundreds of thousands of inferences.
- Tech stack: Python, PyTorch, HuggingFace, Scikit-learn, NumPy.
-
2025 - 2025 Jakarta, Indonesia
Data Engineering Intern
Ruangguru
- Consolidated millions of payment events into user sessions in BigQuery, producing a product-level funnel used to diagnose major drop-off points.
- Automated large-scale Google Review scraping and sentiment analysis with Puppeteer and HuggingFace for continuous customer satisfaction tracking.
- Processed hundreds of thousands of tutor scheduling events with Apps Script and BigQuery to generate a real-time interactive Looker Studio tutor availability heatmap.
- Tech stack: Google Cloud (BigQuery, Looker Studio), Apps Script, Puppeteer, HuggingFace.
Education
-
2024 - 2028 New York, NY
Bachelor of Science
Columbia University
Computer Science
- GPA: 4.25/4.0
- Full-tuition scholarship recipient.
- Dean’s List.
- Relevant coursework: Natural Language Processing, Data Structures and Algorithms, Systems Programming, Linear Algebra and Probability, Competitive Programming.
Projects
-
ResearcherX: AI-Powered IDE and GraphRAG Engine
- Developed an AI-powered IDE for academic writing with real-time logical contradiction linting, using a dual-routing FastAPI and litellm pipeline to optimize latency and cost.
- Engineered an asynchronous hybrid-search engine with Neo4j and LanceDB featuring node-level provenance and Cypher-based garbage collection for graph integrity.
- Finalist (Top 6) at the Millard Chan Technology ‘99 Startup Competition 2026 and 2nd place at the Columbia Lion Cage Startup Competition 2026.
- Tech stack: Python (FastAPI), TypeScript (Next.js), Neo4j (Cypher), LanceDB, litellm, ProseMirror.
-
LakeAgent
- Built infrastructure for deep-research agents to operate over both structured and unstructured data-lake sources.
- Designed an end-to-end pipeline for dataset discovery, automatic integration, and verifiable answer generation with explicit provenance.
- Applied the system to forecasting and analytic questions that combine structured signals with unstructured evidence.
-
Targeted Neural Audio Embeddings for Cortical Prediction
- Built a brain-encoding pipeline that maps Qwen-Audio speech representations to cortical activations by extracting targeted task subspaces from large audio-language embeddings.
- Processed 150k+ audio examples across 15 benchmark datasets to isolate auditory signals such as emotion and reasoning.
- Performed dimensionality sweeps across model architectures to maximize task-relevant signal while suppressing noise.
-
Unstructured Cloud ELT Pipeline
- Built an automated ELT pipeline to ingest chat screenshots into BigQuery using Vision AI for text extraction.
- Implemented sentiment and intent classification with BigQuery ML to support analytics and automated response workflows.
- Tech stack: BigQuery, Gemini API, Looker Studio, Vision AI, Python.
-
Untukmu Karyamu (Tencent Hackathon)
- Placed 2nd at Tencent Kepler Plan S3 Competition 2025 with Best Code, Best Product Idea, and Popularity Award.
- Architected a generative web-builder with Next.js and Gemini API to automate deployment to Tencent EdgeOne for MSMEs.
- Tech stack: Next.js, Supabase, Puppeteer, EdgeOne Pages.
Skills
Languages: Python, SQL, JavaScript, Java, C, Rust
Libraries and Frameworks: Pandas, NumPy, PyTorch, Scikit-learn, Puppeteer, Selenium, BeautifulSoup, React, Next.js, Node.js
Cloud and Databases: Google Cloud (BigQuery, Looker Studio), AWS (EC2, S3, Bedrock), Supabase, PostgreSQL, MongoDB
Tools: GitHub, Docker, Claude Code, n8n, Copilot, Arduino, Notion
Honors and Scholarships
- Hack@Brown 2026: Finalist and awarded ‘Strongest Product Thinking,’ working with Google Ventures and Partiful.
- Millard Chan Technology ‘99 Startup Competition (2026): Finalist (Top 6) with ResearcherX.
- Columbia Lion Cage Startup Competition (2026): 2nd Place with ResearcherX.
- Tencent Kepler Plan S3 Competition 2025: 2nd Place (Best Code, Best Product Idea, Popularity Award).
- Clash of Champions Season 2: Top 9 in an academic survival show for top Indonesian students.
- Indonesia Maju Scholarship: Full-ride scholarship recipient (Ministry of Education of Indonesia), 1 of 350 recipients.
- 55th International Chemistry Olympiad 2023: Final stage (10th rank) for national team selection.