CV

Education, research experience, projects, skills, and honors.

Contact Information

Name Austin Senna Wijaya
Professional Title Computer Science Student and Research Assistant
Email asw2215@columbia.edu

Professional Summary

Undergraduate researcher focused on agentic systems, machine learning, and large-scale data infrastructure.

Experience

  • 2025 -

    New York, NY

    Data Agents Research Assistant
    Columbia Data, Agents, and Processes Lab
    • Contributing to DataSeek, a system providing infrastructure for deep-research agents to operate over TBs of multimodal data; submitted a benchmark paper to ICML 2026.
    • Re-architected a legacy discovery tool into a scalable Agentic GraphRAG engine; reduced retrieval complexity to O(N log K) via LSH and Min-Heap Top-K pruning, increasing accuracy by 10% across a 10TB S3 data lake.
    • Engineered a parallelized benchmark evaluation framework using AWS Bedrock to test autonomous data tasks and multi-step reasoning, and added agentic tool capabilities to DeepSeek R1.
    • Tech stack: Python, SQL, AWS (EC2, S3, Bedrock), Docker.
  • 2025 -

    New York, NY

    Machine Learning Research Assistant
    Columbia Zuckerman Institute
    • Trained linear probes via Ridge Regression on external audio benchmarks to decompose Qwen-Audio-2-7B’s latent space into orthogonal feature vectors and map embeddings to ECoG neural signals.
    • Engineered an end-to-end GPU-optimized pipeline on HPC clusters (NCSA Delta) to parallelize TTS generation and extract penultimate layer embeddings across hundreds of thousands of inferences.
    • Tech stack: Python, PyTorch, HuggingFace, Scikit-learn, NumPy.
  • 2025 - 2025

    Jakarta, Indonesia

    Data Engineering Intern
    Ruangguru
    • Consolidated millions of payment events into user sessions in BigQuery, producing a product-level funnel used to diagnose major drop-off points.
    • Automated large-scale Google Review scraping and sentiment analysis with Puppeteer and HuggingFace for continuous customer satisfaction tracking.
    • Processed hundreds of thousands of tutor scheduling events with Apps Script and BigQuery to generate a real-time interactive Looker Studio tutor availability heatmap.
    • Tech stack: Google Cloud (BigQuery, Looker Studio), Apps Script, Puppeteer, HuggingFace.

Education

  • 2024 - 2028

    New York, NY

    Bachelor of Science
    Columbia University
    Computer Science
    • GPA: 4.25/4.0
    • Full-tuition scholarship recipient.
    • Dean’s List.
    • Relevant coursework: Natural Language Processing, Data Structures and Algorithms, Systems Programming, Linear Algebra and Probability, Competitive Programming.

Projects

  • ResearcherX: AI-Powered IDE and GraphRAG Engine
    • Developed an AI-powered IDE for academic writing with real-time logical contradiction linting, using a dual-routing FastAPI and litellm pipeline to optimize latency and cost.
    • Engineered an asynchronous hybrid-search engine with Neo4j and LanceDB featuring node-level provenance and Cypher-based garbage collection for graph integrity.
    • Finalist (Top 6) at the Millard Chan Technology ‘99 Startup Competition 2026 and 2nd place at the Columbia Lion Cage Startup Competition 2026.
    • Tech stack: Python (FastAPI), TypeScript (Next.js), Neo4j (Cypher), LanceDB, litellm, ProseMirror.
  • LakeAgent
    • Built infrastructure for deep-research agents to operate over both structured and unstructured data-lake sources.
    • Designed an end-to-end pipeline for dataset discovery, automatic integration, and verifiable answer generation with explicit provenance.
    • Applied the system to forecasting and analytic questions that combine structured signals with unstructured evidence.
  • Targeted Neural Audio Embeddings for Cortical Prediction
    • Built a brain-encoding pipeline that maps Qwen-Audio speech representations to cortical activations by extracting targeted task subspaces from large audio-language embeddings.
    • Processed 150k+ audio examples across 15 benchmark datasets to isolate auditory signals such as emotion and reasoning.
    • Performed dimensionality sweeps across model architectures to maximize task-relevant signal while suppressing noise.
  • Unstructured Cloud ELT Pipeline
    • Built an automated ELT pipeline to ingest chat screenshots into BigQuery using Vision AI for text extraction.
    • Implemented sentiment and intent classification with BigQuery ML to support analytics and automated response workflows.
    • Tech stack: BigQuery, Gemini API, Looker Studio, Vision AI, Python.
  • Untukmu Karyamu (Tencent Hackathon)
    • Placed 2nd at Tencent Kepler Plan S3 Competition 2025 with Best Code, Best Product Idea, and Popularity Award.
    • Architected a generative web-builder with Next.js and Gemini API to automate deployment to Tencent EdgeOne for MSMEs.
    • Tech stack: Next.js, Supabase, Puppeteer, EdgeOne Pages.

Skills

Languages: Python, SQL, JavaScript, Java, C, Rust
Libraries and Frameworks: Pandas, NumPy, PyTorch, Scikit-learn, Puppeteer, Selenium, BeautifulSoup, React, Next.js, Node.js
Cloud and Databases: Google Cloud (BigQuery, Looker Studio), AWS (EC2, S3, Bedrock), Supabase, PostgreSQL, MongoDB
Tools: GitHub, Docker, Claude Code, n8n, Copilot, Arduino, Notion

Honors and Scholarships

  • Hack@Brown 2026: Finalist and awarded ‘Strongest Product Thinking,’ working with Google Ventures and Partiful.
  • Millard Chan Technology ‘99 Startup Competition (2026): Finalist (Top 6) with ResearcherX.
  • Columbia Lion Cage Startup Competition (2026): 2nd Place with ResearcherX.
  • Tencent Kepler Plan S3 Competition 2025: 2nd Place (Best Code, Best Product Idea, Popularity Award).
  • Clash of Champions Season 2: Top 9 in an academic survival show for top Indonesian students.
  • Indonesia Maju Scholarship: Full-ride scholarship recipient (Ministry of Education of Indonesia), 1 of 350 recipients.
  • 55th International Chemistry Olympiad 2023: Final stage (10th rank) for national team selection.