LakeAgent
Deep research over data lakes with verifiable, provenance-backed answers.
LakeAgent extends deep-research agents to operate over both structured and unstructured data at data-lake scale.
Highlights
- Built infrastructure for agents to answer analytic questions requiring enumeration, aggregation, and causal reasoning over heterogeneous sources.
- Designed an end-to-end pipeline that discovers relevant datasets, integrates them automatically, and generates verifiable outputs with explicit provenance.
- Applied the system to realistic forecasting questions that combine structured signals (e.g., historical performance) with unstructured evidence (e.g., interviews and public statements).
Context
Research project with Columbia Data, Agents, and Processes Lab (DAPLab).