Skip to main content
CS Colloquium | February 17, 2021

Juneau: Managing and Guiding Data Science

Zachary Ives
Adani President's Distinguished Professor and Department Chair, Computer and Information Science Department University of Pennsylvania

Stevenson Hall 1300
11:00 AM - 11:50 AM

How do we promote large-scale data science and data sharing, e.g., in the sciences or across organizations? Many modern data science applications have been leveraging data lakes: schema-agnostic repositories of data files and data products, which offer limited organization and management capabilities. There is a need to build a new generation of data science environments, which leverage data lakes so scientists and analysts can find tables, schemas, workflows, and datasets useful to their task at hand. Juneau incorporates search and management solutions into the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables. Our core methods also generalize to other settings where computational tasks involve execution of programs or scripts.