Data Science Training: Distributed Computing with Spark

3 Day Training Course

The Data Incubator - Training Data Scientists For Over 250+ Companies, Including:

Course Overview

Scala, Spark, and Scalding are technologies at the forefront of distributed computing that offer more abstract but more powerful APIs. The course focuses on the basics of Scala like map, flatmap, for comprehension, data structures, and core concepts of Spark like resilient distributed datastores, memory caching, actions, transformations, and distributed machine learning.

Students come away with a solid understanding of the basics of Scala and Spark as well as critical tooling around Spark (sbt, jvm) to make them more productive.

You'll apply that knowledge to directly developing, building, and deploying Spark jobs to run on large, real-world data in the cloud (AWS EMR).

Mini Project: Trainees familiarize themselves with Spark’s computational workflow, analytic capabilities, and machine learning toolkit by analyzing a large multi-gigabyte dataset.

Return To Top

Leave your email address and we'll send across a course brochure:

Meet Your Instructors

Michael Li has worked as a data scientist (Foursquare, A16Z), quant (D.E. Shaw, J.P. Morgan), and a rocket scientist (NASA). He did his PhD at Princeton as a Hertz fellow and read Part III Maths at Cambridge as a Marshall scholar.


Francesco Mosconi is a Data Scientist at Catalit. Previously, he was the Chief Data Officer at Spire, a next-generation wearables company, a co-founder at Axelera, and an application engineer at Roche. He received a joint PhD from Université Pierre et Marie Curie (Paris VI) and Università degli Studi di Padova (Padua, Italy).

Ariel M'ndange-Pfupfu studied physics at Stanford and got an engineering PhD from Northwestern. Since joining The Data Incubator as a Data Scientist in Residence, he’s worked on a variety of data science and software engineering projects, as well as curriculum development and instruction.

Robert Schroll is a Data Scientist in residence at The Data Incubator and has been a key contributor to a variety of open source software development and data science projects. He received his PhD from the University of Chicago in computational physics and his undergraduate degree from Maryland.

Return To Top

Date & Location

OnLocation: This will be taking place in New York on December 7, 8 & 9.

Venue Information: New York - New York is the most populous city in the United States.The city is referred to as New York City or the City of New York to distinguish it from the State of New York, of which it is a part.

Return To Top


$2995 Silver Package: The full 3 day course, including breakfast, lunch & networking drinks

$3295 Gold Package: The full 3 day course, including breakfast, lunch & networking drinks | Access to presentations from the latest Big Data Innovation Summit from the Innovation Enterprise

$3595 Full Access: The full 3 day course, including breakfast, lunch & networking drinks | Annual Subscription to Big Data & Analytics channels via ieOnDemand | 4000+ hours of on-demand presentations & case studies | 40hrs of new content added monthly - ieOnDemand.com

Group Access: We offer generously discounted rates for team bookings, please email jordan@theiegroup.com

Return To Top

The Data Incubators fellowship Data Science was named as on of the "15 Things That Are Harder To Get Into Than Harvard" by Business Insider

The Data Incubator was ranked second by Data Economy for Data Science Incubators

About The Data Incubator

The Data Incubator is a Cornell-funded data science training organization. They run an 8-week fellowship that was selected by Business Insider as one of 15 competitive programs in the US with more competitive admissions than Harvard. The Data Incubator was founded in 2014 in New York City by Michael Li, a former Data Scientist at local-mobile-social startup Foursquare and Andreessen Horowitz & Rocket Scientist at NASA. A variety of innovative companies partner with The Data Incubator for their hiring and training needs, including LinkedIn, Genentech, Capital One, Pfizer, and many others.

Return To Top