Big Data Engineering for Analytics
About This Course
The course objective is to explore the engineering aspects of big data storage, querying and processing techniques. The course aims to teach the students to apply the newly acquired proficiencies by developing data intensive applications using distributed compute platform (e.g. using the Hadoop platform, Spark Framework and relevant tools).
This 5-day course helps data engineers focus on essential design and architecture while building a data lake and relevant processing platform.
Participants will learn various aspects of data engineering while building resilient distributed datasets. Participants will learn to apply key practices, identify multiple data sources appraised against their business value, design the right storage, and implement proper access model(s).
Finally, participants will build a scalable data pipeline solution composed of pluggable component architecture, based on the combination of requirements in a vendor/technology agnostic manner. Participants will familiarize themselves on working with Spark platform along with additional focus on query and streaming libraries.
What You'll Learn
Agenda
Module 1: Introduction to Data Science, Data Engineering and Big Data
Module 2: Understand Big Data from an Analytics Perspective
Module 3: Architectural Viewpoints in Big Data
Module 4: The Hadoop Ecosystem for Big Data
Module 5: Distributed File Storage
Module 6: NoSQL Databases for Big Data
Module 7: Spark and Functional Programming for Big Data
Module 8: Spark and Resilient Distributed Data Sets
Module 9: Spark QL for Big Data
Module 10: Spark and Real Time Stream Processing
Module 11: Management of Big Data initiatives
Discussion and Project Requirement Elaboration
Project and Assessment
Project Demonstration, Report Submission and Presentations. Each team will work on a practical case study and submit/present their work done regarding the assigned Big Data project.
Closing Remarks
Entry Requirements
Please see course weblink for more information