Description:
Design and implement distributed data processing pipelines using Spark, Hive, SQL, and other tools and languages prevalent in the Hadoop ecosystem.
Work to build aggregates and curate data sets needed to solve BI reporting needs.
Design and implement end-to-end solutions.
Build utilities, user-defined functions, and frameworks to better enable data flow patterns.
Research, evaluate, and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.
Work with teams to resolve operational and performance issues. • Work with architecture/engineering leads and other teams to ensure quality solutions are implemented and engineering best practices are defined and adhered to.
Work in Agile team structure
Coordinate with offshore teams for project related work and act as technical lead for onsite teams.
Responsibilities: Click here to enter text.
Strong development skills around Hadoop, Spark, Airflow and Hive
Strong SQL, Python, shell scripting, and Python
Strong understanding of Hadoop internals
Experience (at least familiarity) with data warehousing, dimensional modeling, and ETL development
Experience with AWS components and services, particularly, EMR, S3
Awareness of DevOps tools
Working experience in Agile framework
Bachelor Degree and above