Job Description
The Data Engineer will be implementing data ingestion and transformation pipelines for large scale organisations. We are seeking someone with deep technical skills in a variety of technologies to play an important role in developing and delivering early proofs of concept and production implementation.
You will be building solutions using a variety of open-source tools & Microsoft Azure services, and a proven track record in delivering high-quality work to tight deadlines.
Your main responsibilities will be:
Designing and implementing highly performant data ingestion & transformation pipelines from multiple sources using a variety of technologies Delivering and presenting proofs of concept of key technology components to prospective customers and project stakeholders. Developing scalable and re-usable frameworks for ingestion and transformation of large data sets Master data management system and process design and implementation. Data quality system and process design and implementation. Integrating the end to end data pipeline to take data from source systems to target data repositories ensures the quality and consistency of data is maintained at all times Working with event-based / streaming technologies to ingest and process data Working with other members of the project team to support the delivery of additional project components (Reporting tools, API interfaces, Search) Evaluating the performance and applicability of multiple tools against customer requirements Working within an Agile delivery / DevOps methodology to deliver proof of concept and production implementation in iterative sprints.
Qualifications
Hands-on experience designing and delivering solutions using the Azure Data Analytics platform (Cortana Intelligence Platform) including Azure Storage, Azure SQL Database, Azure SQL Data Warehouse, Azure Data Lake, Azure Cosmos DB, Azure Stream Analytics Direct experience in building data pipelines using Azure Data Factory and Apache Spark (preferably Databricks). Experience building data warehouse solutions using ETL / ELT tools such as SQL Server Integration Services (SSIS), Oracle Data Integrator (ODI), Talend, and Wherescape Red. Experience with Azure Event Hub, IOT Hub, Apache Kafka, Nifi for use with streaming data / event-based data Experience with other Open Source big data products eg Hadoop (incl. Hive, Pig, Impala) Experience with Open Source non-relational / NoSQL data repositories (incl. MongoDB, Cassandra, Neo4J) Experience working with structured and unstructured data including imaging & geospatial data. Comprehensive understanding of data management best practices including demonstrated experience with data profiling, sourcing, and cleansing routines utilizing typical data quality functions involving standardization, transformation, rationalization, linking and matching. Experience working in a Dev/Ops environment with tools such as Microsoft Visual Studio Team Services, Chef, Puppet or Terraform