Facebook SDK

 

BigData ETL Tester (Datahub)

Location - San Ramon, CA/ Tempe, AZ (Remote Till Covid)

Duration – 12-18 months Long Term Contract

 

 

On high-level we have below in the required experience

 

·         Experience in writing complex SQL, python/shell Scripts to test data ingestion framework based on the data mapping & requirements provided and perform extensive data analysis to identify the defects.

•     Strong Data Analytics, ETL, Data warehouse, Data Virtualization, BI Dashboard concepts.

·         Experience in working with large scale Big data/Enterprise Data Warehouse, Data Integration, Data Migration and upgrade projects.

·         Experience in testing complex data systems, data ingestion pipeline through batch, real time/streaming framework.

·         Experience in building/updating automating frameworks using programming languages such as Python/Java/Shell or previous proven programming experience in any relevant scripting languages.

·         Experience in test data setup in various file formats and databases.

 

In detail we want below skill sets for datahub

 

·         Hands-on experience in testing data ingestion pipeline through batch, real time/streaming framework implemented using Spark or NIFI, Airflow, Kfaka.

·         Testing different types dimension and FACT tables with in-depth data warehousing knowledge.

·         UNIX environment by writing HDFS and Shell commands for job execution, file validation, etc.,

·         Programming language (Python or Shell scripts or Scala) to understand data ingestion functionalities implemented using Spark and python scripts, analyze the log for failures.

·         Hive – understand the mapping/requirement document and write medium to complex level HiveQL for data validation between different tables, DDL & DML operations

·         Different file formats – validating data in different file formats (Json, xml, parquet, delimited, fixed width) with another file or Hive/HBase table using SparkSQL or python/shell scripts.

·         Test data setup in different file formats for positive and negative scenarios.

·         Integration testing of E2E data ingestion pipeline integrating different tools.

·         YARN – to monitor the spark jobs running in cluster mode and check the logs for any issues.

 

(Good to have)

 

·         Developing automation script to validate data between table and files.

·         Airflow or any other scheduling tools to execute the E2E jobs and monitor data ingestion process, check the logs for any issues.

·         HBase – shell commands for data validations

 

Pankaj Kumar

ST Global LLC 

Fax: 206-319-4579

pankaj@stglobaltech.com

www.stglobaltech.com

 

Note: We respect your Online Privacy. This is not an unsolicited mail. Under Bills.1618 Title III passed by the 105th U.S. Congress this mail cannot be considered Spam as long as we include Contact information and a method to be removed from our mailing list. If you are not interested in receiving our e-mails then please reply with a "REMOVE" in the subject line at mention all the e-mail addresses to be removed with any e-mail addresses, which might be diverting the e-mails to you. We are sorry for the inconvenience.

 

Post a Comment

Previous Post Next Post