| Job Description : Technical Skillset (Mandatory) Big Data Tools: Hadoop, Cloudera (CDP), PySpark, Kafka, etc. Stream-Processing Systems: Storm, Spark-Streaming, etc. Databases: Relational SQL and NoSQL databases, Snowflake, Postgres. AWS Cloud Services: Glue Services, EC2, S3, EMR, RDS, Lambda, DMS etc. Object-Oriented Languages: Python Technical Skillset (Optional) Visualization Tool: SAP Business Objects, Tableau Data Pipeline & Integration: Airflow, Kafka, Informatica Cloud (IICS), PowerCenter Role and Responsibilities 1. To understand data quality requirements and design framework for parsing Rule. 2. Write algorithm using PySpark to do ranking 3. Implementing ETL processes using AWS Glue 4. Ensure optimization of software through design reviews and code reviews 5. Monitoring performance and advising any necessary infrastructure changes & defining data retention policies 6. Building stream-processing systems, using solutions such as Storm or Spark-Streaming Project Specific requirement 1. Data-oriented personality, good communication skills, and an excellent eye for details. Relevant Experience required 1. Proficient understanding of distributed computing principles. 2. Proficiency with Hadoop, MapReduce, HDFS. 3. Strong understanding of Data Structures and Algorithms. 4. Experience with integration of data from multiple data sources 5. Experience with various messaging systems, such as Kafka or RabbitMQ 6. Experience in scripting languages - Python. 7. Experience with Cloudera/MapR/Hortonworks Apache HDFS 8. Experience with scripting tools and methods to optimize SW development and testing activities. |
Post a Comment