JOB TITLE : Site Reliability Engineer LOCATION : Bloomfield, CT DURATION : c2c NOTE: USC, GC, H4-EAD, L2-EAD, TN VISA ONLY Must Have:- - Programming language preferably Python
- AWS knowledge
- Devops knowledge/experience
- SRE experience of performance tuning, providing improvements, solutions etc
The role requires a wide variety of strengths and capabilities, including - Bachelor’s degree or equivalent experience
- 5+ years of experience in Developer, Operations roles
- Knowledge of application, data and infrastructure architecture disciplines
- Working proficiency in development and monitoring tools
- Ability to work in large, collaborative teams to achieve organization goals
- A mindset focused on reliability, uptime and the customer experience
- 4+ years of SRE experience
- 3+ years of public cloud experience
- Willing to work on a distributed team spanning multiple time zones.
- Analytical and problem-solving skills
- Good communication skills
- The desire to continually learn and test your own boundaries
- Exposure to standard software development practices including agile, deployment pipelines, continuous integrations and continuous delivery
- Understanding of software skills such as business analysis, development, maintenance and software improvements
Qualifications: - Experience with monitoring and observability systems like Dynatrace, CA APM, Splunk etc
- Experience in one or more programming languages Python, Java, Scala, PySpark
- Experience with build and deployment pipeline technologies like Jenkins, Cloudbees, Terraform, Urban Code Deploy, Ansible, Maven, GitHub
- Experience working with relational databases or HDFS or cloud storage
- Hands on experiences in AWS technologies – Cloud Watch, Step Functions, Lambda, Glue, EC2, Redshift, DynamoDB, SQS, ECS/EKS, Elastic Search, SnowFlake, Elastic Cache, databricks, TeraForm to name a few
- Exposure to containerization – Docker and Kubernetes
- Experiences with additional enterprise technologies: OpenShift, Kafka, Redhat, DataBricks
The primary responsibilities include - Monitoring, Alerting and Communication
- Manage and monitor production systems
- Application Availability & performance issues
- Data and Integration issues
- Stakeholder communication
- Incident Management and root cause analysis
- Facilitate the incident management
- Guide the team on root cause analysis
- Manage performance metrics
- Work with stakeholders to identify & define the service level indicators and objectives
- Measure and report the performance metrics.
- Automation and tools
- Work with the service delivery teams in developing and testing automation of repetitive and manual efforts.
- Work closely with engineering teams on monitoring and performance management tools initiatives to improve system reliability and performance.
- Team building
- Help teams transition to SRE mind set
- Help with SRE adoption across teams
Thanks and Regards Krishna Patamsetty Agile Enterprise Solutions Inc. P: 972-440-2110 Note : If you have received this mail in error or prefer not to receive such emails in the future, please reply with "REMOVE" in the subject line and the email id(s) to be removed. All removal requests will be honored ASAP. We sincerely apologies for any inconvenience caused. | | | |
To unsubscribe from future emails or to update your email preferences click here .
Post a Comment