Senior Production Support (SRE)

Atlanta, GA / Frisco, Texas (Hybrid)
Contracted
Experienced
Client looking for 10+ Years of experience.

Title: Senior Production Support (SRE)
Duration: Contract
Location: Atlanta, GA / Frisco, Texas (Hybrid)

Relocation Works
Except OPT and H1T, All Work Authorization is Workable
While sharing the resume, Please do mention the candidate location and Work Authorization.


Job Description :
Must Have Skills
Skill 1 – Yrs of Exp – 5+ support, production support, or system administration in a complex environment, with at least 2 years in a leadership or supervisory role.
Skill 2 – Yrs of Exp – 6 + ITIL processes, particularly incident management, change management, and problem management.
Skill 3 – Yrs of Exp – 6+ monitoring tools (e.g., Splunk, AppDynamics, New Relic, or similar) and experience in log analysis for troubleshooting.
Skill 4 – 6 Yrs of Exp - 6+ Scripting skills (e.g., Python, Shell scripting) for automating routine tasks and improving operational efficiency
Skill 5 – 6 Yrs of Exp- database systems (SQL, Oracle, etc.) and experience with database troubleshooting in a production environment.
Skill 6– 6 Yrs of Exp - Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes) is a plus.
Job Summary:
We are seeking a Senior Production Support Lead to oversee the daily operations of our production environments and ensure smooth and efficient functionality of critical applications and systems. The ideal candidate will have strong technical troubleshooting skills, a solid understanding of production environments, and a proven ability to lead a team in resolving complex production issues quickly and effectively. As a Senior Production Support Lead, you will be responsible for managing escalations, overseeing incident management, driving performance improvements, and working closely with other teams such as development, infrastructure, and business operations.

Key Responsibilities:
•    Lead Production Support Operations: Manage and lead the production support team to ensure the stability and availability of critical production systems. Oversee monitoring, incident management, and resolution of production issues.
•    Incident and Problem Management: Coordinate and lead efforts to resolve production incidents promptly, ensuring minimal business impact. Manage the root cause analysis (RCA) process for recurring issues and work towards preventive solutions.
•    System Monitoring and Optimization: Continuously monitor system performance, identify potential bottlenecks or issues, and take proactive measures to improve system performance and reliability.
•    Escalation Handling: Serve as the point of escalation for complex production issues, providing guidance and expertise in troubleshooting and resolution.
•    Collaboration with Development and Infrastructure Teams: Work closely with the development, QA, and infrastructure teams to ensure smooth production deployments, patch management, and post-deployment monitoring. SLA Adherence: Ensure that SLAs are met for all production issues, including response and resolution times. Track and report on SLA performance metrics regularly.
•    Team Leadership: Provide mentorship and guidance to junior team members, conduct regular team meetings, and facilitate knowledge-sharing sessions to build a high-performing support team. Documentation & Knowledge Management: Maintain up-to-date knowledge base articles, troubleshooting guides, and standard operating procedures (SOPs). Ensure proper documentation of all incidents, changes, and resolutions. Change Management: Assist in managing changes in production environments by ensuring thorough testing and validation of changes, and providing post-implementation support. Continuous Improvement: Drive continuous improvement initiatives within the production support process. Identify opportunities to automate repetitive tasks, enhance system reliability, and optimize operational workflows.

Qualifications & Skills:
•    Bachelors Degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent experience).5+ years of experience in IT support, production support, or system administration in a complex environment, with at least 2 years in a leadership or supervisory role.
•    Strong technical troubleshooting skills in areas such as application monitoring, databases, network, and server infrastructure.
•    Experience with ITIL processes, particularly incident management, change management, and problem management.
•    Proficiency with monitoring tools (e.g., Splunk, AppDynamics, New Relic, or similar) and experience in log analysis for troubleshooting.
•    Scripting skills (e.g., Python, Shell scripting) for automating routine tasks and improving operational efficiency. Strong understanding of database systems (SQL, Oracle, etc.) and experience with database troubleshooting in a production environment. Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes) is a plus. Strong communication skills, with the ability to explain technical issues to non-technical stakeholders and produce clear incident reports. Leadership and Team Management: Proven ability to manage, mentor, and lead teams effectively, providing guidance
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*