SaltLakeCityRecruiter Since 2001
the smart solution for Salt Lake City jobs

Reliability Engineer

Company: DISH Network
Location: American Fork
Posted on: June 12, 2021

Job Description:

In 2015, SLING redefined America's relationship with live TV. Five years later, we remain one of the country's most influential inventions and driving forces behind the cord-cutting phenomenon. We've achieved our success by filling our offices with people invested in the medium they're building. Our teams are challenging the status quo and reimagining streaming capabilities. From product development to software design to big data and beyond, our people play vital roles in connecting consumers with the products and platforms of tomorrow. Job Duties and Responsibilities The Dish Technologies team is looking for a highly motivated, talented, and experienced SRE Specialist to be part of the Datacentre Engineering Operations team. The Site Reliability Engineer (SRE) will be responsible for both uplifting and maintaining our evolving technology platforms, infrastructure and technology controls. As an SRE, the role will include both oversight for production operations of our systems, as well as developmentengineering of solutions to maximize system reliability automation. The role will address three dimensions Tools Coverage - Assess the tools coverage and ensure sufficient monitoring is in place to enable mature observability and data driven decision making Defining and educating Engineering teams - Process, Procedures, Guide Rails and best practices Culture - Inculcate the culture of high performing teams and adopt the ways of working with the influence of SRE The role will need to work with a global team responsible for a mission critical business function, and will partner with Infrastructure, DevOps and Core practices (like Security, Identity, ProdOps, Cloud platform and Tools) teams to identify and implement automation opportunities to drive down toil, reduce technical debt and improve system reliability. Key Responsibilities Own the Infrastructure, APM and work with DevOps teams to Build, Release, Monitor and run the services to improve service reliably Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go and Python Work with Ansible, Puppet, Chef, Terraform or another config management orchestration suite, know where it's broken, work towards fixing them and explore new alternatives Define and accelerate implementation of support processes, tools and best practices Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure Performance and maturity baselining of DevOps process, tools maturity coverage, metrics, technology and engineering practices Define, Measure and improve Reliability Metrics (SLOSLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline - automate release management Strong believer of automation to bring in sustained continuous improvement by automating Toil, Runbooks, Improving ability of the applications to auto heal leading to improved reliability Experience to Include Knowledge in the one or more of the following key areas Ops maturity (performance testing, monitoring, operations - SIP), APM, Performance Benchmarking, Software Design and lifecycle (planning - discovery to provision), Infosec (including compliance, security) Good understanding implementation experience using 12-factor App principle Exp in building monitoringmetrics alerting tool (APM tool), custom dashboard for each Application stack against supported environment Expertise with Python-related Technologies and Frameworks Exp with UnixLinux-OS Internals and administration or Networking and SME on at least one of the Cloud computing Infrastructure - Google Cloud Platform Azure AWS Familiarity with handling Containerization - Kubernetes, Docker, Rancher, etc Kafka, Yarn, Elastic Search etc. Source code management and Implementation of Security best practices. Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Map Reduce Data science (AI ML) and analytics to be able to predict failures operational issues Be a subject matter expert, able to upskill cross skill engineering teams on SRE principles, tools and execution Troubleshoot, debug, and diagnose operational issues and drive them to closure. Monitor the health of Dish-Sling services, and define as well as track reliability metrics Skills - Requirements The successful candidate will have the following attributesqualifications Bachelor's Master's Degree and 10+ years of Development and Operations related experience andor training or equivalent combination of education and experience Relevant experience as SRE would be an added advantage Good understanding of uplifting the maturity (App Engineering practices Ops) Understanding of software delivery lifecycles, particularly AgileLean DevOps Proven experience in handling large scale and growing infrastructure across Data Centres and heterogeneous Cloud platforms Experience as a service owner in managing large - geographically diverse stake holders Ability to work with creative - fast growing engineering team and motivate them to deliver their best work History of driving innovation DICECA LI-CA1 Compensation 64,150.00Yr. - 100,550.00Yr. From versatile health perks to new career opportunities, check out our benefits on our careers website. Employment is contingent on Successful completion of a pre-employment screen, which may include a drug test.

Keywords: DISH Network, Salt Lake City , Reliability Engineer, Other , American Fork, Utah

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Utah jobs by following @recnetUT on Twitter!

Salt Lake City RSS job feeds