An emerging tech company based in South Africa that utilizes machine learning to provide business solutions is searching for a Site Reliability Engineer (SRE).
- Observe and gather data on latency, availability, emergency response, monitoring, and performance.
- Assist the development teams and the DevOps to ensure that the company is within Service Level Objectives and Service Level Indicators by further developing alerting, tracing and monitoring capabilities.
- Follow objectives and report their progress.
- Give advice to customers on how their outcomes can benefit from the product.
- Application Support.
- Develop and find improvements for log analytic metrics continuously over time.
- 1 Year experience as a Site Reliability Engineer, however experience as a DevOps Engineer as well as a Support Engineer is also acceptable.
- Relevant Bachelor of Science Degree.
- Knowledge of Python, Go or Bash is essential.
- Knowledge of cloud technologies such as AWS and GCP.
- Understanding of support Kubernetes platforms and SQL.
- Knowledge of Stackdriver, Prometheus, Cloudwatch Grafana, OpenTracing.