Site Reliability Engineer Senior On-site
ID: 7248
1 день назад
Активна
Presight
ОАЭ, Абу-Даби
7 000 $ - 8 000 $
Тип занятости
Полная занятость
Требуемый опыт
Более 6 лет
Формат работы
Полный день
📞Способы связи
📄 Оригинальный текст вакансии
Публикатор: ant1k
Обсуждение: @devops_jobs
#SRE #senior #fulltime #onsite #relocation #UAE #linux #kubernetes #vacancy #вакансия
Senior SRE (Site Reliability Engineer)
Location: Abu Dhabi, UAE (relocation required)
Company: https://www.presight.ai/
Employment Type: Full-time, on-site only
Salary Range: $7,000-8,000 (25-30k AED) net, based on interview results
About Presight:
Presight is an ADX-listed public company with Abu Dhabi based G42 as its majority shareholder and is the
region’s leading big data analytics company powered by GenAI. It combines big data, analytics, and AI expertise
to serve every sector, of every scale, to create business and positive societal impact. Presight excels at all-
source data interpretation to support insight-driven decision-making that shapes policy and creates safer,
healthier, happier, and more sustainable societies. Today, through its range of GenAI-driven products and
solutions, Presight is bringing App
Position Overview:
Seeking a meticulous Engineer - Site Reliability who will support the Presight delivery model that empowers
product & technology teams to develop & deliver high-quality products, improve platform infrastructure and
strengthen the reliability of products and solutions.
You play a key role in defining & establishing the delivery model deployed in the development of cutting edge,
next-gen analytics solutions & services at Presight.
Key Responsibilities:
- Manage the infrastructure required to run our solutions deployed to public or private cloud (air-gapped).
- Analyze service performance, identify bottlenecks, and provide measurable improvement plans.
- Maintain the environment’s health by continuously monitoring technical and business metrics, configuring alerts for potential issues, and proactively addressing risks to prevent disruptions
- Deploy application updates with minimal disruption to services
- Identify, evaluate, and conduct proof-of-concepts for new technologies.
- Contribute to the knowledge base.
- Review and modify CI/CD principles and service maturity iteratively, striving for continuous
- improvement
Requirements:
- 5+ years of experience in managing Kubernetes clusters.
- 5+ years of experience in configuring and using monitoring/observability platforms
- Familiarity with at least one type of database
- 5+ years in a SRE/DevOps/Sysadmin/Platform Engineer role
Mandatory skills:
- Strong background in Linux/Unix Administration
- Solid hands-on experience deploying and operating Kubernetes or Openshift clusters
- Experience configuring and maintaining monitoring and observability solutions
- Ability to troubleshoot and resolve complex production issues efficiently, including performing root cause analysis and restoring services quickly during high-pressure incidents or critical outages
- Experience in backing up and restoring various systems
- Working together with project managers and solution architects while serving as subject matter experts
- Implementing basic network security (e.g. configuring VPCs, firewalls/security groups, etc.)
- Understand the dependencies of various GPU cards, and upgrade container images as needed in order to ensure compatibility
- Deploy and operate products provided by third party providers
- Creating releases together with the development team and deploying release packages to all required environments
Nice to Have:
- Good understanding of typical system architecture and interaction between its components
- Experience automating tasks using infrastructure-as-code tools, e.g. Ansible, Terraform
- Thorough understanding of a company's systems, including auxiliary components like caching systems (e.g., Redis, Memcached) and message queues (e.g., RabbitMQ, Kafka)
- Good understanding of databases, e.g. Postgres, Elasticsearch, Clickhouse
- Basic scripting
- Working knowledge of OAuth 2.0, OpenID/OpenID-Connect, SAML 2.0, Kerberos, LDAP
Contact for questions and CV: @ant1kdream
🛠 Навыки
Ansible
ClickHouse
Elasticsearch
Kafka
Kerberos
Kubernetes
LDAP
Linux
Memcached
OAuth 2.0
OpenID Connect
Openshift
Postgres
RabbitMQ
Redis
SAML 2.0
Terraform
🎯 Домены
AI
Analytics
Big Data
🤖 ИИ навыки
Ansible
CI/CD
ClickHouse
Containerization
Docker
Elasticsearch
Firewalls
GPU
Infrastructure as Code
Kafka
Kerberos
Kubernetes
LDAP
Linux
Memcached
Monitoring
Networking
OAuth 2.0
Observability
OpenID Connect
Openshift
PostgreSQL
RabbitMQ
Redis
SAML 2.0
Scripting
Security Groups
Terraform
VPC
* Навыки определены автоматически с помощью нейросети
🤖 ИИ домены
AI
Analytics
Big Data
Cloud Infrastructure
DevOps
FinTech
GenAI
Platform Engineering
SRE
System Administration
* Домены определены автоматически с помощью нейросети
📢 Информация о публикации
🔗 Оригинальные посты (2)
Канал:devops_jobs_feed