About Oracle SaaS Cloud SRE(Site Reliability Engineering)
Oracle SaaS Cloud SRE plays a critical role in delivering and supporting best-of-breed cloud solutions to Oracle customers.
Oracle Cloud is the industry's broadest and most integrated public cloud. It offers best-in-class services across software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS), and even lets you put Oracle Cloud in your own data center. Oracle Cloud helps organizations drive innovation and business transformation by increasing business agility, lowering costs, and reducing IT complexity.
The Oracle Cloud has shown strong adoption, supporting 70+ million users and more than 30+ billion transactions each day. It runs in 19 data centers around the world.
Our team delivers cross-team visibility and execution on the most challenging reliability issues impacting Oracle's SaaS customers. We engage deeply with service owners and stakeholders to deeply understand and improve critical issues that impair service experience.
What You Need to Have
A BS or MS in Computer Science, or equivalent
Knowledge of:
Resiliency design and operation Optimizing loads of large volumes of data into a database Experience managing large fleets Performance tuning and optimization and expertise with AWR and ASH Autonomous monitoring – Cluster Health Advisor, CHM, etc. Exadata architecture, design, best practices RAC, GoldenGate, DataGuard Automation methodologies Data Modeling and relational Database design Cloud computing patterns Methodical approach to troubleshooting complex problems Most importantly, the aptitude to be a good team player and the willingness to learn and implement new Cloud technologies as needed
What the Perfect Candidate Will Have
Understanding of:
Oracle SOA and BPEL IT Security and compliance Linux internals Scripting languages, such as Python, Ruby, Bash, etc. Oracle Fusion Middleware FMW Administration, to include WebLogic and SOA Networking and TCP/IP Standard Internet services, such as DNS, HTTP, etc. Oracle Enterprise Manager Defining and documenting technical architecture of complex and highly scalable products
Career Level - IC5
An unique opportunity to join a rapidly growing world-class team to improve the cutting-edge Oracle Cloud technologies and infrastructure that make up the Oracle Cloud solutions. As part of the SRE team, you will be continually challenged and have an opportunity to contribute to the Oracle Cloud success every day, working closely with our development partners.
As a Site Reliability Engineer, you will solve exciting technical challenges by analyzing, troubleshooting, and designing vital Oracle Cloud services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance.
What You'll Do
Ownership Scope – As an SRE, you will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you collaborate with. In partnership with your Development colleagues, you will have the responsibility to ensure that services are designed and delivered to be mission critical with a focus on security, resiliency, scale, and performance.
Operations Engineering – You will understand and be able to communicate the scale, capacity, security, performance attributes, and requirements of the services you own. We are subject matter experts, able to understand and communicate every characteristic of our service stack, such as: degradation and behavior under load of the services and their dependencies end-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate Instrumentation and metrics that clearly describe the service behaviors scaling requirements and patterns resiliency and recoverability, ensuring that backup/restore and disaster recovery capabilities are implemented, tested and maintained
Automation – You will have a clear understanding of automation and orchestration principles, and will be eager to help automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt. Automation must be part of your DNA.
Technical Experts - You will have a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. You will bring this expertise to bear in driving reliability improvements in the services you engage with.
Broad Interests - SREs are a rare mix of sysadmins and Development Engineers, and as such, have the ability to understand and explain the effect of product architecture decisions on the ability to run as distributed systems. They are driven by professional curiosity, and a desire to develop deep understanding of their services and their dependencies.
Cross-team collaboration – You will engage with and present to a wide variety of audiences, ranging from individual contributors and teams to executive leadership