Database Systems SRE

Apple Inc

Cupertino, CA

Job posting number: #7223950 (Ref:apl-200540673)

Posted: March 1, 2024

Job Description

Summary
Apple’s Services Engineering organization (ASE) is seeking experienced database systems engineers to join our Cassandra SRE team.

Engineers in ASE Cassandra SRE develop and contribute to software built to manage Apache Cassandra, an open source distributed database powering some of Apple's most critical internet services.

You will be joining a team of experts, working at the cutting edge of modern database deployment architectures, distributed systems. The team's work is deployed at massive scale, serving millions of queries per second over hundreds of petabytes of data across our data-centers worldwide. It also has big impact, forming the platform upon which iCloud and many other internet services at Apple are built. In ASE, your work will benefit hundreds of millions of users and is critical to the success of some of the most visible current and future Apple features.
Key Qualifications
  • Demonstrated expertise developing database systems, storage engines, distributed systems, or performance engineering.
  • Experience developing critical internet services and/or platform infrastructure.
  • Proficient in modern Java and optionally Python / Go.
  • Optional experience with managing services run on Kubernetes
  • Optional experience with EC2, EBS, and Terraform
Description
The ASE Cassandra SRE team develops applications and tooling that are safe, reliable, scalable, and fast. This work requires an innovative spirit and an extraordinary degree of care and rigor in engineering. Team members contribute to all major components of Cassandra deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling/dashboards, deployment architecture, as well as contributing back to the upstream patches to the database focused on stability, performance, and scaling.

Success in this role requires expertise in several of the following:

- Understanding of core SRE concepts - Monitoring, Alerting, Incident management

- Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics)

- Performance engineering (design concepts, profile-guided optimization)

- Service management across a bare metal, virtualized (EC2), and containerized (K8s) style platforms

- Fundamentals of system-level hardware and networking components (storage devices and controllers, network interfaces, CPU and memory layout in server-class systems)

- Operating systems concepts (process scheduling, disk and network I/O, performance)

- Datacenter architecture (networking topologies, host placement strategies, and failure modes); design of multi-datacenter systems; failure domains; and wide-area networking


This role also requires excellent communication, ability to partner with our Core Storage and Analytics teams, and a high degree of customer focus when engaging with internal platform customers. As a distributed team, ability to work effectively with colleagues based in other locations is also essential; experience in this area is a plus. Prior experience with development or maintenance of distributed databases / storage systems is recommended.
Education & Experience
BS or MS in Computer Science / related fields or equivalent work experience
Pay & Benefits




Apply Now

Please mention to the employer that you saw this ad on Sciencejobs.org