[Remote] Principal Engineer, Compute Fleet Management

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Databricks is a leading data and AI company focused on enabling data teams to tackle complex problems. The Principal Engineer for Compute Fleet Management will optimize cloud compute resources and ensure the reliability and efficiency of the infrastructure that supports Databricks' products. Responsibilities • Pioneering Fleet Optimization: Provisioning and pooling of O(Billion)s of cloud resources to achieve peak workload performance, industry-leading efficiency, and robust resource isolation • Delivering Hyper-Scale Resilience: Build the architecture that guarantees horizontal scaling and resilience against zonal or even cloud account-level failures, ensuring Databricks is always on • Owning the Critical Path: Lead the development of the lowest-dependency systems required to bootstrap and manage our massive compute platform • High Availability: Achieve and maintain 99.99% availability for all batch and serving workloads • Stellar Efficiency: Drive utilization to 60% or higher—a crucial metric that requires balancing high efficiency with unwavering tolerance for cloud failures • Best-in-Class Isolation: Architect and enforce strong security and performance isolation across a diverse range of customer workloads • Leading Transformative Projects: Taking ownership of complex, cross-team, cross-layer, and multi-quarter strategic engineering initiatives from concept to execution • Distributed Systems Mastery: Deep, hands-on experience developing and operating high-scale distributed systems on at least one major public cloud • Influence Without Authority: Proven ability to drive consensus, establish technical direction, and lead large technical efforts across organizational boundaries • Execution Discipline: Exceptional strength in planning, tracking project progress, and managing complex cross-organizational dependencies Skills • Leading Transformative Projects: Taking ownership of complex, cross-team, cross-layer, and multi-quarter strategic engineering initiatives from concept to execution • Distributed Systems Mastery: Deep, hands-on experience developing and operating high-scale distributed systems on at least one major public cloud • Influence Without Authority: Proven ability to drive consensus, establish technical direction, and lead large technical efforts across organizational boundaries • Execution Discipline: Exceptional strength in planning, tracking project progress, and managing complex cross-organizational dependencies • Experience managing and scaling a massive fleet of GPUs for AI/ML workloads • Experience with developing and operating large-scale distributed systems across all major clouds (AWS, Azure, and GCP) Benefits • Annual performance bonus • Equity Company Overview • Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture. It was founded in 2013, and is headquartered in San Francisco, California, USA, with a workforce of 5001-10000 employees. Its website is Company H1B Sponsorship • Databricks has a track record of offering H1B sponsorships, with 385 in 2025, 319 in 2024, 227 in 2023, 222 in 2022, 166 in 2021, 64 in 2020. Please note that this does not guarantee sponsorship for this specific role. Apply tot his job
Apply Now →

Similar Jobs

[Remote] Senior Data Engineer (Databricks) — U.S. Citizens Only

Remote Full-time

SQL DBA - Remote TX

Remote Full-time

DB2 DBA/RPG Programmer (Remote)

Remote Full-time

Lead Data Engineer – Data Control

Remote Full-time

Greenplum Database Administrator (Remote)

Remote Full-time

Database Administrator Lead Consultant - Hybrid- Chicago, IL Hybrid Chicago River Point

Remote Full-time

Database Administrator III USA > TX > Remote

Remote Full-time

[Remote] Database Administrator (PostgreSQL, MySQL, Oracle, Microsoft SQL Server, Hadoop/Cloudera)

Remote Full-time

[Remote] Manager Database Administrator - Remote US

Remote Full-time

Remote Applications Database Administrator (DBA) - Join Our Team of Experts

Remote Full-time

Solution Architect

Remote Full-time

Experienced Customer Service Representative – Remote Call Center Agent for Dynamic Team at blithequark

Remote Full-time

Experienced Full-Time Remote Data Entry Specialist - $26/Hour - Work From Home Opportunity with blithequark

Remote Full-time

Experienced Customer Onboarding Manager for Enterprise Clients – EMEA Region, Driving Business Growth and Success through Strategic Partnerships and Exceptional Customer Experience

Remote Full-time

**Experienced Remote Data Entry Specialist - Logistics and E-commerce Operations at FedEx**

Remote Full-time

TAX TECHNICIAN I, BOARD OF EQUALIZATION

Remote Full-time

Tableau Developer - 100% Remote

Remote Full-time

**Experienced Full Stack Data Platform Manager – Web & Cloud Application Development for arenaflex**

Remote Full-time

**Experienced Part-Time Data Entry Specialist – Remote Opportunity with blithequark**

Remote Full-time

Experienced Customer Success Manager (Technical) – Enterprise and Mid-Market Merchant Relationship Development and Revenue Growth Specialist

Remote Full-time
← Back to Home