Site Reliability Engineer (SRE) / Production Engineer (PE) - Kubernetes & Cloud Infrastructure Job at Fireworks AI, Bay County, FL

UXdXNWx4QVpIY1ZpbS9qRlFqbzR5b3dBWnc9PQ==
  • Fireworks AI
  • Bay County, FL

Job Description

About Us:

Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.

The Role:

We’re seeking a highly skilled SRE/PE with deep expertise in Kubernetes (k8s), cloud networking, and infrastructure automation. This role will focus on reducing incident response time, implementing auto-remediation, optimizing auto-scaling, and improving cluster efficiency and service health. You’ll design systems that balance performance, cost, and reliability while working onsite with our Redwood City team.

Key Responsibilities:

  1. Incident Response & Reliability Engineering:

  2. Kubernetes & GPU Cluster Optimization:

  3. Cloud Networking & Service Health:

  4. Monitoring & Observability:

  5. Automation & Infrastructure-as-Code (IaC):

Minimum Qualifications:

  • 3+ years in SRE/PE/DevOps roles with production-grade Kubernetes experience.

  • Proficiency in cloud networking (AWS/GCP/Azure VPCs, firewalls, DNS) and service monitoring (Prometheus, Alertmanager, Grafana).

  • Hands-on experience with incident management and improving system reliability/SLOs.

  • Strong scripting/coding skills (Python/Go/Bash) for automation and tooling.

  • Familiarity with object storage (S3, GCS) and data pipeline integration.

Preferred Qualifications

  • Experience with GPU clusters (NVIDIA GPUs, MIG, CUDA) and AI/ML workloads.

  • Knowledge of auto-scaling technologies (K8s HPA/VPA) and auto-remediation frameworks.

  • Expertise in service meshes (Istio)

Why Fireworks AI?

  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.

  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.

  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.

  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Job Tags

Similar Jobs

BioSpace

Bioinformatics Co-Op Job at BioSpace

 ...facioscapulohumeral muscular dystrophy (FSHD) and Pompe disease. Role Summary The Data Sciences team at Dyne Therapeutics is seeking a bioinformatics co-op student to participate in research on innovative oligonucleotide therapeutics for rare neuromuscular disease. You will... 

On Point Protection Agency LLC

Armed / Unarmed Security Officer Job at On Point Protection Agency LLC

 ...Our growing security and patrol organization is currently accepting applications for the role of Armed and Unarmed Security Officers...  ..., Part-time Salary: From $16.00 per hour Benefits: Paid time off Experience level: ~1 year ~ No experience needed... 

M5W Transport

Truck Driver Class A Local Home Daily 1700 per week Intermodal Position Job at M5W Transport

Details Pay $1700 to $1850 per week Home Daily s- ( Tues~Sat or Sun~Thurs - Morning & Evening Shift Available - Operating Area: 250 miles - No Touch Freight)Available Trucks 2022 to 2024 M5W Transport ~ Michael (***) ***-**** Benefits Paid Vacation... 

MedQuest Associates LLC

Medical Front Desk Receptionist Job at MedQuest Associates LLC

Overview: Are you seeking a rewarding Front Desk, PRN position that fits your busy schedule? Our medical center operates from 8am...  ...2+ years of experience working in a customer support role or hospitality field Special Qualifications: Strong customer service... 

Link Solutions, Inc.

Cable Splicer Job at Link Solutions, Inc.

 ...Development. Job Description Link Solutions is seeking a Cable Splicer to join our team at Fort Stewart, GA. Must be a US...  ...telephone poles, manholes, handholds, and outside terminals. Install, de-install, maintain, repair, and test multiple-conductor...