Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.
We’re seeking a highly skilled SRE/PE with deep expertise in Kubernetes (k8s), cloud networking, and infrastructure automation. This role will focus on reducing incident response time, implementing auto-remediation, optimizing auto-scaling, and improving cluster efficiency and service health. You’ll design systems that balance performance, cost, and reliability while working onsite with our Redwood City team.
Incident Response & Reliability Engineering:
Kubernetes & GPU Cluster Optimization:
Cloud Networking & Service Health:
Monitoring & Observability:
Automation & Infrastructure-as-Code (IaC):
3+ years in SRE/PE/DevOps roles with production-grade Kubernetes experience.
Proficiency in cloud networking (AWS/GCP/Azure VPCs, firewalls, DNS) and service monitoring (Prometheus, Alertmanager, Grafana).
Hands-on experience with incident management and improving system reliability/SLOs.
Strong scripting/coding skills (Python/Go/Bash) for automation and tooling.
Familiarity with object storage (S3, GCS) and data pipeline integration.
Experience with GPU clusters (NVIDIA GPUs, MIG, CUDA) and AI/ML workloads.
Knowledge of auto-scaling technologies (K8s HPA/VPA) and auto-remediation frameworks.
Expertise in service meshes (Istio)
Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.
Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.
Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.
Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.
...Sysco has immediate job openings for dependable local CDL A Delivery Truck Driver to safely and efficiently operate a tractor-trailer... ...unload/deliver various products (meats, produce, frozen foods, groceries, dry goods, supplies, etc.) to customer locations on an...
[JOB ALERT] Senior Tax Accountant Duluth, GA - $100K$120K Who: Strong Stable Company What: Senior Tax Accountant Focus on federal & state tax compliance and audit support When: Immediate Need Where: Duluth, GA Why: Growth-driven hire due to candidate...
...If you are experienced in non-profit bookkeeping/accounting, and if you are looking for the perfect blend of stable work and flexible hours... ...part-time and full time opportunities. These are fully remote positions,and work days/times are flexible. The ideal candidate...
...Type: High School Teacher - Full-Time Faculty Category: High School Benefits: Denver Academy offers competitive salaries, excellent health and retirement savings benefits, significant professional development opportunities, and numerous other...
A highly effective and dynamic classroom/workshop teacher, with a genuine commitment to put students first.To develop curriculum subject in STEAM by PBL teaching methodology. It includes but not limited to robotics, Scratch/Python coding, creative maker...etc. courses...