AI Reliability Engineer (AI SRE)

DeWinter BH ·www.dewintergroup.com

Location Campbell, CA, - None Specified -
Work type Remote
Salary USD 175 / hour
Type Full time
Level Mid
Source Shazamme
Information Technology Accepting Candidates
Apply direct

Title: AI Reliability Engineer (AI SRE)
Job Type: Contract
Contract Length: 12 Months
Pay Range: $50/hr – $175/hr
Start Date: ASAP
Location: Remote

About the Opportunity:

Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Reliability Engineer (AI SRE) to join their team for a 12-month engagement. This project involves ensuring the reliability, availability, and performance of mission-critical AI systems by defining SLOs, implementing automated resilience measures, and leading incident response. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly.

Key Responsibilities & Deliverables:

This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include:

  • Defining and maintaining Service Level Objectives (SLOs) for AI inference latency and availability.
  • Building automated "circuit breakers" and fallback logic (e.g., switching to a smaller model if the primary fails).
  • Leading incident response and root-cause analysis (RCA) for complex AI system failures.
  • Developing stress-testing and chaos engineering scenarios specifically for AI agent swarms.
  • Optimizing the "cold start" and scaling time for serverless AI functions.
Required Skills & Experience:

We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have:
  • 4+ years of experience in Site Reliability Engineering (SRE).
  • Deep expertise in system monitoring, incident management, and cloud resilience. This isn't a learning role—you need to be a subject matter expert.
  • Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
  • Experience with Python/Go, Kubernetes, and observability stacks (Datadog, New Relic).
  • Strong communication skills to provide clear and concise status updates to the project team.
#LI-KP1

Frequently asked questions

Who is hiring for the AI Reliability Engineer (AI SRE) role?
DeWinter BH is hiring for the AI Reliability Engineer (AI SRE) position, a Shazamme client. Apply directly on the employer's career site.
Where is the AI Reliability Engineer (AI SRE) job located?
The AI Reliability Engineer (AI SRE) role with DeWinter BH is based in Campbell, US. The role is remote-friendly.
Is the AI Reliability Engineer (AI SRE) role remote?
Yes — the AI Reliability Engineer (AI SRE) position at DeWinter BH is remote. Candidates based in US are preferred.
What does the AI Reliability Engineer (AI SRE) role pay?
DeWinter BH lists the AI Reliability Engineer (AI SRE) role at up to USD 175 per hour.
Is the AI Reliability Engineer (AI SRE) role full-time or contract?
This is a full time position at DeWinter BH.
What experience level is the AI Reliability Engineer (AI SRE) role?
The AI Reliability Engineer (AI SRE) position is aimed at mid-level candidates.
How do I apply for the AI Reliability Engineer (AI SRE) role at DeWinter BH?
Apply directly on DeWinter BH's career page via the Apply button on this listing. ZammeJobs links straight through to the employer's ATS — no third-party form, no resume database.
Apply direct