Here
Born
to build
what’s next. Together
Looking to join a remote team of high performers? Eager to be truly supported by AI and guided by purpose?
DevOps Lead
Location:
Remote/Hybrid (East Coast US Preferred)
Department:
Engineering
Employment Type:
Full-Time
Reports To:
CTO
About Baryons
Baryons is building the next generation of AI-driven flourishing and conversation platforms — intelligent systems that help people grow, reflect, and thrive through meaningful dialogue. Our products combine voice, memory, and human insight to create interactions that feel natural, personal, and alive.
As a science-backed, patent-pending organization, we’re defining a new category of human-AI connection — one that’s grounded in psychology, neuroscience, and the science of human flourishing.
Role Overview
As DevOps Lead, you’ll architect, implement, and maintain the cloud infrastructure and automation that powers our AI-driven applications. You’ll be hands-on with Azure and Google Cloud (GCP) environments, focusing on Kubernetes orchestration, scalability, automation, cost optimization, and security. You’ll guide DevOps best practices, lead our CI/CD initiatives, and ensure our systems are secure, reliable, and built to scale. Experience with LiveKit and real-time agent infrastructure is a strong plus.
Responsibilities
Lead the design, deployment, and management of scalable, secure infrastructure in Azure and Google Cloud (GCP).
Architect and manage Kubernetes clusters, ensuring high-availability, disaster recovery, and efficient orchestration of containerized workloads.
Build, configure, and maintain automation for infrastructure provisioning, application deployment, monitoring, and alerting (using tools like Terraform, Helm, etc.).
Implement and refine CI/CD pipelines for all engineering teams, ensuring rapid, safe, and repeatable delivery of code and AI models.
Monitor and optimize cloud infrastructure for cost management and resource utilization.
Develop and maintain comprehensive observability and logging systems (metrics, logs, tracing) to enable real-time monitoring, alerting, and performance optimization.
Implement and enforce cloud security best practices, secrets management, and compliance protocols (SOC2, HIPAA, or similar, as applicable).
Design and maintain disaster recovery, backup, and high-availability strategies for critical applications and data.
Champion DevOps best practices, automation, and a culture of ownership and operational excellence across the engineering organization.
Collaborate with software engineers, data scientists, and product leads to enable efficient development, deployment, and operation of AI- and voice-powered systems.
Ensure all infrastructure, automation, and deployment processes are well-documented and accessible.
Mentor and support junior DevOps and engineering team members.
Troubleshoot, resolve, and prevent production issues in a fast-paced environment.
Required Qualifications
5+ years of experience in DevOps, Site Reliability Engineering, or related roles.
Deep expertise with Azure and Google Cloud (GCP) environments, including network, security, and storage services.
Proven experience architecting, deploying, and managing Kubernetes clusters in production environments.
Strong automation skills: Terraform (or equivalent IaC tools), Helm, and scripting languages (Python, Bash, etc.).
Demonstrated experience building and maintaining robust CI/CD pipelines (GitHub Actions, GitLab CI, or similar).
Solid understanding of cost optimization strategies for cloud-native applications.
Experience with monitoring, alerting, and observability (Prometheus, Grafana, Datadog, etc.).
Experience implementing security best practices and compliance protocols.
Experience designing and maintaining disaster recovery and high-availability solutions.
Excellent troubleshooting, communication, and collaboration skills.
Nice-to-Have
Experience with LiveKit and real-time agent infrastructure (LiveKit Agents, voice/video, WebRTC, etc.).
Background working with AI/ML model deployment and scaling in production.
Familiarity with additional clouds (AWS, OCI) or hybrid cloud architectures.
Prior experience in a startup or high-growth SaaS environment.