Infrastructure Engineer
Wraithwatch
Location
United States
Employment Type
Full time
Location Type
Remote
Department
Engineering
About Wraithwatch
Wraithwatch was founded by alumni from SpaceX, Palantir, and Anduril to build the next generation of AI-powered cyber defense systems for the United States and its allies. We are deployed today to customers spanning the federal government, aerospace, defense, manufacturing, and emerging technology. Our core product is a continuously adaptive cyber defense platform utilizing generative artificial intelligence agents to autonomously model and construct a digital twin of an organization's entire IT and cybersecurity environment and analyze it for weaknesses, misconfigurations, and chains of possible attack.
We currently operate numerous live federal deployments with more coming online this year. Each one is a distinct environment with its own infrastructure, compliance requirements, and operational constraints. We need someone to make all of them run reliably at the same time.
The Role
You are responsible for the infrastructure that keeps Wraithwatch running in production across our federal and commercial deployments.
You will build and maintain the CI/CD pipelines, deployment automation, monitoring, and operational tooling that let a small team ship software reliably across all of these environments without things falling apart. When something breaks at 2am in a production federal environment, you're the person who gets it back up. When we need to roll a release across 20 deployments in a week, you're the person who makes that possible without manual SSH sessions into each one.
This is not a role where you're managing infrastructure for a single product in a single cloud account. You're building the deployment and operations backbone for a company that is scaling fast across highly constrained federal and Fortune 500 environments, and every shortcut you skip today becomes a fire you fight next month.
Responsibilities
Build and maintain deployment automation and CI/CD pipelines that reliably ship software across 20+ federal environments in parallel.
Own the operational health of all production deployments — monitoring, alerting, incident response, and post-mortems.
Design infrastructure that accounts for the reality of federal environments: air-gapped networks, strict compliance requirements, variable network topologies, and limited access windows.
Build tooling that gives the engineering team visibility into the state of every deployment without requiring manual inspection.
Automate everything that can be automated. We cannot scale to hundreds of environments with manual processes.
Work with the product engineering team to ensure new features are deployable across all environments without environment-specific hacks.
Manage and improve cloud infrastructure (AWS, GCP) and on-prem deployments depending on customer requirements.
Participate in on-call rotations for production environments.
Qualifications
Basic:
4+ years of experience in infrastructure engineering, SRE, DevOps, or platform engineering
Strong experience with infrastructure-as-code (Terraform, Pulumi, or similar) and configuration management (Ansible, Salt, or similar)
Deep Linux systems knowledge — you should be comfortable debugging at the OS level, not just the application level
Experience building and maintaining CI/CD pipelines for complex multi-environment deployments
Proficiency with containerization and orchestration (Docker, Kubernetes)
Experience with at least one major cloud provider (AWS preferred)
Comfortable working in highly constrained or regulated environments where you can't just spin up whatever you want
-
Willingness to work extended hours when production issues demand it — we are a seed-stage startup with federal customers who do not tolerate downtime
Preferred:
Experience deploying software into federal, DoD, or IC environments
Familiarity with FedRAMP, STIG compliance, or ATO processes
Experience with air-gapped or disconnected network deployments
Ability to obtain and maintain a security clearance
Experience operating infrastructure at scale across many isolated environments simultaneously
Familiarity with monitoring and observability stacks (Prometheus, Grafana, Datadog, ELK, or similar)