Systems Engineer

VTG Defense
United States, Virginia, Chantilly
14291 Park Meadow Drive (Show on map)
Dec 16, 2025
Overview We are seeking an experienced Senior Systems Engineer with US Government Top Secret/SCI security clearance with Polygraph to support a small standalone system dedicated to high-performance computing (HPC) and artificial intelligence (AI) workloads. This role demands a blend of operational expertise and strategic technical vision, focusing on the management and optimization of our standalone HPC/AI system. The ideal candidate will manage the technical operation of our infrastructure, develop standardized procedures for hardware, network, and software management across the system, and expertly oversee cluster management (including provisioning, optimization, and monitoring of clustered resources for HPC/AI workloads, such as NVIDIA BCM). What will you do? This position requires broad expertise in HPC/AI system administration, with a focus on: Refining infrastructure management frameworks Traditional infrastructure management (hardware, networking, directory services) Modern HPC/AI support (Linux/Ubuntu, Proxmox, NVIDIA BCM, WEKA storage) Designing scalable, secure, and highly available system architectures Do you have what it takes? Active TS/SCI with Polygraph required. Bachelor's degree in Engineering, Computer Science, Software Engineering, or related field. 7+ years' experience in systems engineering or related field Operating Systems & Infrastructure: Expert-level Linux systems engineering Windows client operating systems deployment/maintenance Linux (Ubuntu) server operating systems deployment/maintenance Hardware & Networking: Server hardware Network hardware, wiring, and switching configurations Virtualization & Containerization: Virtualization (ideally Proxmox) Containerization (ideally Docker/Podman with Ray or Kubernetes) Management & Orchestration: Directory services and PKI infrastructure deployment/maintenance Configuration management (ideally Ansible, Puppet, Chef, or DSC) Cluster orchestration (ideally NVIDIA Base Cluster Management (BCM)) Development Support & Software Management: Development support services (Gitlab, Jenkins, Nexus) Operating system software repository synchronization (Apt, Snap, Yum)