SR HIGH PERFORMANCE COMPUTING SYSTEMS ENGINEER

H. Lee Moffitt Cancer Center

Tampa, FL

Job posting number: #7230494 (Ref:hlj_52659)

Posted: March 21, 2024

Application Deadline: Open Until Filled

Job Description

SR High Performance Computing Systems Engineer

Position Highlights:

  • The Sr HPC Systems Engineer designs, develops, evaluates, and modifies software packages for the solution of scientific or engineering problems and for the support of research and development at Moffitt Cancer Center. The Sr HPC Systems Engineer analyzes existing systems and formulates logic for new systems. The Sr HPC Systems Engineer also devises logical procedures, prepares flow charts, performs coding, tests, and debugs programs. The individual will provide input for the documentation of new or existing programs, and determines system specifications, input/output processes, and working parameters for hardware/software compatibility. The Sr HPC Systems Engineer will also contribute to decisions on policies, procedures, expansion strategies, and product evaluations for the HPC resources. This position is focused primarily on working with the High Performance Computing (HPC) Cluster system. While the Director of Scientific Computing, HPC steering committee, Cancer Informatics Core Scientific Director and Manager will provide general project direction, the individual must exercise  their own judgment for daily implementation and maintenance. The Sr HPC Systems Engineer applies technical expertise and background to work within a team of pure and applied scientists and software engineers to consult for and support scientific researchers who use HPC resources at the Moffitt Cancer Center. The Sr HPC Engineer engages with principal investigators and their labs, core facilities, and individual researchers. This position creates and optimizes computational solutions to the specific scientific computing needs of each constituency; ensuring that the appropriate technology resources are identified and utilized optimally. This position consults on applicable software packages and algorithms and assists in optimizing them for scalability and (massive) parallelization as needed. The Sr HPC Systems Engineer will also manage operational and automation tasks for the HPC Cluster system, including scheduling software, package management and version control, and assist the Director of Scientific Computing in reconfiguration of HPC management systems to best suit researchers’ needs.

 

Responsibilities:

 

  • Configures, debugs and ensures stable operation of HPC cluster tools such as Bright Cluster Manager, OpenHPC, Warewulf.
  • Configures, optimizes and ensures stable operation of software such as MATLAB and open source equivalents such as Octave and SciLab. In addition, ensures that any added module or toolbox needed for this software is working properly, including necessary licenses.
  • Configures, optimizes and ensures stable operation and availability of Message Passing Interface (MPI) such as OpenMP libraries and utilities.
  • Configures, optimizes and ensures stable operation and availability of the R statistical package and necessary extra modules needed by the HPC cluster users.
  • Configures, optimizes and ensures stable operation of GPU related software and libraries such as CUDA and others.
  • Configures, optimizes and ensures stable operation of specific software needed for mathematical oncology, bioinformatics, biostatistics and any other groups of research involved in the use of the HPC cluster.
  • Creates deployment scripts that facilitate the deployment of commercial off-the-shelf (COTS) and custom applications.
  • Estimates time and effort involved in realizing new cluster capabilities for enterprise-level resource allocation, project planning and forecasting purposes.
  • Automates solutions for routine tasks such as system deployments, database backups and open source software provisioning.
  • Collaborates with software developers to build, maintain, test and deploy user-friendly web-based interfaces that simplify scientists' views of their data and workflows.
  • Educates cluster user community on the optimal use of the cluster's computational resources via one-on-one collaboration, workshops and preparation of relevant documentation and tutorials.
  • Performs other duties as assigned.
  • Report and monitor systems for usage and system load
  • Investigate and implement new technologies within an HPC environment
  • Participates in cluster governance process
  • Provides cluster training and education

 

Credentials and Experience:

 

  • Bachelor’s Degree – field of study : Computer Science, Information Technology, Management Information Systems, Biology/Chemistry/Physics or related field
  • Minimum of seven (7 ) years' experience designing, developing and successfully administering or supporting Unix- based systems

 

  • Well-versed in cluster development methodologies in particular open-source operating systems, tools, languages and frameworks for cluster environments.

 

Proficient in one or more of the following languages:

- Python

- Shell scripting (i.e., BASH, CSH, KSH, etc.)

- C/C++

Proficient in one or more of the following tools:

- GIT

- Trac

- Modules

- Fabric/Puppet or other code deployment tools

- Docker

- Singularity/Apptainer

 

*A High School diploma plus an additional four (4) years of relevant experience designing, developing and successfully administering or supporting Unix-based systems (for a total of eleven (11) years’ relevant experience) may be considered in lieu of a Bachelor's degree.

 

Preferred Experience

Familiarity in one or more of the following Packages:

-R

- Matlab (or any open source similar tools like Octave, SciLab)

- Familiarity with Slurm HPC scheduling

- Working knowledge of computer hardware, networking concepts and tools

- Working knowledge of protocols; such as SSH, DNS, DHCP, and LDAP.



Mission To create a Moffitt culture of diversity, equity, and inclusion as we strive to contribute to the prevention and cure of cancer. Vision To advance and accelerate a culture of access, equity, and inclusion. Diversity is a priority at Moffitt and is meant "to promote a culture of diversity and inclusion as we contribute to the prevention and cure of cancer." The Enterprise Equity Department focuses its efforts on eliminating those obstacles to an individual’s ability to exist within their personal comfort zone at the cancer center. Everyone is important to meeting this priority. Addressing and responding to diversity and inclusion fosters an environment where mutual respect for diverse cultures, communication styles, languages, customs, beliefs, values, traditions, experiences and other ways in which we identify ourselves, is the expectation.


Apply Now

Please mention to the employer that you saw this ad on Sciencejobs.org