National Aerospace Science and Technology Park (NASTP)
HPC Cluster Admin
2559 views
Posted date 10th January, 2024 Last date to apply 20th January, 2024
Category Engineering
Status Closed

 

Job Title:

HPC Cluster Admin

Job Category:

Management

Department/Group:

Project GreenAI

Proposed Deployment:

IT Division/Datacenter

Location:

ISB / RWP

Travel Required:

None

Level/Salary Range:

PPS-06

Position Type:

Contractual

HR Contact:

 

Date:

 

Job Description

  QUALIFICATIONS AND EDUCATION REQUIREMENTS

  • BS/BE/Diploma (Networks, IT Infrastructure) with a Minimum 1 year (Preferred 04 Years) of relevant experience in Datacenter & network operations & management.

ROLE AND RESPONSIBILITIES

  • HPC Infrastructure Maintenance: Manage the day-to-day system administration of Linux-based cluster computing and storage environments, associated network infrastructure and cloud services.
  • performs system back-ups, and maintains system configuration files and recovery.
  • User Support: Collaborate with colleagues and team members to understand their computing needs, provide technical assistance, and troubleshoot issues related to system performance and job execution.  Provide user consultation and training.
  • Performance monitoring: Monitor system performance, diagnose bottlenecks, and take necessary actions to improve system performance.
  • Documentation: Maintain detailed documentation of system configurations, procedures, and troubleshooting guides to facilitate knowledge sharing and team collaboration. Develop user facing documentation.
  • Planning: Meet regularly with stakeholders to understand existing challenges, anticipated needs, and opportunities for closer collaboration.
  •  Develop roadmap for system improvements and lifecycling, making recommendations to the management.
  • Maintain security standards according to internal policies. Execute the day-to-day activities of the Incident Management process Manage and support user requests in accordance with SLA.

PREVIOUS EXPERIENCE

This position requires in-depth knowledge of and extensive hands-on experience :

  • Linux cluster system administration
  • SLURM configuration and management
  • ZFS management and configuration
  • Scripting for system management and task automation
  • Networking technologies (ex: switch configuration, VLAN tagging)
  • Installing and repairing servers and associated cluster hardware
  • Technical complex problem solving and troubleshooting.
  • Experience with stateless node management and provisioning (Warewulf) is a plus.

ADDITIONAL NOTES

  • The hired employee will undergo 3-month probation.
  • The final appointment will be confirmed after completion of the probation period.
  • Pakistani Diaspora is highly encouraged to apply.
  • Incomplete applications and applications with fake/false documents will be rejected at any stage during or after recruitment process.
  • Only shortlisted candidates will be invited for interview and written test.
  • No TA/DA will be admissible for test /interview.
  • The Organization reserves the right to withdraw/cancel the vacancies at any stage without assigning any reason.

 

Search