HPC Cluster Hosting

DCC is together with DTU Physics, DTU CBS and DTU Risø one of the four HPC operations centers at DTU and thus provides hosting services to DTU HPC cluster owners free of charge. We are currently hosting 7 HPC clusters including the Central DTU HPC Cluster generally available to all staff and students at DTU. The list of currently hosted cluster is provided here.

The DCC hosting service entails assistance for:

  • Choice of architecture and configuration of HPC Clusters (compute node architecture, make and model, interconnect and storage)
  • Purchase and tendering of HPC hardware
  • Server room rack space for HPC clusters (power, cooling, network and interconnect infrastructure) with fast connections to other HPC resources for resource pooling, I/O optimized storages as well as central DTU storages
  • Installation, configuration and operations of the HPC cluster
  • Integration to central DTU infrastructures for authorization and authentication (using existing DTU account)
  • End user support incl. possibility for local mailing lists for the cluster

The terms for hosting HPC clusters in DCC are as follows:

  • The hosting service is as all other DCC services free of charge to DTU researchers and students
  • The cluster owner (normally the grant holder for the money for which to purchase HPC clusters) has to assign a primary technical contact person for the cluster. The technical contact person is the main contact to DCC regarding changes to system or application software stack changes to the cluster. Furthermore, the cluster owner or the technical contact person is responsible for approving user access to the cluster.
  • Root access is not granted to the cluster.
  • The hosted hardware has to be under service and support paid by the cluster owner. If the HPC hardware is older than 5 years (counting from the invoice date), DCC can not guarantee floor space in server rooms. For equipment older than 5 years DCC reserves the right to decommission the HPC cluster.
  • The cluster owner can freely decide which architecture to purchase and manufacturer. However, DCC would like to decide the brand of hardware to optimize for daily operations and maintenance.
  • Compute nodes are required to have QDR/FDR Infiniband as interconnect.

The DCC system architecture is designed on the basis of the following parameters:

  • Cluster owners should be granted high degree of flexibility and control of their clusters – i.e. choice of architecture and type of HPC cluster
  • Cluster owners (and users appointed hereby) should always have full logical control, i.e. always have highest priority for access to their resources and control how they are assigned and used
  • Provide the possibility for resource pooling while still meeting the above requirement
  • Ease of use and ease of switching computations to other HPC installations (primarily internal DTU installations but also external)
  • Based on central DTU infrastructures for authentication and authorization – i.e. Central DTU accounts for login and authorization control via central DTU infrastructures
  • Data should available on all nodes and if possible on other DTU HPC installations as well
  • Agnostic toward choice of cluster hardware
  • Software stack governance model
  • Coherence between interactive and batch use of HPC systems