Taurus HPC Cluster
This guide walks you through the steps of connecting to the Taurus High Performance Computing (HPC) Cluster, installing Visual Studio Code on your computer and compiling and running programs in C/C++, OpenMP, CUDA, MPI, MATLAB and Python. Before connecting to the cluster, you will need a username and password. Contact the system administrator, Dr. Vincent Roberge (vincent.roberge(at)rmc.ca), to get an account created for you. If you intend to use MATLAB on the cluster, you will also need an OpenVPN config file which will be supplied by the system administrator.
The Taurus HPC Cluster’s configuration is based on the Linux Containers (LXC) virtualization technology. The host Linux operating systems offer minimal services beside the LXC virtualization and all compute nodes are implemented in LXC containers. Containers run in unprivileged mode to allow direct access to the GPUs. A preconfigured set of containers have been deployed by the system administrator to allow users to program in C/C++, OpenMP, CUDA, MPI, MATLAB and Python. However, if your research project requires that you have your own container(s) in order for you to install your own software and libraries, this is possible. Discuss your requirements with the system administrator.
2. Cluster Specifications
The Taurus HPC Cluster is composed of four compute nodes (taurus1.local.net to taurus4.local.net) connected with a 10 Gbps low-latency converged Ethernet switch. Each node is a Dell Precision server equipped with dual Intel Xeon Silver 4214R CPUs with 12 hyper-threaded cores for a total of 24 cores or 48 virtual cores per node. Each node has 128 GB of RAM and a NVIDIA RTX 3080 graphics processing unit (GPU) with 8,960 cores and 10 GB DDR5 RAM supporting CUDA compute capability 8.6. The nodes are configured with Linux Ubuntu 22.04 and MATLAB Parallel Server R2022b. The Taurus HPC Cluster provides a combined processing power of 1.23 TFLOPS on the CPUs and 119.2 TFLOPS on the GPUs.
3. Connecting to the Taurus HPC Cluster
To connect to the computer cluster, you need to access an SSH gateway:
- Host: tauruscluster.duckdns.org
- Port: 80
Once on the gateway, you can access the first node of the cluster also using SSH:
- Host: taurus1.local.net
- Post: 22
Note that you cannot open an SSH terminal directly on the SSH gateway server. The gateway is configured as an SSH jump host only. You do not need to worry about the details, simply follow the instructions below. These instructions are written for a Windows computer, but can easily be adjusted for Linux or Mac. We leave this as an exercise for the reader. Also, with the exception of Section 9 which discusses running MATLAB code on the cluster, these instructions can be used on a computer on the RMC network. To use MATLAB on the Taurus HPC Cluster, you must use your personal computer at home or on the RMC wireless network.
Important note: In this manual, commands that are to be run on Windows starts with
“>”and commands that are to be run on the compute cluster starts with
“$”. Do not include the
“$”when typing your commands.
On your Windows computer, start a command prompt (not PowerShell but cmd.exe) and type:
> mkdir %USERPROFILE%\.ssh > type nul > %USERPROFILE%\.ssh\config > notepad %USERPROFILE%\.ssh\config
Enter the following text and save the file. Make sure that you replace
"username" with your actual username as assigned to you by the system administrator.
Host tauruscluster HostName tauruscluster.duckdns.org Port 80 User username Host taurus1 HostName taurus1.local.net ProxyJump tauruscluster User username
From the command prompt, you can now connect to the taurus1 server using the following command. It should prompt you for your password twice and then ask you to change your password.
> ssh taurus1
Once you have logged successfully onto taurus1, it is recommended to change your password using the following command. You must use a complex password (more than 8 characters, with upper cases, lower cases and digit).
You can now log off.
From your Windows computer, generate an SSH public key that you will upload onto the taurus1 server. This will allow you to log onto the cluster without a password. This will save you quite a bit of time later on when programming in Visual Studio Code.
In the Windows command prompt, generate a RSA public key and copy the key to the gateway server:
> ssh-keygen > type %USERPROFILE%\.ssh\id_rsa.pub | ssh taurus1 "cat > id_rsa.pub"
Now, login to the server again and add the public key to your domain account:
> ssh taurus1 $ dos2unix id_rsa.pub $ kinit $ ipa user-mod --sshpubkey="$(cat id_rsa.pub)"
You can now log off and try logging into taurus1, it should not ask you for a password. Note that it may take a minute or two for your key to propagate to all the domain hosts. If the system prompts you for a password. Do not worry, simply try again later.
> ssh taurus1
If you move to a different computer (i.e. a computer at RMC and a computer at home) and want to add multiple SSH keys to your account, upload all your keys to your home directory on taurus1 and then use the
usermod command with multiple instances of the
--sshpubkey option to add your multiple RSA keys all at once. All keys must be added at once.
Create a key on the first computer and upload it to the server. Note that the first key is named
id_rsa1.pub when uploaded onto the taurus1 server:
> type %USERPROFILE%\.ssh\id_rsa.pub | ssh taurus1 "cat > id_rsa1.pub"
Create a key on the second computer and upload it to the server. Now the key is named
> type %USERPROFILE%.ssh\id_rsa.pub | ssh taurus1 "cat > id_rsa2.pub"
Login to the server and add your keys to your domain account:
> ssh taurus1 $ dos2unix id_rsa1.pub id_rsa2.pub $ kinit $ ipa user-mod--sshpubkey="$(cat id_rsa1.pub)" --sshpubkey="$(cat id_rsa2.pub)"
If needed, you can delete your SSH public keys from your domain account using:
$ kinit $ ipa user-mod--sshpubkey=
4. Connecting Locally using WiFi
In most cases, users should connect to the Taurus HPC Cluster over the Internet on port 80 as explained in the previous section. This works well for remote programming in C, C++, OpenMP, MPI and Python. However, to program in MATLAB, users need to establish a VPN connection to the cluster’s subnet. The process is explained in section 11.2. This VPN connection can only be established from a personal computer connected to the Internet or the RMC WiFi network. It cannot be done from an RMC lab computer as the port used for the VPN connection is not open on the RMC enterprise network.
Another option to connect to the Taurus HPC Cluster is to connect locally using the Taurus WiFi local network. The network is called TaurusWifi and TaurusWifi5G. Once connected to the network, you have direct access to the compute nodes. Contact the system administrator to get the network password. You will need to provide your MAC address so that an IP address can be assigned to your computer. A MAC (Media Access Control) address, sometimes referred to as a hardware or physical address, is a unique, 12-character alphanumeric attribute that is used to identify individual electronic devices on a network. An example of a mac address is:
Physical Address. . . . . . . . . : B0-4C-A3-00-18-3C
You can get the MAC address of your Wireless Network Adapter as follows. (Note: make sure that you record the MAC of your Wifi adapter and not your wired adapter):
- Windows 11
- Open the Command Prompt by pressing the Windows Key on your keyboard and typing
cmdinto the Search bar
- The wireless MAC address will be listed under Wireless LAN adapter Wi-Fi next to Physical Address
- Open the Command Prompt by pressing the Windows Key on your keyboard and typing
- Open the Apple menu.
- Click System Preferences > Network
- Select your network connection and click Advanced
- You will find the MAC address on the hardware tab.
- Open a terminal and type
- Open a terminal and type
5. Shared Folder, Daily Backups and USB Connection
The Taurus HPC Cluster is configured with a shared directory accessible from every taurus node. When you log onto a taurus node, the shared directory is accessible from your home directory at
/home/username/shared. It is recommended to use this folder for all your work.
This shared folder backed up every night for 7 days, every week for 4 weeks and every month for 3 months. If you accidentally delete files, browse the
/mnt/backup directory to recover your files. Note that files stored directly in your home directory are local to the node and note backed.
Although the Taurus HPC Cluster is only accessible for remote programming, there is one workstation that is physically co-located with the cluster to allow users to connect a USB external drive and transfer data to and from the cluster. This is useful to upload and download machine learning datasets as an example. The workstation is located in the ECE Tech Shop in s4100. You can log on using your Taurus HPC Cluster user account. The workstation is running Ubuntu Desktop and automatically mounts any FAT32, NTFS or Ext3/4 USB drives connected to it. You can then drag-n-drop your files to your shared directory on the cluster. The USB interface is 3.0 and allows for a fast transfer rate.
6. Installing Visual Studio Code
Visual Studio Code (VSCode) is a free Integrated Development Environment (IDE) that runs on several operating systems including Windows, Linux and Mac. It is highly configurable and allows you to develop code using Makefile and perform remote debugging without any lag. This makes it the perfect IDE to program on the Taurus cluster. VSCode is free, you can download it from the internet and install it on your computer. You can install it on an RMC computer as it does not require admin privilege and the ECE department has received the authorization from CIS to use this software.
Once installed, start VSCode. On the left toolbar, click on the Extensions icon and install the following extensions:
- Nsight Visual Studio Code Edition
- C/C++ IntelliSense
- Remote Development
- Makefile Tools
- Shell Debugger
Once these extensions are installed, you can use VSCode to connect to taurus1. Click on the remote connection button on the bottom left corner of the VSCode window and select Connect to Host. Type taurus1.
It may ask you for the operating system of the remote host, select Linux. It may also ask you for your password a few times (this is if your SSH key has not propagated to the SSH gateway host yet), enter it each time. Once you are connected, click on Menu > Terminal > New Terminal. This gives you a bash terminal on taurus1. You can type
pwd to confirm that you are connected as yourself and that you are in your home directory.
Important note: Once connected to a remote server, the VSCode will automatically install the VSCode server in your home directory on the remote server, this may take a few seconds. You will also need to reinstall the extensions listed above on VSCode once you are running on the remote server.
7. Running a C/C++ Program
This section covers how to compile, run and debug a C/C++ program on the Taurus HPC Cluster using VSCode. First, use the following command in the VSCode terminal to download the example code in your home directory:
$ cd ~
$ wget --user cluster --password computing https://roberge.segfaults.net/wordpress/files/cluster/prime_cpp.zip
Unzip the start code:
$ unzip prime_cpp.zip
Go to Menu > File > Open Folder and open the prime_cpp directory. Inspect the content of the
prime.cpp file. This code computes the number of prime numbers between 2 and n. To compile and run the code, click on the debug icon on the left toolbar to open the debug window and then select Run debug from the drop-down menu in the left window. Then click on the green arrow just left of the drop-down menu. The start code should run successfully. You should see the output of the program in the TERMINAL window.
Important note: Using the debug button on the left toolbar and selecting the run configuration in the drop-down menu in the debug window on the left allows you to make use of the Makefile and the build and launch tasks that have been manually programmed in the
launch.jsonfiles (more details about these two files are given below). If you use the debug or run buttons on the right-hand side of VSCode, you will be using some default task and launch parameters and the project will fail.
This project has been configured to be compiled using a Makefile. Inspect the content of the source file and the Makefile. The Makefile is a bit complex and is very complete. It can be used as a starting point when creating your own C/C++ projects. Now, inspect the content of the .vscode directory which is used by VSCode to configure the project.
c_cpp_properties.json file is used by the VSCode to perform the syntax highlighting and to allow you to see function definition and perform auto-completion. It is not used during the compilation. If the
c_cpp_properties.json is not configured correctly, you may see errors highlighted in your code, but your code still compiles correctly. These errors are false positive due to the misconfiguration.
launch.json file configures the launch option for your program. In this example, there is the “Run debug” and “Run release” launch configuration. For both configurations, you can see the path to the binary being executed. You can also see the arguments used. Note that the launch configurations were configured with a pre-launch task which compiles the program before it is run.
tasks.json file configures the compilation tasks. Here, it calls the Makefile with the appropriate target.
To debug your program, add breakpoints by clicking to the left of the line number in the .cpp source file and compile and run your program in debug mode. When the program stops at a breakpoint, you can see the content of variables on the left window. To see the content of arrays, add an entry such as the following in the watch window:
This entry would allow you to see the first 5 values of the array input. By the way, there is no array in this C++ example, but knowing this trick will prove invaluable when writing your own code in VSCode. To print the value of a single element of an array, you can type something like the following in the DEBUG CONSOLE tab:
This entry would allow you to see the 6th element of array input. Remember that C/C++ is zero indexed.
Another useful trick, VSCode has an auto-format feature and fixes the indentation of your code. To use it, simply type:
- On Windows: Shift + Alt + F
- On Mac: Shift + Option + F
- On Ubuntu: Ctrl + Shift + I
Once your code is working in debug mode, compile it and run it in release mode. This will ensure that it works correctly in release mode. However, to measure accurate runtimes, you must run the code without the GDB debugger attached. For this, go to the Terminal tab and call the program directly. In this case, call:
8. Running an OpenMP Program (multicore CPU)
Now that you have run a sequential program, let’s use OpenMP to take advantage of the multicore processor installed in the Taurus HPC Cluster. Download the example code :
$ cd ~
$ wget --user cluster --password computing https://roberge.segfaults.net/wordpress/files/cluster/prime_omp.zip
Unzip the start code:
$ unzip prime_omp.zip
Go to Menu > File > Open Folder and open the prime_omp directory. Inspect the content of the
prime.cpp file. This code also computes the number of prime numbers between 2 and n, but now uses multiple threads. You can inspect the Makefile to see the options used to compile the program.
Adjust the number of threads and run the program. Run it outside of the debugger to get the real runtime of the program. As an example, the following command will run the program with 4 to 12 threads with increments of 4.
$ make release
$ release/prime_omp 4 4 12
launch.json file in the .vscode directory to see how VSCode can be configured to pass arguments to the program
9. Running a CUDA Program (GPU)
The GPU contains a very large number of cores when compared to multicore CPUs, but each core is much simpler in design. GPUs are optimized for massively parallel programs that exploit data-level parallelism. To test the GPU installed on the Taurus HPC Cluster, download the CUDA example program using the following commands:
$ cd ~
$ wget --user cluster --password computing https://roberge.segfaults.net/wordpress/files/cluster/prime_cuda.zip
Unzip the start code:
$ unzip prime_cuda.zip
Go to Menu > File > Open Folder and open the prime_cuda directory. Inspect the content of the
prime.cu file, the Makefile and the files in the .vscode directory. When you are ready, compile and run the example program using the launch button. You should note a much higher speedup compared to the OpenMP program.
10. Running an MPI Program (Distributed and Multicore)
For highly complex tasks, it is sometimes necessary to use the computing power of multiple computers connected together in a cluster. This can be achieved using a high-performance multi-process library for distributed systems such as Message Passing Interface (MPI). MPI programs must run from a directory that is shared between all the nodes in the cluster. For this reason, change directory to
~/shared and download the MPI example program there:
$ cd ~/shared
$ wget --user cluster --password computing https://roberge.segfaults.net/wordpress/files/cluster/prime_mpi.zip
Unzip the start code:
$ unzip prime_mpi.zip
Go to Menu > File > Open Folder and open the prime_mpi directory. Inspect the content of the various files in the directory.
Your MPI program will run on 4 nodes (taurus1 to taurus4) and will make use of all the cores on the CPUs. Before you run the program, you must acquire a Kerberos ticket from the domain controller which will allow you to access the other nodes without a password. You must also log in to the other 3 taurus nodes so that a soft link to your shared directory gets created in your home directory. The nodes have been configured so that the soft link gets created the first time you open a Bash shell on the node. To do this, run the following commands on taurus1. It can be done right from the terminal window of VSCode.
$ ssh $USER@taurus2.local.net $ exit $ ssh $USER@taurus3.local.net $ exit $ ssh $USER@taurus4.local.net $ exit
This previous step only needs to be done the first time you use the Taurus HPC Cluster. However, at the beginning of every MPI programming session, you will need to enter the command below to reacquire your Kerberos ticket and to remount the shared drive on all taurus nodes:
$ make init
You are now ready to compile and run the MPI example program. Use the launch button and select mpirun release. Mpirun is the application that launches the multiple processes on the cluster computers.
Debugging an MPI program is a bit tricky because mpirun is the first process to run which in turn starts several instances of your program. If you try debugging your program in the standard way, the debugger will start debugging mpirun and not your application. This will fail catastrophically. To debug your MPI program, compile your program in DEBUG mode using the Makefile. Note that there is a line of code at the beginning of the
main() function that is included only in debug mode:
This line of code makes process 0 wait for the debugger to attach to it. Once the program is compiled in debug mode, insert at least one breakpoint in the source code and use the command prompt to launch the program:
$ make run-debug
Process 0 will spin lock until the debugger attaches to it. Now, from VSCode, use the launch button and select attach debug. You will be prompted to select the PID of the process to attach to. Typically, process 0 of your program will be the process named prime_mpi with the lowest PID. Once attached, process 0 will start running automatically and should stop at your first breakpoint. You can now use the debugger normally.
11. Running a MATLAB Parallel Program (Distributed, Multicore and GPU)
The last exercise of this guide consists of running a MATLAB Parallel Server program on the Taurus HPC Cluster. This allows you to write parallel code that makes use of a very large number of processes and the GPUs present on all the compute nodes.
The compute server is configured with MATLAB R2022b. You must use this exact version on your personal computer. If you do not have this version, you must install it using your RMC email account. This may require you to create a MathWorks account. When installing MATLAB, make sure that you install the MATLAB Parallel Toolbox.
When connecting a MATLAB client to the MATLAB Parallel Server, several ports are used for the communication. For this reason, before you can run a job on the cluster, you must establish a VPN connection to the Taurus HPC Cluster subnet using the OpenVPN client.
Important note: This section can only be done on your personal computer at home or on the RMC wireless network as the RMC Enterprise Network does not open the destination port used by the VPN connection.
11.2. VPN Connection
The OpenVPN client can be downloaded here. Once downloaded and installed, the client will ask you for a profile file. Get the profile file from the system administrator. Each user has a unique profile file that identifies you when you connect to the network, you cannot share your config file with other users. Loading the profile file is the only configuration needed to establish the network connection.
To confirm that you are connected to the Taurus HPC Cluster, open a Windows command prompt and type:
> nslookup taurus1.local.net
You should receive a 192.168.90.x address.
11.3. Starting a MATLAB Job Scheduler
The next step is to start a MATLAB Job Scheduler and a group of worker processes. To do this, start the following executable (you can create a shortcut on your desktop) on your Windows computer:
<matlab>\toolbox\parallel directory does not exist, you did not install the Matlab Parallel Toolbox on your computer, reinstall it.
Click on Add or Find host and add the taurus hosts (compute nodes):
Once added, in the MATLAB Job Scheduler window, click on Start to create a job manager on taurus1.local.net. Name your scheduler with your username. This will allow other users to identify your scheduler as yours. You can use any password as the admin user password. This password is only valid for the duration of the scheduler. You should stop your scheduler when you are done using the compute cluster.
Now that the scheduler is started, in the Workers window, click on Start to create workers. Since each host has 24 cores, it is recommended to create a maximum of 24 workers per host for a total of 96 workers. Creating 96 workers does take time, please be patient. You only need to create them at the beginning of your work day and destroy them at the end of the day or when you are done working on your project.
Once your workers are started, your Admin Center window should look like this:
You can now minimize Admin Center and start the normal MATLAB program.
11.4. Connecting to MATLAB Workers
In MATLAB, on the Home tab, go to Parallel > Create and Manage Clusters. Click on the Add Cluster Profile icon and select Matlab Parallel Scheduler. This should create a profile named “MJSProfile1”. Right-click on the profile name and rename the profile to “TaurusCluster”. With “TaurusCluster” selected, click on the Edit icon and configure your cluster as follows:
- Description: Taurus Cluster
- Host name: taurus1.local.net
- Matlab Job Scheduler: vincent (this should be your username is you followed the instruction)
- Username for Matlab job scheduler access: vincent (this is your username)
Leave the rest as default.
Back to the MATLAB main window, on the Home tab, select Parallel > Select Parallel Environment > TaurusCluster.
11.5. Running Parallel Code
Move to a suitable working directory and run the following MATLAB command in the MATLAB command window to download the example code:
prime_matlab.m file and run the code. MATLAB will prompt you for your password, enter you Taurus HPC Cluster password (the one used to SSH onto the Taurus HPC Cluster).
11.6. Terminating a MATLAB Session
Once you are done using MATLAB on the Taurus HPC Cluster, go in Admin Center and destroy all your workers AND THEN destroy the job scheduler. The workers must be destroyed before the scheduler. The order is very important. To destroy your workers, simply right-click on your workers and select Destroy. You can use Ctrl-a to select all your workers at once. Do the same to destroy your scheduler.
Important note: Forgetting to destroy your scheduler and worker will create unnecessary load on the cluster and will clutter the Admin Center console of other users. Make sure that you destroy your workers and your scheduler in the correct order. The workers must be destroyed first and the scheduler must be destroyed last.
Thank you for using the Taurus HPC Cluster, if you have comments or requests, do not hesitate to contact the system administrator. If you would like to have your own Linux container (LXC) on the cluster so that you can install personalized packages (ex.: python with different machine learning or artificial intelligence libraries), this is possible, discuss it with the system administrator.