Galilean_moons_around_Jupiter

Cluster guide

Welcome to the jupiter cluster guide!

This is the documentation of our cluster and it's resources.

Jump to

Basic usage

Ask Prof. Giovanni for a user, if one is granted you can login to the cluster using IP:150.162.31.2 and Port:2222 as follows.

$ ssh your_user@150.162.31.2 -p 2222

When logging in the cluster for the first time, please update your password to a strong one.

If you don't want to remember the port and the ip everytime you login, you can register this remote location in your ~/.ssh/config file (you can change jupiter for any name of your choice).

Host jupiter
    HostName 150.162.31.2
    User your_user
    Port 2222

Now you can login with:

$ ssh jupiter

You can also set a passwordless connection by creating a key and making it available in jupiter.

Softwares

ORCA

Usage:

In your $HOME/bin folder in jupiter there should already be a job script that you can use directly on yout .inp files.

In your .inp it should be specified both the nprocs and maxcore keywords, i.e. for a h2o_opt.inp file:

! HF def2-svp Opt 

%pal
nprocs 8 
end

%maxcore 2000

*xyzfile h2o.xyz 0 1 

Any doubts on the input please check the orca input library.

To submit this job, run:

$ job h2o.inp 1 8

Where the 1 means how many machines you're using and 8 means the number of procs. Or more generally:

$ job file.inp $nmachines $nprocs

Use the default values and you'll be fine (1 machine and 8 procs).

Checking a job status

$ qstat 

Or if you want information only on your submitted jobs:

$ qstat -u $USER

Where $USER is your cluster username.

These command just prints the queue usage without some information that is useful for debugging some failed jobs. To receive more information (for instance on the node your job is running) run:

$ qstat -as

Transfering files

The most common way of transferring files between local and remote locations (and the easiest if you're just starting) is the scp command.

For istance, if you want to send a opt.inp file located at ~/geem/projects to a /home/your_user/projects/ folder located in jupiter:

$ scp ~/geem/projects/opt.inp jupiter:/home/your_user/projects/ 

This comes from:

$ scp -P 2222 ~/geem/projects/opt.inp USERNAME@150.162.31.2:/home/your_user/projects/ 

Or the other way around, from jupiter to your local folder:

$ scp jupiter:/home/your_user/projects/opt.out ~/geem/projects/

There are other options (for instance, rclone) that can be also be used to sync files between your local machine and jupiter.

If you're using a windows machine, a good graphical interface that allows you to drag and drop files is mobaxterm.

Vinicius also has a script that syncs and submits jobs while in your local machine, any questions talk to him.

Debugging errors and jobs not running

Sometimes your jobs will fail (I know, sucks!), if you ever find yourself questioning why that happen this may help you:

First thing to do is to check if your job failed because of some issue related to orca, to do that open the output file in your favorite text-editor and look at the last lines. A quick alternative is to use the tail command, which shows you the last lines of a file.

If some error happened in a SCF procedure (if the issue is related to orca itself) you should see some error message at the very bottom of the output file. This error should at least clarify what went wrong in your calculation.

If the output file doesn't contain any error at the end of the file, then the issue might be hardware related. In this case, try to do the following:

  • Resubmit the job and check in what node your job is being run (do that by issuing the qstat -as command, once it starts you can see information on what node is running your job), for example:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
5090.ufsc       giliand* ufsc     tetrazol_*   7964   1   8   10gb 120:0 R 00:59
   Job run at Mon Apr 25 at 10:47 on (himalia:ncpus=8:mem=10485760kb)
5119.ufsc       giliand* ufsc     14_hess.i* 306745   1   8   10gb 120:0 R 32:23
   Job run at Tue Apr 26 at 22:13 on (io:ncpus=8:mem=10485760kb)

In this situation, the job 5090.ufsc is running on himalia and the job 5119.ufsc is running on io.

  • Connect through ssh to that node while logged in jupiter (ssh himalia for example) and check for temperature issues (running sensors). If you see the temperature of the CPU core's reaching 100 degrees, please contact matheus or vinicius.

Asking for help

If you ever cant understand what the hell is going on, or if you ever identify some kind of issue in the cluster, please open an issue in our cluster repository, this way we can keep track of what is happening and also have an archive for the solutions, if the problems ever happen again.

A first look into the linux world

You may soon realise that you're now entering a field where we're generally looking at a command line. Don't worry, you'll soon get very used to it and learn to love it. 😄

There are good reasons to use a terminal insted of GUI's (graphical user interfaces), but I will assume that if you're here you already have a slight idea about that.

For now I will only leave this quote here:

I once heard an author say that when you are a child you use a computer by looking at the pictures. When you grow up, you learn to read and write.

Also, here are some articles explaining the basics of using a unix terminal, if you're new to this world:

https://ubuntu.com/tutorials/command-line-for-beginners#4-creating-folders-and-files https://www.digitalocean.com/community/tutorials/an-introduction-to-the-linux-terminal https://maker.pro/linux/tutorial/basic-linux-commands-for-beginners

Other resources

Right now we've only got acces to jupiter and Tarsus (there will be a tutorial on using Tarsus too, dont worry), but soon we will have acces to cesup and Santos-Dumont.

  • CESUP
  • Tarsus (Franca-SP)
  • Santos-Dumont

Contributing

This site is open source and made by your dearest lab colleagues, if you feel like there should be more information, tutorials and guides here feel free to contribute. 😄