Starting a pipeline

Note

If you are running the deprecated genericpipeline version of the pipeline (prefactor 3.2 or older), please check the old instructions page.

Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with cwltool or toil for the HBA calibrator pipeline:

$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json

$ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json

where LINC.json is the input JSON file as described in the chapter Configuring LINC and <install_dir> the location of the LINC CWL description files.

Note

Instead of specifying all options in LINC.json the user can also use command line options to override the defaults.

By default, LINC will execute the processing steps (like Dp3, etc.) inside a Docker container. If you prefer to use Singularity instead, the option --singularity can be added to the cwltool command line (see options below).

Note

Do not run your cwltool or toil calls inside the Docker or Singularity container unless this is exactly what you intend to do (see next section)

The following list provides the workflows to call in the command above for standard LOFAR observations. These provide the proper pipeline with pre-defined parameters (defaults) for HBA and LBA observations:

LINC workflow

HBA

LBA

LINC_calibrator.cwl

HBA_calibrator.cwl

LBA_calibrator.cwl

LINC_target.cwl

HBA_target.cwl

LBA_target.cwl

Note

The LBA target workflow is still experimental and thus may not provide the expected results.

If you have installed cwltool or toil locally on your system, LINC will pull automatically the right (u)Docker/Singularity image for you.

Running LINC from within a (u)Docker/Singularity image

If you do not want to install cwltool or toil locally on your system you need to pull the software images first (otherwise you do not need to do that): For Docker:

$ docker pull astronrd/linc

, uDocker:

$ udocker pull astronrd/linc

, and for Singularity:

$ singularity pull docker://astronrd/linc

To run LINC you only need to add the container-specific execution command and make sure that all necessary volumes are mounted read-write (<mount_points>) and are thus accessible from inside the container, e.g.,:

$ singularity exec --bind <mount_points>:<mount_points> <linc.sif> cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json

where <linc.sif> is the location of the Singularity image, or

$ docker run --rm <docker_options> -v <mount_points>:<mount_points> -w $PWD astronrd/linc cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json

Since you are running LINC inside a container, do not forget to add the --no-container flag to your call, no matter whether you use Singularity or Docker. Do not use the --singularity flag.

Pipeline options for cwltool

The following <cwl_options> are recommended to use for running LINC with cwltool. Please check carefully which options to choose depending on the way how you run LINC:

  • —outdir: specifies the location of the final pipeline output directory (results)

  • —tmpdir-prefix: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using /tmp)

  • —log-dir: specifies the location of the intermediate logfiles captured from stdout or stderr

  • —leave-tmpdir: do not delete intermediate data products (use this if you need debugging)

  • —parallel: jobs will run in parallel (highly recommended to achieve decent processing speed)

  • —singularity: use Singularity instead of Docker (necessary if you want to use Singularity)

  • —user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)

  • —no-container: don’t use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)

  • —preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)

  • —debug: more verbose output (use only for debugging the pipeline)

While the pipeline runs, the terminal will output the current state of the pipeline. For debugging it is recommended to run cwltool inside a screen or to pipe the output into a runtime logfile:

$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1

A fairly typical run that uses Singularity can look similar to this:

$ cwltool \
  --singularity \
  --parallel \
  --outdir "/data/myproject/Linc-L628614" \
  --log-dir "/data/myproject/Log-L628614" \
  --tmpdir-prefix "/data/myproject/Tmp-L628614/" \
  ~/.local/share/linc/workflows/HBA_target.cwl \
  linc-L628614.json

In the specified --tmpdir-prefix all temporary folders and files are generated. At the end of the run those files can be deleted. Specifying --parallel will allocate one job per available computing thread. For I/O and computationally demanding steps the amount of parallel jobs is reduced. If your system is not reasonably powerful to run many tasks in parallel, do not use this option or switch to toil using the --maxCores option.

Note

cwltool has no option to resume a failed/crashed run. If you need this option have a look at toil.

Pipeline options for toil

The following <cwl_options> are recommended to use for running LINC with toil:

  • —workDir: specifies the location of toil-specific intermediate data products, directory needs to exist

  • —bypass-file-store: do not use toil’s file store. This option is always required to run LINC properly!

  • —tmpdir-prefix: specifies the location of the intermediate data products. The directory needs to exist (should provide enough fast disk space, avoid using /tmp)

  • —log-dir: specifies the location of the intermediate logfiles captured from stdout or stderr

  • —outdir: specifies the location of the final data products

  • —leave-tmpdir: do not delete intermediate data products (use this if you need debugging)

  • —jobStore: location of the jobStore (“statefile”)

  • —logFile: location of the main pipeline logfile

  • —batchSystem: use specific batch system of an HPC cluster or similar, e.g. slurm or single_machine

  • —maxCores: maximum amount of cores to be considered for allocating jobs

  • —singularity: use Singularity instead of Docker (necessary if you want to use Singularity)

  • —user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)

  • —preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)

  • —no-container: don’t use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)

  • —restart: If specified pipeline will attempt to restart existing workflow as saved in the jobStore.

A fairly typical run that uses Singularity can look similar to this:

$ toil-cwl-runner \
  --workDir "/data/myproject/Work-L628614" \
  --jobStore "/data/myproject/Work-L628614/JobStore" \
  --logFile "/data/myproject/Linc-L628614.log" \
  --batchSystem single_machine \
  --bypass-file-store \
  --singularity \
  --outdir "/data/myproject/Linc-L628614" \
  --log-dir "/data/myproject/Log-L628614" \
  --tmpdir-prefix "/data/myproject/Tmp-L628614/" \
  ~/.local/share/linc/workflows/HBA_target.cwl \
  linc-L628614.json

If you need to reduce the amount of jobs to be run in parallel, decrease the amount of computing threads to be considered for running the pipeline with --maxCores to a number lower than the actual amount of available computing threads (as shown if you call nproc).

The following parameters may help to further control of toil:

  • —writeLogsFromAllJobs: enable saving toil pipeline job logfiles

  • —writeLogs: location of the pipeline job logfiles

  • —logLevel: can be CRITICAL, ERROR, WARNING, INFO or DEBUG

  • —clean: determines the deletion of the jobStore upon completion of the program. Can be ‘always’, ‘onError’, ‘never’ or ‘onSuccess’

  • —stats: creates runtime statistics

  • —retryCount: amount of retries for each failed pipeline job

For further information on how to use toil the user may want to read the toil documentation.

Stopping and restarting the pipeline

You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it). If you use cwltool the pipeline can not be resumed from the stage where it was terminated. You will have to restart the full pipeline.

You can restart a pipeline if using toil through adding the parameter --restart on the terminal. If you want to start from scratch you should delete the directory created via jobStore and all intermediate data products (usually specified via the --workDir parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.

Note

toil’s --restart option is only useful if a temporary error has occured, since it will always reuse the workflow and the data it originally has started with. If those needed to be changed (software update/corrupted or missing data), you will have to delete the jobStore and start a fresh run.

Running LINC with a cached Singularity image

Typical reasons for which you would like to run LINC with a cached Singularity image:

  • when running into a Docker pull limit.

  • when compute nodes do not have an outbound connection to the internet or any similar firewall restrictions.

In such cases you need to set a special CWL environment variable to make use of it:

$ export CWL_SINGULARITY_CACHE=<cachedir>
$ export SINGULARITY_TMPDIR=<cachedir>/tmp

where <cachedir> is your chosen location of the Singularity image. When running LINC, the location of the <cachedir> should be visible from all compute nodes or machines you want to use. Once set simply restart your pipeline with --singularity and the image should be put into the right directory and be used for all steps. Setting $SINGULARITY_TMPDIR is optional and avoids temporary mounting of the images in your /tmp-directory. This is particularly helpful if your /tmp-directory does not have much disk space. If you need to update your LINC Singularity image, simply remove the astronrd_linc.sif file in your <cachedir> and clean the Singularity cache:

$ singularity cache clean

Troubleshooting

With cwltool a pipeline crash is reported via this message:

WARNING Final process status is permanentFail

If you encounter such a permanent fail it is highly recommend to pipe the output of the pipeline run into a logfile, add --leave-tmpdir, and specify the --log-dir to the <cwl_options>:

$ cwltool --leave-tmpdir <cwl_options> --log-dir <logdir> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1

In order to figure out at which step the pipeline failed you can search for the term permanentFail in the toil or cwltool logfile:

$ more logfile | grep "Permanent Fail"

WARNING [job find_skymodel_cal] completed permanentFail
WARNING [step find_skymodel_cal] completed permanentFail
INFO [workflow prep] completed permanentFail
WARNING [step prep] completed permanentFail
INFO [workflow linc] completed permanentFail
WARNING [step linc] completed permanentFail
INFO [workflow ] completed permanentFail
WARNING [job check_ateam_separation] completed permanentFail
WARNING [step check_ateam_separation] completed permanentFail
WARNING Final process status is permanentFail

With that information it is possible to identify the first failed job/step to be find_skymodel_cal. To find the corresponding part of the logfile where the step was launched you search for [job find_skymodel_cal]. The corresponding logfiles of this job/step can be found in the <tmpdir> (specified with --tmpdir-prefix) or <logdir> (if you have specified --log-dir):

$ find <tmpdir> | grep find_skymodel_cal

<tmpdir>/n6zgif6j/find_skymodel_cal.log
<tmpdir>/n6zgif6j/find_skymodel_cal_err.log

$ cat <tmpdir>/n6zgif6j/find_skymodel_cal.log <tmpdir>/n6zgif6j/find_skymodel_cal_err.log

Traceback (most recent call last):
File "find_sky.py", line 27, in <module>
    output = find_skymodel(mss, skymodels, max_separation_arcmin=max_separation_arcmin)
File "/usr/local/bin/find_skymodel_cal.py", line 130, in main
    ra, dec = grab_pointing(input2strlist_nomapfile(ms_input)[0])
File "/usr/local/bin/find_skymodel_cal.py", line 26, in grab_pointing
    [ra, dec] = pt.table(MS+'::FIELD', readonly=True, ack=False).getcol('PHASE_DIR')[0][0] * 180 / math.pi
File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 372, in __init__
    Table.__init__(self, tabname, lockopt, opt)
RuntimeError: Table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS::FIELD does not exist

In this example the table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS seems to be missing. Since in this example we make use of Docker we need to find the location of this file on our harddisk:

$ more logfile  | grep "/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS"

--mount=type=bind,source=/data/L667521_SB000_uv.MS,target=/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS,readonly \

$ ls -d /data/L667521_SB000_uv.MS

ls: cannot access '/data/L667521_SB000_uv.MS': No such file or directory

So obviously we have specified a non-existing data set as an input in the LINC.json

In toil the main logfile is written to --logFile and logfiles from single jobs/steps are put into --writeLogs. If a job has failed the corresponding logfile location is reported in the main logfile.

If there is no error message reported or no corresponding logfile available, check for all lines leading with ERROR or error to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong.

To get help on new or already known issues, please check Getting help for further support and information.