Starting a pipeline¶
Note
If you are running the deprecated genericpipeline version of the pipeline (prefactor 3.2 or older), please check the old instructions page.
Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with cwltool
or toil
for the HBA calibrator pipeline:
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json
$ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json
where LINC.json
is the input JSON file as described in the chapter Configuring LINC and <install_dir>
the location of the LINC CWL description files.
Note
Instead of specifying all options in LINC.json
the user can also use command line options to override the defaults.
By default, LINC will execute the processing steps (like Dp3, etc.) inside a Docker container. If you prefer to use Singularity instead, the option
--singularity
can be added to the cwltool command line (see options below).
Note
Do not run your cwltool
or toil
calls inside the Docker or Singularity container unless this is exactly what you intend to do (see next section)
The following list provides the workflows to call in the command above for standard LOFAR observations. These provide the proper pipeline with pre-defined parameters (defaults) for HBA and LBA observations:
LINC workflow |
HBA |
LBA |
|
|
|
|
|
|
Note
The LBA target workflow is still experimental and thus may not provide the expected results.
If you have installed cwltool
or toil
locally on your system, LINC will pull automatically the right (u)Docker/Singularity image for you.
Running LINC from within a (u)Docker/Singularity image¶
If you do not want to install cwltool
or toil
locally on your system you need to pull the software images first (otherwise you do not need to do that):
For Docker:
$ docker pull astronrd/linc
, uDocker:
$ udocker pull astronrd/linc
, and for Singularity:
$ singularity pull docker://astronrd/linc
To run LINC you only need to add the container-specific execution command and make sure that all necessary volumes are mounted read-write (<mount_points>
) and are thus accessible from inside the container, e.g.,:
$ singularity exec --bind <mount_points>:<mount_points> <linc.sif> cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json
where <linc.sif>
is the location of the Singularity image, or
$ docker run --rm <docker_options> -v <mount_points>:<mount_points> -w $PWD astronrd/linc cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json
Since you are running LINC inside a container, do not forget to add the --no-container
flag to your call, no matter whether you use Singularity or Docker. Do not use the --singularity
flag.
Pipeline options for cwltool
¶
The following <cwl_options>
are recommended to use for running LINC with cwltool
.
Please check carefully which options to choose depending on the way how you run LINC:
—outdir: specifies the location of the final pipeline output directory (results)
—tmpdir-prefix: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using
/tmp
)—log-dir: specifies the location of the intermediate logfiles captured from
stdout
orstderr
—leave-tmpdir: do not delete intermediate data products (use this if you need debugging)
—parallel: jobs will run in parallel (highly recommended to achieve decent processing speed)
—singularity: use Singularity instead of Docker (necessary if you want to use Singularity)
—user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)
—no-container: don’t use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
—preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
—debug: more verbose output (use only for debugging the pipeline)
While the pipeline runs, the terminal will output the current state of the pipeline. For debugging it is recommended to
run cwltool inside a screen
or to pipe the output into a runtime logfile:
$ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1
A fairly typical run that uses Singularity can look similar to this:
$ cwltool \
--singularity \
--parallel \
--outdir "/data/myproject/Linc-L628614" \
--log-dir "/data/myproject/Log-L628614" \
--tmpdir-prefix "/data/myproject/Tmp-L628614/" \
~/.local/share/linc/workflows/HBA_target.cwl \
linc-L628614.json
In the specified --tmpdir-prefix
all temporary folders and files are generated. At the end of the run those files can be deleted. Specifying --parallel
will allocate one job per available computing thread. For I/O and computationally demanding steps the amount of parallel jobs is reduced. If your system is not reasonably powerful to run many tasks in parallel, do not use this option or switch to toil
using the --maxCores
option.
Note
cwltool
has no option to resume a failed/crashed run. If you need this option have a look at toil
.
Pipeline options for toil
¶
The following <cwl_options>
are recommended to use for running LINC with toil
:
—workDir: specifies the location of
toil
-specific intermediate data products, directory needs to exist—bypass-file-store: do not use
toil
’s file store. This option is always required to run LINC properly!—tmpdir-prefix: specifies the location of the intermediate data products. The directory needs to exist (should provide enough fast disk space, avoid using
/tmp
)—log-dir: specifies the location of the intermediate logfiles captured from
stdout
orstderr
—outdir: specifies the location of the final data products
—leave-tmpdir: do not delete intermediate data products (use this if you need debugging)
—jobStore: location of the jobStore (“statefile”)
—logFile: location of the main pipeline logfile
—batchSystem: use specific batch system of an HPC cluster or similar, e.g.
slurm
orsingle_machine
—maxCores: maximum amount of cores to be considered for allocating jobs
—singularity: use Singularity instead of Docker (necessary if you want to use Singularity)
—user-space-docker-cmd udocker: use uDocker instead of Docker (necessary if you want to use uDocker)
—preserve-entire-environment: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
—no-container: don’t use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
—restart: If specified pipeline will attempt to restart existing workflow as saved in the
jobStore
.
A fairly typical run that uses Singularity can look similar to this:
$ toil-cwl-runner \
--workDir "/data/myproject/Work-L628614" \
--jobStore "/data/myproject/Work-L628614/JobStore" \
--logFile "/data/myproject/Linc-L628614.log" \
--batchSystem single_machine \
--bypass-file-store \
--singularity \
--outdir "/data/myproject/Linc-L628614" \
--log-dir "/data/myproject/Log-L628614" \
--tmpdir-prefix "/data/myproject/Tmp-L628614/" \
~/.local/share/linc/workflows/HBA_target.cwl \
linc-L628614.json
If you need to reduce the amount of jobs to be run in parallel, decrease the amount of computing threads to be considered for running the pipeline with --maxCores
to a number lower than the actual amount of available computing threads (as shown if you call nproc).
The following parameters may help to further control of toil
:
—writeLogsFromAllJobs: enable saving
toil
pipeline job logfiles—writeLogs: location of the pipeline job logfiles
—logLevel: can be CRITICAL, ERROR, WARNING, INFO or DEBUG
—clean: determines the deletion of the jobStore upon completion of the program. Can be ‘always’, ‘onError’, ‘never’ or ‘onSuccess’
—stats: creates runtime statistics
—retryCount: amount of retries for each failed pipeline job
For further information on how to use toil
the user may want to read the toil documentation.
Stopping and restarting the pipeline¶
You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it).
If you use cwltool
the pipeline can not be resumed from the stage where it was terminated. You will have to restart the full pipeline.
You can restart a pipeline if using toil
through adding the parameter --restart
on the terminal. If you want to start from scratch you should delete the directory created via jobStore
and all intermediate data products (usually specified via the --workDir
parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.
Note
toil
’s --restart
option is only useful if a temporary error has occured, since it will always reuse the workflow and the data it originally has started with. If those needed to be changed (software update/corrupted or missing data), you will have to delete the jobStore
and start a fresh run.
Running LINC with a cached Singularity image¶
Typical reasons for which you would like to run LINC with a cached Singularity image:
when running into a Docker pull limit.
when compute nodes do not have an outbound connection to the internet or any similar firewall restrictions.
In such cases you need to set a special CWL environment variable to make use of it:
$ export CWL_SINGULARITY_CACHE=<cachedir>
$ export SINGULARITY_TMPDIR=<cachedir>/tmp
where <cachedir>
is your chosen location of the Singularity image. When running LINC, the location of the <cachedir>
should be visible from all compute nodes or machines you want to use. Once set simply restart your pipeline with --singularity
and the image should be put into the right directory and be used for all steps.
Setting $SINGULARITY_TMPDIR
is optional and avoids temporary mounting of the images in your /tmp
-directory. This is particularly helpful if your /tmp
-directory does not have much disk space.
If you need to update your LINC Singularity image, simply remove the astronrd_linc.sif
file in your <cachedir>
and clean the Singularity cache:
$ singularity cache clean
Troubleshooting¶
With cwltool
a pipeline crash is reported via this message:
WARNING Final process status is permanentFail
If you encounter such a permanent fail it is highly recommend to pipe the output of the pipeline run into a logfile, add --leave-tmpdir
, and specify the --log-dir
to the <cwl_options>
:
$ cwltool --leave-tmpdir <cwl_options> --log-dir <logdir> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1
In order to figure out at which step the pipeline failed you can search for the term permanentFail
in the toil
or cwltool
logfile:
$ more logfile | grep "Permanent Fail"
WARNING [job find_skymodel_cal] completed permanentFail
WARNING [step find_skymodel_cal] completed permanentFail
INFO [workflow prep] completed permanentFail
WARNING [step prep] completed permanentFail
INFO [workflow linc] completed permanentFail
WARNING [step linc] completed permanentFail
INFO [workflow ] completed permanentFail
WARNING [job check_ateam_separation] completed permanentFail
WARNING [step check_ateam_separation] completed permanentFail
WARNING Final process status is permanentFail
With that information it is possible to identify the first failed job/step to be find_skymodel_cal
. To find the corresponding part of the logfile where the step was launched you search for [job find_skymodel_cal]
.
The corresponding logfiles of this job/step can be found in the <tmpdir>
(specified with --tmpdir-prefix
) or <logdir>
(if you have specified --log-dir
):
$ find <tmpdir> | grep find_skymodel_cal
<tmpdir>/n6zgif6j/find_skymodel_cal.log
<tmpdir>/n6zgif6j/find_skymodel_cal_err.log
$ cat <tmpdir>/n6zgif6j/find_skymodel_cal.log <tmpdir>/n6zgif6j/find_skymodel_cal_err.log
Traceback (most recent call last):
File "find_sky.py", line 27, in <module>
output = find_skymodel(mss, skymodels, max_separation_arcmin=max_separation_arcmin)
File "/usr/local/bin/find_skymodel_cal.py", line 130, in main
ra, dec = grab_pointing(input2strlist_nomapfile(ms_input)[0])
File "/usr/local/bin/find_skymodel_cal.py", line 26, in grab_pointing
[ra, dec] = pt.table(MS+'::FIELD', readonly=True, ack=False).getcol('PHASE_DIR')[0][0] * 180 / math.pi
File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 372, in __init__
Table.__init__(self, tabname, lockopt, opt)
RuntimeError: Table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS::FIELD does not exist
In this example the table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS
seems to be missing. Since in this example we make use of Docker we need to find the location of this file on our harddisk:
$ more logfile | grep "/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS"
--mount=type=bind,source=/data/L667521_SB000_uv.MS,target=/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS,readonly \
$ ls -d /data/L667521_SB000_uv.MS
ls: cannot access '/data/L667521_SB000_uv.MS': No such file or directory
So obviously we have specified a non-existing data set as an input in the LINC.json
In toil
the main logfile is written to --logFile
and logfiles from single jobs/steps are put into --writeLogs
. If a job has failed the corresponding logfile location is reported in the main logfile.
If there is no error message reported or no corresponding logfile available, check for all lines leading with ERROR
or error
to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong.
To get help on new or already known issues, please check Getting help for further support and information.