.. toctree::
   :hidden:
 
   running_old
   
.. _runlinc:

.. role:: blue

Starting a pipeline
===================

.. note::

   If you are running the deprecated genericpipeline version of the pipeline (**prefactor** 3.2 or older), please check the :doc:`old instructions page<running_old>`.

Once you have the data and the input JSON file ready, you can run the pipeline, e.g., with ``cwltool`` or ``toil`` for the HBA calibrator pipeline::

    $ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json
    
    $ toil-cwl-runner <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json
    
where ``LINC.json`` is the input JSON file as described in the chapter :doc:`parset` and ``<install_dir>`` the location of the **LINC** CWL description files.

.. note::
    
   Instead of specifying all options in ``LINC.json`` the user can also use command line options to override the defaults.

By default, **LINC** will execute the processing steps (like Dp3, etc.) inside a Docker container. If you prefer to use Singularity instead, the option
``--singularity`` can be added to the cwltool command line (see options below).

.. note::
    
   Do **not** run your ``cwltool`` or ``toil`` calls inside the Docker or Singularity container unless this is exactly what you intend to do (see next section)

The following list provides the workflows to call in the command above for standard LOFAR observations. These provide the proper pipeline with pre-defined parameters (defaults) for **HBA** and **LBA** observations:

============================ ====================== =======================
**LINC workflow**              **HBA**                   **LBA**
---------------------------- ---------------------- -----------------------
``LINC_calibrator.cwl``      ``HBA_calibrator.cwl`` ``LBA_calibrator.cwl``
``LINC_target.cwl``          ``HBA_target.cwl``     ``LBA_target.cwl``
============================ ====================== =======================

.. note::
    
    The **LBA** target workflow is still experimental and thus may not provide the expected results.
    
If you have installed ``cwltool`` or ``toil`` locally on your system, **LINC** will pull automatically the right (u)Docker/Singularity image for you.

Running **LINC** from within a (u)Docker/Singularity image
----------------------------------------------------------

If you do not want to install ``cwltool`` or ``toil`` locally on your system you need to pull the software images first (otherwise you **do not** need to do that):
For Docker::

    $ docker pull astronrd/linc

, uDocker::
    
    $ udocker pull astronrd/linc
    
, and for Singularity::

    $ singularity pull docker://astronrd/linc

To run **LINC** you only need to add the container-specific execution command and **make sure that all necessary volumes are mounted read-write** (``<mount_points>``) and are thus accessible from inside the container, e.g.,::

    $ singularity exec --bind <mount_points>:<mount_points> <linc.sif> cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json

where ``<linc.sif>`` is the location of the Singularity image, or ::

    $ docker run --rm <docker_options> -v <mount_points>:<mount_points> -w $PWD astronrd/linc cwltool --no-container <cwl_options> /usr/local/share/linc/workflows/HBA_calibrator.cwl LINC.json

Since you are running **LINC** inside a container, do not forget to add the ``--no-container`` flag to your call, no matter whether you use Singularity or Docker. Do **not** use the ``--singularity`` flag.

Pipeline options for ``cwltool``
--------------------------------

The following ``<cwl_options>`` are recommended to use for running **LINC** with ``cwltool``.
Please check carefully which options to choose depending on the way how you run **LINC**:

    * **---outdir**: specifies the location of the final pipeline output directory (results)
    * **---tmpdir-prefix**: specifies the location of the intermediate data products (should provide enough fast disk space, avoid using ``/tmp``)
    * **---log-dir**: specifies the location of the intermediate logfiles captured from ``stdout`` or ``stderr``
    * **---leave-tmpdir**: do not delete intermediate data products (use this if you need **debugging**)
    * **---parallel**: jobs will run in parallel (highly recommended to achieve decent processing speed)
    * **---singularity**: use Singularity instead of Docker (necessary if you **want** to use Singularity)
    * **---user-space-docker-cmd udocker**: use uDocker instead of Docker  (necessary if you **want** to use uDocker)
    * **---no-container**: don't use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
    * **---preserve-entire-environment**: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
    * **---debug**: more verbose output (use only for debugging the pipeline)
    
While the pipeline runs, the terminal will output the current state of the pipeline. For **debugging** it is recommended to
run cwltool inside a ``screen`` or to pipe the output into a runtime logfile::

    $ cwltool <cwl_options> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1
    
A fairly typical run that uses Singularity can look similar to this: ::

    $ cwltool \
      --singularity \
      --parallel \
      --outdir "/data/myproject/Linc-L628614" \
      --log-dir "/data/myproject/Log-L628614" \
      --tmpdir-prefix "/data/myproject/Tmp-L628614/" \
      ~/.local/share/linc/workflows/HBA_target.cwl \
      linc-L628614.json
    
In the specified ``--tmpdir-prefix`` all temporary folders and files are generated. At the end of the run those files can be deleted. Specifying ``--parallel`` will allocate one job per available computing thread. For I/O and computationally demanding steps the amount of parallel jobs is reduced. If your system is not reasonably powerful to run many tasks in parallel, do not use this option or switch to ``toil`` using the ``--maxCores`` option.

.. note::
    
    ``cwltool`` has no option to resume a failed/crashed run. If you need this option have a look at ``toil``.

Pipeline options for ``toil``
--------------------------------
The following ``<cwl_options>`` are recommended to use for running **LINC** with ``toil``:

    * **---workDir**: specifies the location of ``toil``-specific intermediate data products, directory needs to exist
    * **---bypass-file-store**: do not use ``toil``'s file store. This option is **always required** to run LINC properly!
    * **---tmpdir-prefix**: specifies the location of the intermediate data products. The directory needs to exist (should provide enough fast disk space, avoid using ``/tmp``)
    * **---log-dir**: specifies the location of the intermediate logfiles captured from ``stdout`` or ``stderr``
    * **---outdir**: specifies the location of the final data products
    * **---leave-tmpdir**: do not delete intermediate data products (use this if you need **debugging**)
    * **---jobStore**: location of the jobStore ("statefile")
    * **---logFile**: location of the main pipeline logfile
    * **---batchSystem**: use specific batch system of an HPC cluster or similar, e.g. ``slurm`` or ``single_machine``
    * **---maxCores**: maximum amount of cores to be considered for allocating jobs
    * **---singularity**: use Singularity instead of Docker (necessary if you **want** to use Singularity)
    * **---user-space-docker-cmd udocker**: use uDocker instead of Docker  (necessary if you **want** to use uDocker)
    * **---preserve-entire-environment**: use system environment variables (only for manual installation and in case you are running from within a Docker/Singularity image)
    * **---no-container**: don't use Docker container (only for manual installation and in case you are running from within a Docker/Singularity image)
    * **---restart**:  If specified pipeline will attempt to restart existing workflow as saved in the ``jobStore``.

A fairly typical run that uses Singularity can look similar to this: ::

    $ toil-cwl-runner \
      --workDir "/data/myproject/Work-L628614" \
      --jobStore "/data/myproject/Work-L628614/JobStore" \
      --logFile "/data/myproject/Linc-L628614.log" \
      --batchSystem single_machine \
      --bypass-file-store \
      --singularity \
      --outdir "/data/myproject/Linc-L628614" \
      --log-dir "/data/myproject/Log-L628614" \
      --tmpdir-prefix "/data/myproject/Tmp-L628614/" \
      ~/.local/share/linc/workflows/HBA_target.cwl \
      linc-L628614.json

If you need to reduce the amount of jobs to be run in parallel, decrease the amount of computing threads to be considered for running the pipeline with ``--maxCores`` to a number lower than the actual amount of available computing threads (as shown if you call `nproc`).

The following parameters may help to further control of ``toil``:

    * **---writeLogsFromAllJobs**: enable saving ``toil`` pipeline job logfiles
    * **---writeLogs**: location of the pipeline job logfiles
    * **---logLevel**: can be **CRITICAL**, **ERROR**, **WARNING**, **INFO** or **DEBUG**
    * **---clean**: determines the deletion of the jobStore upon completion of the program. Can be **'always'**, **'onError'**, **'never'** or **'onSuccess'**
    * **---stats**: creates runtime statistics
    * **---retryCount**: amount of retries for each failed pipeline job

For further information on how to use ``toil`` the user may want to read the `toil documentation`_.

Stopping and restarting the pipeline
------------------------------------

You can stop a pipeline run anytime by terminating the CWL process (typically by pressing CRTL-C in the terminal where you started it).
If you use ``cwltool`` the pipeline **can not** be resumed from the stage where it was terminated. You will have to restart the full pipeline.

You can restart a pipeline if using ``toil`` through adding the parameter ``--restart`` on the terminal. If you want to start from scratch you should delete the directory created via ``jobStore`` and all intermediate data products (usually specified via the ``--workDir`` parameter). After that you will start with a clean run. As mentioned earlier, you can re-start the pipeline by running the same command with which you started it.

.. note::
    
    ``toil``'s ``--restart`` option is only useful if a temporary error has occured, since it will always reuse the workflow and the data it originally has started with. If those needed to be changed (software update/corrupted or missing data), you will have to delete the ``jobStore`` and start a fresh run.

Running **LINC** with a cached Singularity image
------------------------------------------------

Typical reasons for which you would like to run **LINC** with a cached Singularity image:

    * when running into a Docker pull limit.
    * when compute nodes do not have an outbound connection to the internet or any similar firewall restrictions.

In such cases you need to set a special CWL environment variable to make use of it::

    $ export CWL_SINGULARITY_CACHE=<cachedir>
    $ export SINGULARITY_TMPDIR=<cachedir>/tmp

where ``<cachedir>`` is your chosen location of the Singularity image. When running **LINC**, the location of the ``<cachedir>`` should be visible from all compute nodes or machines you want to use. Once set simply restart your pipeline with ``--singularity`` and the image should be put into the right directory and be used for all steps.
Setting ``$SINGULARITY_TMPDIR`` is optional and avoids temporary mounting of the images in your ``/tmp``-directory. This is particularly helpful if your ``/tmp``-directory does not have much disk space.
If you need to update your **LINC** Singularity image, simply remove the ``astronrd_linc.sif`` file in your ``<cachedir>`` and clean the Singularity cache::

    $ singularity cache clean

Troubleshooting
---------------

With ``cwltool`` a pipeline crash is reported via this message::

    WARNING Final process status is permanentFail

If you encounter such a permanent fail it is highly recommend to pipe the output of the pipeline run into a logfile, add ``--leave-tmpdir``, and specify the ``--log-dir`` to the ``<cwl_options>``::

    $ cwltool --leave-tmpdir <cwl_options> --log-dir <logdir> <install_dir>/workflows/HBA_calibrator.cwl LINC.json > logfile 2>&1

In order to figure out at which step the pipeline failed you can search for the term ``permanentFail`` in the ``toil`` or ``cwltool`` logfile::

    $ more logfile | grep "Permanent Fail"

    WARNING [job find_skymodel_cal] completed permanentFail
    WARNING [step find_skymodel_cal] completed permanentFail
    INFO [workflow prep] completed permanentFail
    WARNING [step prep] completed permanentFail
    INFO [workflow linc] completed permanentFail
    WARNING [step linc] completed permanentFail
    INFO [workflow ] completed permanentFail
    WARNING [job check_ateam_separation] completed permanentFail
    WARNING [step check_ateam_separation] completed permanentFail
    WARNING Final process status is permanentFail
    
With that information it is possible to identify the first failed job/step to be ``find_skymodel_cal``. To find the corresponding part of the logfile where the step was launched you search for ``[job find_skymodel_cal]``.
The corresponding logfiles of this job/step can be found in the ``<tmpdir>`` (specified with ``--tmpdir-prefix``) or ``<logdir>`` (if you have specified ``--log-dir``)::

    $ find <tmpdir> | grep find_skymodel_cal
    
    <tmpdir>/n6zgif6j/find_skymodel_cal.log
    <tmpdir>/n6zgif6j/find_skymodel_cal_err.log

    $ cat <tmpdir>/n6zgif6j/find_skymodel_cal.log <tmpdir>/n6zgif6j/find_skymodel_cal_err.log

    Traceback (most recent call last):
    File "find_sky.py", line 27, in <module>
        output = find_skymodel(mss, skymodels, max_separation_arcmin=max_separation_arcmin)
    File "/usr/local/bin/find_skymodel_cal.py", line 130, in main
        ra, dec = grab_pointing(input2strlist_nomapfile(ms_input)[0])
    File "/usr/local/bin/find_skymodel_cal.py", line 26, in grab_pointing
        [ra, dec] = pt.table(MS+'::FIELD', readonly=True, ack=False).getcol('PHASE_DIR')[0][0] * 180 / math.pi
    File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 372, in __init__
        Table.__init__(self, tabname, lockopt, opt)
    RuntimeError: Table /var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS::FIELD does not exist

In this example the table ``/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS`` seems to be missing. Since in this example we make use of Docker we need to find the location of this file on our harddisk::

    $ more logfile  | grep "/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS"
    
    --mount=type=bind,source=/data/L667521_SB000_uv.MS,target=/var/lib/cwl/stg189d9fe5-93bd-46e9-92f7-24ff5819c6e2/L667521_SB000_uv.MS,readonly \
    
    $ ls -d /data/L667521_SB000_uv.MS

    ls: cannot access '/data/L667521_SB000_uv.MS': No such file or directory

So obviously we have specified a non-existing data set as an input in the ``LINC.json``

In ``toil`` the main logfile is written to ``--logFile`` and logfiles from single jobs/steps are put into ``--writeLogs``. If a job has failed the corresponding logfile location is reported in the main logfile.

If there is no error message reported or no corresponding logfile available, check for all lines leading with ``ERROR`` or ``error`` to get additional information about the possible cause of the crash or diagnostic messages that tell you what exactly went wrong.

To get help on new or already known issues, please check :ref:`help` for further support and information.

.. _toil documentation: https://toil.readthedocs.io/en/latest/running/cliOptions.html