.. _development.add-task: ============================= Adding a task to the workflow ============================= In this tutorial, we will see how to add a new task to the workflow. We will use the example of a task that extract the number of scans from a mzML file, using the pyOpenMS library. Adding a python script to the workflow executables ================================================== In :file:`cylc-src/bioreactor-workflow/bin/`, create a new file named :file:`get-scans-number` and paste the following content: .. code-block:: python :caption: :file:`bin/get-scans-number` #!/usr/bin/env python import os import sys from pathlib import Path from pyopenms import MzMLFile, MSExperiment MZML = os.getenv("mzml") def main(): """ Usage: ./get-scans-number Get number of scans from mzML file. `$mzml` shell environment variable must be set to the path of the file. """ exp = MSExperiment() MzMLFile().load(MZML, exp) sys.stdout.write(str(exp.getNrSpectra())) if __name__ == "__main__": if len(sys.argv) > 1: sys.stderr.write(main.__doc__) elif not MZML: sys.stderr.write("$mzml environment variable not set.\n") sys.exit() elif not Path(MZML).exists(): sys.stderr.write(f"mzML file not found: {MZML}\n") sys.exit() main() Make the script executable: .. code-block:: console $ chmod +x get-scans-number Creating a new task in the [runtime] section ================================================ Open :file:`cylc-src/bioreactor-workflow/flow.cylc` and add the following task definition at the end: .. code-block:: cylc :caption: :file:`flow.cylc` :emphasize-lines: 3- [runtime] # ... [[get_scans_number]] # The task will run in the wf-openms conda environment # Adding None makes the task appear at the root in the TUI/GUI inherit = None, CONDA_OPENMS script = """ echo "The script lauched by this task will extract the number of scans from the mzML file." get-scans-number > ${output_file} echo "The number of scans has been saved to ${output_file}" echo "Number of scans: $(cat ${output_file})" """ [[[environment]]] # The python script will use the $mzml environment # variable to get the path of the file. mzml = ${MAIN_RESULTS_DIR}/${RAWFILE_STEM}.mzML output_file = ${MAIN_RESULTS_DIR}/scans_number.txt This task will run the :file:`get-scans-number` script and save the output to a file named :file:`scans_number.txt` in the main results directory. This directory (:file:`share/cycle/n/dataflow/`) is specific to each cyclepoint ``n``. Adding the task to the graph ============================ Add a new graph string to the :strong:`+P1/P1` recurrence, inside the :strong:`[graph]` section of the workflow definition: .. code-block:: cylc :caption: :file:`flow.cylc` :emphasize-lines: 8 [[graph]] R1/^ = validate_cfg => validate_compounds_db & validate_met_model => is_setup R1/+P1 = convert_raw => get_instrument => extract_features +P1/P1 = """ is_setup[^] => _catch_raw @catch_raw => _catch_raw => convert_raw => get_timestamp & trim_spectra => extract_features => annotate => quantify convert_raw => get_scans_number """ The task will be executed for each cyclepoint (/P1) starting from the second one (+P1). It will run after the :strong:`convert_raw` task as it depends on the mzML file generated by it. No other task depends on the one we just added. You can check that the task has been added correctly by running: .. code-block:: console $ cylc graph bioreactor-workflow 0 1 .. figure:: /_static/graphs/added-task-graph.png :alt: Graph with the new task added :scale: 50% :align: center Testing the new task ==================== Install and start a new run of the workflow, and add a mzML file to the :file:`raws/` directory. The task should start immediately after the :strong:`convert_raw` task and generate a :file:`scans_number.txt` file in the :file:`cylc-run/your_run_name/share/cycle/1/dataflow/` directory. .. code-block:: output :caption: :file:`job.out` in logs Workflow : bioreactor-workflow/task-added Job : 1/get_scans_number/01 (try 1) User@Host: elliotfontaine@MBP-Elliot.local 2024-07-22T14:18:50+02:00 INFO - started The script lauched by this task will extract the number of scans from the mzML file. The number of scans has been saved to /Users/elliotfontaine/cylc-run/bioreactor-workflow/task-added/share/cycle/1/dataflow/scans_number.txt Number of scans: 35 2024-07-22T14:18:52+02:00 INFO - succeeded