Pipelines¶
It is important to note that a Pipeline is also a Step, so everything that applies to a Step in the For Users chapter also applies to Pipelines.
Configuring a Pipeline¶
This section describes how to set parameters on the individual steps in a pipeline. To change the order of steps in a pipeline, one must write a Pipeline subclass in Python. That is described in the Pipelines section of the developer documentation.
Just as with Steps, Pipelines can by configured either by a configuration file or directly from Python.
From a configuration file¶
A Pipeline configuration file follows the same format as a Step configuration file: the ini-file format used by the ConfigObj library.
Here is an example pipeline configuration file for a TestPipeline
class:
name = "TestPipeline"
class = "stpipe.test.test_pipeline.TestPipeline"
science_filename = "science.fits"
flat_filename = "flat.fits"
output_filename = "output.fits"
[steps]
[[flat_field]]
config_file = "flat_field.cfg"
threshold = 42.0
[[combine]]
skip = True
Just like a Step, it must have name and class values.
Here the class must refer to a subclass of stpipe.Pipeline.
Following name and class is the [steps] section. Under
this section is a subsection for each step in the pipeline. To figure
out what configuration parameters are available, use the stspec
script (just as with a regular step):
> stspec stpipe.test.test_pipeline.TestPipeline
start_step = string(default=None)# Start the pipeline at this step
end_step = string(default=None)# End the pipeline right before this step
science_filename = input_file() # The input science filename
flat_filename = input_file() # The input flat filename
skip = bool(default=False) # Skip this step
output_filename = output_file() # The output filename
[steps]
[[combine]]
config_file = string(default=None)
skip = bool(default=False) # Skip this step
[[flat_field]]
threshold = float(default=0.0)# The threshold below which to remove
multiplier = float(default=1.0)# Multiply by this number
skip = bool(default=False) # Skip this step
config_file = string(default=None)
Note that there are some additional optional configuration keys
(start_step and end_step) for controlling when the pipeline
starts and stops. This is covered in the section
Running partial Pipelines.
For each Step’s section, the parameters for that step may either be specified inline, or specified by referencing an external configuration file just for that step. For example, a pipeline configuration file that contains:
[steps]
[[flat_field]]
threshold = 42.0
multiplier = 2.0
is equivalent to:
[steps]
[[flat_field]]
config_file = myflatfield.cfg
with the file myflatfield.cfg in the same directory:
threshold = 42.0
multiplier = 2.0
If both a config_file and additional parameters are specified, the
config_file is loaded, and then the local parameters override
them.
Any optional parameters for each Step may be omitted, in which case defaults will be used.
From Python¶
A pipeline may be configured from Python by passing a nested dictionary of parameters to the Pipeline’s constructor. Each key is the name of a step, and the value is another dictionary containing parameters for that step. For example, the following is the equivalent of the configuration file above:
from stpipe.test.test_pipeline import TestPipeline
steps = {
'flat_field': {'threshold': 42.0}
}
pipe = TestPipeline(
"TestPipeline",
config_file=__file__,
science_filename="science.fits",
flat_filename="flat.fits",
output_filename="output.fits",
steps=steps)
Running a Pipeline¶
From the commandline¶
The same strun script used to run Steps from the commandline can
also run Pipelines.
The only wrinkle is that any step parameters overridden from the
commandline use dot notation to specify the parameter name. For
example, to override the threshold value on the flat_field
step in the example pipeline above, one can do:
> strun stpipe.test.test_pipeline.TestPipeline --steps.flat_field.threshold=48
From Python¶
Once the pipeline has been configured (as above), just call the instance to run it.
pipe()
Running partial Pipelines¶
There are two kinds of pipelines available:
1) Flexible pipelines are written in Python and may contain looping, conditionals and steps with more than one input or output.
2) Linear pipelines have a strict linear progression of steps and only have one input and output.
Linear pipelines have a feature that allows only a part of the
pipeline to be run. This is done through two additional configuration
parameters: start_step and end_step. start_step specifies
the first step to run. end_step specifies the last step to run.
Like all other configuration parameters, they may be either specified
in the Pipeline configuration file, or overridden at the commandline.
When start_step and end_step indicate that only part of the
pipeline will be run, the results of each step will be cached in the
current working directory. This allows the pipeline to pick up where
it left off later.
Note
In the present implementation, all this caching happens in the current working directory – we probably want a more sane way to manage these files going forward.
Each step may also be skipped by setting its configuration parameter
skip to True (either in the configuration file or at the command
line).
Caching details¶
The results of a Step are cached using Python pickles. This allows virtually most of the standard Python data types to be cached. In addition, any FITS models that are the result of a step are saved as standalone FITS files to make them more easily used by external tools. The filenames are based on the name of the substep within the pipeline.
Hooks¶
Each Step in a pipeline can also have pre- and post-hooks associated. Hooks themselves are Step instances, but there are some conveniences provided to make them easier to specify in a configuration file.
Pre-hooks are run right before the Step. The inputs to the pre-hook are the same as the inputs to their parent Step. Post-hooks are run right after the Step. The inputs to the post-hook are the return value(s) from the parent Step. The return values are always passed as a list. If the return value from the parent Step is a single item, a list of this single item is passed to the post hooks. This allows the post hooks to modify the return results, if necessary.
Hooks are specified using the pre_hooks and post_hooks
configuration parameter associated with each step. More than one pre-
or post-hook may be assigned, and they are run in the order they are
given. There can also be pre_hooks and post_hooks on the
Pipeline as a whole (since a Pipeline is also a Step). Each of these
parameters is a list of strings, where each entry is one of:
- An external commandline application. The arguments can be accessed using {0}, {1} etc. (See
stpipe.subproc.SystemCall).- A dot-separated path to a Python Step class.
- A dot-separated path to a Python function.
For example, here’s a post_hook that will display a FITS file in
the ds9 FITS viewer the flat_field step has done flat field
correction on it:
[steps]
[[flat_field]]
threshold = 42.0
post_hooks = "ds9 {0}",