This page describes the details of Python modules intended for use in Catalyst. This page covers features supported by ParaView 5.9 and later i.e. version 2.0.
Catalyst Python modules are simply Python scripts that use the paraview
package to define data analysis and visualization pipelines. You use these modules to define the analysis pipelines to be executed in a Catalyst-instrumented simulation code, for example.
Python defines a module as a file containing Python definitions and statements. The file name for a module is the name of the module suffixed by extension .py
. Thus a module named foo
will be in a file named foo.py
.
Packages are a mechanism for structuring multiple modules. Essentially, if you place a bunch of .py files (aka modules) in a directory structure, that becomes a package! Of course, there's a little more to that and the Python docs are a highly recommended read. Similar to the convection to name a module using the file name, the name of the directory representing a Python package becomes the name of the package. A package can contain multiple directories (sub-packages) and Python files.
When a Python module is imported into the application, the Python statements in the module are executed. Since a Python module is a file, the execution starts with the first statement in the file and continues till the end of the file is reached or an error is encountered.
When a Python package is imported, the entry-point is defined by a file named __init__.py
with in the package. Statements in this file are executed. In the simplest case, __init__.py
can be an empty file, but it can also execute initialization code, including importing other modules, packages, sub-packages etc.
Catalyst Python code can be structured as modules or packages. In either case, the module (or package) is imported and then functions or variables defined on the module (or package) are checked and invoked as described in following sections.
Catalyst adaptors should use vtkCPPythonScriptV2Pipeline
to execute scripts as described in this document. For older Catalyst Python scripts, i.e. those that use RequestDataDescription
and DoCoProcess
functions, should continue to use vtkCPPythonScriptPipeline
and vtkCPPythonStringPipeline
at this time. It is conceivable that we provide a wrapper that lets us use vtkCPPythonScriptV2Pipeline
for these older script too, however, that has not been implemented at this time.
vtkCPPythonPipeline::DetectScriptVersion
may be used to detect the version number for the script. The logic relies on the file type and presence of certain comments in the header of the file to determine which version it may be.
Adaptors may use vtkCPPythonPipeline::CreatePipeline
to create appropriate vtkCPPythonPipeline
subclass based on the script version.
Now, let's look at the structure of a Catalyst Python module. The same is applicable to a Python package as well.
ParaView Python API, exposed via the paraview.simple
module, is used to define the visualization and data analysis pipeline. The API mimics actions one would take the GUI. For example, simple script to create a Wavelet source and slice it, looks as follows:
This same code can be used in a Catalyst Python script to setup the visualization pipeline. The only thing to note is that we need a mechanism to indicate which of the data-producers in the pipeline should be replaced by the data generated by the simulation. In Catalyst, the data produced by the simulation is available on named channels. Thus, we need to identify which named-channel corresponds to which data data-producer in the pipeline. For that, we use the registrationName
attribute. In the Python script, when creating a source or a filter, one can pass in an optional argument registrationName
. When the script is being executed within Catalyst, Catalyst looks to see if the registrationName
matches the name of a known channel. If so, the data producer type requested (in this case Wavelet) is ignored and instead is replaced by a producer that provides simulation data as the output.
The paraview.demos.wavelet_miniapp
is a miniapp that acts as a simulation producing a time value vtkImageData
dataset. So we can use it to run the above script as follows:
Let's inspect the warnings to understand what's going on.
bridge.py:17
warning can be ignored for now. The INFO
messages are produced by wavelet_miniapp.py
to provide information about the timestep as it progresses through the mock-simulation timesteps.
`Module 'sample1' missing Catalyst 'options', will use a default options object` warning lets us know that the Catalyst will default 'options'. More on that later.
script may not depend on simulation data; is that expected?
tells us that the pipeline we've setup in this script does not use any data produced by the simulation at all. While there's nothing wrong with that, it's highly unlikely that that's the intent and hence this warning is posted.
In our case, we want the Wavelet
source to be replaced by the simulation data. As mentioned, this means the registrationName
should match the simulation channel name. wavelet_miniapp
uses input as the default channel name. It can be changed using -c
(or --channel
) command line argument. Let's rename the channel using --channel
.
Great! Now the warnings about simulation data have disappeared.
The pipeline, however, still doesn't do anything significant.
Also, the code in the module seems to execute only once, for the first timestep. This is confirmed by the fact that the print_info
statements in sample.py
only show up once in the output. This is because the code in the module gets executed only when its imported for the first time. Once imported, which happens on first timestep, the module is not re-imported and hence the code is not re-executed.
To add code that gets executed each iteration, you can define a catalyst_execute
function. Likewise, catalyst_initialize
and catalyst_finalize
functions may be defined too and they will get called for the first and after the last timestep respectively. Here's an example script to see that in action.
When executed using the wavelet_miniapp
(this time, let's execute for 3 timesteps just to make things a little clearer), we get the following:
Note that catalyst_initialize
and catalyst_finalize
only gets called once while catalyst_execute
gets called for each iteration.
The info
object passed to catalyst_initialize
can be used to obtain information about the current invocation which may be useful when writing code to execute per iteration.
Here's a sample script that saves out an image each iteration.
Here's the result of executing this script with wavelet_miniapp
:
In ParaView 5.9, we introduced a new pipeline object called Extractors. Extractors are items in the visualization pipeline that can save data or images per timestep. Extractors make it largely unnecessary to have any custom code to execute per iteration since one can simply use extracts to save out image extracts from views or data extracts from filters and other data producers.
For example, sample3.py
can be modified to use extractors as follows:
This will produce an output like follows:
If you look at the generated results, /tmp/output-0.png
, /tmp/output-1.png
, etc. will be generated as with sample3.py
. However, this time, we didn't need to add an explicit code to execute per iteration by providing catalyst_execute
. The extractor achieve the same effect.
Such Catalyst scripts that use extractors can be easily exported from the ParaView GUI using File > Save Catalyst State. Such an exported script will often how the following trailer:
The options
variable is setup with values chosen in the Save Catalyst State dialog. This is the options variable that we have been seeing warnings for in our runs so far. If present, options
must be of the type paraview.catalyst.Options
. This is used to specify parameters that control how the code is executed e.g. location of output directory, Catalyst Live connection information etc. If not present, a default one is created.
The following lists several of the important properties on options
and their default values.
options.ExtractsOutputDirectory
is handy to place all generated extracts under a specific directory. To use it, ensure that all extractor use a relative filename. The filename is then evaluated to be relative to the directory provided for options.ExtractsOutputDirectory
Catalyst's steering capability allows simulation parameters to be modified at runtime using the CreateSteerableParameters
Python function in the script, and catalyst_results
function on the simulation side to retrieve parameters associated with a proxy. A special kind of Extractor, 'steering'
, allows the Python script to send feedback to the simulation in the form of a dataset. This extractor, unlike the others, is not associated to a trigger, but their input dataset is serialized in a Conduit node sent to the simulation only when the latter calls catalyst_results.
A usual use case is performing a data reduction in the Python script, then sending this result back to the simulation. The following script presents an example of this workflow.
In previous section, we used a single .py file to demonstrate various aspects of Catalyst Python module. The same is applicable to a Python package. Take a simple package with directory structure as follows:
The contents of __init__.py
as sample2.py
described earlier. Now, you can use this package together with the wavelet_miniapp
as follows:
As you can see, it produces a very similar result to using sample2.py
. This is of course a very simple script. For complex scripts, a package allows better code organization and reusability and hence can be very handy in those situations.
The __init__.py
file is the entry point. You can import other modules, packages, or sub-packages in it as needed.
Python supports importing modules and packages from within a ZIP archive without having to unzip the contents. This can be a very handy feature for Catalyst Python scripts. You can simply archive a package directory using the same name as the directory (adding extension .zip
) and then use that instead of the .py file.
For example, we can zip the sample
directory as sample.zip use it as follows to produce almost identical result:
Note, currently it is required that the name of the zip archive and the name of the package directory must match.
The pvpython
/pvbatch
executables can be launched with a -l
argument to generate detailed logs as follows:
Here, a /tmp/log.txt
will all logging output will be generated. You can also elevate Catalyst-generated log to a higher level and log that, for example, the following will elevate catalyst log level to INFO and log all INFO messages to /tmp/infolog.txt
The wavelet_miniapp
that we used in several of the examples here is a simple stand-in for a simulation code. It uses the Wavelet (vtkRTAnalyticSource
) internally to produce time-varying uniform rectilinear grid (vtkImageData
). Command line options can be used to customize the run, including number of timesteps, etc.
To run in parallel using mpi, make sure that you use pvbatch in symmetric mode as follows:
The wavelet_miniapp
is a good miniapp to use as stand-in for a simulation code to debug issues with the Catalyst scripts. Another useful miniapp is the filedriver_miniapp
. Unlike the wavelet_miniapp
which can only produce a specific type of dataset, filedriver_miniapp
uses files on disk and serves them as if they are being generated by a simulation.