Reading Restarted Simulation Output: Difference between revisions
(New page: An increasingly important problem that we are facing is that of reading in the output files from simulations that have been restarted (note that this is different from reading restart file...) |
No edit summary |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 12: | Line 12: | ||
=== Time Multiplexer Filter === | === Time Multiplexer Filter === | ||
A time multiplexer filter is a fairly simply filter that takes multiple inputs and has one output. Based on the time step requested down stream, this filter passes one of its inputs to the output and ignores the rest. | |||
;Pros: | |||
:* Easy to implement, can be used over and over. | |||
:* No extra GUI. | |||
The best part about this solution is its simplicity. Creating the filter is simply (I already have something that works pretty well), and it is completely agnostic to the type of reader or even the type of data. There is also no extra GUI to try to engineer or learn. Simply clamp on a filter and you are done. | |||
;Cons: | |||
:* Cumbersome to set up. | |||
:* Prone to user error. | |||
:* Requires modifications of the reader (which can effect their behavior). | |||
As I have actually tried to use this method to load output from a simulation with 12 restarts, I was disappointed at how cumbersome it was to set up. Opening all the files individually is a pain. Scripting helps a lot (it is what I ended up using), but it is not the kind of thing you want your users to have to do. | |||
The method is also prone to user error. Generally, you want all of the readers to have the same options so that the output of each one has the same type of data (e.g. all the same variables). Filters downstream of the multiplexer will behave badly if a variable that they are using suddenly disappears when the time changes. Mismatching the parameters of the reader is an easy thing to mess up. Again, scripting helps here, but is not a great solution for most users. | |||
Ironically, the solution that breaks the functionality away from the reader is the one that requires the most modifications to the reader. The issue that comes up is that each reader is going to create its own copy of the data. What you really want is to only read in the copy of data that is valid for the given time step. To do this, you typically have to add a feature that has the reader clear its output when reading a timestep outside of its range. Besides the pain of adding this feature, it also changes the behavior slightly. Now when you read a timestep outside of the valid range, the data disappears. | |||
--[[User:Biddisco|Biddisco]] 05:26, 26 November 2007 (EST) This is very interesting. I would like to add another use case to stimulate thoughts. A simulation consists of a number of time steps - however each time step may have a variable number of sub-iterations. The user would like to animate over time - traversing between primary time steps - but also dig down into the individual iterations which may have adaptive time steps depending on the current solution. We therfore have a slightly more complex time than just a step number or time value. (Some kind of hash might be useful). | |||
=== Time File Series Reader === | === Time File Series Reader === | ||
There is currently a special "meta" reader called vtkFileSeriesReader that takes in a series of files, associates each one with a time step, and then uses the actual reader to read one in. As the pipeline requests different time steps, the meta-reader changes the filename of the actual reader to load in a different data set. Here I propose to extend the file series reader to allow the files it points to to have its own time series in each, and select the file based on the time defined in each. | |||
;Pros: | |||
:* Relatively easy to implement, can be used over and over. | |||
:* Behaves just a single reader (once multiple files are selected). | |||
:* Can be used without changes to the reader. | |||
Like the multiplexer filter solution, the time file series meta-reader is relatively easy to implement. The tricky part has already been implemented in vtkFileSeriesReader. Unlike the multiplexer, the meta-reader behaves (from the user's perspective) just like a single reader once the file series is selected. Thus it is less cumbersome to set up and less prone to errors. | |||
Also, the meta-reader does not require any changes to the actual reader. However, meta-reader may call ProcessRequest on the reader in a different order than the pipeline would, so there may be some robustness issues with some readers. | |||
;Cons: | |||
:* Reading time information may be slow. | |||
:* Selecting a file series. | |||
One problem I foresee with this solution is the time it takes to read the time information. The meta-reader will have to iterate the actual reader over all the files to query for the time range in each file. This can take a while, especially if the reader does not handle I/O if its metadata very well. | |||
There needs to be a way to select the file series. The issue is further complicated by the fact that actual reader may itself read in a file series (so you are actually reading a series of file series). There are several ways to select the series, each with their own quirks. | |||
;Select multiple files in the file browser: This is basically how the current vtkFileSeriesReader does it. There are special grouping operations in the file browser to help. However, since we might be reading a series of file series, this grouping becomes difficult and perhaps impossible. You could select multiple files by hand, but this can become cumbersome again. It is also a real problem if the files are located in separate directories, which may become necessary as 10,000 files are common. | |||
;Have a flat metadata file of filenames: The meta-reader could read in a file that is simply a text file containing a list of filenames of all the inputs. Although inelegant, such a file is very easy to create in a text editor or by simply piping the output of ls to a file. You can also navigate the metadata file much faster if it is placed in a directory away from the actual data files. | |||
--[[User:Biddisco|Biddisco]] 05:21, 26 November 2007 (EST) This works reasonably well. I have an ASCII <sigh> reader which loads huge time series data and takes a very long time - In the ExecuteInformation pass, the time information is dumped into a flat txt file so that the second time the file is accessed, the time (and some other information about the data) is available immediately. It would not be hard to imagine adding such a capability to the FileSeriesReader to automatically generate a flat txt file containing the required information so that on subsequent accesses it would be present already (read only file system caveat - would need a cache path for such infoprmation). Extending the concept to a formal XML metadata file would be great. If the XML is present, we read it. if not we generate it when the data has been scanned for the first time. | |||
;Have an XML metadata file: We could also create an XML format for pointing at other files. The XML format would allow us to place more metadata than just the filename. For example, we could put the time steps for each file to speed up the reading of all the metadata. However, now the creation of the file becomes much more difficult. In fact, to add this extra metadata would probably require some extra scripts or programs to generate it for the user. |
Latest revision as of 05:26, 26 November 2007
An increasingly important problem that we are facing is that of reading in the output files from simulations that have been restarted (note that this is different from reading restart files). When a simulation is started, it continues a simulation previously executed, but typically writes to a completely new set of files.
A simulation that was restarted multiple times will have multiple output file sets, each containing data over a different range of time. It is usually desirable to see all of these times as one contiguous group. So far this has been done by loading files independently (i.e. doing nothing), concatenating all the data into one file (or file group), or hacking up an animation to change filenames in the reader. None of these is a good solution.
It has long been proposed to simply change the readers to select file series or series of file series. The problem is that doing so can obfuscate the GUI, an issue that has stalled the implementation for the Exodus reader for years. Furthermore, the solution would have to be reworked independently for each reader. There is already a similar request for the SPCTH reader, and more may follow.
The new pipeline time support gives us more options for allowing us to read restarted simulation outputs. This page captures the design of such a solution.
Potential Designs
So far I can think of two potential designs using the new time support. The first is to use a #Time Multiplexer Filter and the second is to use a #Time File Series Reader
Time Multiplexer Filter
A time multiplexer filter is a fairly simply filter that takes multiple inputs and has one output. Based on the time step requested down stream, this filter passes one of its inputs to the output and ignores the rest.
- Pros
-
- Easy to implement, can be used over and over.
- No extra GUI.
The best part about this solution is its simplicity. Creating the filter is simply (I already have something that works pretty well), and it is completely agnostic to the type of reader or even the type of data. There is also no extra GUI to try to engineer or learn. Simply clamp on a filter and you are done.
- Cons
-
- Cumbersome to set up.
- Prone to user error.
- Requires modifications of the reader (which can effect their behavior).
As I have actually tried to use this method to load output from a simulation with 12 restarts, I was disappointed at how cumbersome it was to set up. Opening all the files individually is a pain. Scripting helps a lot (it is what I ended up using), but it is not the kind of thing you want your users to have to do.
The method is also prone to user error. Generally, you want all of the readers to have the same options so that the output of each one has the same type of data (e.g. all the same variables). Filters downstream of the multiplexer will behave badly if a variable that they are using suddenly disappears when the time changes. Mismatching the parameters of the reader is an easy thing to mess up. Again, scripting helps here, but is not a great solution for most users.
Ironically, the solution that breaks the functionality away from the reader is the one that requires the most modifications to the reader. The issue that comes up is that each reader is going to create its own copy of the data. What you really want is to only read in the copy of data that is valid for the given time step. To do this, you typically have to add a feature that has the reader clear its output when reading a timestep outside of its range. Besides the pain of adding this feature, it also changes the behavior slightly. Now when you read a timestep outside of the valid range, the data disappears.
--Biddisco 05:26, 26 November 2007 (EST) This is very interesting. I would like to add another use case to stimulate thoughts. A simulation consists of a number of time steps - however each time step may have a variable number of sub-iterations. The user would like to animate over time - traversing between primary time steps - but also dig down into the individual iterations which may have adaptive time steps depending on the current solution. We therfore have a slightly more complex time than just a step number or time value. (Some kind of hash might be useful).
Time File Series Reader
There is currently a special "meta" reader called vtkFileSeriesReader that takes in a series of files, associates each one with a time step, and then uses the actual reader to read one in. As the pipeline requests different time steps, the meta-reader changes the filename of the actual reader to load in a different data set. Here I propose to extend the file series reader to allow the files it points to to have its own time series in each, and select the file based on the time defined in each.
- Pros
-
- Relatively easy to implement, can be used over and over.
- Behaves just a single reader (once multiple files are selected).
- Can be used without changes to the reader.
Like the multiplexer filter solution, the time file series meta-reader is relatively easy to implement. The tricky part has already been implemented in vtkFileSeriesReader. Unlike the multiplexer, the meta-reader behaves (from the user's perspective) just like a single reader once the file series is selected. Thus it is less cumbersome to set up and less prone to errors.
Also, the meta-reader does not require any changes to the actual reader. However, meta-reader may call ProcessRequest on the reader in a different order than the pipeline would, so there may be some robustness issues with some readers.
- Cons
-
- Reading time information may be slow.
- Selecting a file series.
One problem I foresee with this solution is the time it takes to read the time information. The meta-reader will have to iterate the actual reader over all the files to query for the time range in each file. This can take a while, especially if the reader does not handle I/O if its metadata very well.
There needs to be a way to select the file series. The issue is further complicated by the fact that actual reader may itself read in a file series (so you are actually reading a series of file series). There are several ways to select the series, each with their own quirks.
- Select multiple files in the file browser
- This is basically how the current vtkFileSeriesReader does it. There are special grouping operations in the file browser to help. However, since we might be reading a series of file series, this grouping becomes difficult and perhaps impossible. You could select multiple files by hand, but this can become cumbersome again. It is also a real problem if the files are located in separate directories, which may become necessary as 10,000 files are common.
- Have a flat metadata file of filenames
- The meta-reader could read in a file that is simply a text file containing a list of filenames of all the inputs. Although inelegant, such a file is very easy to create in a text editor or by simply piping the output of ls to a file. You can also navigate the metadata file much faster if it is placed in a directory away from the actual data files.
--Biddisco 05:21, 26 November 2007 (EST) This works reasonably well. I have an ASCII <sigh> reader which loads huge time series data and takes a very long time - In the ExecuteInformation pass, the time information is dumped into a flat txt file so that the second time the file is accessed, the time (and some other information about the data) is available immediately. It would not be hard to imagine adding such a capability to the FileSeriesReader to automatically generate a flat txt file containing the required information so that on subsequent accesses it would be present already (read only file system caveat - would need a cache path for such infoprmation). Extending the concept to a formal XML metadata file would be great. If the XML is present, we read it. if not we generate it when the data has been scanned for the first time.
- Have an XML metadata file
- We could also create an XML format for pointing at other files. The XML format would allow us to place more metadata than just the filename. For example, we could put the time steps for each file to speed up the reading of all the metadata. However, now the creation of the file becomes much more difficult. In fact, to add this extra metadata would probably require some extra scripts or programs to generate it for the user.