armi.utils.textProcessors module
Utility classes and functions for manipulating text files.
- armi.utils.textProcessors.SCIENTIFIC_PATTERN = '[+-]?\\d*\\.\\d+[eEdD][+-]\\d+'
Matches: * code:` 1.23e10` * code:-1.23Ee10 * code:+1.23d10 * code:` .23D10` * code:` 1.23e-10` * code:` 1.23e+1`
- armi.utils.textProcessors.FLOATING_PATTERN = '[+-]?\\d+\\.*\\d*'
Matches 1, 100, 1.0, -1.2, +12.234
- armi.utils.textProcessors.DECIMAL_PATTERN = '[+-]?\\d*\\.\\d+'
Matches .1, 1.213423, -23.2342, +.023
- armi.utils.textProcessors.resolveMarkupInclusions(src: Union[TextIO, Path], root: Optional[Path] = None) StringIO [source]
Process a text stream, appropriately handling
!include
tags.This will take the passed IO stream or file path, replacing any instances of
!include [path]
with the appropriate contents of the!include
file.What is returned is a new text stream, containing the contents of all of the files stitched together.
- Parameters:
src (StringIO or TextIOBase/Path) – If a Path is provided, read text from there. If is stream is provided, consume text from the stream. If a stream is provided,
root
must also be provided.root (Optional Path) – The root directory to use for resolving relative paths in !include tags. If a stream is provided for
src
,root
must be provided. Otherwise, the directory containing thesrc
path will be used by default.
Notes
While the use of
!include
appears as though it would invoke some sort of special custom YAML constructor code, this does not do that. Processing these inclusions as part of the document parsing/composition that comes with ruamel.yaml could work, but has a number of prohibitive drawbacks (or at least reasons why it might not be worth doing). Using a custom constructor is more-or-less supported by ruamel.yaml (which we do use, as it is what underpins the yamlize package), but it carries limitations about how anchors and aliases can cross included-file boundaries. Getting around this requires either monkey-patching ruamel.yaml, or subclassing it, which in turn would require monkey-patching yamlize.Instead, we treat the
!include
s as a sort of pre-processor directive, which essentially pastes the contents of the!include
d file into the location of the!include
. The result is a text stream containing the entire contents, with all!include
s resolved. The only degree of sophistication lies in how indentation is handled; since YAML cares about indentation to keep track of object hierarchy, care must be taken that the included file contents are indented appropriately.To precisely describe how the indentation works, it helps to have some definitions:
Included file: The file specified in the
!include [Included file]
Including line: The line that actually contains the
!include [Included file]
Meaningful YAML content: Text in a YAML file that is not either indentation or a special character like “-”, “:” or “?”.
The contents of the included file will be indented such that that the first character of each line in the included file will be found at the first column in the including line that contains meaningful YAML content. The only exception is the first line of the included file, which starts at the location of the
!include
itself and is not deliberately indented.In the future, we may wish to do the more sophisticated processing of the
!include
s as part of the YAML parse. For future reference, there is some pure gold on that topic here: https://stackoverflow.com/questions/44910886/pyyaml-include-file-and-yaml-aliases-anchors-references
- armi.utils.textProcessors.findYamlInclusions(src: Union[TextIO, Path], root: Optional[Path] = None) List[Tuple[Path, FileMark]] [source]
Return a list containing all of the !included YAML files from a root file.
This will attempt to “normalize” relative paths to the passed root. If that is not possible, then an absolute path will be used instead. For example, if a file (A) !includes another file (B) by an absolute path, which in turn !includes more files relative to (B), all of (B)’s relative includes will be turned into absolute paths from the perspective of the root file (A).
- class armi.utils.textProcessors.SequentialReader(filePath)[source]
Bases:
object
Fast sequential reader that must be used within a with statement.
- Variables:
line (str) – value of the current line
match (re.match) – value of the current match
Notes
This reader will sequentially search a file for a regular expression pattern or string depending on the method used. When the pattern/string is matched/found, the reader will stop, return
True
, and set the attributesline
andmatch
.This pattern makes it easy to cycle through repetitive output in a very fast manner. For example, if you had a text file with consistent chuncks of information that always started with the same text followed by information, you could do something like this:
>>> with SequentialReader('somefile') as sr: ... data = [] ... while sr.searchForText('start of data chunk'): ... # this needs to repeat for as many chunks as there are. ... if sr.searchForPatternOnNextLine('some-(?P<data>\w+)-pattern'): ... data.append(sr.match['data'])
- issueWarningOnFindingText(text, warning)[source]
Add a text search for every line of the file, if the text is found the specified warning will be issued.
This is important for determining if issues occurred while searching for text.
- raiseErrorOnFindingText(text, error)[source]
Add a text search for every line of the file, if the text is found the specified error will be raised.
This is important for determining if errors occurred while searching for text.
See also
- raiseErrorOnFindingPattern(pattern, error)[source]
Add a pattern search for every line of the file, if the pattern is found the specified error will be raised.
This is important for determining if errors occurred while searching for text.
See also
- searchForText(text)[source]
Search the file for the next occurrence of
text
, and set theself.line
attribute to that line’s value if it matched.Notes
This will search the file line by line until it finds the text. This sets the attribute
self.line
. If the previous_searchFor*
method did not match, the last line it did not match will be searched first.- Returns:
matched – Boolean inidcating whether or not the pattern matched
- Return type:
- searchForPattern(pattern)[source]
Search the file for the next occurece of
pattern
and set theself.line
attribute to that line’s value if it matched.Notes
This will search the file line by line until it finds the pattern. This sets the attribute
self.line
. If the previous_searchFor*
method did not match, the last line it did not match will be searched first.- Returns:
matched – Boolean inidcating whether or not the pattern matched
- Return type:
- searchForPatternOnNextLine(pattern)[source]
Search the next line for a given pattern, and set the
self.line
attribute to that line’s value if it matched.Notes
This sets the attribute
self.line
. If the previous_searchFor*
method did not match, the last line it did not match will be searched first.- Returns:
matched – Boolean inidcating whether or not the pattern matched
- Return type:
- class armi.utils.textProcessors.SequentialStringIOReader(stringIO)[source]
Bases:
SequentialReader
Fast sequential reader that must be used within a with statement.
- Variables:
line (str) – value of the current line
match (re.match) – value of the current match
Notes
This reader will sequentially search a file for a regular expression pattern or string depending on the method used. When the pattern/string is matched/found, the reader will stop, return
True
, and set the attributesline
andmatch
.This pattern makes it easy to cycle through repetitive output in a very fast manner. For example, if you had a text file with consistent chuncks of information that always started with the same text followed by information, you could do something like this:
>>> with SequentialReader('somefile') as sr: ... data = [] ... while sr.searchForText('start of data chunk'): ... # this needs to repeat for as many chunks as there are. ... if sr.searchForPatternOnNextLine('some-(?P<data>\\w+)-pattern'): ... data.append(sr.match['data'])
- class armi.utils.textProcessors.TextProcessor(fname, highMem=False)[source]
Bases:
object
A general text processing object that extends python’s abilities to scan through huge files.
Use this instead of a raw file object to read data out of output files, etc.
- scipat = '[+-]?\\d*\\.\\d+[eEdD][+-]\\d+'
- number = '[+-]?\\d+\\.*\\d*'
- decimal = '[+-]?\\d*\\.\\d+'
- fsearch(pattern, msg=None, killOn=None, textFlag=False)[source]
Searches file f for pattern and displays msg when found. Returns line in which pattern is found or FALSE if no pattern is found. Stops searching if finds killOn first.
If you specify textFlag=True, the search won’t use a regular expression (and can’t). The basic result is you get less powerful matching capabilities at a huge speedup (10x or so probably, but that’s just a guess.) pattern and killOn must be pure text if you do this.