armi.utils.textProcessors module

Utility classes and functions for manipulating text files.

armi.utils.textProcessors.SCIENTIFIC_PATTERN = '[+-]?\\d*\\.\\d+[eEdD][+-]\\d+'

Matches: * code:` 1.23e10` * code:-1.23Ee10 * code:+1.23d10 * code:` .23D10` * code:` 1.23e-10` * code:` 1.23e+1`

armi.utils.textProcessors.FLOATING_PATTERN = '[+-]?\\d+\\.*\\d*'

Matches 1, 100, 1.0, -1.2, +12.234

armi.utils.textProcessors.DECIMAL_PATTERN = '[+-]?\\d*\\.\\d+'

Matches .1, 1.213423, -23.2342, +.023

class armi.utils.textProcessors.FileMark(fName, line, column, relativeTo)[source]

Bases: object

armi.utils.textProcessors.resolveMarkupInclusions(src: Union[TextIO, Path], root: Optional[Path] = None) StringIO[source]

Process a text stream, appropriately handling !include tags.

This will take the passed IO stream or file path, replacing any instances of !include [path] with the appropriate contents of the !include file.

What is returned is a new text stream, containing the contents of all of the files stitched together.

Parameters:
  • src (StringIO or TextIOBase/Path) – If a Path is provided, read text from there. If is stream is provided, consume text from the stream. If a stream is provided, root must also be provided.

  • root (Optional Path) – The root directory to use for resolving relative paths in !include tags. If a stream is provided for src, root must be provided. Otherwise, the directory containing the src path will be used by default.

Notes

While the use of !include appears as though it would invoke some sort of special custom YAML constructor code, this does not do that. Processing these inclusions as part of the document parsing/composition that comes with ruamel.yaml could work, but has a number of prohibitive drawbacks (or at least reasons why it might not be worth doing). Using a custom constructor is more-or-less supported by ruamel.yaml (which we do use, as it is what underpins the yamlize package), but it carries limitations about how anchors and aliases can cross included-file boundaries. Getting around this requires either monkey-patching ruamel.yaml, or subclassing it, which in turn would require monkey-patching yamlize.

Instead, we treat the !includes as a sort of pre-processor directive, which essentially pastes the contents of the !included file into the location of the !include. The result is a text stream containing the entire contents, with all !includes resolved. The only degree of sophistication lies in how indentation is handled; since YAML cares about indentation to keep track of object hierarchy, care must be taken that the included file contents are indented appropriately.

To precisely describe how the indentation works, it helps to have some definitions:

  • Included file: The file specified in the !include [Included file]

  • Including line: The line that actually contains the !include [Included file]

  • Meaningful YAML content: Text in a YAML file that is not either indentation or a special character like “-”, “:” or “?”.

The contents of the included file will be indented such that that the first character of each line in the included file will be found at the first column in the including line that contains meaningful YAML content. The only exception is the first line of the included file, which starts at the location of the !include itself and is not deliberately indented.

In the future, we may wish to do the more sophisticated processing of the !includes as part of the YAML parse. For future reference, there is some pure gold on that topic here: https://stackoverflow.com/questions/44910886/pyyaml-include-file-and-yaml-aliases-anchors-references

armi.utils.textProcessors.findYamlInclusions(src: Union[TextIO, Path], root: Optional[Path] = None) List[Tuple[Path, FileMark]][source]

Return a list containing all of the !included YAML files from a root file.

This will attempt to “normalize” relative paths to the passed root. If that is not possible, then an absolute path will be used instead. For example, if a file (A) !includes another file (B) by an absolute path, which in turn !includes more files relative to (B), all of (B)’s relative includes will be turned into absolute paths from the perspective of the root file (A).

class armi.utils.textProcessors.SequentialReader(filePath)[source]

Bases: object

Fast sequential reader that must be used within a with statement.

Variables:
  • line (str) – value of the current line

  • match (re.match) – value of the current match

Notes

This reader will sequentially search a file for a regular expression pattern or string depending on the method used. When the pattern/string is matched/found, the reader will stop, return True, and set the attributes line and match.

This pattern makes it easy to cycle through repetitive output in a very fast manner. For example, if you had a text file with consistent chuncks of information that always started with the same text followed by information, you could do something like this:

>>> with SequentialReader('somefile') as sr:
...     data = []
...     while sr.searchForText('start of data chunk'):
...         # this needs to repeat for as many chunks as there are.
...         if sr.searchForPatternOnNextLine('some-(?P<data>\w+)-pattern'):
...             data.append(sr.match['data'])
issueWarningOnFindingText(text, warning)[source]

Add a text search for every line of the file, if the text is found the specified warning will be issued.

This is important for determining if issues occurred while searching for text.

Parameters:
  • text (str) – text to find within the file

  • warning (str) – An warning message to issue.

raiseErrorOnFindingText(text, error)[source]

Add a text search for every line of the file, if the text is found the specified error will be raised.

This is important for determining if errors occurred while searching for text.

Parameters:
  • text (str) – text to find within the file

  • error (Exception) – An exception to raise.

raiseErrorOnFindingPattern(pattern, error)[source]

Add a pattern search for every line of the file, if the pattern is found the specified error will be raised.

This is important for determining if errors occurred while searching for text.

Parameters:
  • pattern (str) – regular expression pattern

  • error (Exception) – An exception to raise.

searchForText(text)[source]

Search the file for the next occurrence of text, and set the self.line attribute to that line’s value if it matched.

Notes

This will search the file line by line until it finds the text. This sets the attribute self.line. If the previous _searchFor* method did not match, the last line it did not match will be searched first.

Returns:

matched – Boolean inidcating whether or not the pattern matched

Return type:

bool

searchForPattern(pattern)[source]

Search the file for the next occurece of pattern and set the self.line attribute to that line’s value if it matched.

Notes

This will search the file line by line until it finds the pattern. This sets the attribute self.line. If the previous _searchFor* method did not match, the last line it did not match will be searched first.

Returns:

matched – Boolean inidcating whether or not the pattern matched

Return type:

bool

searchForPatternOnNextLine(pattern)[source]

Search the next line for a given pattern, and set the self.line attribute to that line’s value if it matched.

Notes

This sets the attribute self.line. If the previous _searchFor* method did not match, the last line it did not match will be searched first.

Returns:

matched – Boolean inidcating whether or not the pattern matched

Return type:

bool

consumeLine()[source]

Consumes the line.

This is necessary when searching for the same pattern repetitively, because otherwise searchForPatternOnNextLine would not work.

class armi.utils.textProcessors.SequentialStringIOReader(stringIO)[source]

Bases: SequentialReader

Fast sequential reader that must be used within a with statement.

Variables:
  • line (str) – value of the current line

  • match (re.match) – value of the current match

Notes

This reader will sequentially search a file for a regular expression pattern or string depending on the method used. When the pattern/string is matched/found, the reader will stop, return True, and set the attributes line and match.

This pattern makes it easy to cycle through repetitive output in a very fast manner. For example, if you had a text file with consistent chuncks of information that always started with the same text followed by information, you could do something like this:

>>> with SequentialReader('somefile') as sr:
...     data = []
...     while sr.searchForText('start of data chunk'):
...         # this needs to repeat for as many chunks as there are.
...         if sr.searchForPatternOnNextLine('some-(?P<data>\\w+)-pattern'):
...             data.append(sr.match['data'])
class armi.utils.textProcessors.TextProcessor(fname, highMem=False)[source]

Bases: object

A general text processing object that extends python’s abilities to scan through huge files.

Use this instead of a raw file object to read data out of output files, etc.

scipat = '[+-]?\\d*\\.\\d+[eEdD][+-]\\d+'
number = '[+-]?\\d+\\.*\\d*'
decimal = '[+-]?\\d*\\.\\d+'
reset()[source]

Rewinds the file so you can search through it again.

errorChecking(checkForErrors)[source]
checkErrors(line)[source]
fsearch(pattern, msg=None, killOn=None, textFlag=False)[source]

Searches file f for pattern and displays msg when found. Returns line in which pattern is found or FALSE if no pattern is found. Stops searching if finds killOn first.

If you specify textFlag=True, the search won’t use a regular expression (and can’t). The basic result is you get less powerful matching capabilities at a huge speedup (10x or so probably, but that’s just a guess.) pattern and killOn must be pure text if you do this.