armi.utils.textProcessors module

Utility classes and functions for manipulating text files.

armi.utils.textProcessors.SCIENTIFIC_PATTERN = '[+-]?\\d*\\.\\d+[eEdD][+-]\\d+'

Matches: * code:` 1.23e10` * code:-1.23Ee10 * code:+1.23d10 * code:` .23D10` * code:` 1.23e-10` * code:` 1.23e+1`

armi.utils.textProcessors.FLOATING_PATTERN = '[+-]?\\d+\\.*\\d*'

Matches 1, 100, 1.0, -1.2, +12.234

armi.utils.textProcessors.DECIMAL_PATTERN = '[+-]?\\d*\\.\\d+'

matches .1, 1.213423, -23.2342, +.023

class armi.utils.textProcessors.FileMark(fName, line, column, relativeTo)[source]

Bases: object

armi.utils.textProcessors._processIncludes(src, out, includes: List[Tuple[pathlib.Path, armi.utils.textProcessors.FileMark]], root, indentation=0, currentFile='<stream>')[source]

This is the workhorse of resolveMarkupInclusions and friends.

Recursively inserts the contents of !included YAML files into the output stream, keeping track of indentation and a list of included files along the way.

armi.utils.textProcessors.resolveMarkupInclusions(src: Union[TextIO, pathlib.Path], root: Optional[pathlib.Path] = None) → _io.StringIO[source]

Process a text stream, appropriately handling !include tags.

This will take the passed IO stream or file path, replacing any instances of !include [path] with the appropriate contents of the !include file.

What is returned is a new text stream, containing the contents of all of the files stitched together.

Parameters
  • src (TextIOBase or Path) – If a Path is provided, read text from there. If is stream is provided, consume text from the stream. If a stream is provided, root must also be provided.

  • root (Optional Path) – The root directory to use for resolving relative paths in !include tags. If a stream is provided for src, root must be provided. Otherwise, the directory containing the src path will be used by default.

Notes

While the use of !include appears as though it would invoke some sort of special custom YAML constructor code, this does not do that. Processing these inclusions as part of the document parsing/composition that comes with pyyaml or ruamel.yaml could work, but has a number of prohibitive drawbacks (or at least reasons why it might not be worth doing). Using a custom constructor is more-or-less supported by ruamel.yaml (which we do use, as it is what underpins the yamlize package), but it carries limitations about how anchors and aliases can cross included-file boundaries. Getting around this requires either monkey-patching ruamel.yaml, or subclassing it, which in turn would require monkey-patching yamlize.

Instead, we treat the !includes as a sort of pre-processor directive, which essentially pastes the contents of the !included file into the location of the !include. The result is a text stream containing the entire contents, with all !includes resolved. The only degree of sophistication lies in how indentation is handled; since YAML cares about indentation to keep track of object hierarchy, care must be taken that the included file contents are indented appropriately.

To precisely describe how the indentation works, it helps to have some definitions:

  • Included file: The file specified in the !include [Included file]

  • Including line: The line that actually contains the !include [Included file]

  • Meaningful YAML content: Text in a YAML file that is not either indentation or a special character like “-“, “:” or “?”.

The contents of the included file will be indented such that that the first character of each line in the included file will be found at the first column in the including line that contains meaningful YAML content. The only exception is the first line of the included file, which starts at the location of the !include itself and is not deliberately indented.

In the future, we may wish to do the more sophisticated processing of the !includes as part of the YAML parse. For future reference, there is some pure gold on that topic here: https://stackoverflow.com/questions/44910886/pyyaml-include-file-and-yaml-aliases-anchors-references

armi.utils.textProcessors._getRootFromSrc(src: Union[TextIO, pathlib.Path], root: Optional[pathlib.Path]) → pathlib.Path[source]
armi.utils.textProcessors.findYamlInclusions(src: Union[TextIO, pathlib.Path], root: Optional[pathlib.Path] = None) → List[Tuple[pathlib.Path, armi.utils.textProcessors.FileMark]][source]

Return a list containing all of the !included YAML files from a root file.

This will attempt to “normalize” relative paths to the passed root. If that is not possible, then an absolute path will be used instead. For example, if a file (A) !includes another file (B) by an absolute path, which in turn !includes more files relative to (B), all of (B)’s relative includes will be turned into absolute paths from the perspective of the root file (A).

armi.utils.textProcessors._resolveMarkupInclusions(src: Union[TextIO, pathlib.Path], root: Optional[pathlib.Path] = None) → Tuple[_io.StringIO, List[Tuple[pathlib.Path, armi.utils.textProcessors.FileMark]]][source]
class armi.utils.textProcessors.SequentialReader(filePath)[source]

Bases: object

Fast sequential reader that must be used within a with statement.

line

value of the current line

Type

str

match

value of the current match

Type

re.match

Notes

This reader will sequentially search a file for a regular expression pattern or string depending on the method used. When the pattern/string is matched/found, the reader will stop, return True, and set the attributes line and match.

This pattern makes it easy to cycle through repetitive output in a very fast manner. For example, if you had a text file with consistent chuncks of information that always started with the same text followed by information, you could do something like this:

>>> with SequentialReader('somefile') as sr:
...     data = []
...     while sr.searchForText('start of data chunk'):
...         # this needs to repeat for as many chunks as there are.
...         if sr.searchForPatternOnNextLine('some-(?P<data>\w+)-pattern'):
...             data.append(sr.match['data'])
issueWarningOnFindingText(text, warning)[source]

Add a text search for every line of the file, if the text is found the specified warning will be issued.

This is important for determining if issues occurred while searching for text.

Parameters
  • text (str) – text to find within the file

  • warning (str) – An warning message to issue.

raiseErrorOnFindingText(text, error)[source]

Add a text search for every line of the file, if the text is found the specified error will be raised.

This is important for determining if errors occurred while searching for text.

Parameters
  • text (str) – text to find within the file

  • error (Exception) – An exception to raise.

raiseErrorOnFindingPattern(pattern, error)[source]

Add a pattern search for every line of the file, if the pattern is found the specified error will be raised.

This is important for determining if errors occurred while searching for text.

Parameters
  • pattern (str) – regular expression pattern

  • error (Exception) – An exception to raise.

searchForText(text)[source]

Search the file for the next occurrence of text, and set the self.line attribute to that line’s value if it matched.

Notes

This will search the file line by line until it finds the text. This sets the attribute self.line. If the previous _searchFor* method did not match, the last line it did not match will be searched first.

Returns

matched – Boolean inidcating whether or not the pattern matched

Return type

bool

searchForPattern(pattern)[source]

Search the file for the next occurece of pattern and set the self.line attribute to that line’s value if it matched.

Notes

This will search the file line by line until it finds the pattern. This sets the attribute self.line. If the previous _searchFor* method did not match, the last line it did not match will be searched first.

Returns

matched – Boolean inidcating whether or not the pattern matched

Return type

bool

searchForPatternOnNextLine(pattern)[source]

Search the next line for a given pattern, and set the self.line attribute to that line’s value if it matched.

Notes

This sets the attribute self.line. If the previous _searchFor* method did not match, the last line it did not match will be searched first.

Returns

matched – Boolean inidcating whether or not the pattern matched

Return type

bool

_readLine()[source]
consumeLine()[source]

Consumes the line.

This is necessary when searching for the same pattern repetitively, because otherwise searchForPatternOnNextLine would not work.

class armi.utils.textProcessors.SequentialStringIOReader(stringIO)[source]

Bases: armi.utils.textProcessors.SequentialReader

Fast sequential reader that must be used within a with statement.

line

value of the current line

Type

str

match

value of the current match

Type

re.match

Notes

This reader will sequentially search a file for a regular expression pattern or string depending on the method used. When the pattern/string is matched/found, the reader will stop, return True, and set the attributes line and match.

This pattern makes it easy to cycle through repetitive output in a very fast manner. For example, if you had a text file with consistent chuncks of information that always started with the same text followed by information, you could do something like this:

>>> with SequentialReader('somefile') as sr:
...     data = []
...     while sr.searchForText('start of data chunk'):
...         # this needs to repeat for as many chunks as there are.
...         if sr.searchForPatternOnNextLine('some-(?P<data>\w+)-pattern'):
...             data.append(sr.match['data'])
__enter__()[source]

Override to prevent trying to open/reopen a StringIO object.

We don’t need to override __exit__, because it doesn’t care if closing the object fails.

class armi.utils.textProcessors.TextProcessor(fname, highMem=False)[source]

Bases: object

A general text processing object that extends python’s abilities to scan through huge files.

Use this instead of a raw file object to read data out of output files, etc.

scipat = '[+-]?\\d*\\.\\d+[eEdD][+-]\\d+'
number = '[+-]?\\d+\\.*\\d*'
decimal = '[+-]?\\d*\\.\\d+'
reset()[source]

rewinds the file so you can search through it again

errorChecking(checkForErrors)[source]
checkErrors(line)[source]
fsearch(pattern, msg=None, killOn=None, textFlag=False)[source]

Searches file f for pattern and displays msg when found. Returns line in which pattern is found or FALSE if no pattern is found. Stops searching if finds killOn first

If you specify textFlag=True, the search won’t use a regular expression (and can’t). The basic result is you get less powerful matching capabilities at a huge speedup (10x or so probably, but that’s just a guess.) pattern and killOn must be pure text if you do this.

class armi.utils.textProcessors.SmartList(f)[source]

Bases: object

A list that does stuff like files do i.e. remembers where it was, can seek, etc. Actually this is pretty slow. so much for being smart. nice idea though.

next()[source]
seek(line)[source]
close()[source]