armi.mpiActions module

This module provides an abstract class to be used to implement “MPI actions.”

MPI actions are tasks, activities, or work that can be executed on the worker nodes. The standard workflow is essentially that the master node creates an MpiAction, sends it to the workers, and then both the master and the workers invoke() together. For example:

Sample MPI Action Workflow

Step

Code

Notes

1

master: distributeState = DistributeStateAction()

worker: action = armi.MPI_COMM.bcast(None, root=0)

master: Initializing a distribute state action.

worker: Waiting for something to do, as determined by the master, this happens within the worker’s workerOperate().

2

master: armi.MPI_COMM.bcast(distributeState, root=0)

worker: action = armi.MPI_COMM.bcast(None, root=0)

master: Broadcasts a distribute state action to all the worker nodes

worker: Receives the action from the master, which is a DistributeStateAction.

3

master: distributeState.invoke(self.o, self.r, self.cs)

worker: action.invoke(self.o, self.r, self.cs)

Both invoke the action, and are in sync. Any broadcast or receive within the action should also be synced up.

In order to create a new, custom MPI Action, inherit from MpiAction, and override the invokeHook() method.

class armi.mpiActions.MpiAction[source]

Bases: object

Base of all MPI actions.

MPI Actions are tasks that can be executed without needing lots of other information. When a worker node sits in it’s main loop, and receives an MPI Action, it will simply call invoke().

property parallel
classmethod invokeAsMaster(o, r, cs)[source]

Simplified method to call from the master process.

This can be used in place of:

someInstance = MpiAction() someInstance = COMM_WORLD.bcast(someInstance, root=0) someInstance.invoke(o, r, cs)

Interestingly, the code above can be used in two ways:

  1. Both the master and worker can call the above code at the same time, or

  2. the master can run the above code, which will be handled by the worker’s main loop.

Option number 2 is the most common usage.

Warning

This method will not work if the constructor (i.e. __init__) requires additional arguments. Since the method body is so simple, it is strong discouraged to add a *args or **kwargs arguments to this method.

Parameters
  • o (armi.operators.Operator) – If an operator is not necessary, supply None.

  • r (armi.operators.Reactor) – If a reactor is not necessary, supply None.

_mpiOperationHelper(obj, mpiFunction)[source]

Strips off the operator, reactor, cs from the mpiAction before

broadcast(obj=None)[source]

A wrapper around bcast, on the master node can be run with an equals sign, so that it can be consistent within both master and worker nodes.

Parameters

obj – This is any object that can be broadcast, if it is None, then it will broadcast itself, which triggers it to run on the workers (assuming the workers are in the worker main loop.

See also

armi.operators.operator.OperatorMPI.workerOperate()

receives this on the workers and calls invoke

Notes

The standard bcast method creates a new instance even for the root process. Consequently, when passing an object, references can be broken to the original object. Therefore, this method, returns the original object when called by the master node, or the broadcasted object when called on the worker nodes.

gather(obj=None)[source]

A wrapper around MPI_COMM.gather.

Parameters

obj – This is any object that can be gathered, if it is None, then it will gather itself.

Notes

The returned list will contain a reference to the original gathered object, without making a copy of it.

invoke(o, r, cs)[source]

This method is called by worker nodes, and passed the worker node’s operator, reactor and settings file.

Parameters
Returns

result – result from invokeHook

Return type

object

static mpiFlatten(allCPUResults)[source]

Flatten results to the same order they were in before making a list of mpiIter results.

See also

mpiIter()

used for distributing objects/tasks

static mpiIter(objectsForAllCoresToIter)[source]

Generate the subset of objects one node is responsible for in MPI.

Notes

Each CPU will get similar number of objects. E.G. if there are 12 objects and 5 CPUs, the first 2 CPUs will get 3 objects and the last 3 CPUS will get 2.

Parameters

objectsForAllCoresToIter (list) – List of all objects that need to have an MPI calculation performed on. Note, that since len() is needed this method cannot accept a generator.

See also

mpiFlatten()

used for collecting results

invokeHook()[source]

This method must be overridden in sub-clases.

This method is called by worker nodes, and has access to the worker node’s operator, reactor, and settings (through self.o, self.r, and self.cs). It must return a boolean value of True or False, otherwise the worker node will raise an exception and terminate execution.

Returns

result – Dependent on implementation

Return type

object

armi.mpiActions.runActions(o, r, cs, actions, numPerNode=None, serial=False)[source]

Run a series of MpiActions in parallel, or in series if serial=True.

Notes

The number of actions DOES NOT need to match armi.MPI_SIZE.

Calling this method may invoke MPI Split which will change the MPI_SIZE during the action. This allows someone to call MPI operations without being blocked by tasks which are not doing the same thing.

armi.mpiActions.runActionsInSerial(o, r, cs, actions)[source]

Run a series of MpiActions in serial.

Notes

This will set the MpiAction.serial attribute to True, and the MpiAction.broadcast and MpiAction.gather methods will basically just return the value being supplied.

class armi.mpiActions.DistributionAction(actions)[source]

Bases: armi.mpiActions.MpiAction

This MpiAction scatters the workload of multiple actions to available resources.

Notes

This currently only works from the root (of COMM_WORLD). Eventually, it would be nice to make it possible for sub-tasks to manage their own communicators and spawn their own work within some sub-communicator.

This performs an MPI Split operation and takes over the armi.MPI_COMM and associated varaibles. For this reason, it is possible that when someone thinks they have distributed information to all nodes, it may only be a subset that was necessary to perform the number of actions needed by this DsitributionAction.

invokeHook()[source]

Overrides invokeHook to distribute work amongst available resources as requested.

Notes

Two things about this method make it non-recursiv

exception armi.mpiActions.MpiActionError[source]

Bases: Exception

Exception class raised when error conditions occur during an MpiAction.

class armi.mpiActions.DistributeStateAction(skipInterfaces=False)[source]

Bases: armi.mpiActions.MpiAction

invokeHook()[source]

Sync up all nodes with the reactor, the cs, and the interfaces.

Notes

This is run by all workers and the master any time the code needs to sync all processors.

_distributeSettings()[source]
_distributeReactor(cs)[source]
_distributeParamAssignments()[source]
_distributeInterfaces()[source]

Distribute the interfaces to all MPI nodes.

Interface copy description Since interfaces store information that can influence a calculation, it is important in branch searches to make sure that no information is carried forward from these runs on either the master node or the workers. However, there are interfaces that cannot be distributed, making this a challenge. To solve this problem, any interface that cannot be distributed is simply re-initialized. If any information needs to be given to the worker nodes on a non-distributable interface, additional function definitions (and likely soul searching as to why needed distributable information is on a non-distributable interface) are required to pass the information around.

armi.mpiActions._diagnosePickleError(o)[source]

Scans through various parts of the reactor to identify which part cannot be pickled.

Notes

So, you’re having a pickle error and you don’t know why. This method will help you find the problem. It doesn’t always catch everything, but it does help.

We also find that modifying the Python library as documented here tells us which object can’t be pickled by printing it out.