3. Inference Array Design

3.1. Purpose

Image showing an example of the placement of inference arrays in a small cosmology simulation

Image showing an example of the bubbles output by the inference method

This page describes the design of the EnzoMethodInference Method. The purpose of this method is to create a collection of regular arrays (“inference arrays”), each containing a subset of block field data to pass to an external Deep Learning (DL) inference method. After the inference method is invoked, the intersecting leaf blocks are provided with the pertinent output of the inference method, such as the locations of bubbles where star formation is expected to occur. A mock-up of inference arrays and generated bubbles is shown in the above figures, with inference arrays indicated by squares in the left figure, and bubble locations added in the right figure.

Some characteristics of inference arrays include the following:

inference array sizes are typically about 64^3

all inference arrays have the same size and resolution

inference array positions are a regular 3D grid

inference arrays are only created and allocated where needed

Where inference arrays are created is determined by some relatively simple (local) criteria, such as density threshold, possibly coupled with a restriction on the block’s minimum refinement level. Field data are then copied from intersecting leaf block fields to the inference arrays, using either linear restriction or prolongation.

Some assumptions we make include the following:

a given block may intersect multiple inference arrays

a given inference array may intersect multiple blocks

inference arrays are aligned with blocks in some specific AMR level, level_array

inference array resolution matches that of blocks in some (finer) AMR level, level_infer.

Since inference arrays are aligned with blocks in a specific refinement level, we use the term “level array” to refer to the sparse array of inference-arrays. The level array is implemented as a sparse 3D Charm++ chare array, where each element of the chare array is a collection of inference arrays containing field data for its region.

Some synchronization and performance issues addressed in the design include the following:

multiple level array “create” requests may be received from intersecting leaf blocks, but the level array element can only be created once

inference arrays tend to be clustered, so level array elements should be distributed across compute nodes to reduce compute and memory load imbalances

level arrays elements won’t know a priori the indices of intersecting leaf blocks, so that must be determined dynamically via tree traversal

the Charm++ doneInserting() method must be called on the level chare array, but only after all elements are created, so synchronization is required

3.2. Phases

Phases of the algorithm are enumerated below:

Evaluate: blocks apply local criteria to determine where to create inference arrays

Allocate: the “level array” chare array of inference arrays is created

Populate: inference arrays request and receive field data from intersecting leaf blocks

Apply inference: inference arrays apply the external DL inference method

Update blocks: inference arrays send results back to intersecting leaf blocks

These phases are described in detail below. Note entry method organization prefixes are omitted below to clarify the UML sequence diagram labeling; e.g. EnzoBlock::p_exit() in the documentation refers to EnzoBlock::p_method_infer_exit() in code.

3.2.1. Phase 1. Evaluate

Below is a UML sequence diagram illustrating the evaluation phase in EnzoMethodInference. The left blue columns represent inference arrays, the red right columns represent all blocks in successive refinement levels (“B0” are all root-level blocks, “B1” all level-1 blocks, etc.), and the center yellow column represents the root-node Simulation object, used for synchronization and counting.

In the “Evaluate” phase, blocks apply local criteria to determine where to create inference arrays. Control enters the method at the block level, such that EnzoMethodInference::compute() is called on all blocks, which in turn call apply_criteria().

The criteria currently implemented are whether the point-wise density is greater than the block-local average by some specified threshold. (See the Inference Parameters section for user parameters for the inference method, including density threshold).

To improve performance, this is applied only on “sufficiently fine” level blocks, specified by level_base (level_base = 2 is typical). Inference arrays are guaranteed not to intersect leaf blocks in levels coarser than level_base; conversely, all blocks (leaf or non-leaf) in level = level_base that intersect inference arrays are guaranteed to exist. This property is used for communication from the level array to leaf blocks.

After a leaf block applies the criteria apply_criteria(), if any cells satisfy the criteria, the associated intersecting level array elements are tagged for creation. Note there may be multiple such elements, based on whether the block is coarser or finer than level_array (the level at which blocks and inference arrays coincide). If there are multiple intersecting inference arrays for a block, a logical “mask” array is used for keeping track of which inference arrays to create. If only one inference array intersects a leaf block, the mask size is 1.

These masks are merged toward the coarser level_base level, using the p_merge_masks() entry method called on block parents. At each step, the child masks are merged in their parent block using logical-OR (if level >= level_array) or concatenation (if level < level_array).

3.2.2. Phase 2. Allocate

When level_base is reached (level 2 in the figure), each block in the level_base level will have a mask specifying where each inference array needs to be created. At this step, the level array elements are created using p_create_level_array().

The reduction operation continues with counting the number of inference arrays created, using p_count_arrays(). This continues down to the root level blocks, which send the accumulated counts to the root Simulation object. After all root-level block counts have been received, the Simulation object will contain the total number of inference arrays to be created, which is used to initialize synchronization counters.

The count of number of inference arrays to create, determined in the previous phase, is used to determine when all level array elements have been created. (As a technicality, the count is set to one more than the count to prevent the algorithm from hanging if no level array elements need to be created, which is possible. If no inference arrays are created, the method exits immediately).

As level array elements are created, the constructor notifies the root Simulation object via p_level_array_created(), which decrements the counter. When zero is reached, all level array elements are guaranteed to have been created, and the Simulation object can then finalize the chare array by calling the Charm++ “doneInserting()” method, and proceed to the next phase.

3.2.3. Phase 3. Populate

After the level array chare array is created, the root Simulation object calls p_request_data() on all elements of the array. Each level array element sends a request to the unique block in level_base that it intersects. This request is then forwarded via child blocks to all intersecting leaf blocks using p_request_data().

When an intersecting leaf block is reached, it serializes the required portion of field data and sends it directly to the intersecting inference array. Blocks coarser than level_infer must interpolate the data, which is done on the receive end; blocks finer than level_infer restrict data before sending it. The data is sent directly to the requesting level array element using EnzoLevelArray::p_transfer_data().

3.2.4. Phase 4. Apply inference

Level array elements keep track of incoming data, counting the relative volume of incoming data until the relative volume reaches 1.0. After the last piece of data is received and copied into the inference arrays, the level array element calls EnzoLevelArray::apply_inference(). After the DL inference method is applied, p_done() is called on the root-level Simulation object. The root Simulation object counts down the number of calls received, so it knows when all DL inference methods have completed.

3.2.5. Phase 5. Update blocks

After all DL inference methods have completed, level array elements forward the results to the intersecting leaf blocks. This is done using the same communication pattern as in the populate phase with p_request_data(), in which data is sent to the unique level_base block and forwarded to the child leaf blocks via intersecting child blocks.

For the method to end, all blocks must call compute_done(). This is done via successive calls to p_done() on the level array chare, then the root-level simulation chare, and finally p_exit() on all blocks, which calls compute_done(). This seemingly roundabout approach is used to ensure proper synchronization. First, each level array element sums up the block volumes of incoming p_done() methods from its containing blocks. When this volume sum reaches the volume associated with the level array element, it triggers a call to p_done() on the root-level simulation chare. The root-level simulation chare in turn counts the number of these incoming p_done() calls from the level array chares. When the count reaches the total number of level array chares, it triggers a call to p_exit() on all blocks, which calls compute_done(), ending the method and returning control to Cello.