# 3. Inference Array Design

## 3.1. Purpose

This page describes the design of the `EnzoMethodInference`

Method.
The purpose of this method is to create a collection of regular arrays
(“inference arrays”), each containing a subset of block field data to
pass to an external Deep Learning (DL) inference method. After the
inference method is invoked, the intersecting leaf blocks are provided
with the pertinent output of the inference method, such as the
locations of bubbles where star formation is expected to occur. A
mock-up of inference arrays and generated bubbles is shown in the
above figures, with inference arrays indicated by squares in the left
figure, and bubble locations added in the right figure.

Some characteristics of inference arrays include the following:

inference array sizes are typically about 64^3

all inference arrays have the same size and resolution

inference array positions are a regular 3D grid

inference arrays are only created and allocated where needed

Where inference arrays are created is determined by some relatively simple (local) criteria, such as density threshold, possibly coupled with a restriction on the block’s minimum refinement level. Field data are then copied from intersecting leaf block fields to the inference arrays, using either linear restriction or prolongation.

Some assumptions we make include the following:

a given block may intersect multiple inference arrays

a given inference array may intersect multiple blocks

inference arrays are aligned with blocks in some specific AMR level,

`level_array`

inference array resolution matches that of blocks in some (finer) AMR level,

`level_infer`

.

Since inference arrays are aligned with blocks in a specific refinement level, we use the term “level array” to refer to the sparse array of inference-arrays. The level array is implemented as a sparse 3D Charm++ chare array, where each element of the chare array is a collection of inference arrays containing field data for its region.

Some synchronization and performance issues addressed in the design include the following:

multiple level array “create” requests may be received from intersecting leaf blocks, but the level array element can only be created once

inference arrays tend to be clustered, so level array elements should be distributed across compute nodes to reduce compute and memory load imbalances

level arrays elements won’t know a priori the indices of intersecting leaf blocks, so that must be determined dynamically via tree traversal

the Charm++

`doneInserting()`

method must be called on the level chare array, but only after all elements are created, so synchronization is required

## 3.2. Phases

Phases of the algorithm are enumerated below:

Evaluate: blocks apply local criteria to determine where to create inference arrays

Allocate: the “level array” chare array of inference arrays is created

Populate: inference arrays request and receive field data from intersecting leaf blocks

Apply inference: inference arrays apply the external DL inference method

Update blocks: inference arrays send results back to intersecting leaf blocks

These phases are described in detail below. Note entry method
organization prefixes are omitted below to clarify the UML sequence
diagram labeling; e.g. `EnzoBlock::p_exit()`

in the documentation
refers to `EnzoBlock::p_method_infer_exit()`

in code.

### 3.2.1. Phase 1. Evaluate

Below is a UML sequence diagram illustrating the evaluation phase in
`EnzoMethodInference`

. The left blue columns represent inference
arrays, the red right columns represent all blocks in successive
refinement levels (“B0” are all root-level blocks, “B1” all level-1
blocks, etc.), and the center yellow column represents the root-node
Simulation object, used for synchronization and counting.

In the “Evaluate” phase, blocks apply local criteria to determine
where to create inference arrays. Control enters the method at the
block level, such that `EnzoMethodInference::compute()`

is called on
all blocks, which in turn call `apply_criteria()`

.

The criteria currently implemented are whether the point-wise density is greater than the block-local average by some specified threshold. (See the Inference Parameters section for user parameters for the inference method, including density threshold).

To improve performance, this is applied only on “sufficiently fine”
level blocks, specified by `level_base`

(`level_base`

= 2 is
typical). Inference arrays are guaranteed not to intersect leaf blocks
in levels coarser than `level_base`

; conversely, all blocks (leaf or
non-leaf) in `level`

= `level_base`

that intersect inference arrays are
guaranteed to exist. This property is used for communication from the
level array to leaf blocks.

After a leaf block applies the criteria `apply_criteria()`

, if any
cells satisfy the criteria, the associated intersecting level array
elements are tagged for creation. Note there may be multiple such
elements, based on whether the block is coarser or finer than
`level_array`

(the level at which blocks and inference arrays
coincide). If there are multiple intersecting inference arrays for a
block, a logical “mask” array is used for keeping track of which
inference arrays to create. If only one inference array intersects a
leaf block, the mask size is 1.

These masks are merged toward the coarser `level_base`

level, using
the `p_merge_masks()`

entry method called on block parents. At each
step, the child masks are merged in their parent block using
logical-OR (if `level >= level_array`

) or concatenation (if ```
level <
level_array
```

).

### 3.2.2. Phase 2. Allocate

When `level_base`

is reached (level 2 in the
figure), each block in the `level_base`

level will have a mask
specifying where each inference array needs to be created. At this
step, the level array elements are created using
`p_create_level_array()`

.

The reduction operation continues with counting the number of
inference arrays created, using `p_count_arrays()`

. This continues
down to the root level blocks, which send the accumulated counts to
the root Simulation object. After all root-level block counts have
been received, the Simulation object will contain the total number of
inference arrays to be created, which is used to initialize
synchronization counters.

The count of number of inference arrays to create, determined in the previous phase, is used to determine when all level array elements have been created. (As a technicality, the count is set to one more than the count to prevent the algorithm from hanging if no level array elements need to be created, which is possible. If no inference arrays are created, the method exits immediately).

As level array elements are created, the constructor notifies the root
Simulation object via `p_level_array_created()`

, which decrements
the counter. When zero is reached, all level array elements are
guaranteed to have been created, and the Simulation object can then
finalize the chare array by calling the Charm++ “doneInserting()”
method, and proceed to the next phase.

### 3.2.3. Phase 3. Populate

After the level array chare array is created, the root Simulation
object calls `p_request_data()`

on all elements of the array. Each
level array element sends a request to the unique block in
`level_base`

that it intersects. This request is then forwarded via
child blocks to all intersecting leaf blocks using
`p_request_data()`

.

When an intersecting leaf block is reached, it serializes the required
portion of field data and sends it directly to the intersecting inference array.
Blocks coarser than `level_infer`

must interpolate the
data, which is done on the receive end; blocks finer than
`level_infer`

restrict data before sending it. The data is sent
directly to the requesting level array element using
`EnzoLevelArray::p_transfer_data()`

.

### 3.2.4. Phase 4. Apply inference

Level array elements keep track of incoming data, counting the relative volume of
incoming data until the relative volume reaches 1.0. After the last piece of data
is received and copied into the inference arrays, the level array element calls
`EnzoLevelArray::apply_inference()`

. After the DL inference
method is applied, `p_done()`

is called on the root-level Simulation
object. The root Simulation object counts down the number of calls received,
so it knows when all DL inference methods have completed.

### 3.2.5. Phase 5. Update blocks

After all DL inference methods have completed, level array elements
forward the results to the intersecting leaf blocks. This is done
using the same communication pattern as in the **populate** phase with
`p_request_data()`

, in which data is sent to the unique
`level_base`

block and forwarded to the child leaf blocks via
intersecting child blocks.

For the method to end, all blocks must call `compute_done()`

. This
is done via successive calls to `p_done()`

on the level array chare,
then the root-level simulation chare, and finally `p_exit()`

on all
blocks, which calls `compute_done()`

. This seemingly roundabout
approach is used to ensure proper synchronization. First, each level
array element sums up the block volumes of incoming `p_done()`

methods from its containing blocks. When this volume sum reaches the
volume associated with the level array element, it triggers a call to
`p_done()`

on the root-level simulation chare. The root-level
simulation chare in turn counts the number of these incoming
`p_done()`

calls from the level array chares. When the count reaches
the total number of level array chares, it triggers a call to
`p_exit()`

on all blocks, which calls `compute_done()`

, ending the
method and returning control to Cello.