Index of /download/dist/gasnet/ofi-conduit

[ICO]NameLast modifiedSize

[PARENTDIR]Parent Directory  -
[DIR]contrib/2022-10-28 13:56 -
[TXT]license.txt2022-10-28 13:54 2.8K
[TXT]gasnet_ofi.h2022-10-28 13:54 8.9K
[TXT]gasnet_ofi.c2022-10-28 13:54 99K
[TXT]gasnet_extended_fwd.h2022-10-28 13:54 3.6K
[TXT]gasnet_extended.c2022-10-28 13:54 10K
[TXT]gasnet_core_internal.h2022-10-28 13:54 7.3K
[TXT]gasnet_core_help.h2022-10-28 13:54 626
[TXT]gasnet_core_fwd.h2022-10-28 13:54 7.5K
[TXT]gasnet_core.h2022-10-28 13:54 5.5K
[TXT]gasnet_core.c2022-10-28 13:54 38K
[   ]conduit.mak.in2022-10-28 13:54 2.0K
[TXT]README2022-10-28 13:54 34K
[   ]Makefile.in2022-10-28 13:55 32K
[   ]Makefile.am2022-10-28 13:54 5.1K

GASNet ofi-conduit documentation
Copyright 2015-2017, Intel Corporation
Portions copyright 2018-2022, The Regents of the University of California.

User Information:
----------------
This is the README for ofi-conduit.

OpenFabrics Interfaces (OFI) is a framework focused on exporting fabric
communication services to applications. OFI is best described as a collection of
libraries and applications used to export fabric services. 

See more details at: https://ofiwg.github.io/libfabric/

This conduit is feature-complete and passes tests in recommended configurations.
Performance tuning is planned as future work.

Use of ofi-conduit is only recommended on certain networks (see next section).
Therefore, it is disabled by default in most systems.  It can be enabled
explicitly at configure time using either `--enable-ofi` or
`--enable-ofi=probe`.  Either option will enable ofi-conduit if `configure`
locates the prerequisites.  The difference is that the first option makes
failure to locate the prerequisites fatal in `configure`.

Where this conduit runs:
-----------------------

The GASNet OFI conduit can run on any OFI provider that matches its
requirements. However, at this time, it is recommended *only* for the
following networks.

  + Intel(R) Omni-Path Fabric  via the "psm2" provider
  + HPE Slingshot-10 (100 Gb/s ConnectX-5 NIC), via the "verbs;ofi_rxm" provider
  + HPE Slingshot-11 (200 Gb/s Cassini NIC), via the "cxi" provider

The following providers have been used successfully on networks where other
GASNet conduits are a more appropriate choice. Therefore we do not recommend
use of these providers in general.

  + TCP, via the "sockets" provider
  + UDP, via "udp;ofi_rxd", under libfabric 1.7 and higher
  + Cray Aries, via the "gni" provider, under libfabric 1.14 and higher

The following providers have not met our expectations (with respect to
performance or stability) when tested on the listed networks and therefore are
not currently recommended at all.

  + TCP, via "tcp;ofi_rxm"
  + InfiniBand, via "verbs;ofi_rxm"

The ofi-conduit is currently known to lack support for the following providers
as of libfabric v1.14.0. If these providers are important to you, please
contact us at [email protected] .

  + "psm1" provider for the (end-of-life) TrueScale products
  + "opx" provider, an alternative to "psm2" for the Omni-Path fabric

The fi_info command installed with libfabric can be used to query the available
OFI providers on your system.  It is worth noting that the available providers
on a cluster compute node may differ from those on a front-end node.

Users with InfiniBand hardware should use the GASNet ibv-conduit, rather than
ofi-conduit with the verbs OFI provider, as the former may provide better
performance.

Users with Ethernet hardware are encouraged to consider the GASNet udp-conduit
as an alternative to ofi-conduit with the sockets or udp;ofi_rxd providers, as
the former may provide more favorable performance.

Users with Cray XC hardware are encouraged to consider the GASNet aries-conduit
as an alternative to ofi-conduit with the gni provider, since the former may
provide better performance.  Additionally, while the GASNet maintainers have
seen good results from preliminary testing of this provider with GASNet's
ofi-conduit, nobody has thoroughly validated it.  Therefore, use of the gni
provider with ofi-conduit should be considered experimental.

The ofi-conduit includes support for memory kinds (`GEX_MK_CLASS_CUDA_UVA` and
`GASNET_HAVE_MK_CLASS_HIP`) for certain providers and subject to minimum
versions of libfabric.  See docs/memory_kinds.md in the GASNet-EX sources for
details.

OFI provider requirements to run this conduit:
-------------------------------------
Endpoint type: RDM
Capabilities: FI_RMA and FI_MSG; plus FI_HMEM if using `gex_MK_Create()`
Secondary Capabilities: FI_MULTI_RECV
Memory Registration Mode:
    * API 1.4 or earlier: FI_MR_SCALABLE (preferred) or FI_MR_BASIC
    * API 1.5 or newer: all or none (preferred) of FI_MR_{ALLOCATED,VIRT_ADDR,PROV_KEY}
    * There is currently no support for providers that require FI_LOCAL_MR / FI_MR_LOCAL
Other Modes:
    * Supports providers that require FI_MR_ENDPOINT for FI_RMA endpoints
    * Supports providers that require FI_CONTEXT for FI_MSG endpoints
    * There are currently no operations which would require supporting FI_RX_CQ_DATA
    * There are currently no operations which would require supporting FI_ASYNC_IOV
    * There is currently no support for providers that require FI_CONTEXT for FI_RMA endpoints
    * There is currently no support for providers that require FI_CONTEXT2
    * There is currently no support for providers that require FI_MSG_PREFIX
    * There is currently no support for providers that require FI_RESTRICTED_COMP
Threading mode: FI_THREAD_SAFE and/or FI_THREAD_DOMAIN
    * In GASNET_{SEQ,PARSYNC} mode, all providers use FI_THREAD_DOMAIN as only one thread makes
      calls into the GASNet library.
    * By default in GASNET_PAR mode, FI_THREAD_DOMAIN is used only for the psm2 provider (which
      does not currently support FI_THREAD_SAFE) while all other providers use FI_THREAD_SAFE.
      See the description of --{en,dis}able-ofi-thread-domain, below, for non-default behaviors.

Building ofi-conduit
--------------------

Libfabric is a core component of OFI. To use ofi-conduit, the user firsts needs to 
locate or build an install of libfabric. The source code of libfabric can be found at:

   https://github.com/ofiwg/libfabric

To build GASNet with ofi-conduit enabled, the minimum requirement is
  [path to]/configure --enable-ofi

The following configure options are also available:

    * --with-ofi-home=INSTALL_PREFIX
      Used to provide the libfabric install prefix.
      If not used, the default is based on the location of fi_info in $PATH.
      If fi_info is not found in $PATH, then this option is required.

    * --with-ofi-provider=PROVIDER_NAME
      Since different providers expose different feature sets, ofi-conduit will
      use compile-time knowledge of the intended provider to elide unneeded
      code.  This option can be used to name the provider to optimize for.
      If this option is not present, the configure script will attempt to detect
      an available provider using the fi_info utility, if available.
      If fi_info is not available, or no supported provider is listed by fi_info,
      or if the argument to this option is "generic", then provider features
      will be detected at runtime. While this is flexible as to which providers
      it supports, it will add branches in critical paths that may increase
      software overheads.  This case is referred to as "generic provider" below.

      Note that some providers, such as "verbs;ofi_rxm" include a semi-colon in
      their name.  To avoid issues with shell quoting, the configure script will
      allow substitution of a colon for the semi-colon (e.g. "verbs:ofi_rxm") as
      well as use of just the first word (e.g. "verbs") if no ambiguity would
      result.

    * --with-ofi-num-completions=VALUE
      Specifies the maximum number of transmit completions to be read from the
      transmit CQ during each call to the polling function.
      Default: 64

    * --with-ofi-max-medium=VALUE
      This option specifes the default value of the environment variable
      GASNET_OFI_MAX_MEDIUM, which in turn determines the value returned
      by gex_AM_LUB{Request,Reply}Medium().
      The value cannot be less than 512.
      Default: 8192

The following configure flags override options set by provider selection via
auto-detection or by the --with-ofi-provider=PROVIDER_NAME option. These are not
recommended to be used as tuning options.

    * --enable-ofi-thread-domain
      Forces the use of FI_THREAD_DOMAIN in GASNET_PAR mode.
      This results in one global lock to protect all calls into libfabric.
      This is the default for psm2 provider.
    * --disable-ofi-thread-domain
      Forces the use of FI_THREAD_SAFE in GASNET_PAR mode.
      This is the default for most providers and for the generic provider case.

      In GASNET_SEQ and GASNET_PARSYNC modes, the two options above have no
      effect.

    * --enable-ofi-mr-scalable
      Indicates that FI_MR_SCALABLE memory registration support will be
      compiled into the conduit, and FI_MR_BASIC support will not.
      This is the default (and recommended) option for most providers.
    * --disable-ofi-mr-scalable
      Indicates that FI_MR_BASIC memory registration support will be compiled
      into the conduit, and FI_MR_SCALABLE support will not.
      Currently only the "verbs;ofi_rxm" provider defaults to use of
      FI_MR_BASIC.

      For the generic provider case, both FI_MR_SCALABLE and FI_MR_BASIC
      memory registration support are compiled and the selection is made at
      runtime.

    * --enable-ofi-multi-cq
      Indicates that the provider requires the use of distinct completion
      queues for the transmit completions of each endpoint.
      This is the default option for the cxi provider.
    * --disable-ofi-multi-cq
      Indicates that the provider supports the use of a single completion
      queue for the transmit completions of multiple endpoints.
      This is the default (and recommended) option for most providers.

      For the generic provider case, the logic for Cq creation and polling
      adapts at runtime to use a single completion queue if possible, and uses
      multiple Cqs only if necessary.

    * --enable-ofi-retry-recvmsg
      Enables logic in ofi-conduit to handle the provider returning `-FI_EAGAIN`
      from `fi_recvmsg()`.
      This is the default option for the cxi and udp;ofi_rxd providers and for
      the generic provider case.
    * --disable-ofi-retry-recvmsg
      Disable logic to handle `-FI_EAGAIN` returns from `fi_recvmsg()`.
      This is the default (and recommended) option for most providers.


The default spawner to be used by the gasnetrun_ofi utility can be
selected by configuring '--with-ofi-spawner=VALUE', where VALUE is one
of 'mpi', 'pmi' or 'ssh'.  If this option is not used, mpi is the
default when available, and ssh otherwise.
Here are some things to consider when selecting a default spawner:
  + The choice of spawner only affects the protocol used for parallel job
    setup and teardown; in particular it is NOT used to implement any part
    of the steady-state GASNet communication operations. As such, the
    selected protocol needs to be stable and co-exist with GASNet
    communication, but its performance efficiency is usually not a
    practical consideration.
  + mpi-spawner is the default when MPI is available precisely because it
    is so frequently present on systems where GASNet is to be installed.
    Additionally, very little (if any) configuration is required and the
    behavior is highly reliable.
  + pmi-spawner uses the same "Process Management Interface" which forms
    the basis for many mpirun implementations.  When support is available,
    this spawner can be as easy to use and as reliable as mpi-spawner, but
    without the overheads of initializing an MPI runtime.
  + ssh-spawner depends only on the availability of a remote shell command
    such as ssh.  For this reason ssh-spawner support is always compiled.
    However, it can be difficult (or impossible) to use on a cluster which
    was not setup to allow ssh to (and among) its compute nodes.
For more information on configuration and use of these spawners, see
   README-{ssh,mpi,pmi}-spawner (installed)
or
   other/{ssh,mpi,pmi}-spawner/README (source).

Depending on the libfabric provider in use, there may be restrictions on how
mpi-based spawning is used. For instance, the psm2 provider has the property
that each process may only open the network adapter once. Additionally, if the
MPI implementation also uses libfabric for communication, then there is a risk
it may adjust settings in a way incompatible with ofi-conduit. If you wish to
use mpi-spawner, please consult its README for advice on how to set your
MPIRUN_CMD to use native TCP/IP to avoid these potential problems.

Job Spawning
------------

If using UPC++, Chapel, etc. the language-specific commands should be used
to launch applications.  Otherwise, applications can be launched using
the gasnetrun_ofi utility:
  + usage summary:
    gasnetrun_ofi -n <n> [options] [--] prog [program args]
    options:
      -n <n>                 number of processes to run (required)
      -N <N>                 number of nodes to run on (not supported by all MPIs)
      -E <VAR1[,VAR2...]>    list of environment vars to propagate
      -v                     be verbose about what is happening
      -t                     test only, don't execute anything (implies -v)
      -k                     keep any temporary files created (implies -v)
      -spawner=(ssh|mpi|pmi) force use of a specific spawner (if available)

There are as many as three possible methods (ssh, mpi and pmi) by which one
can launch an ofi-conduit application.  Ssh-based spawning is always
available, and mpi- and pmi-based spawning are available if the respective
support was located at configure time.  The default is established at
configure time (see section "Building ofi-conduit", above).

To select a non-default spawner one may either use the "-spawner=" command-
line argument or set the environment variable GASNET_OFI_SPAWNER to "ssh",
"mpi" or "pmi".  If both are used, then the command line argument takes
precedence.

Recognized environment variables:
---------------------------------

* GASNET_OFI_SPAWNER
  To override the default spawner for ofi-conduit jobs, one may set this
  environment variable as described in the section "Job Spawning", above.
  There are additional settings which control behaviors of the various
  spawners, as described in the respective READMEs (listed in section
  "Building ofi-conduit", above).

* GASNET_QUIET - set to 1 to silence the startup warning indicating
  the provider in use may deliver suboptimal performance.

* GASNET_OFI_RMA_POLL_FREQ - In order to ensure efficient progress, the conduit polls the RMA
  transmit completion queue once for every GASNET_OFI_RMA_POLL_FREQ RMA injections. Default: 32

* GASNET_OFI_NUM_BBUFS, GASNET_OFI_BBUF_SIZE, GASNET_OFI_BBUF_THRESHOLD - See the
  "Non-bulk, Non-blocking Put Functions" section for detail on these environment variables.

* GASNET_OFI_NUM_RECEIVE_BUFFS and GASNET_OFI_RECEIVE_BUFF_SIZE - Active Message
  receive resources. These two settings control, respectively, the number and
  size of the "multi-receive" buffers allocated for the reception of in-bound
  AMs.  Depending on the given provider's implementation of `FI_MULTI_RECV`,
  increasing these may improve AM throughput or have no impact.  On some
  providers, reducing these may severely reduce AM throughput or even lead to
  crashes.  Defaults: 8 and 1M
  See also: Bug 4179 and Bug 4461 under "Known problems"

* GASNET_OFI_AM_INJECT_LIMIT - limit on size of fi_inject() payload.
  GASNet Active Messages of total size (at the ofi level) no larger than this
  parameter value will be sent using fi_inject(), and larger messages are sent
  using fi_send().  Use of fi_inject() reduces the overall complexity of the
  operation by ensuring the data is consumed synchronously, usually at the
  cost of an extra copy.  Meanwhile, fi_send() consumes the data asynchronously,
  eliminating the extra copy at the expense of signaling an asynchronous local
  completion.  This setting does not affect payload local completion semantics
  at the GASNet API level.
  It too large a value is requested, it will be reduced to the maximum which
  the OFI provider can support.
  Default: largest size supported by the provider.
  NOTE: If this parameter is not set, then "GASNET_OFI_INJECT_LIMIT" is
  accepted as an alias.

* GASNET_OFI_RMA_INJECT_LIMIT - limit on size of RMA with FI_INJECT
  GASNet RMA put operations no larger than this size may be issued using the
  `FI_INJECT` flag to ensure the data is consumed synchronously.  This can
  reduce the overal complexity of operations using `GEX_EVENT_NOW`, typically
  at cost of an extra copy.  This setting does not affect local completion
  semantics at the GASNet API level.
  It too large a value is requested, it will be reduced to the maximum which
  the OFI provider can support.
  Default: largest size supported by the provider.

* GASNET_OFI_TX_CQ_SIZE and GASNET_OFI_RX_CQ_SIZE - limits on completion queue
  length. These settings control the length of the completion queues (CQs) used
  for transmit and receive operations, respectively.  The default of zero allows
  the provider to determine the length. Use of this default is recommended unless
  debugging a possible Cq overrun.

* GASNET_OFI_MAX_REQUEST_BUFFS and GASNET_OFI_MAX_REPLY_BUFFS - limits on the
  number of buffers allocated for the construction of out-bound AM Requests and
  Replies, respectively. The default for both settings is 1024.

* GASNET_OFI_NUM_INITIAL_REQUEST_BUFFS and GASNET_OFI_NUM_INITIAL_REPLY_BUFFS -
  the number of buffers to allocate at startup for the construction of
  out-bound AM Requests and Replies, respectively. If these values are lower
  than the corresponding "MAX" values (described immediately above) then the
  difference *may* be allocated dynamically, but only as needed.
  The default for each setting is the smaller of 256 or the respective MAX.
  In other words: with all defaults, 256 buffers are allocated to each pool and
  each poll is permitted to grow as large as 1024 buffers if the demand exists.

* GASNET_OFI_MAX_MEDIUM - maximum AM Medium payload size
  This setting determines the size of buffers used for AM Mediums, and thus the
  return value of the gex_AM_LUB{Request,Reply}Medium() queries.
  The value cannot be less than 512.
  Unless a different value was set using --with-ofi-max-medium=[value] at
  configure time, the default value is 8192.

* GASNET_OFI_DEVICE
  By default, ofi-conduit will open and use the first device enumerated by
  libfabric as matching the specified provider and required capabilities.
  This setting can be used to specify a device to be used in place of this
  default.
  See also 'GASNET_OFI_DEVICE_*', immediately below.

* GASNET_OFI_DEVICE_*
  The environment variable 'GASNET_OFI_DEVICE', described immediately above,
  provides only a single setting and unless one uses some external means to
  give per-process settings this cannot provide per-process control.  This
  can make it difficult to get the best performance from multi-rail systems
  with multiple processes per node and architectural locality properties that
  affect PCI/adapter access efficiency.
  However, if 'hwloc' is detected at configure time, then it is possible to
  give ofi-conduit values for 'GASNET_OFI_DEVICE' which will vary per-process
  based on cpu-binding and machine topology information as follows.
   1. The variable 'GASNET_OFI_DEVICE_TYPE' names an object type using hwloc's
      terminology, with the default being "Socket" (aka "Package").
      If the value is "None" (case-insensitive) then the logic below is
      disabled and the value of 'GASNET_OFI_DEVICE' is used by all processes.
   2. GASNet queries the set of objects of the given type which intersect the
      process's cpuset, to construct a variable name 'GASNET_OFI_DEVICE_[suff]'
      where '[suff]' is an underscore-delimited ordered list of logical
      object ids.  For example, with the default object type, a process bound
      only to cores in the first socket would have a variable name of
      'GASNET_OFI_DEVICE_0'.  Meanwhile, if the cpuset spans sockets 0 and 1
      (such as for an unbound process on a 2-socket system) then the variable
      'GASNET_OFI_DEVICE_0_1' is used.
   3. If the environment variable determined in step 2 is set, then it is
      used.  Otherwise the un-suffixed 'GASNET_OFI_DEVICE' is used.
  As a concrete example, on OLCF's Crusher there are four NICs -- one per NUMA
  Node.  Examining the corresponding Quick Start Guide one can construct the
  following settings to ensure processes bound to cores in a single NUMA Node
  will use the topologically nearest NIC:
      GASNET_OFI_DEVICE_TYPE=Node
      GASNET_OFI_DEVICE_0=cxi2
      GASNET_OFI_DEVICE_1=cxi1
      GASNET_OFI_DEVICE_2=cxi3
      GASNET_OFI_DEVICE_3=cxi0
  These specific recommendations are appropriate to the specific composition
  of a node of OLCF's Crusher, and should not be considered as generic advice
  for use of all multi-NIC systems.
  Of course, even on the same system, your mileage may vary.
  By default 'GASNET_OFI_DEVICE_TYPE' is "Socket" and all other variables in
  this family are unset.

* FI_PROVIDER - set to a provider name to override the default OFI provider selection
  Note that only in the "generic provider" case (described with the configure
  options) can this actually be used to change the provider to one different
  than selected at configure time.

* FI_UNIVERSE_SIZE - This libfabric variable conveys the number of ranks /
  peers a given process endpoint expects to communicate with (default: 256).
  This setting is not directly consumed by GASNet ofi-conduit, but affects the
  behavior of the underlying providers it relies upon.  Libfabric documentation
  encourages users to set this high enough to prevent internal CQ overruns for
  non-scalable communication patterns, but not so high as to waste
  provider-internal memory.  Unfortunately at the time of this writing the
  effect appears to be highly provider-specific, ranging from no apparent
  effect on some providers (sockets), to the intended scaling effect on others
  (verbs), and even outright crashes on other providers (cxi).

* FI_PSM2_LOCK_LEVEL - This variable only applies to the psm2 provider and only
  is available in libfabric v1.5.0. This environment variable controls the
  internal locking state of the provider and can be set to the following three
  values:
    - FI_PSM2_LOCK_LEVEL=0: All locks inside of the provider will be disabled.
    - FI_PSM2_LOCK_LEVEL=1: Some locks inside of the provider will be disabled,
      and is suitable for programs that limit the access to each PSM2 context to
      a single thread.
    - FI_PSM2_LOCK_LEVEL=2: All locks inside of the provider are enabled.
  The default setting of this variable is 2. The original author recommends using
  a value of 0 when using GASNet in GASNET_SEQ mode and a value of 1 when running
  in GASNET_PARSYNC mode. For GASNET_PAR mode, a value of 1 should only be used
  if GASNet was configured with --enable-ofi-thread-domain, which only will
  allow a single thread to make calls into libfabric at a time. Otherwise, the
  default value of 2 should be used. This variable should be used by power users
  only and should be used at your own risk.

* All the environment variables provided by libfabric (see `fi_info -e`)
  Note that many of these influence provider-specific behaviors, and in some
  cases setting non-default values may negatively affect the operation of
  ofi-conduit.

* The environment variables described in the "Non-bulk, Non-blocking Put Functions" 
  section of this file.

* All the standard GASNet environment variables (see top-level README)

Non-bulk, Non-blocking Put Functions
------------------------------------
For some calls to gex_RMA_Put{NB,NBI}() GASNet-EX requires the source buffer to be able to be reused as
soon as the function returns. The ofi-conduit implements this by a hybrid approach that uses
the FI_INJECT flag, bounce buffers, and blocking operations.

* If the nbytes parameter is less than or equal to the chosen provider's inject_size (see the 
  fi_endpoint(3) man page), the FI_INJECT flag will be used.
* If the nbytes parameter is greater than the inject size but less than or equal to a user specifiable
  threshold (default is 4 times the bounce buffer size) then bounce buffers will be used.
* If the nbytes parameter is greater than the threshold, the operation will be implemented as a 
  fully-blocking operation.

The following GASNet statistics counters show the number of times each of these code paths are entered:
NB_PUT_INJECT, NB_PUT_BOUNCE, NB_PUT_BLOCK.

The following environment variables may be used to tune this behavior:
* GASNET_OFI_BBUF_SIZE - The size of each bounce buffer. Default is GASNET_PAGESIZE.
* GASNET_OFI_NUM_BBUFS - The number of bounce buffers to be allocated at initialization. Default: 64
* GASNET_OFI_BBUF_THRESHOLD - Payload sizes above GASNET_OFI_BBUF_THRESHOLD will be transferred
  as a blocking operation. This is useful for when the overheads of bounce buffering become too great.
  Default: (4 * GASNET_OFI_BBUF_SIZE).
* The following condition must hold: GASNET_OFI_NUM_BBUFS >= GASNET_OFI_BBUF_THRESHOLD/GASNET_OFI_BBUF_SIZE
* These defaults are chosen to optimize for a 256K L2 cache, assuming 4K pages. It is recommended to 
  modify these variables so that GASNET_PAGESIZE * GASNET_OFI_NUM_BBUFS == L2-cache-size.
* These parameters only apply to gex_RMA_Put{NB,NBI}() with (lc_opt != GEX_EVENT_DEFER).
  They do not apply to gets, nor to blocking or value-based puts.

Known problems:
---------------

* Bug 4179
  In libfabric 1.11 and earlier, the verbs;ofi_rxm provider (used, for
  instance, on the Slingshot-10 network of some HPE Cray EX systems) has been
  observed to experience corruption of AMs, as at least sometimes diagnosed
  with a fatal error message like the following:
    GASNet node [...] received an AM message from node [...] for a handler
    index with no associated AM handler function registered
  At the time of this writing, the best known work-around is to update
  libfabric to version 1.12 or newer.  However, if that is not possible, then
  an alternative is to set
    GASNET_OFI_RECEIVE_BUFF_SIZE=single
  in your environment.  This setting automatically uses a receive buffer size
  that ensures only a single AM will be received per "multi-receive" buffer,
  and the default for GASNET_OFI_NUM_RECEIVE_BUFFS is adjusted to some
  unspecified larger value to compensate.
  Empirical evidence shows this prevents the failures in applications which
  exhibit them (which is *not* every application).  However, this measurably
  increases the latency of small AM operations.  So, this should only be used
  if it is suspected that bug 4179 is impacting your application and you cannot
  update libfabric to 1.12 or newer.
  The most up-to-date information on this bug is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4179

* Bug 4267
  Multi-threaded applications in which one or more threads are using GASNet
  concurrent with another thread calling gasnet_exit() may experience "random
  crashes" as the exiting thread destroys resources in use by the others.
  This includes otherwise benign calls to gasnet_AMPoll().
  The most up-to-date information on this bug is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4267

* Bug 4413
  At least cxi provider is known to crash at initialization if the environment
  variable "FI_UNIVERSE_SIZE" is set.  However, this is a supported variable in
  some providers, and is even *recommended* for use with verbs;ofi_rxm in runs
  with more than 256 processes.
  The most up-to-date information on this bug is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4413

* Bugs 4420 and 4503
  At least verbs;ofi_rxm and cxi providers are known to fail when using
  read-only memory as the source of an RMA Put and/or AM Long operation.
  The most up-to-date information on these bugs are maintained at the follow:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4420 (verbs)
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4503 (cxi)

* Bug 4422
  Multi-threaded applications run on a single core when using psm2 provider,
  unless some action is taken to pin the application to subset of the cores.
  The most up-to-date information on this bug is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4422

* Bug 4427
  The "RDMADISSEM" barrier, used as the default in an RDMA-capable conduit
  without a conduit-specific barrier, has been observed to be incorrect when
  used with at least the sockets and udp;ofi_rxd providers.
  To be cautious, the less efficient "AMDISSEM" barrier is currently the default
  in ofi-conduit, *independent* of provider.
  The most up-to-date information on this bug is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4427

* Bug 4461
  Under certain AM traffic patterns, the cxi provider (for the Slingshot-11
  network on some HPE Cray EX systems) has been observed to fail in various
  ways, including crashes and dropped messages.
  At the time of this writing, the best known work-around is to set
    GASNET_OFI_RECEIVE_BUFF_SIZE=single
  in your environment.  This setting automatically uses a receive buffer size
  that ensures only a single AM will be received per "multi-receive" buffer,
  and the default for GASNET_OFI_NUM_RECEIVE_BUFFS is adjusted to compensate.
  Empirical evidence shows this prevents the failures in applications which
  exhibit them (which is *not* every application).  However, this measurably
  increases the latency of small AM operations.  So, this should only be used
  if it is suspected that bug 4461 is impacting your application.
  The most up-to-date information on this bug is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4461

* Bug 4507
  Currently, ofi-conduit's implementation of RMA and AM Long do not correctly
  honor the `max_msg_size` of the libfabric endpoint.  This can lead to fatal
  errors for very large RMA operations or AM Long payloads.  Currently, the
  the smallest known limit of any provider is 1GiB in a single transfer.
  There are no known work-arounds other than for the client code to break such
  transfers into smaller sizes.
  The most up-to-date information on this bug, including the observed limits
  for some providers, is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4507

* See docs/memory_kinds.md in the GASNet-EX sources for current information
  regarding known issues with respect to memory kinds support in ofi-conduit.

* See the GASNet Bugzilla server for details on additional known bugs:
  https://gasnet-bugs.lbl.gov/

* Limits to MPI interoperability
  Depending on the libfabric provider in use, it may not be possible to have both
  MPI and GASNet using the native network API in the same application.  There
  are known issues with the psm2 and cxi providers, though the latter's issue
  also represents a general concern.

  The psm2 provider has the property that each process may only open
  the network adapter once.  If you wish to use MPI and GASNet in the same
  application on the Intel Omni-Path fabric, then there are two options:
    1. GASNet may be configured to use an alternative transport.
       Options include mpi- and udp-conduits.
    2. MPI may be configured to use an alternative transport (most likely TCP).
  The relative performance implications of these options depends strongly on the
  properties of each application.

  The cxi provider requires use of `FI_` environment variables to configure the
  provider to reliably handle Active Messages under heavy loads.  It has been
  observed that HPE Cray MPI uses the same settings, but in a way which is
  not compatible with ofi-conduit's needs.  As a result, initializing MPI
  prior to GASNet will yield less reliable Active Messages and possible hangs
  or crashes as a result.  Therefore, when an ofi-conduit application is
  launched using `srun`, care is taken to avoid initializing MPI as the spawer.
  To ensure those automatic measures are sufficient, we have the following
  recommendations for use of ofi-conduit on the HPE Cray EX platform:
    + Do not use MPI for job launch:
      Configure using `--with-ofi-spawner=pmi --with-pmi-version=cray` to
      ensure the `gasnetrun_ofi` script defaults to use of PMI-based job
      spawning, and that configure will fail if the necessary Cray PMI library
      cannot be located.  This latter point is important even if one is not
      using the `gasnetrun_ofi` script.
    + Hybrid applications should initialize GASNet before MPI.
  The most up-to-date information on this issue is maintained at:
    https://gasnet-bugs.lbl.gov/bugzilla/show_bug.cgi?id=4455

  With the gni provider, when Cray MPI is initialized prior to GASNet (including
  via use of MPI for job launch), a fatal error is seen at startup which
  includes the following text:
    fi_domain failed: -16(Device or resource busy)
  Meanwhile, hybrid applications which initialize GASNet before MPI have been
  observed to run correctly.
  The same recommendations given above for cxi provider apply to gni provider.

  As noted above, the specific issue with the cxi provider also represents a
  more general concern that an MPI implementation may configure libfabric in
  a way which is incompatible with GASNet's ofi-conduit OR that GASNet's
  settings could cause problems for the MPI implementation.  However, there are
  currently no known cases of either such interaction other than with the cxi
  and gni providers as described above.

Future work:
------------

The OFI community is working on increasing the number of providers that can support GASNet.

* Restore "RDMADISSEM" as the default barrier for those providers where it is
  safe to do so (see bug 4427) or deploy a better alternative (perhaps via
  FI_COLLECTIVE)

* Implement AMLong via target-side reassembly using fi_writedata()

* Implement asynchronous LC support for Long, most likely as part of the
  work to implement target-side reassembly.

* Improve non-trivial support for `GEX_FLAG_IMMEDIATE` with AM Long, once
  target-side reassembly removes the "sync" of "put-sync-send".

* Improve asynchronous LC support for Put.  Like libibverbs, libfabric
  does not provide distinct events for local versus remote completion.
  Therefore, the current support for `GEX_EVENT_{DEFER,GROUP}` signals local
  completion immediately before remote/operation completion. However, use of
  `FI_INJECT` could be used to achieve synchronous completion of small Puts
  in the same manner that ibv-conduit utilizes inline sends.

* Implement native support for Negotiated-Payload Active Message (NPAM)

* Eliminate a buffer copy for AM Medium (and AM Long below the RMA threshold)
  by using (iov_count > 1) on providers which support it, along with the
  corresponding asynchronous LC support for AM Medium.

* Implement native atomics via FI_ATOMIC

* Implement native collectives via FI_COLLECTIVE

* Implement Memory Kinds support using the FI_HMEM capability