dwww: tpablo.net

dwww Home | Manual pages | Find package
INTRO_SHMEM(3)                     Open MPI                     INTRO_SHMEM(3)

NAME
       intro_shmem - Introduction to the OpenSHMEM programming model

DESCRIPTION
       The  SHMEM  programming model consists of library routines that provide
       low-latency, high-bandwidth communication  for  use  in  highly  paral-
       lelized  scalable  programs.  The routines in the OpenSHMEM application
       programming interface (API) provide a programming model for  exchanging
       data between cooperating parallel processes. The resulting programs are
       similar in style to Message Passing Interface (MPI) programs. The SHMEM
       API can be used either alone or in combination with MPI routines in the
       same parallel program.

       An OpenSHMEM program is SPMD (single program, multiple data) in  style.
       The  SHMEM  processes,  called processing elements or PEs, all start at
       the same time and they all run the same program. Usually the  PEs  per-
       form  computation on their own subdomains of the larger problem and pe-
       riodically communicate with other PEs to exchange information on  which
       the next computation phase depends.

       The  OpenSHMEM  routines  minimize  the  overhead  associated with data
       transfer requests, maximize bandwidth and minimize data  latency.  Data
       latency  is the period of time that starts when a PE initiates a trans-
       fer of data and ends when a PE can use the  data.   OpenSHMEM  routines
       support  remote  data  transfer  through put operations, which transfer
       data to a different PE, get operations, which transfer data from a dif-
       ferent  PE,  and remote pointers, which allow direct references to data
       objects owned by another PE. Other operations supported are  collective
       broadcast and reduction, barrier synchronization, and atomic memory op-
       erations. An atomic memory operation is an atomic read-and-update oper-
       ation, such as a fetch-and-increment, on a remote or local data object.

OPENSHMEM ROUTINES
       This section lists the significant OpenSHMEM message-passing routines.

       PE queries

              *      C/C++ only:

                     *      _num_pes(3)

                     *      _my_pe(3)

              *      Fortran only:

                     *      NUM_PES(3)

                     *      MY_PE(3)

       Elemental data put routines

              *      C/C++ only:

                     *      shmem_double_p(3)

                     *      shmem_float_p(3)

                     *      shmem_int_p(3)

                     *      shmem_long_p(3)

                     *      shmem_short_p.(3)

       Block data put routines

              *      C/C++ and Fortran:

                     *      shmem_put32(3)

                     *      shmem_put64(3)

                     *      shmem_put128(3)

              *      C/C++ only:

                     *      shmem_double_put(3)

                     *      shmem_float_put(3)

                     *      shmem_int_put(3)

                     *      shmem_long_put(3)

                     *      shmem_short_put.(3)

              *      Fortran only:

                     *      shmem_complex_put(3)

                     *      shmem_integer_put(3)

                     *      shmem_logical_put(3)

                     *      shmem_real_put(3)

       Elemental data get routines

              *      C/C++ only:

                     *      shmem_double_g(3)

                     *      shmem_float_g(3)

                     *      shmem_int_g(3)

                     *      shmem_long_g(3)

                     *      shmem_short_g(3)

       Block data get routines

              *      C/C++ and Fortran:

                     *      shmem_get32(3)

                     *      shmem_get64(3)

                     *      shmem_get128(3)

              *      C/C++ only:

                     *      shmem_double_get(3)

                     *      shmem_float_get(3)

                     *      shmem_int_get(3)

                     *      shmem_long_get(3)

                     *      shmem_short_get(3)

              *      Fortran only:

                     *      shmem_complex_get(3)

                     *      shmem_integer_get(3)

                     *      shmem_logical_get(3)

                     *      shmem_real_get(3)

       Strided put routines

              *      C/C++ and Fortran:

                     *      shmem_iput32(3)

                     *      shmem_iput64(3)

                     *      shmem_iput128(3)

              *      C/C++ only:

                     *      shmem_double_iput(3)

                     *      shmem_float_iput(3)

                     *      shmem_int_iput(3)

                     *      shmem_long_iput(3)

                     *      shmem_short_iput(3)

              *      Fortran only:

                     *      shmem_complex_iput(3)

                     *      shmem_integer_iput(3)

                     *      shmem_logical_iput(3)

                     *      shmem_real_iput(3)

       Strided get routines

              *      C/C++ and Fortran:

                     *      shmem_iget32(3)

                     *      shmem_iget64(3)

                     *      shmem_iget128(3)

              *      C/C++ only:

                     *      shmem_double_iget(3)

                     *      shmem_float_iget(3)

                     *      shmem_int_iget(3)

                     *      shmem_long_iget(3)

                     *      shmem_short_iget(3)

              *      Fortran only:

                     *      shmem_complex_iget(3)

                     *      shmem_integer_iget(3)

                     *      shmem_logical_iget(3)

                     *      shmem_real_iget(3)

       Point-to-point synchronization routines

              *      C/C++ only:

                     *      shmem_int_wait(3)

                     *      shmem_int_wait_until(3)

                     *      shmem_long_wait(3)

                     *      shmem_long_wait_until(3)

                     *      shmem_longlong_wait(3)

                     *      shmem_longlong_wait_until(3)

                     *      shmem_short_wait(3)

                     *      shmem_short_wait_until(3)

              *      Fortran:

                     *      shmem_int4_wait(3)

                     *      shmem_int4_wait_until(3)

                     *      shmem_int8_wait(3)

                     *      shmem_int8_wait_until(3)

       Barrier synchronization routines

              *      C/C++ and Fortran:

                     *      shmem_barrier_all(3)

                     *      shmem_barrier(3)

       Atomic memory fetch-and-operate (fetch-op) routines

              *      C/C++ and Fortran:

                     *      shmem_swap

       Reduction routines

              *      C/C++ only:

                     *      shmem_int_and_to_all(3)

                     *      shmem_long_and_to_all(3)

                     *      shmem_longlong_and_to_all(3)

                     *      shmem_short_and_to_all(3)

                     *      shmem_double_max_to_all(3)

                     *      shmem_float_max_to_all(3)

                     *      shmem_int_max_to_all(3)

                     *      shmem_long_max_to_all(3)

                     *      shmem_longlong_max_to_all(3)

                     *      shmem_short_max_to_all(3)

                     *      shmem_double_min_to_all(3)

                     *      shmem_float_min_to_all(3)

                     *      shmem_int_min_to_all(3)

                     *      shmem_long_min_to_all(3)

                     *      shmem_longlong_min_to_all(3)

                     *      shmem_short_min_to_all(3)

                     *      shmem_double_sum_to_all(3)

                     *      shmem_float_sum_to_all(3)

                     *      shmem_int_sum_to_all(3)

                     *      shmem_long_sum_to_all(3)

                     *      shmem_longlong_sum_to_all(3)

                     *      shmem_short_sum_to_all(3)

                     *      shmem_double_prod_to_all(3)

                     *      shmem_float_prod_to_all(3)

                     *      shmem_int_prod_to_all(3)

                     *      shmem_long_prod_to_all(3)

                     *      shmem_longlong_prod_to_all(3)

                     *      shmem_short_prod_to_all(3)

                     *      shmem_int_or_to_all(3)

                     *      shmem_long_or_to_all(3)

                     *      shmem_longlong_or_to_all(3)

                     *      shmem_short_or_to_all(3)

                     *      shmem_int_xor_to_all(3)

                     *      shmem_long_xor_to_all(3)

                     *      shmem_longlong_xor_to_all(3)

                     *      shmem_short_xor_to_all(3)

              *      Fortran only:

                     *      shmem_int4_and_to_all(3)

                     *      shmem_int8_and_to_all(3)

                     *      shmem_real4_max_to_all(3)

                     *      shmem_real8_max_to_all(3)

                     *      shmem_int4_max_to_all(3)

                     *      shmem_int8_max_to_all(3)

                     *      shmem_real4_min_to_all(3)

                     *      shmem_real8_min_to_all(3)

                     *      shmem_int4_min_to_all(3)

                     *      shmem_int8_min_to_all(3)

                     *      shmem_real4_sum_to_all(3)

                     *      shmem_real8_sum_to_all(3)

                     *      shmem_int4_sum_to_all(3)

                     *      shmem_int8_sum_to_all(3)

                     *      shmem_real4_prod_to_all(3)

                     *      shmem_real8_prod_to_all(3)

                     *      shmem_int4_prod_to_all(3)

                     *      shmem_int8_prod_to_all(3)

                     *      shmem_int4_or_to_all(3)

                     *      shmem_int8_or_to_all(3)

                     *      shmem_int4_xor_to_all(3)

                     *      shmem_int8_xor_to_all(3)

       Broadcast routines

              *      C/C++ and Fortran:

                     *      shmem_broadcast32(3)

                     *      shmem_broadcast64(3)

       Cache management routines

              *      C/C++ and Fortran:

                     *      shmem_udcflush(3)

                     *      shmem_udcflush_line(3)

       Byte-granularity block put routines

              *      C/C++ and Fortran

                     *      shmem_putmem(3)

                     *      shmem_getmem(3)

              *      Fortran only:

                     *      shmem_character_put(3)

                     *      shmem_character_get(3)

       Collect routines

              *      C/C++ and Fortran:

                     *      shmem_collect32(3)

                     *      shmem_collect64(3)

                     *      shmem_fcollect32(3)

                     *      shmem_fcollect64(3)

       Atomic memory fetch-and-operate (fetch-op) routines

              *      C/C++ only:

                     *      shmem_double_swap(3)

                     *      shmem_float_swap(3)

                     *      shmem_int_cswap(3)

                     *      shmem_int_fadd(3)

                     *      shmem_int_finc(3)

                     *      shmem_int_swap(3)

                     *      shmem_long_cswap(3)

                     *      shmem_long_fadd(3)

                     *      shmem_long_finc(3)

                     *      shmem_long_swap(3)

                     *      shmem_longlong_cswap(3)

                     *      shmem_longlong_fadd(3)

                     *      shmem_longlong_finc(3)

                     *      shmem_longlong_swap(3)

              *      Fortran only:

                     *      shmem_int4_cswap(3)

                     *      shmem_int4_fadd(3)

                     *      shmem_int4_finc(3)

                     *      shmem_int4_swap(3)

                     *      shmem_int8_swap(3)

                     *      shmem_real4_swap(3)

                     *      shmem_real8_swap(3)

                     *      shmem_int8_cswap(3)

       Atomic memory operation routines

              *      Fortran only:

                     *      shmem_int4_add(3)

                     *      shmem_int4_inc(3)

       Remote memory pointer function

              *      C/C++ and Fortran:

                     *      shmem_ptr(3)

       Reduction routines

              *      C/C++ only:

                     *      shmem_longdouble_max_to_all(3)

                     *      shmem_longdouble_min_to_all(3)

                     *      shmem_longdouble_prod_to_all(3)

                     *      shmem_longdouble_sum_to_all(3)

              *      Fortran only:

                     *      shmem_real16_max_to_all(3)

                     *      shmem_real16_min_to_all(3)

                     *      shmem_real16_prod_to_all(3)

                     *      shmem_real16_sum_to_all(3)

       Accessibility query routines

              *      C/C++ and Fortran:

                     *      shmem_pe_accessible(3)

                     *      shmem_addr_accessible(3)

       Symmetric Data Objects

       Consistent  with  the SPMD nature of the OpenSHMEM programming model is
       the concept of symmetric data objects. These are  arrays  or  variables
       that  exist  with the same size, type, and relative address on all PEs.
       Another term for symmetric data objects is  "remotely  accessible  data
       objects".   In  the  interface  definitions for OpenSHMEM data transfer
       routines, one or more of the parameters are typically  required  to  be
       symmetric or remotely accessible.

       The following kinds of data objects are symmetric:

              *      Fortran  data  objects  in common blocks or with the SAVE
                     attribute. These data objects must not be  defined  in  a
                     dynamic shared object (DSO).

              *      Non-stack  C  and  C++ variables. These data objects must
                     not be defined in a DSO.

              *      Fortran arrays allocated with shpalloc(3F)

              *      C and C++ data allocated by shmalloc(3C)

       Collective Routines
              Some  SHMEM  routines,  for  example,   shmem_broadcast(3)   and
              shmem_float_sum_to_all(3), are classified as collective routines
              because they distribute work across a set of PEs.  They must  be
              called  concurrently by all PEs in the active set defined by the
              PE_start, logPE_stride, PE_size argument triplet. The  following
              man pages describe the OpenSHMEM collective routines:

              *      shmem_and(3)

              *      shmem_barrier(3)

              *      shmem_broadcast(3)

              *      shmem_collect(3)

              *      shmem_max(3)

              *      shmem_min(3)

              *      shmem_or(3)

              *      shmem_prod(3)

              *      shmem_sum(3)

              *      shmem_xor(3)

USING THE SYMMETRIC WORK ARRAY, PSYNC
       Multiple  pSync  arrays  are  often  needed if a particular PE calls as
       OpenSHMEM collective routine twice without intervening barrier synchro-
       nization. Problems would occur if some PEs in the active set for call 2
       arrive at call 2 before processing of call 1 is complete by all PEs  in
       the  call  1  active  set.  You  can use shmem_barrier(3) or shmem_bar-
       rier_all(3) to perform a barrier  synchronization  between  consecutive
       calls to OpenSHMEM collective routines.

       There are two special cases:

       *      The  shmem_barrier(3)  routine allows the same pSync array to be
              used on consecutive calls as long as the active PE set does  not
              change.

       *      If the same collective routine is called multiple times with the
              same active set, the calls may alternate between two  pSync  ar-
              rays.  The  SHMEM  routines  guarantee that a first call is com-
              pletely finished by all PEs by the time processing  of  a  third
              call begins on any PE.

       Because the SHMEM routines restore pSync to its original contents, mul-
       tiple calls that use the same pSync array do not require that pSync  be
       reinitialized after the first call.

SHMEM ENVIRONMENT VARIABLES
       This section lists the significant SHMEM environment variables.

       *      SMA_VERSION print the library version at start-up.

       *      SMA_INFO  print  helpful  text about all these environment vari-
              ables.

       *      SMA_SYMMETRIC_SIZE number of bytes to allocate for the symmetric
              heap.

       *      SMA_DEBUG enable debugging messages.

       The first call to SHMEM must be start_pes(3).  This routines initialize
       the SHMEM runtime.

       Calling any other SHMEM routines  beforehand  has  undefined  behavior.
       Multiple calls to this routine is not allowed.

COMPILING AND RUNNING OPENSHMEM PROGRAMS
       The  OpenSHMEM specification is silent regarding how OpenSHMEM programs
       are compiled, linked and run. This section shows some examples  of  how
       wrapper  programs could be utilized to compile and launch applications.
       The commands are styled after wrapper programs found in many MPI imple-
       mentations.

       The  following  sample  command  line demonstrates running an OpenSHMEM
       Program using a wrapper script (oshrun in this case):

       *      C/C++:
              oshcc c_program.c

       *      FORTRAN:
              oshfort fortran_program.f

       The following sample command line  demonstrates  running  an  OpenSHMEM
       Program  assuming  that  the library provides a wrapper script for such
       purpose (named oshrun for this example):

       oshrun -np 32 ./a.out

EXAMPLES
       Example 1: The following Fortran OpenSHMEM program directs all  PEs  to
       sum simultaneously the numbers in the VALUES variable across all PEs:
       PROGRAM REDUCTION
         REAL VALUES, SUM
         COMMON /C/ VALUES
         REAL WORK

         CALL START_PES(0)
         VALUES = MY_PE()
         CALL SHMEM_BARRIER_ALL ! Synchronize all PEs
         SUM = 0.0
         DO I = 0, NUM_PES()-1
           CALL SHMEM_REAL_GET(WORK, VALUES, 1, I) ! Get next value
           SUM = SUM + WORK                ! Sum it
         ENDDO
         PRINT *, 'PE ', MY_PE(), ' COMPUTED SUM=', SUM
         CALL SHMEM_BARRIER_ALL
       END

       Example  2:  The following C OpenSHMEM program transfers an array of 10
       longs from PE 0 to PE 1:
       #include <mpp/shmem.h>

       main() {
         long source[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
         static long target[10];

         shmem_init();
         if (shmem_my_pe() == 0) {
           /* put 10 elements into target on PE 1 */
           shmem_long_put(target, source, 10, 1);
         }
         shmem_barrier_all(); /* sync sender and receiver */
         if (shmem_my_pe() == 1)
           printf("target[0] on PE %d is %d\n", shmem_my_pe(), target[0]);
       }

SEE ALSO
       The following man pages also contain information on OpenSHMEM routines.
       See the specific man pages for implementation information.

       shmem_add(3),   shmem_and(3),  shmem_barrier(3),  shmem_barrier_all(3),
       shmem_broadcast(3), shmem_cache(3),  shmem_collect(3),  shmem_cswap(3),
       shmem_fadd(3),     shmem_fence(3),     shmem_finc(3),     shmem_get(3),
       shmem_iget(3),     shmem_inc(3),     shmem_iput(3),      shmem_lock(3),
       shmem_max(3), shmem_min(3), shmem_my_pe(3), shmem_or(3), shmem_prod(3),
       shmem_put(3),   shmem_quiet(3),   shmem_short_g(3),   shmem_short_p(3),
       shmem_sum(3),  shmem_swap(3), shmem_wait(3), shmem_xor(3), shmem_pe_ac-
       cessible(3), shmem_addr_accessible(3), shmem_init(3), shmem_malloc(3C),
       shmem_my_pe(3I), shmem_n_pes(3I)

4.1.2                            Nov 24, 2021                   INTRO_SHMEM(3)
Generated by dwww version 1.14 on Tue Aug 12 03:51:30 CEST 2025.