checkpoint_here, CHECKPOINT_HERE

checkpoint_here (for C programs) and CHECKPOINT_HERE (for FORTRAN programs) initiates a user-directed checkpointing.

Synopsis

For C programs:

int checkpoint_here(int flag)

err = checkpoint_here(flag)


For FORTRAN programs:

EXTERNAL CHECKPOINT_HERE

INTEGER CHECKPOINT_HERE

INTEGER FLAG

PARAMETER (CKPT_IMMEDIATE=0, CKPT_PERIODIC=1)

ERR = CHECKPOINT_HERE(FLAG)

Description

Depending on the flag value, checkpoint_here initiates a concurrent checkpoint when called synchronously from every node in an application. The value for flag can be either CKPT_IMMEDIATE or CKPT_PERIODIC.

If CKPT_IMMEDIATE is passed to checkpoint_here, a checkpoint is started immediately.

If instead, CKPT_PERIODIC is passed to checkpoint_here, all nodes first coordinate themselves and check if a timer has expired since last checkpoint or the starting of the program. This timer is maintained on node 0 process of the application. If timer has expired, then a checkpoint is initiated by checkpoint_here().

Return Value

Both functions return 0 upon success and a negative value upon failure. If the program recovers from a checkpoint that was initiated by this checkpoint_here call, it will appear to return from the call to this checkpoint_here with a return value 1.

Example

CALL MULT_MATRIX(A, B, C)

ERR = CHECKPOINT_HERE(CKPT_IMMEDIATE)

IF (ERR .EQ. -1 .AND. MYNODE() .EQ. 0 ) THEN

WRITE (*,*) 'CHECKPOINT_HERE FAILED with ', ERR

STOP

ENDIF

IF (ERR .EQ. 1 .AND. MYNODE() .EQ. 0 ) THEN

WRITE (*,*) 'WE JUST RECOVERED FROM A CHECKPOINT'

ENDIF


Copyright, Princeton University