[Cfp-interest] Berkeley feedback - substitution

Ian McIntosh ianm at ca.ibm.com
Fri May 23 19:15:46 PDT 2014


I understand the interest in presubstitution, but I'm not aware of any
existing hardware to do it automatically.  That means it has to be done by
software, presumably mostly by compilers in response to user code.  I think
it will be challenging to get it approved or implemented.

For PowerPC there are several possible implementations.  These are slightly
simplified:

   Trap to a signal handler:
   Before the monitored code, activate a signal handler for either a
   SIGTRAP or SIGFPE (depending on whether AIX or Linux and on AIX the
   mechanism the user chose to detect exceptions) if it wasn't already
   active (requires a system call), store the substitution values in some
   agreed on place (probably an extern or thread-local struct variable,
   with the old value saved and later restored in case of recursion), save
   the Floating Point Status and Control Register and the FE0+FE1 bits in
   the Machine State Register enabling traps, and change them to enable
   exception trapping for the relevant exceptions (changing trap enablement
   requires a system call).
   After the monitored code, restore the FPSCR control but not status bits,
   the FE0+FE1 trap enablement (via a system call) and the substitution
   struct's previous values and deactivate the signal handler if necessary
   (via a system call).
   During the monitored code, an exception of interest would trap to the
   operating system which after many instructions would activate the signal
   handler.  That would analyze the situation, find which instruction
   trapped instead of executing, what its result register was, deduce
   somehow what exception would have occurred (maybe by temporarily
   disabling exception trapping then executing just that instruction?),
   find the appropriate substitution value in the agreed on place, set the
   result register to the substitution value, adjust the instruction
   address register (PC) to point at the next instruction, then return.
   Obviously this is expensive.

   Test after each operation:
   Before the monitored code, store the substitution values in some agreed
   on place (which in this case can be a local auto variable), save the
   FPSCR and change it to enable exception reporting for the relevant
   exceptions and disable FE0+FE1 exception trapping (via a system call).
   After the monitored code, restore the FPSCR control but not status bits
   and FE0+FE1 exception trapping (via a system call).
   During the monitored code, an exception of interest would not trap.
   Each floating-point instruction would be a "record form instruction",
   containing a control bit telling the CPU to copy 4 specific flags from
   the FPSCR to a condition register.  Also each function call would be
   followed by a floating-point copy instruction to detect any exceptions
   that occurred in the function.  Each floating-point instruction would be
   followed by a conditional branch checking a specific condition register
   bit (copied from the FPSCR) indicating whether any enabled exception
   status bit was set.  If yes, code would be executed to analyze the FPSCR
   to see what exception occurred, find the appropriate substitution value
   in the agreed on place and set the result register to the substitution
   value, then continue.
   If unsubstituted exceptions are supposed to trap to a signal handler
   then things get more complicated.
   Unfortunately adding a check after each instruction destroys
   parallelism.
   This approach is expensive but for small blocks much less expensive than
   trapping to a signal handler.

   Test after the code block:
   This would be similar to #2 but test only at the end of the code block.
   Depending on the semantics it may be necessary to either test before
   each store or to defer all stores until after the single test at the
   end.
   This would be less expensive than #1 or #2.

   Test instruction results:
   It may be cheaper to explicitly test the result of every floating-point
   instruction in the block, especially if only NaNs or Infinities or Zeros
   are being substituted.
   If the program had enabled trapping on exceptions, that would have to be
   disabled then restored, but that could be done by changing the FPSCR to
   not report the monitored exceptions.

   Let the user handle it:
   In many applications the least expensive way is to let the program do it
   itself.  None of the above can meet the stated requirement of being no
   more expensive than a [compare and] conditional branch.

Assuming substitution would apply to all precisions, the substitution
values must either be translated to all precisions in advance, or more
likely translated to the required precision on demand.  Or would the user
have to specify the appropriate value for each precision in the block?

Combining substitution with other special handling adds a little more
complexity.

Most of the exceptions are mutually exclusive, but inexact can occur with
some of the others.  If inexact is handled we will have to define which
substitution wins.

The PowerPC has specific invalid operation flags ORed together into the
IEEE Invalid Operation, so users could distinguish between oo-oo, oo/oo,
0/0, 0*oo, sqrt(-1), (int)(1.E200), comparing to an SNaN or a library
function domain or range error.

On some systems SIMD operations might add additional comlexity.


How would this be implemented on other architectures that you're familiar
with?

- Ian McIntosh          IBM Canada Lab         Compiler Back End Support
and Development



|------------>
| From:      |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |David Hough CFP <pcfp at oakapple.net>                                                                                                        |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |cfp-interest at ucbtest.org,                                                                                                                  |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |2014-05-23 05:21 PM                                                                                                                        |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |[Cfp-interest] Berkeley feedback                                                                                                           |
  >-------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >-------------------------------------------------------------------------------------------------------------------------------------------|
  |cfp-interest-bounces at oakapple.net                                                                                                          |
  >-------------------------------------------------------------------------------------------------------------------------------------------|






I talked about my ideas for alternate exception handling yesterday at
Berkeley.    I'd say that the main feedback, especially from Kahan, is that
the macro syntax is too complicated.    Others pointed out that nested
macros will look especially unattractive.     And the pragma syntax
isn't much better.

Kahan was particularly adamant that some kind of presubstitution mechanism
be available, particularly because, lacking explicit branches, it made
automated (and perhaps manual) theorem proving more tractable.

We agreed that the continued fraction with derivative example was not a
very compelling argument for presubstitution, because the presubstituted
value changes in the inner loop, and it's very doubtful that any
presubstitution mechanism can be as fast as conditional branches
on operands or results in the source code, especially if they get
compiled into correctly-predicted branches or conditional moves.

One of the other attendees pointed out that the continued fraction example
is a good candidate for executing the entire loop under FP_EXCEPTIONAL
as if no invalid would occur, and then interrupting and recomputing more
carefully in the infrequent occasions when invalid does occur.
That corresponds to the way I imagine FP_EXCEPTIONAL could be applied
to the dnrm2 problem, although Kahan has another algorithm for dnrm2
that is supposed to be avoid exceptions and only invoke scaling if
necessary.


_______________________________________________
Cfp-interest mailing list
Cfp-interest at oakapple.net
http://mailman.oakapple.net/mailman/listinfo/cfp-interest


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20140523/2f2a3817/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20140523/2f2a3817/attachment.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
Url : http://mailman.oakapple.net/pipermail/cfp-interest/attachments/20140523/2f2a3817/attachment-0001.gif 


More information about the Cfp-interest mailing list