Writing Sequential Code
=======================

In the software :doc:`introduction`, we showed how ordinary-looking Python
code could automatically be parallelized. For example,

.. testsetup:: introduction1

   # Patch Dfmux objects so we can fake board calls here.
   import pydfmux
   pydfmux.Dfmux.get_fir_stage = lambda s: 6
   pydfmux.ReadoutChannel.set_frequency = lambda *k, **a: None

   class UNITS(object):
      HZ = 'HZ'
   pydfmux.UNITS = UNITS

   class TARGET(object):
      CARRIER = 'CARRIER'
      NULLER = 'NULLER'
      DEMOD = 'DEMOD'

   pydfmux.UNITS = UNITS
   pydfmux.TARGET = TARGET

.. vim_ungum*

.. doctest:: introduction1

   >>> import pydfmux
   >>> hwm = pydfmux.load_session('''
   ... !HardwareMap
   ... - !Dfmux { serial: "004" }
   ... - !Dfmux { serial: "019" }
   ... ''')

   >>> dfmuxes = hwm.query(pydfmux.Dfmux)
   >>> print dfmuxes.get_fir_stage()
   [6, 6]

In this example, we dispatched the same `get_fir_stage()` call to two
IceBoards in parallel. This is a simple but ideal case for dispatching
calls The same example could have been coded sequentially:

.. doctest:: introduction1

   >>> dfmuxes = hwm.query(pydfmux.Dfmux)
   >>> for d in dfmuxes:
   ...   print d.get_fir_stage()
   6
   6

As you can imagine, the parallel version performs better with a large number
of Dfmuxes.

This example, however, is too trivial to be really useful. In
:doc:`algorithms`, we described how to succinctly *parallelize* code,
dispatching it asynchronously to a number of IceBoard resources. In the
following sections, we focus on semantics of *sequential* dispatch, which is
often more efficient for single-board interactions than just parallelizing.

Sequential Calls
----------------

Issuing calls in parallel is not always the best approach. For example: let's
say we wish to reset a bunch of channel parameters to 0 on one or more Dfmux
boards. In this case, we're likely to make a *huge* number of calls to each
board, and it's most efficient to arrange these calls board-by-board.

.. important:: Blindly parallelizing is not optimal because it stresses the
   ARM. There are several possible bottlenecks in the system, but one of the
   prime suspects is the ability of the ARM to manage huge bursts of network
   traffic. The ARM's application server uses a small number of threads, and
   serves one request at a time. It is *much* better to minimize the number of
   requests, and to make each one as useful as possible.

For example, consider the following sequential code:

.. doctest:: sequential1

   >>> import pydfmux
   >>> hwm = pydfmux.load_session('!HardwareMap [ !Dfmux { serial: "004" } ]')

   >>> d = hwm.query(pydfmux.Dfmux).one()

   >>> def clear_channel(d):
   ...    for mezz in d.mezzanines:
   ...        for m in mezz.modules:
   ...            for c in m.channels:
   ...                c.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.CARRIER)
   ...                c.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.DEMOD)
   ...                c.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.CARRIER)
   ...                c.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.NULLER)
   >>> clear_channel(d)

This invocation takes 3.7s on a single board. It could have been dispatched on
multiple boards as follows:

.. code-block:: python

   >>> dfmuxes = hwm.query(pydfmux.Dfmux)
   >>> dfmuxes.call_with(clear_channel)

When called against multiple boards, we would still get parallel behaviour.
However, in both single- and multiple-board cases, the calls are extremely
inefficient.

Why is this example inefficient? Simple: it initiates 4,096 calls *per board*.
Each of these calls involves setting up and tearing down a short-lived,
call-specific HTTP session, incurring significant overhead on the PC and
especially on the IceBoard's ARM core.

It's also possible to re-use a single HTTP session for multiple commands. This
can be done as follows:

.. doctest:: sequential1

   >>> import pydfmux
   >>> hwm = pydfmux.load_session('!HardwareMap [ !Dfmux { serial: "004" } ]')

   >>> d = hwm.query(pydfmux.Dfmux).one()

   >>> def clear_channel(d):
   ...    with d.tuber_context() as ctx:
   ...        for (i, mezz) in d.mezzanine.items():
   ...            for (j, m) in mezz.module.items():
   ...                for (k, c) in m.channel.items():
   ...                    ctx.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.CARRIER, k, j, i)
   ...                    ctx.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.DEMOD, k, j, i)
   ...                    ctx.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.CARRIER, k, j, i)
   ...                    ctx.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.NULLER, k, j, i)
   >>> clear_channel(d)

.. vim_unstuff**

The `with` invocation creates a `Context Manager
<https://www.python.org/dev/peps/pep-0343/>`_, which Python provides as a
mechanism to "wrap" blocks of code with entry and exit behaviour. Within this
context block, any function calls made against the context variable `ctx` are
not actually executed -- they're merely queued, returning a placeholder
variable. When we *leave* the context, the calls are dispatched as a single
HTTP interaction, with much lower overhead.

This invocation takes 1.2s, for a speedup of ~3x. More importantly, it imposes
less load on each ARM, the network, and the dispatching PC. All of these are
possible bottlenecks as the size of the experiment varies.

You can, of course, still issue the top-level call in parallel as before:

.. code-block:: python

   >>> dfmuxes = hwm.query(pydfmux.Dfmux)
   >>> dfmuxes.call_with(clear_channel)

This combination (parallel calls at the top level, and use of context managers
to issue serial calls on each dfmux) is an efficient combination. Other
considerations aside, it's best to parallelize calls across boards, and to
combine calls on a single board using a context manager. Of course, it's
*more* important to produce legible code that performs adequately.

Using Tuber Contexts
--------------------

Above, we used a context manager to speed up calls. However, in this example,
none of the calls actually returned anything. What if we needed the result?
For example, imagine querying temperatures from a number of on-board sensors:

.. doctest:: context1

   >>> import pydfmux
   >>> hwm = pydfmux.load_session('!HardwareMap [ !Dfmux { serial: "004" } ]')
   >>> d = hwm.query(pydfmux.Dfmux).one()

   >>> sensors = (
   ...     d.TEMPERATURE_SENSOR.MB_PHY, d.TEMPERATURE_SENSOR.MB_ARM,
   ...     d.TEMPERATURE_SENSOR.MB_FPGA, d.TEMPERATURE_SENSOR.MB_FPGA_DIE,
   ...     d.TEMPERATURE_SENSOR.MB_POWER)

   >>> with d.tuber_context() as ctx:
   ...     results = {s: d.get_motherboard_temperature(s) for s in sensors}

What does "results" contain in this case? It can't be a dictionary of
temperatures, since the temperatures themselves weren't actually when the
dictionary was created.

.. doctest:: context1

   >>> print results
   {'MOTHERBOARD_TEMPERATURE_ARM': <Future at 0x7f0258e61ad0 state=finished returned float>,
    'MOTHERBOARD_TEMPERATURE_FPGA': <Future at 0x7f0258e61dd0 state=finished returned float>,
    'MOTHERBOARD_TEMPERATURE_FPGA_DIE': <Future at 0x7f0258e61150 state=finished returned float>,
    'MOTHERBOARD_TEMPERATURE_PHY': <Future at 0x7f0257e56690 state=finished returned float>,
    'MOTHERBOARD_TEMPERATURE_POWER': <Future at 0x7f0258e61fd0 state=finished returned float>}

Each function call returns a `Future` object instead of the expected numeric
type. Futures are a standard way (used by `tornado
<http://tornado.readthedocs.org/en/latest/concurrent.html#tornado.concurrent.Future>`_,
`concurrent.futures
<https://docs.python.org/3/library/concurrent.futures.html>`_, and Python
3.4's `asyncio <https://docs.python.org/3/library/asyncio-task.html>`_
package) of providing a placeholder for a result that isn't available yet.

The *numeric* results can be retrieved by querying the `Future` objects:

.. doctest:: context1

   >>> actual_results = {k:v.result() for k,v in results.items()}
   {'MOTHERBOARD_TEMPERATURE_FPGA': 30.5, 'MOTHERBOARD_TEMPERATURE_FPGA_DIE': 57.7989151000977, 'MOTHERBOARD_TEMPERATURE_PHY': 39.5, 'MOTHERBOARD_TEMPERATURE_ARM': 41.0, 'MOTHERBOARD_TEMPERATURE_POWER': 31.0}

If a call generates an exception, this exception will be raised either when
the context closes, or when the corresponding Future's `result()` method is
invoked (whichever happens first.) The exception can also be retrieved using
the Future's `exception()` method.

Advanced Tuber Contexts
-----------------------

Behind the scenes, our asynchronous code (e.g. Tuber contexts, and parallel
dispatch within the HardwareMap code) make heavy use of Tornado event loops.
Occasionally, it is useful to expose these event loops directly. The following
example is taken from the `on-board event-monitoring code
<https://bitbucket.org/winterlandcosmology/icecore/src/HEAD/opkg/iceboard-runtime/data/usr/bin/iceboard-monitor.py?at=master>`_.

.. tip:: In general, you *do not* have to write code like this directly.
   However, at least some developers within the collaboration need to be aware
   that it exists and is occasionally well-motivated.

.. code-block:: python

   @tornado.gen.coroutine
   def check_motherboard():
       try:
           # It might eventually be useful to have multiple set points per rail here.
           MB_TEMPERATURE_ACTIONS = {
               ib.TEMPERATURE_SENSOR.MB_PHY: (80, None),
               ib.TEMPERATURE_SENSOR.MB_ARM: (80, None),
               ib.TEMPERATURE_SENSOR.MB_FPGA: (80, fpga_panic),
               ib.TEMPERATURE_SENSOR.MB_FPGA_DIE: (80, fpga_panic),
               ib.TEMPERATURE_SENSOR.MB_POWER: (80, power_panic),
           }

           # Build 'results' a dictionary of temperature values.
           results = {}
           with ib.tuber_context() as ctx:
               for (sensor, (limit, action)) in MB_TEMPERATURE_ACTIONS.items():
                   results[sensor] = {
                           "future": ctx.get_motherboard_temperature(sensor),
                           "sensor": sensor,
                           "limit": limit,
                           "action": action,
                   }
               yield ctx._tuber_flush_async()

           # Check against permitted limits
           for (sensor, d) in results.items():
               temp = d['future'].result()
               limit = d['limit']
               action = d['action']

               if temp > limit:
                   print "Eek! %f > %f" % (temp, limit)

                   if action:
                       action()

       except Exception as e:
           logger.critical("Eek! %r" % e)

   @coroutine
   def power_panic():
       logger.critical("Power supply panic!")

       # Make sure mezzanines are powered off.
       with ib.tuber_context() as ctx:
           ctx.set_mezzanine_power(False, 1)
           ctx.set_mezzanine_power(False, 2)
           yield ctx._tuber_flush_async()

   if __name__=='__main__':
      iol = tornado.ioloop.IOLoop.instance()

      # Install a temperature checker
      temp_cb = tornado.ioloop.PeriodicCallback(check_motherboard, 10000)
      temp_cb.start()

      # Enter event loop (never returns)
      iol.start()


There are two critical features in this example:

#. First, the `__main__` function is structured asynchronously; it creates an
   I/O loop, and calls methods decorated with the `@coroutine` decorator.
   This is the conventional way of writing asynchonous code, common to most
   such Python frameworks.

   .. important:: We go to some trouble to hide event loops, even though it is
      recommended that an asynchronous program's top level conform to the
      above pattern (with "iol.start()" controlling dispatch, and with
      `@coroutine`-decorated methods reaching down wherever an asynchronous
      control path exists.) It is unclear to me whether we would be better off
      requiring top-level scripts to be explicitly asynchronous -- I expect it
      would be an uphill battle.

#. Second, the context manager code includes odd-looking `yield` statements.
   Rather than hiding the context manager magic, these yields expose it to
   higher-level code.

   In this example, the _tuber_flush_async() method converts the context
   manager's queue of pending functions into an asynchronous call, and returns
   control to the top-level Python dispatcher (where it can continue running
   other scheduled tasks, e.g. monitoring voltage rails.)

   The top-level dispatcher will resume execution of our code block once
   another piece of code yields control and our asynchronous results are
   available.

.. vim: sts=3 ts=3 sw=3 tw=78 smarttab expandtab