Writing Sequential Code
=======================
In the software :doc:`introduction`, we showed how ordinary-looking Python
code could automatically be parallelized. For example,
.. testsetup:: introduction1
# Patch Dfmux objects so we can fake board calls here.
import pydfmux
pydfmux.Dfmux.get_fir_stage = lambda s: 6
pydfmux.ReadoutChannel.set_frequency = lambda *k, **a: None
class UNITS(object):
HZ = 'HZ'
pydfmux.UNITS = UNITS
class TARGET(object):
CARRIER = 'CARRIER'
NULLER = 'NULLER'
DEMOD = 'DEMOD'
pydfmux.UNITS = UNITS
pydfmux.TARGET = TARGET
.. vim_ungum*
.. doctest:: introduction1
>>> import pydfmux
>>> hwm = pydfmux.load_session('''
... !HardwareMap
... - !Dfmux { serial: "004" }
... - !Dfmux { serial: "019" }
... ''')
>>> dfmuxes = hwm.query(pydfmux.Dfmux)
>>> print dfmuxes.get_fir_stage()
[6, 6]
In this example, we dispatched the same `get_fir_stage()` call to two
IceBoards in parallel. This is a simple but ideal case for dispatching
calls The same example could have been coded sequentially:
.. doctest:: introduction1
>>> dfmuxes = hwm.query(pydfmux.Dfmux)
>>> for d in dfmuxes:
... print d.get_fir_stage()
6
6
As you can imagine, the parallel version performs better with a large number
of Dfmuxes.
This example, however, is too trivial to be really useful. In
:doc:`algorithms`, we described how to succinctly *parallelize* code,
dispatching it asynchronously to a number of IceBoard resources. In the
following sections, we focus on semantics of *sequential* dispatch, which is
often more efficient for single-board interactions than just parallelizing.
Sequential Calls
----------------
Issuing calls in parallel is not always the best approach. For example: let's
say we wish to reset a bunch of channel parameters to 0 on one or more Dfmux
boards. In this case, we're likely to make a *huge* number of calls to each
board, and it's most efficient to arrange these calls board-by-board.
.. important:: Blindly parallelizing is not optimal because it stresses the
ARM. There are several possible bottlenecks in the system, but one of the
prime suspects is the ability of the ARM to manage huge bursts of network
traffic. The ARM's application server uses a small number of threads, and
serves one request at a time. It is *much* better to minimize the number of
requests, and to make each one as useful as possible.
For example, consider the following sequential code:
.. doctest:: sequential1
>>> import pydfmux
>>> hwm = pydfmux.load_session('!HardwareMap [ !Dfmux { serial: "004" } ]')
>>> d = hwm.query(pydfmux.Dfmux).one()
>>> def clear_channel(d):
... for mezz in d.mezzanines:
... for m in mezz.modules:
... for c in m.channels:
... c.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.CARRIER)
... c.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.DEMOD)
... c.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.CARRIER)
... c.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.NULLER)
>>> clear_channel(d)
This invocation takes 3.7s on a single board. It could have been dispatched on
multiple boards as follows:
.. code-block:: python
>>> dfmuxes = hwm.query(pydfmux.Dfmux)
>>> dfmuxes.call_with(clear_channel)
When called against multiple boards, we would still get parallel behaviour.
However, in both single- and multiple-board cases, the calls are extremely
inefficient.
Why is this example inefficient? Simple: it initiates 4,096 calls *per board*.
Each of these calls involves setting up and tearing down a short-lived,
call-specific HTTP session, incurring significant overhead on the PC and
especially on the IceBoard's ARM core.
It's also possible to re-use a single HTTP session for multiple commands. This
can be done as follows:
.. doctest:: sequential1
>>> import pydfmux
>>> hwm = pydfmux.load_session('!HardwareMap [ !Dfmux { serial: "004" } ]')
>>> d = hwm.query(pydfmux.Dfmux).one()
>>> def clear_channel(d):
... with d.tuber_context() as ctx:
... for (i, mezz) in d.mezzanine.items():
... for (j, m) in mezz.module.items():
... for (k, c) in m.channel.items():
... ctx.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.CARRIER, k, j, i)
... ctx.set_frequency(0, c.iceboard.UNITS.HZ, d.TARGET.DEMOD, k, j, i)
... ctx.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.CARRIER, k, j, i)
... ctx.set_amplitude(0, c.iceboard.UNITS.NORMALIZED, d.TARGET.NULLER, k, j, i)
>>> clear_channel(d)
.. vim_unstuff**
The `with` invocation creates a `Context Manager
`_, which Python provides as a
mechanism to "wrap" blocks of code with entry and exit behaviour. Within this
context block, any function calls made against the context variable `ctx` are
not actually executed -- they're merely queued, returning a placeholder
variable. When we *leave* the context, the calls are dispatched as a single
HTTP interaction, with much lower overhead.
This invocation takes 1.2s, for a speedup of ~3x. More importantly, it imposes
less load on each ARM, the network, and the dispatching PC. All of these are
possible bottlenecks as the size of the experiment varies.
You can, of course, still issue the top-level call in parallel as before:
.. code-block:: python
>>> dfmuxes = hwm.query(pydfmux.Dfmux)
>>> dfmuxes.call_with(clear_channel)
This combination (parallel calls at the top level, and use of context managers
to issue serial calls on each dfmux) is an efficient combination. Other
considerations aside, it's best to parallelize calls across boards, and to
combine calls on a single board using a context manager. Of course, it's
*more* important to produce legible code that performs adequately.
Using Tuber Contexts
--------------------
Above, we used a context manager to speed up calls. However, in this example,
none of the calls actually returned anything. What if we needed the result?
For example, imagine querying temperatures from a number of on-board sensors:
.. doctest:: context1
>>> import pydfmux
>>> hwm = pydfmux.load_session('!HardwareMap [ !Dfmux { serial: "004" } ]')
>>> d = hwm.query(pydfmux.Dfmux).one()
>>> sensors = (
... d.TEMPERATURE_SENSOR.MB_PHY, d.TEMPERATURE_SENSOR.MB_ARM,
... d.TEMPERATURE_SENSOR.MB_FPGA, d.TEMPERATURE_SENSOR.MB_FPGA_DIE,
... d.TEMPERATURE_SENSOR.MB_POWER)
>>> with d.tuber_context() as ctx:
... results = {s: d.get_motherboard_temperature(s) for s in sensors}
What does "results" contain in this case? It can't be a dictionary of
temperatures, since the temperatures themselves weren't actually when the
dictionary was created.
.. doctest:: context1
>>> print results
{'MOTHERBOARD_TEMPERATURE_ARM': ,
'MOTHERBOARD_TEMPERATURE_FPGA': ,
'MOTHERBOARD_TEMPERATURE_FPGA_DIE': ,
'MOTHERBOARD_TEMPERATURE_PHY': ,
'MOTHERBOARD_TEMPERATURE_POWER': }
Each function call returns a `Future` object instead of the expected numeric
type. Futures are a standard way (used by `tornado
`_,
`concurrent.futures
`_, and Python
3.4's `asyncio `_
package) of providing a placeholder for a result that isn't available yet.
The *numeric* results can be retrieved by querying the `Future` objects:
.. doctest:: context1
>>> actual_results = {k:v.result() for k,v in results.items()}
{'MOTHERBOARD_TEMPERATURE_FPGA': 30.5, 'MOTHERBOARD_TEMPERATURE_FPGA_DIE': 57.7989151000977, 'MOTHERBOARD_TEMPERATURE_PHY': 39.5, 'MOTHERBOARD_TEMPERATURE_ARM': 41.0, 'MOTHERBOARD_TEMPERATURE_POWER': 31.0}
If a call generates an exception, this exception will be raised either when
the context closes, or when the corresponding Future's `result()` method is
invoked (whichever happens first.) The exception can also be retrieved using
the Future's `exception()` method.
Advanced Tuber Contexts
-----------------------
Behind the scenes, our asynchronous code (e.g. Tuber contexts, and parallel
dispatch within the HardwareMap code) make heavy use of Tornado event loops.
Occasionally, it is useful to expose these event loops directly. The following
example is taken from the `on-board event-monitoring code
`_.
.. tip:: In general, you *do not* have to write code like this directly.
However, at least some developers within the collaboration need to be aware
that it exists and is occasionally well-motivated.
.. code-block:: python
@tornado.gen.coroutine
def check_motherboard():
try:
# It might eventually be useful to have multiple set points per rail here.
MB_TEMPERATURE_ACTIONS = {
ib.TEMPERATURE_SENSOR.MB_PHY: (80, None),
ib.TEMPERATURE_SENSOR.MB_ARM: (80, None),
ib.TEMPERATURE_SENSOR.MB_FPGA: (80, fpga_panic),
ib.TEMPERATURE_SENSOR.MB_FPGA_DIE: (80, fpga_panic),
ib.TEMPERATURE_SENSOR.MB_POWER: (80, power_panic),
}
# Build 'results' a dictionary of temperature values.
results = {}
with ib.tuber_context() as ctx:
for (sensor, (limit, action)) in MB_TEMPERATURE_ACTIONS.items():
results[sensor] = {
"future": ctx.get_motherboard_temperature(sensor),
"sensor": sensor,
"limit": limit,
"action": action,
}
yield ctx._tuber_flush_async()
# Check against permitted limits
for (sensor, d) in results.items():
temp = d['future'].result()
limit = d['limit']
action = d['action']
if temp > limit:
print "Eek! %f > %f" % (temp, limit)
if action:
action()
except Exception as e:
logger.critical("Eek! %r" % e)
@coroutine
def power_panic():
logger.critical("Power supply panic!")
# Make sure mezzanines are powered off.
with ib.tuber_context() as ctx:
ctx.set_mezzanine_power(False, 1)
ctx.set_mezzanine_power(False, 2)
yield ctx._tuber_flush_async()
if __name__=='__main__':
iol = tornado.ioloop.IOLoop.instance()
# Install a temperature checker
temp_cb = tornado.ioloop.PeriodicCallback(check_motherboard, 10000)
temp_cb.start()
# Enter event loop (never returns)
iol.start()
There are two critical features in this example:
#. First, the `__main__` function is structured asynchronously; it creates an
I/O loop, and calls methods decorated with the `@coroutine` decorator.
This is the conventional way of writing asynchonous code, common to most
such Python frameworks.
.. important:: We go to some trouble to hide event loops, even though it is
recommended that an asynchronous program's top level conform to the
above pattern (with "iol.start()" controlling dispatch, and with
`@coroutine`-decorated methods reaching down wherever an asynchronous
control path exists.) It is unclear to me whether we would be better off
requiring top-level scripts to be explicitly asynchronous -- I expect it
would be an uphill battle.
#. Second, the context manager code includes odd-looking `yield` statements.
Rather than hiding the context manager magic, these yields expose it to
higher-level code.
In this example, the _tuber_flush_async() method converts the context
manager's queue of pending functions into an asynchronous call, and returns
control to the top-level Python dispatcher (where it can continue running
other scheduled tasks, e.g. monitoring voltage rails.)
The top-level dispatcher will resume execution of our code block once
another piece of code yields control and our asynchronous results are
available.
.. vim: sts=3 ts=3 sw=3 tw=78 smarttab expandtab