Daisy chain wiring of functions of no return
10 Oct 2024 - John Z. Li
Assuming you are working on a toy trading system, it consists of the following modules:
- A client order validation module (let us call it RequestValidator);
- A risk check module (let us call it RiskChecker);
- A order book module to do book-keeping (let us call it OrderBook);
- An Exchange session management module which translates an order into Exchange
format and send it to Exchange (let us call it ExhcnageClient).
For each order, it flows through the above modules sequentially, until either
being sent to Exchange successfully, or being rejected because an error happens
while processing it in one of the modules.
To put it into pseudo code, it will be like below:
if(RequestValidator.process(order)){ if(RiskChecker.process(order)){ if(OrderBook.process(order)){ if(ExchangeClient.process(order)){ // order successfully sent to Exchange } else{ // handle the error that the order can't be sent to Exchange } } else{ // handle OrderBook error } } else{ // handle risk checking error } } else{ // handle validation error }If we want to flatten the nested
if-elsecases, we can transform the code into its equivalent form as below (assuming the code is wrapped in a function that returnsvoid):bool success = false; success = RequestValidator.process(order); if(not success){ // handle validation error return; } success = RiskChecker.process(order); if(not success){ // handle risk checking error return } success = OrderBook.process(order); if(not success){ // handle OrderBook error return } success = ExchangeClient.process(order); if(not success){ // handle the error that order can't be successfully sent to Exchange return }Now, instead of nested
if-elsecases, we have a cascading ofifchecks, each of which handles an error that leads to early return. In realworld systems, this kind of conditional progression is commonplace and the depth (the number ofif’s of the data flow) can be large. For low-latency software, this kind of code poses some limitation on performance optimization regarding branch prediction and inlining. (Note,[[liekly]]and[[unlikely]]won’t help with branch prediction as they only affect the layout of generated machine code. And modern CPU’s usually use dynamic branch predictor that ignores static hints at source code level.) It is also hard for the compiler to statically determine which function call should be inlined for better performance. Profiling Guided Optimization(PGO) will help here, but that is another topic.
In this post, the author want to discuss a simple trick that makes our intention obvious at the call graph level. This trick will allow us for fine-grained tuning for latency critical code afterwards. The idea is, instead of mixing happy path code and error handling code together, we split them into separate functions. Instead of doing
RequestValidator <----> RiskChecker <----> OrderBook <----> ExchangeClient
We do the below as an alternative:
RequestValidator ----> RiskChecker ----> OrderBook ----> ExchangeClient
| | | |
(error) <---- (error) <---- (error) <---- (error)
where the arrow denotes the direction of data flow,
double arraw <----> denotes a function call that returns,
and a single arrow ----> or <---- denotes a function call that does not return.
Because the function call graph looks like a daisy chain, we call it daisy chain
wiring of no-return functions.
In pseduo code, it looks something like below
class RequestValidator{
void process(Order& order){
// do order validation
if(error)[[unlikely]]{
handle_error(order, error);
}
RiskChecker.process(order); // function call of no return
}
void handle_error(Order& order, Error error){
if(error is not module internal error){
// this function is called by RiskChecker::handle_error();
}
else{
// handle validation error
}
}
};
class RiskChecker{
vlid process(Order& order){
// do risk checking
if(error)[[unlikely]]{
handle_error(order, error);
}
OrderBook.process(order); // function call of no return
}
void handle_error(Order& order, Error error){
if(error is not module internal error){
// this function is called by OrderBook::handle_error();
}
else{
// handle risk checking error
}
// notify RequestValidator an error has happened inside RiskChecker
RequestValidator.handle_error(order, error);
}
};
class OrderBook{
void process(Order& order){
// do OrderBook book-keeping
if(error)[[unlikely]]{
handle_error(order, error);
}
ExchangeClient.process(order); // function of no return
}
void handle_error(Order& order, Error errror){
if(error is not module internal error){
// this function is called by ExchangeCliet::handle_error();
}
else{
// handle OrderBook error
}
// notify an error has happened inside OrderBook
RiskChecker.handle_error(order, error);
}
};
class ExchangeClient{
void process(Order& order){
// try send the order to Exchange
if(error)[[unlikely]]{
handle_error(order, error);
}
}
void handle_error(Order& order, Error error){
// notify OrderBook that an error has happened inside ExchangeClient
OrderBook.handle_error(order, error);
}
};
We can see that error handling logic is now completely separated from happy path code.
We can use compiler intrinsics to mark these error handling fuctions as
not being inlined (for GCC, it is __attribute__ ((noinline))),
for we do not want the error handling code to increase the latency of happy path code.
Code snippet like if(error)[[unlikely]] do_function_call() followed by some unconditional code,
is a clear indication that the if branch is for slow path.
The trailing non-return function call at the end of a function will be
translated by the compiler to a simple jump machine code if the called function is not inlined.
This is already faster than a normal function call with return values.
In the case that the called function is inlined,
the generated code will be as if the called function is expanded in place.
As we explicitly forbids inlining error handling code, cache and registers are less likely to
be invalidated in happy path, leading to lower latency.
Furthermore, for modern C++ compilers, trailing non-return function calls at the end
of the enclosing function body can be
optimized further (called Tail Call Optimization) by reusing
the call stack of the calling function. This will further reduce latency for happy path.
We can even do fine-grained manual optimization with Daisy Chian Wiring
by trying forcing inlining for happy path functions (functions named “process” in the above example).
(For GCC it can be done by __attribute__((always_inline))). Keep in mind that forcing
inlining does not necessarilly improve latency.
To find the optimal arrangement, some experiments are needed to determine which function
needs to be inlined or use PGO.
From a data flow perspective, if data flows from the process() function in module A to module B,
it will never flow back to the same function of A.
The result is that any code path that involves error handling is guaranteed to have
a deeper call graph than that of the corresponding happy path code up to the point
that triggers the error to be handled.
For example, if an order is eventally successfully sent to Exchange,
it will involve 4 function calls, but error handling inside ExchangeClient
will consit of 8 function calls.
A deeper call graph is a clear indication that the corresponding code path is
less likely to be taken. Sadly there is not a compiler option to tell the compiler
that the code path with lowest function call depth should be the most optimized performance-wise.
Though I suspect that is already the heuristic used by the compiler when determing
which piece of code is hot.