Self-taught Jump Trading software engineer's tips for optimizing C++ in HFT

19 December 2023

2 minute read

Self-taught Jump Trading software engineer's tips for optimizing C++ in HFT

Writing C++ code worthy of an elite high-frequency trading firm is no easy feat. When every picosecond matters, a few small tweaks can make all the difference. In a series of blog posts on Sunday, Jump Trading software engineer David Gorski talks about coding issues that make him "wake up in the middle of the night, sweating profusely"... and how to fix them.

Making your C++ code interoperable with SQL code can be a pain as queries are "much more human-digestible in a multi-line format," but C++ "treats adjacent string literals as one." Gorski says that usually the best way to fix it is "the raw sting literal" which allows you to put it all in speech marks and "just focus on the SQL."

However, this creates a query "that is almost double in length to any query parsing function/module." While the performance difference isn't too noticeable, it can still be improved. Gorski says he's been experimenting with the constexpr specifier "to provide compile-time utilities." Using this functionality, he can check each character in the string and remove white spaces and new lines.

Click here to sign up to our technology newsletter 🔧

Elsewhere, he calls unnecessary branching a "nightmare." To fix this in binary search implementation, he uses the code below:

article-image-pPHZYGLXS2inAoe6AebK

This approach is particularly useful when used for smaller to medium-sized arrays. Its optimal use case is around a size of 4096 elements, where performance is over four times faster compared to standard code. By a size of 535870912, using it becomes inefficient.

Click here to create a profile on eFinancialCareers. Make yourself visible to recruiters hiring for top jobs in technology and finance.

Have a confidential story, tip, or comment you’d like to share? Contact: +44 7537 182250 (SMS, Whatsapp or voicemail). Telegram: @SarahButcher. Click here to fill in our anonymous form, or email editortips@efinancialcareers.com. Signal also available.

Bear with us if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans might be asleep, or away from their desks, so it may take a while for your comment to appear. Eventually it will – unless it’s offensive or libelous (in which case it won’t.)

Photo by Baltasar Henderson on Unsplash

AUTHORAlex McMurray Reporter

Be
Bernie B.
19 December 2023
I came across the article the day it was published, and a few things are worth nothing.

David Gorski wrote a very simplified version of what can be found in https://en.algorithmica.org/hpc/data-structures/binary-search/ (the link was provided in the article).

There is 2 branches in the code: the first one is the loop, and the second one is the condition for the binary search.

The outer branch (loop) is not addressed and could be (you only have a finite amount of iterations, and could unroll at least part of the code).

The branch for the binary search is a simple rewriting of the code to guide the compiler into using a cmov.
As the article on algorithmica pointer rightly, gcc is known to have a "random" handling of cmov (some version of the compiler use it, some don't) and one has to be very careful when expecting cmov to be used (basically, always check the code generated).
Should cmov be an issue, an alternative could be using bit mask from arithmetic right shift and some logical operations.
There is a lot of controversy with cmov and its performance and this is not addressed in any of the article.
Should cmov be an issue, an alternative could be using bit mask from arithmetic right shift and some logical operations.

The article is rather imprecise when writing "there will be no prediction to preload the next search space mid-points and at larger array sizes".
This is mostly about pre-emptive execution of a particular predicted branch, which could be "on the wrong side" of the binary search.
In the much better algorithmica article, the author did 2 prefetches on either side but this has some drawbacks (additional problems with this related to memory accesses).

Reply