Double DQN is a deep RL approach, more specifically deep Q-learning, that relies on two neural networks, as we shall see shortly (in Section 4.1.7). In this paper we present a double DQN applied to the market-making decision process. Typically, in the beginning the agent does not know the transition and reward functions. It must explore actions in different states and record how the environment responds in each case.
Are they scaled by some scaling avellaneda market making beforehand – and what data is this parameter estimated from ? If not, how much data is lost by only using the price differences with absolute values smaller than 1? Also, if the market candle features are “divided by the open mid-price for the candle”, does this mean that all of those higher than the mid-price would be would be truncated to 1? The methodology might be more sound than this, but the text simply does not offer answers to these questions.
Shanghai Upgrade: Ethereum devs target 14 March for Goerli Testnet Upgrade!
Depending on inventory sizes, P&L targets, expected price moves, to name a few variables, a market maker can asymmetrically skew the bid and ask prices of their quotes. This fine-tuning introduces feedback mechanisms leading to non-linear behaviour in the market. In essence, successful market making not only depends on formal constraints but is the result of a delicate mix WAVES containing additional creative considerations.
- Here the single best-performing model was Alpha-AS-2, winning for 16 days and coming second on 10 (on 9 of which losing to Alpha-AS-1).
- By default its value is 0, therefore the strategy places orders at optimal bid and ask prices.
- The original Avellaneda-Stoikov model was designed to be used for market making on stock markets, which have defined trading hours.
Inventory management is therefore central to market making strategies , and particularly important in high-frequency algorithmic trading. In an influential paper , Avellaneda and Stoikov expounded a strategy addressing market maker inventory risk. The optimal bid and ask quotes are obtained from a set of formulas built around these parameters. The rationale behind the strategy is, in Avellaneda and Stoikov’s words, to perform a ‘balancing act between the dealer’s personal risk considerations and the market environment’ [ibid.].
High-frequency trading in a limit order book
Reinforcement learning algorithms have been shown to be well-suited for use in high frequency trading contexts [16, 24–26, 37, 45, 46], which require low latency in placing orders together with a dynamic logic that is able to adapt to a rapidly changing environment. In the literature, reinforcement learning approaches to market making typically employ models that act directly on the agent’s order prices, without taking advantage of knowledge we may have of market behaviour or indeed findings in market-making theory. These models, therefore, must learn everything about the problem at hand, and the learning curve is steeper and slower to surmount than if relevant available knowledge were to be leveraged to guide them. We were able to achieve some parallelisation by running five backtests simultaneously on different CPU cores. Upon finalization of the five parallel backtests, the five respective memory replay buffers were merged.
Table 8 provides further insight combining the results for Max DD and P&L-to-MAP. From the negative values in the Max DD columns, we see that Alpha-AS-1 had a larger Max DD (i.e., performed worse) than Gen-AS on 16 of the 30 test days. However, on 13 of those days Alpha-AS-1 achieved a better P&L-to-MAP score than Gen-AS, substantially so in many instances. Only on one day was the trend reversed, with Gen-AS performing slightly worse than Alpha-AS-1 on Max DD, but then performing better than Alpha-AS-1 on P&L-to-MAP. On the whole, the Alpha-AS models are doing the better job at accruing gains while keeping inventory levels under control. This is obtained from the algorithm’s P&L, discounting the losses from speculative positions.
Figures and Tables from this paper
Such control provides the seminal framework of avellaneda market making and Stoikov which solves analytically the problem of maximizing a utility formula including effects from the inventory risk. The solution is an equation for the so called reservation price, which is an adjusted discounted mid-price in proportion to a risk aversion parameter, the position, the volatility and the terminal time. The trades are placed symmetrically around the reservation price using the spread , not exactly around the current mid-price.
Nevertheless, the prices 4 and 8 orderbook movements prior the action setting instant also make fairly a strong appearance in the importance indicator lists , suggesting the existence of slightly longer-term predictive component that may be tapped into profitably. The AS algorithm is static in its reliance on analytical formulas to generate bid and ask quotes based on the real-time input values for the market mid-price of the security and the current stock inventory held by the market maker. These formulas have fixed parameters to model the market maker’s aversion to risk and the statistical properties of market orders. We introduce an expert deep-learning system for limit order book trading for markets in which the stock tick frequency is longer than or close to 0.5 s, such as the Chinese A-share market. This half a second enables our system, which is trained with a deep-learning architecture, to integrate price prediction, trading signal generation, and optimization for capital allocation on trading signals altogether. It also leaves sufficient time to submit and execute orders before the next tick-report.
Setting the bid and ask quotes requires computation of the future spread or alternatively discovery of the future mid-price . The MM mechanisms include also procedures for inventory management that are necessary to control the exposure risk arising from large price movements . In a previous article entitled “Simplified Avellaneda-Stoikov Market Making” we discussed how to use such a strategy to manage and control a portfolio’s inventory risk. Avellaneda & Stoikov taught us that we should adjust our bid/ask prices in order to hedge against inventory risk, and the amount to be adjusted should be linearly proportional to the one-side inventory excess. For small market swings this is a decent weapon for ourselves to be protected from running out of inventory . However, crypto markets are famous for its extreme volatility and seem to be constantly manipulated by the hands of some invisible devils.
- Other indicators, such as the Sortino ratio, can also be used in the reward function itself.
- For the optimization of the parameters involved in the model, a distributed adaptive proximal Newton gradient descent learning strategy is proposed to accelerate the convergence.
- If the user wishes the spread between these two prices to be wider, the risk factor should be set to a higher value.
- Our research developed a hybrid market making algorithm using Bayesian learning to infer the expected mid-price as suggested by information-based models, and the mid-price is next plugged in an inventory-based model to calculate the bid and ask quotes.
On the other hand, using a smaller κ, you are assuming the order book has low liquidity, and you can use a more extensive spread. Please inspect the strategy code in Trading Logic above to understand exactly how it works. The spread (from mid-price) to defer the order refresh process to the next cycle. When placing orders, if the order’s size determined by the order price and quantity is below the exchange’s minimum order size, then the orders will not be created. You will need to hold a sufficient inventory of quote and or base currencies on the exchange to place orders of the exchange’s minimum order size.
The Asymmetric dampened P&L penalizes speculative positions, as speculative profits are not added while losses are discounted. Where the 0 subscript denotes the best orderbook price level on the ask and on the bid side, i.e., the price levels of the lowest ask and of the highest bid, respectively. The procedure, therefore, has two steps, which are applied at each time increment as follows. PLOS ONE promises fair, rigorous peer review, broad scope, and wide readership – a perfect fit for your research every time. For asymptotic expansions when T is large you should read the paper by Guéant, Lehalle, and Fernandez-Tapia here or the book of Guéant The financial mathematics of market-liquidity. The model was created before Satoshi Nakamoto mined the first Bitcoin block, before the creation of trading markets that are open 24/7.
🚀 We shipped the NEW release v0.38 today!💪
👉 Read the release notes: https://t.co/Nv2iza6zKq
— hummingbot (@_hummingbot) April 6, 2021
The https://www.beaxy.com/ of the Alpha-AS models in terms of the Sharpe, Sortino and P&L-to-MAP ratios was substantially superior to that of the Gen-AS model, which in turn was superior to that of the two standard baselines. On the other hand, the performance of the Alpha-AS models on maximum drawdown varied significantly on different test days, losing to Gen-AS on over half of them, a reflection of their greater aggressiveness, made possible by their relative freedom of action. Overall, however, days of substantially better performance relative to the non-Alpha-AS models far outweigh those with poorer results, and at the end of the day the Alpha-AS models clearly achieved the best and least exposed P&L profiles.