Quantitative Day Trading from Natural Language using Reinforcement Learning

It is challenging to design profitable and practical trading strategies, as stock price movements are highly stochastic, and the market is heavily influenced by chaotic data across sources like news and social media. Existing NLP approaches largely treat stock prediction as a classification or regression problem and are not optimized to make profitable investment decisions. Further, they do not model the temporal dynamics of large volumes of diversely influential text to which the market responds quickly. Building on these shortcomings, we propose a deep reinforcement learning approach that makes time-aware decisions to trade stocks while optimizing profit using textual data. Our method outperforms state-of-the-art in terms of risk-adjusted returns in trading simulations on two benchmarks: Tweets (English) and financial news (Chinese) pertaining to two major indexes and four global stock markets. Through extensive experiments and studies, we build the case for our method as a tool for quantitative trading.


Introduction
The stock market, a financial ecosystem involving quantitative trading and investing, observed a market capitalization exceeding $US 60 trillion as of the year 2019. Stock trading presents lucrative opportunities for investors to utilize the market as a platform for investing funds and maximizing profits. However, making profitable investment decisions is challenging due to the market's volatile, noisy, and chaotic nature (Tsay, 2005;Adam et al., 2016). Research at the intersection of Natural Language Processing (NLP) and finance presents encouraging prospects in stock prediction (Jiang, 2020). Conventional work forecasts future trends by modeling numerical historical stock data (Lu *Equal contribution. Figure 1: Here, we show how tweets about Tesla and Moderna influence investors' opinions and impact the stocks over a day and the upcoming week. The tweets by the Tesla CEO Elon Musk lead to massive price drops in Tesla's stock, and Moderna's positive news attracts investments in its stock. A profitable trading decision would entail selling off Tesla's shares (if already held) and buying Moderna's stock in such a scenario. Bao et al., 2017). However, price signals alone can not capture market surprises, mergers, acquisitions, and company announcements. Such events, often reported across financial news and social media, have strong influence over market dynamics (Laakkonen, 2004). For instance, prices immediately react to breaking news about the related company (Busse and Green, 2002). Such reactions conform to the Efficient Market Hypothesis (EMH), a hypothesis in finance which states that financial markets are informationally efficient and prices reflect all available market information at any given time (Malkiel, 1989).
The abundance of stock affecting information across news and social media online inspires the adoption of natural language processing to study the interplay between textual data and stock prices (Oliveira et al., 2017;Xu and Cohen, 2018). However, unlike numerical data, the study of natural language is more challenging. Individual tweets or news headlines may not be informative enough, and analyzing them together can provide a greater context, as shown in Figure 1. Moreover, the timing of their release plays a critical role as stock markets rapidly react to new information (Foucault et al., 2016). Furthermore, not each news story or tweet holds the potential to influence stock trends as texts have a diverse influence on prices (Hu et al., 2017). These observations suggest benefits in factoring in the time-aware dependence and diverse influence of text while analyzing natural language.
Despite profitability being the prime objective of quantitative trading, existing natural language processing methods for stock prediction (Hu et al., 2017;Xu and Cohen, 2018;Du and Tanaka-Ishii, 2020) are commonly formulated as classification or regression tasks, and are not directly optimized towards profit generation. Such methods face fundamental drawbacks. First, they do not innately incorporate the decision making and strategies involved in quantitative trading, in turn limiting potential profitability. Second, they have limited practical applicability as they do not factor in the monetary resources available and financial assets (stocks) held with a trader at each trading time step. This gap presents a new research direction where profit generation can be directly optimized by modeling the complex sequential decision-making process in quantitative trading as a Reinforcement Learning (RL) task. Owing to its nature, RL formulation is directly suitable to the problem of quantitative trading as it provides the potential to automatically learn the adjustment of investment budgets across stocks in portfolios while taking into account the configuration of investments made in the past.
Contributions: We formulate stock prediction as a reinforcement learning problem ( §3) and present PROFIT: Policy for Return Optimization using FInancial news and online Text, a deep reinforcement learning approach that leverages financial news and tweets to model stock-affecting signals and optimize trading decisions for increasing profitability. PROFIT accounts for the monetary resources available and the existing portfolio to execute profitable trades at any given time. Through extensive experiments ( §5) on English and Chinese text corresponding to the NASDAQ, Shanghai, Shenzhen, and Hong Kong markets, we show that PROFIT outperforms state-of-the-art methods in terms of risk adjusted returns by over 13% and minimizes extreme losses by over 16% ( §6.1, §6.2). Using exploratory analyses ( §6.3), we show PROFIT's practical and real-world applicability.

Background
Reinforcement Learning and Natural Language Processing Lately, reinforcement learning has influenced solutions for a wide variety of natural language processing tasks and applications. These include, but are not limited to information extraction (Qin et al., 2018), social media analysis (Zhou and Wang, 2018), text classification (Wu et al., 2018a), extractive (Narayan et al., 2018) and abstractive (Chen and Bansal, 2018) text summarization, neural machine translation (Wu et al., 2018b), text-based games (He et al., 2016a;Ammanabrolu and Riedl, 2019), knowledge-based question answering (Hua et al., 2020) and much more. For these tasks and applications, deep reinforcement learning methods have been more successful in modeling the complexities involved in natural language, such as the processing of large vocabularies and phrases that otherwise make action selection (He et al., 2016a,b) arduous for RL methods that do not exploit deep networks as function approximators. However, most existing methods for a variety of tasks face a fundamental drawback -they do not take into account the influence of the inherent dynamic temporal irregularities and the variably influential nature of text while modeling a time-series of language data over action selection and sequential decision making.
Reinforcement Learning in Finance Recent years have witnessed the adoption of reinforcement learning in the financial realm to solve tasks such as portfolio management (Filos, 2019;Almahdi and Yang, 2019), equity asset reallocation (Meng and Khushi, 2019; Katongo and Bhattacharyya, 2021), cryptocurrency trading Lucarelli and Borrotti, 2019;Ye et al., 2020) and much more. Existing work heavily relies on factors such as technical indicators  to model price signals, or use simple numeric features like sentiment scores from text  to model stock affecting information reflected across news items. However, these methods experience two significant drawbacks. Firstly, despite their success, the performance of such methods depend largely on the quality of external feature representations (for instance, sentence embeddings (Ye et al., 2020)) of text. Secondly, methods that only use prices exhibit lower practical applicability to real-world trading, owing to the lack of information in prices alone.

Problem Description
We formulate stock trading as a reinforcement learning problem. Let S = {s 1 , s 2 , . . . , s N } denote a set of N stocks. We aim to design a trading agent that learns to interact with the stock market environment by leveraging stock-affecting signals present across financial news items and tweets to trade stocks. In the context of an agent, an interaction comprises observing the environment state at any particular time-step to generate an action, and reach the next time-step to receive a reward along with the next state. The typical Markov Decision Process (MDP) description is widely adopted for RL tasks where environments are fully-observable. However, in the stock market, prices are influenced by numerous macro-and micro-economic factors, investor opinions about stocks formed through social media, financial news, and countless other sources. Thus, it becomes pragmatically and computationally impractical to observe and incorporate stock affecting information from all possible sources to make trading decisions. As the stock markets and the underlying factors that drive stock prices are not fully-observable, Partially Observable MDP (POMDP) provides a natural generalization of the MDP to model the stock trading environment (Jaakkola et al., 1995). Hence, the key components of the stock trading environment considered and developed in this study are as follows: State observations: At a time-step τ , the state s τ comprises a trading-account observation o τ , and a market-information observation o m . The trading-account observation o τ comprises the account balance and the number of shares owned corresponding to each stock at time-step τ . The market-information observation o m comprises stock-relevant news or tweets released during a T-day lookback period (days ∈ [τ − T + 1, τ ]). The text input in o m is structured such that it comprises all stock relevant text in a lookback window of length T in a hierarchical fashion within and across days. The orders made through the trading actions taken by the reinforcement learning agent would have minute impacts on the overall market trends, thus having little to no direct influence on the market-information observations.
Trading actions: The agent can buy, sell, or hold the shares for each stock at the time-step τ . We compute a vector of actions a τ over the set of stocks S as decisions made by the agent, which result in an increase, decrease, or no change in the number of stocks shares h. One of three possible actions is taken on each stock s: Note that the trading actions at time-step τ directly impact the trading-account observation at time-step τ + 1, o τ +1 .
Rewards: We define the reward as the change in the value when an action is taken at state s τ to arrive at new state s τ +1 . Corresponding to each state change, we define a return r, as: where b τ is the account balance, p τ is a vector that represents the stock prices, h τ denotes the stock shares in the trading account, and c τ denotes the transaction costs incurred at time-step τ . To maximize the earned profit, we aim to design a reinforcement learning agent that maximizes the cumulative change r(s τ , a τ , s τ +1 ).

Proposed Approach: PROFIT
We adopt reinforcement learning to optimize profitability in quantitative trading. To this end, we introduce PROFIT, a deep reinforcement learning approach for text-based stock trading, as shown in Figure 2. For this study, we make use of a custom policy network that hierarchically and attentively learns time-aware representations of news and tweets to trade stocks. In practice, PROFIT's proposed policy network is generalizable across various actor-critic reinforcement learning methods that exploit neural networks as function approximators. Moreover, PROFIT is compatible with any custom policy network of the same nature that can handle textual time-series data.

Deep Reinforcement Learning
We base PROFIT on the Deep Deterministic Policy Gradient (DDPG) framework (Lillicrap et al., 2015), which bridges the gap between policy gradient (Sutton et al., 2000) and value approximation methods (Watkins and Dayan, 1992) for RL. The DDPG decouples the trading action selection and the trading action evaluation processes into two separate jointly learned networks: the actor network, and the critic network. The actor-network µ, parameterized by θ, takes the observations at state s τ as input, and outputs the trading actions a τ . The critic-network Q, parameterized by φ, takes the observations at state s τ and trading actions a τ from the actor as input. It then outputs a scalar Q(s τ , a τ ) to evaluate the action a τ . For each state s τ , the agent performs an action a τ , receives a reward r τ , and reaches the next state s τ +1 . These transitions represented as (s τ , a τ , s τ +1 , r τ ) are stored in a replay buffer D. Subsequently, a mini-batch B comprising N transitions is sampled from D for updating the model. For each batch B, PROFIT minimizes the following loss L with respect to φ to update the critic as: where y τ is the updated Q-value, γ is a discount factor, θ and θ , φ and φ are the two copy parameters of the policy µ and the value function Q, respectively. The actor is updated using the policy gradient ∇ θ J via backpropagation through time as: In the above equations, θ and θ , φ and φ are the two copy parameters of the policy µ and the value function Q, respectively. For a detailed explanation of the framework, we refer the readers to Lillicrap et al. (2015). Next, we define the trading policy network, which takes the observations at state s τ as input to generate stock trading actions a τ . We use the same architecture for defining the actor and the critic networks.

Trading Policy Network
To generate trading actions, we first learn representations for each stock s ∈ S using the Tday market-information observation o m , and the trading-account observation o τ at the time-step τ . For this study, we derive inspiration from Hu et al. (2017); Sawhney et al. (2020Sawhney et al. ( , 2021 to design the policy network. However, it is important to note that PROFIT is compatible with any general deep network that is capable of handling time-series of textual data. We specifically adopt the following network as it inherently covers a breadth of components that are proved beneficial for designing language-based systems for stock trading. First, PROFIT's policy encodes the texts t corresponding to a stock s released in a day using BERT (Devlin et al., 2019). We tokenize and truncate the input text (t) for each news item or tweet and feed it to BERT. We then aggregate the final hidden states (the final-layer transformer outputs) of the input to get the encoded representation (m, size 768) as as m = BERT(t) ∈ R d , d=768. We also experiment with the [CLS] token and other pooling techniques such as maximum of hidden states and concatenation of mean and maximum of hidden states but do not obtain better results.
For each stock s on a day i, a variable number (K) of tweets (t) are posted at irregular times (k). LSTMs though able to capture the sequential con-text dependencies in text over time, assume inputs to be equally spaced in time. However, the intervals between release of consecutive news items or tweets can vary widely, from a few seconds to many hours, and that can have a drastic impact on their influence on the market (O'Hara, 2015). Thus, we use a time-aware LSTM (TLSTM) (Baytas et al., 2017), to capture the irregularities in the release of text, and encode them for a stock s on a day i.
All news and tweets in a day might not be equally informative, and may have diverse influence over a stock's trend (Barber and Odean, 2007). We use an intra-day attention mechanism (Qin et al., 2017) that allows the trading agent to emphasize texts likely to have a more substantial impact on price. The attention mechanism learns to adaptively aggregate the variable number of hidden states of the t-LSTM into an intra-day text information vector. We combine these representations across days in a hierarchical fashion using an LSTM.
We use attention again over the outputs of the LSTM to obtain a market-information vector p τ comprising financial signals across tweets or news items released over the lookback. Lastly, we concatenate the trading-account observation o τ at state s τ , with the market-information vector p τ to form an overall stock-level representation z τ = [o τ , p τ ].
Trading actions: We concatenate the stockrepresentations z τ to form a feature vector Z across stocks for day τ . We then feed Z to a feed-forward network, followed by a tanh activation function, which outputs actions a τ to buy, hold or sell the shares of each stock s ∈ S at the time-step τ . Pre-processing: We pre-process English tweets using the NLTK 7 (Twitter mode), for treatment of URLs, identifiers (@) and hashtags (#). We adopt the Bert-Tokenizer for tokenization of tweets. For the English tweets, we use the pre-trained BERTbase-cased 8 model. For the Chinese news, we adopt the Chinese-BERT-base 8 model, having 12 layers and 110M parameters. We use characterbased tokenization for the Chinese headlines. We collect prices from Yahoo Finance. 9 We align trading days by dropping data samples that do not possess tweets for a consecutive 7-day window, and further align the data across windows for stocks to ensure that data is available for all days in the window for the same set of stocks. We split the US S&P 500 dataset temporally based on date ranges from January 01, 2014 to July 31, 2015 for training, August 01, 2015 to September 30, 2015 for validation, and October 01, 2015 to January 01, 2016 for testing. We split the China & Hong Kong dataset temporally based on date ranges from January 01, 2015 to August 31, 2015 for training, September 01, 2015 to September 30, 2015 for validation, and October 01, 2015 to January 01, 2016 for testing all models and experiments.

PROFIT Training Setup
We conduct all experiments on a Tesla P100 GPU. We use grid search to find optimal hyperparameters based on the validation Sharpe Ratio ( §5.3) for all 4023 models. We build the RL agent in Python programming language using PyTorch and employ Ope-nAI gym to implement the stock trading environment. We explore the length of the lookback period T ∈ range[2, 10] days. Across both the datasets, we obtain that the model best performs for a weeklong lookback -i.e. 7 days. We explore the hidden state dimension for both TLSTM and LSTM d ∈ [32, 64, 128] (we achieve the best performance for: d = 64, both for the TLSTM and the LSTM) across both the datasets. We factor the time elapsed between the successive posting of texts at the common finest granularity available across the datasets -i.e. 1 minute intervals. We use the Xavier initialization (Glorot and Bengio, 2010) to initialize all network weights. We use an exponential learning rate scheduler (Li and Arora, 2019) with a decay rate of 0.001 and an initial learning rate of 7e−5. For each dataset, we train PROFIT using the Adam optimizer (Kingma and Ba, 2014).

Evaluation Metrics
To assess the profitability and trading performance of all methods, we compute the Sharpe ratio (SR), its variant Sortino Ratio (StR), the Cumulative Return (CR), and the Maximum Drawdown (MDD). The Sharpe Ratio is a measure of the return of a portfolio compared to its risk (Sharpe, 1994). We calculate SR by computing the earned return R a in excess of the risk-free return 10 R f , defined as: SR = The Sortino Ratio is a variation of the Sharpe Ratio, which uses an asset's standard deviation of negative portfolio returns (downside deviation, σ d ) as: StR = The StR is a useful way to evaluate an investment's return for a given level of bad risk, and provides a better view of the risk-adjusted return -as positive volatility is essentially considered beneficial. The CR is the change in the investment over time and is computed using the initial (b 0 ) and the final (b f ) account balance as: The MDD measures the maximum loss from a peak r p to a trough r t of a portfolio, and is defined as: MDD = rt−rp rp * 100. Larger values (in magnitude) of MDD indicate higher volatility. MDD is an indicator used to assess the relative riskiness of one stock trading strategy versus another, as it focuses on capital preservation, which is a key concern for most investors. For instance, two trading strategies may have the same volatility, average outper-10 T-Bill rates: https://home.treasury.gov/ formance, and tracking error, but their maximum drawdowns compared to the benchmark can differ drastically. Investors typically prefer the strategy with lower maximum drawdowns.

Practical Trading Constrains
The following assumptions and constraints reflect concerns for practical stock trading. PROFIT accounts for various elements of the trading process and the financial aspects like transaction costs, market liquidity, and risk-aversion .
Non-negative account balance: Ideally, the allowed trading actions should not result in a negative account balance. Based on the stock-level actions generated at time τ , the stocks are divided into sets for selling, buying, and holding, non-overlapping sets. The constraint for non-negative balance is that for any given time step τ , the sum of account balance b τ ; the money gained through selling the stocks in set S; and the money spent for acquiring the stocks in the buying set: should be positive, or at minimum zero.
Transaction costs: For each trade, various types of transaction costs such as exchange fees, execution fees, and SEC fees are incurred. Further, in practice, different brokers have different commission fees, and despite these variations, we assume our transaction costs to be 0.1% of the value of each trade (either buy or sell).

Baseline Approaches
We compare PROFIT with baselines spanning different formulations: regression, classification, ranking, and reinforcement learning. We follow the same preprocessing protocols as proposed by the original works and adopt their implementations, if available publicly.
Regression (REG) These methods regress return ratios from past data and trade the top stocks.
• AZFinText: Proper noun-based text representations fed to Support Vector Regression for forecasting returns (Schumaker and Chen, 2009).

Classification (CLF)
The following methods classify movements as [up, down, neutral] and trade the stocks where prices are expected to rise.  Table 1: Trading performance over different problem formulations (mean of 5 runs). All formulations use the same base architecture defined in PROFIT's policy network to model stock affecting text over the lookback period.
• TSLDA: Topic Sentiment Latent Dirichlet Allocation, a generative model jointly exploiting topics and sentiments in textual data (Nguyen and Shirai, 2015).
• StockEmb: Stock embeddings acquired using prices, and dual vector (word-level vectors and context-level vectors) representation of texts (Du and Tanaka-Ishii, 2020).
• MAN-SF (text only): BERT based hierarchical encoder for financial text using hierarchical temporal attention (Sawhney et al., 2020).
• Chaotic: A Hierarchical Attention Network using GRU encoders with temporal attention applied on text within days, and the days in the lookback period (Hu et al., 2017).

Ranking (RAN)
The following methods rank stocks to select most profitable trading candidates.
• RankNet: A DNN that utilizes sentiment-based shock and trend scores to optimize a probabilistic ranking function .

Reinforcement Learning (RL)
The following approaches optimize quantitative trading through reinforcement learning.
• iRDPG: An imitative Recurrent Deterministic Policy Gradient (RDPG) algorithm exploiting temporal stock price features, while optimizing the Sharpe Ratio as the reward .
• AlphaStock: An LSTM based network to model prices, comprising attention to model inter-stock cross relations .
• S-Reward: Inverse reinforcement learning method to model relations between sentiments and returns .
• SARL: A Deterministic Policy Gradient with augmented states, comprising stock prices and encoded news (Ye et al., 2020).
6 Results and Discussion

Stock Trading Problem Formulation
We experiment with four different formulations for neural stock trading in Table 1. For each formulation, we treat our custom policy trading network as the base architecture for modeling stock affecting textual information over the lookback period. We find that classification and regression formulations generate relatively low profits compared to others. This is likely as trades in such methods are not optimized for the overall profit as a reward. Moreover, another limitation of classification and regression approaches is that the trading strategy needs to be defined manually. Next, we find that reinforcement learning provides the best performance as it allows PROFIT to enjoy a more granular control over trading actions and learn to optimize the strategy directly for making profitable trades using text. Further, we also observe that trading under RL formulation experiences the lowest MDD, likely as the agent has more flexibility in selecting the trades, which leads to lower losses. Next, we study how different baseline stock trading networks across the four formulations perform compared to PROFIT.

Performance Comparison with Baselines
We now compare PROFIT's profitability (Sharpe Ratio) and risk in investment (Maximum Drawdown) against baseline approaches in Table 2. PROFIT generates higher risk-adjusted returns and experiences lower losses than all methods, as we show in Figure 3. We find methods that incorporate stock affecting information from textual sources generate profits higher or comparable to price-only methods. These results indicate that textual sources  (Du and Tanaka-Ishii, 2020) T + P + A 0.51 ± 0.14 22.01 ± 10.87 0.74 ± 0.21 20.19 ± 9.39 SN -HFA (Xu and Cohen, 2018) T + P + A 0.81 ± 0.08 12.15 ± 2.01 0.93 ± 0.09 8.17 ± 1.97 MAN-SF (Text only) (Sawhney et al., 2020) T + A 0.80 ± 0.11 18.09 ± 7.24 1.01 ± 0.15 8.95 ± 6.19 Chaotic (Hu et al., 2017) T  can augment neural stock prediction, as they potentially help capture classic financial anomalies such as the over-and under-reaction of asset prices to news (Bondt and Thaler, 1985;Corgnet et al., 2013). This observation also follows prior research that shows financial text are generally better indicators of market volatility, compared to price signals (Atkins et al., 2018). In general, we observe that ranking and reinforcement learning methods generate high returns as they are directly optimized towards profit generation. Further, reinforcement learning approaches are typically more profitable as the trading agents optimize every trading action for profit generation directly, unlike ranking, where the task is only to select profitable stocks to trade. These observations validate the premise of formulating quantitative trading as a reinforcement learning problem, compared to conventionally adopted regression and classification formulations. Despite the 2015-16 Chinese Market Turbulence Recession 11 (Liu et al., 2016), the lower MDD of PROFIT indicates the trading agent's ability to respond to bearish markets 12 , and its performance is attributable to the following reasons. Amongst competitive baselines, PROFIT's policy design differentiates it from others, as it captures the hierarchical dependencies in the news and attentively learns to emphasize crucial trading indicators during such turbulent economies. The attention mechanisms potentially account for financial phenomena such as the calendar (Jacobs and Levy, 1988) and 11 https://www.vox.com/2015/7/8/8908765/ chinas-stock-market-crash-explained 12 Bearish markets are those that experience prolonged price declines, experience high volatility and risk on investments. the day-of-the-week (Halil, 2001) effects, and better distinguish noise inducing text from relevant market signals to minimize false evaluations and overreactions (De Long et al., 1989). Further, Jiao et al. (2020) show that frequent news media coverage is an indicator of a decrease in stock volatility. Through its time-aware mechanism, the agent can incorporate such frequencies and learn to trade less volatile stocks to execute low-risk and high-profit trades even in bearish market scenarios.

Parameter Analysis: Probing Sensitivity
Lookback period length T Here, we study how PROFIT's performance varies with the length of lookback period T ∈ [2, 10] days in Figure 4. Lower performance indicates the inability of shorter lookbacks to capture stock affecting market information, as public information requires time to absorb into price movements (Luss and D'Aspremont, 2015).
As we increase T , we observe a deterioration in 2 3 4 5 6 7 8 9  Figure 4: Sensitivity to parameters T and b 0 the trading performance. This indicates that larger lookbacks allow the inclusion of stale information from older days having relatively lower influence on prices (Bernhaedt and Miao, 2004). We observe optimal performance for mid-sized lookbacks.
Initial trading balance b 0 To further analyze PROFIT's trading performance, we simulate the cumulative returns for different initial trading amounts. Financial studies highlight that larger investments are prone to higher risk (Stout, 1995), as higher budgets allow increased risk-taking abilities. Ghysels et al. (2005) find significantly positive relations between larger risk and higher returns (risk-return tradeoff). 13 PROFIT's performance is akin to this phenomena as we observe generally high rewards even for riskier decisions taken on larger investments, as shown in Figure 4. We attribute PROFIT's versatility to its policy design that allows diverse trading choices based on resource availability. These results indicate that PROFIT holds practical applicability to investors across diverse economic milieus: from individual traders to larger firms having greater investment margins.

Conclusion
We propose PROFIT, a deep RL approach for quantitative trading using textual data across online news and tweets. To model the market information, PROFIT hierarchically learns temporally relevant signals from texts in a time-aware fashion, and directly optimizes trading actions towards profit generation. Through extensive analyses on English tweets and Chinese news spanning four markets, we highlight PROFIT's real-world applicability. In trading simulations on the S&P 500 and China Ashares indexes, PROFIT outperforms baselines in terms of profitability and risk in investment.

Ethical Considerations
There is an ethical imperative implicit in this growing influence of automation in market behavior, and it is worthy of serious study (Hurlburt et al., 2009;Cooper et al., 2020). Since financial markets are transparent (Bloomfield and O'Hara, 1999), and heavily regulated (Edwards, 1996), we discuss the ethical considerations pertaining to our work. Following (Cooper et al., 2016), we emphasize on three ethical criteria for automated trading systems and discuss PROFIT's design with respect to these criteria.
Prudent System A prudent system "demands adherence to processes that reliably produce strategies with desirable characteristics such as minimizing risk, and generating revenue in excess of its costs over a period acceptable to its investors" (Longstreth, 1986). PROFIT is directly optimized towards profit-generation and minimizing investor risk by selectively investing in the less volatile stocks ( §6.2), and generates risk-adjusted returns: Sharpe Ratio, as shown in Table 2.
Blocking Price Discovery A trading system should not block price discovery and not interfere with the ability of other market participants to add to their own information (Angel and McCabe, 2013). For example, placing an extremely large volume of orders to block competitor's messages (Quote Stuffing) or intentionally trading with itself to create the illusion of market activity (Wash Trading). PROFIT does not block price discovery in any form.
Circumventing Price Discovery A trading system should not hide information, such as by participating in dark pools or placing hidden orders (Zhu, 2014). We evaluate PROFIT only on public data in highly regulated stock markets. Despite these considerations, it is possible for PROFIT, just as any other automated trading system, to be exploited to hinder market fairness. We follow broad ethical guidelines to design and evaluate PROFIT, and encourage readers to follow both regulatory and ethical considerations pertaining to the stock market.