Deep Learning Framework for Measuring the Digital Strategy of Companies from Earnings Calls

Companies today are racing to leverage the latest digital technologies, such as artificial intelligence, blockchain, and cloud computing. However, many companies report that their strategies did not achieve the anticipated business results. This study is the first to apply state-of-the-art NLP models on unstructured data to understand the different clusters of digital strategy patterns that companies are Adopting. We achieve this by ana-lyzing earnings calls from Fortune’s Global 500 companies between 2015 and 2019. We use Transformer-based architecture for text classification which show a better understanding of the conversation context. We then investigate digital strategy patterns by applying clustering analysis. Our findings suggest that Fortune 500 companies use four distinct strategies which are product-led, customer experience-led, service-led, and efficiency-led . This work provides an empirical baseline for companies and researchers to enhance our understanding of the field.


Introduction
The use of digital technologies is transforming the modern economy with significant implications for businesses. As a result, organizations are perceiving digital technologies to present both growth opportunities and existential threats (Sebastian et al., 2017). Despite efforts to adopt such technologies, research demonstrates that the success rate in improving business performance is very low due to the lack of a coherent digital strategy (Correani et al., 2020). A positive research step to address this challenge is enhancing the understanding of current approaches to digital strategy.
Corporate documents are increasingly being used to understand the performance of organizations for various purposes. Examples are, predicting expected returns from financial reports (Theil et al., 2019) and measuring compliance from sustainability reports (Smeuninx et al., 2020).
In the present study, we use earnings calls transcripts, as they provide rich insights into companies' activities, specifically on their digital strategy. We quantify the approach taken by companies based on the progress made on the following three components related to the digital strategy (Vial, 2019): • Business value: the expressed value of using a specific digital solution (e.g., enhancing customer experience and increasing operational efficiency) • Strategy management: the policies and practices in place to support the implementation of digital solutions (e.g., setup of innovation lab, acquisition of startups, and use of agile methods) • Digital technology: the technology used as a part of the identified digital solution (e.g., artificial intelligence, Internet of things, and robotics) In this work, we offer two contributions. First, we set the baseline for using deep learning based NLP models to measure the digital strategy of companies from earnings calls transcripts. Second, we apply this framework to investigate the digital strategy of Fortune's Global 500 companies. 2

Related Work
There have been a few recommendations for the digital strategy to be customer focused, product focused (Sebastian et al., 2017), or business model focused (Vial, 2019). However, an empirical investigation of digital strategy archetypes is still needed (Tekic & Koroteev, 2019).
A few NLP-driven publications have investigated the digital strategy of companies. This included network analytics of website tags (Stoehr et al., 2019), document clustering of companies' description (Riasanow et al., 2020), and keyword analysis of financial reports (Pramanik et al., 2019). While all these studies revealed insights into the digital strategy of companies, they were exploratory in nature with limited quantitative evaluation of digital strategy components.
Earnings calls transcripts are known to be rich data sources and considered to be more informative and insightful than company filing (Frankel et al., 1999). Moreover, they can be leveraged for text classification to identify decisions and activities that companies take (Keith & Stent, 2019). Therefore, we propose text classification of digital strategy related topics from earnings calls transcripts as detailed in Section 3.

Methodology
There is no generic framework to measure company performance, as it can be topic specific. Measuring digital strategy retrospectively is accomplished by identifying the progress made on its components, also known as digital maturity (Gurbaxani & Dunkle, 2019). From an NLP perspective, this requires identifying topics of interest in the text, followed by assigning a maturity score to it. The closest text classification task to this is aspect-based sentiment analysis, which is proven to be effective for mining aspects related to customers' opinions (Jiang et al., 2019). Therefore, we adapt this task to measure companies' digital strategy and refer to it as Aspect-based Maturity Analysis (ABMA).
In this case, we propose Aspects that refer to 17 coarse-grained topics from the three components of digital strategy presented in the introduction (business value [4 aspects], strategy management [2 aspects], and digital technology [11 aspects]). The design and selection on the 17 topics was based on extensive literature review of various publications on digital strategy components (Al-Ali, 2020). We find multi-label classification suitable as the labels are not mutually exclusive. Maturity, on the other hand, refers to the progress made with a given aspect over four discrete steps, including (1) plan, (2) pilot, (3) release, and (4) pioneer (EY & Microsoft, 2019). We propose the use of multi-class classification given that maturity runs on a discrete scale. Model labels are shown in Table 1. See Appendix B for the list of labels definitions. The following is an example of the classification process from the earnings calls dataset: TEXT: "We're putting most of our efforts right now-are continuing to-into our robotics program. We think it's been a great addition to our fulfillment capacity.

Experiment Setup
The objective of the experiment is to demonstrate the utility of ABMA in measuring the digital strategy of Fortune 500 companies using earnings calls transcripts. The process included pre-processing and structuring the text, filtering irrelevant content, classifying the text, and aggregating the results to the company level. We then clustered the results in Section 5 to identify common digital strategy patterns.

Data and Pre-processing
We chose Fortune's Global 500 companies as a suitable sample based on their diversity and scale of digital activities (Fortune.com, 2019). Our data consists of 4,911 earnings calls transcripts for 304 companies covering the five years between January 1, 2015 and December 31, 2019. 195 companies were excluded due to missing or inconsistent data from the source. The data was then tabularized with features, including company name, ticker symbol, date of call, and call transcript. Performing sentence-level splitting resulted in approximately 3.2 million sentences. We conducted a keyword search to select relevant sentences and maintain a dense dataset. The keywords included 275 domain-specific terms 3 from the 17 topics related to the three digital strategy components. This resulted in 46,277 sentences showing that around 1.46% of earning calls discussions are digital related. We also added previous and following sentences to capture sufficient context. We refer to each text block as a document.

Text Classification
Based on the proposed approach, we designed two-stage text classification architectures. The first model is multi-label to detect the occurrence of an aspect in a document, whereas the second model assigns a maturity class to it 4 . As an experiment, we hand-annotated 1,300 examples from a random sample on aspects and their respective maturity levels as shown by the illustrative example in Section 3. We split the labeled data to 80/10/10 between training, validation, and testing respectively.
Given the absence of a benchmark, we compared the performance of several text classification models, including transformer architecture. We trained the transformer models by applying a discriminative classification fine-tuning to maximize the utilization of the language model pre-training (Howard & Ruder, 2018). The training process also included gradual unfreeze of each layer with a slanted triangular learning rate to aid the convergence of the model parameters towards task-specific features (Howard & Ruder, 2018). We found that the pretrained RoBERTa (Liu et al., 2019) performed best without further language model fine-tuning, as shown in Table 2. The main difference was that RoBERTa was able to achieve higher accuracy on scarce labels. Moreover, we found RoBERTa to generalize better based on context with unseen terminologies. The qualitative evaluation of the output showed reasonable performance based on the training dataset size, in which errors were commonly attributed to imprecisely expressed sentences.

Processing Output
Using the described text classification approach, we detected 61,872 aspect occurrences in 27,198 documents on 295 companies. This shows that 58.7% of the filtered dataset referred to a specific digital strategyrelated activity by the company. To aggregate results from the documents to the company level, we applied several transformation steps. First, we one-hot encoded all the aspects to obtain a binary feature vector. Second, we multiplied each vector of a document with its respective maturity class (1-4).
Differentiating companies that exhibit multiple maturity levels of an aspect in the same year was important. Taking a weighted average was penalizing companies for simultaneous maturity levels. Therefore, we treat the maturity of an aspect as a checklist, in which the score is calculated by summing each identified maturity class within the same year for a given aspect. We then calculated the mean across the five years, which resulted in a maturity score between 0 and 10 for each aspect. As a result, the dataset was a matrix of 295 companies × 18 features (17 aspects-maturity scores + mean maturity).

Clustering of Results
Companies may adopt various digital strategies based on sector, digital maturity, and business scope. To identify the common digital strategy archetypes, we clustered the data. As k-means clustering assumes an equal mean and variance, we standardized the data by applying MinMax scaling and log transformation. There were few sparse features, so we chose 12 dense features (non-zero values > 40% of all companies). We also applied t-SNE 5 for dimensionality reduction, as it significantly improved clustering performance (Maaten & Hinton, 2008). Plotting the Silhouette score for clustering showed multiple peaks demonstrating inherent hierarchy in the data. Upon visual inspection, we found 10 clusters to be meaningful representation. The cluster map is illustrated in Figure 1 (a).
Investigating the cluster map revealed two main insights. First, clusters generally included companies in the same or a related sector. This indicates that the digital strategy can be sector specific. Second, we found a few companies from various sector clustered with predominantly digital native companies from the technology sector. Some examples are ABB and Siemens from the industrial sector are in cluster 2 due to their capabilities in automation and robotics technologies while the majority of their peer companies are in cluster 6. This indicates that some companies have made significant progress in digitally transforming their business.
We investigate the clusters by calculating the mean maturity value for each feature across the 10 clusters. We found four distinct digital strategies shared between eight clusters, whereas the remaining two had very limited digital maturity. The four digital strategy archetypes are product led (cluster 2), customer experience led (cluster 4), service led (clusters 0,7,8,9), and efficiency led (clusters 3,6), as illustrated by the radar charts in Figure 1 (b). We also found that while companies lead with a specific business value, such as customer experience, the vast majority demonstrate some level of progress across all areas. Therefore, focusing on a single area, as some authors argue, might not be practical due to the interdependencies between business functions (Sebastian et al., 2017;Westerman et al., 2014).
Our findings have two main implications. First, managers can use the identified archetypes as a baseline for digital strategy formulation and as a tool to benchmark the progress made against relevant companies. Second, researchers can build on our findings to investigate digital strategy further.

Conclusion
In this study, we present a deep learning framework for measuring the digital strategy of companies by using earnings calls transcripts. In the process, we demonstrate the practical value of state-of-the-art NLP models beyond inference. The results from our experiment enhance the understanding of digital strategy by identifying four distinct digital strategy archetypes. Our findings may serve as a baseline for companies attempting to formulate or benchmark their digital strategy as well as an empirical foundation for further research. For future studies, we recommend investigating the causal relationship between aspect maturity and financial performance.