Paper Abstract
Paper 1: The Art and AI of Stock Analyses by Sean Cao, Junbo Wang, Wei Jiang and Baozhong Yang Abstract We train an AI analyst that digests corporate disclosures, industry trends, and macroeconomic indicators to the extent it beats most analysts. Human wins the “Man vs. Machine” contest when a firm is complex with intangible assets, and AI wins when information is transparent but voluminous. Analysts catch up with machines over time, especially after firms are covered by alternative data and their institutions build AI capabilities. AI power and human wisdom are complementary in generating accurate forecasts and mitigating extreme errors, portraying a future of “Man + Machine” (instead of human displacement) in financial analyses, and likely other high-skill professions. |
Paper 2: Mutual Fund Ratings by Analysts vs. Machine Learning Technique by Si Cheng, Ruichang Lu and Xiaojun Zhang Abstract We examine two forward-looking mutual fund ratings: the analyst rating produced by human analysts and the quantitative rating generated by a machine learning technique. The analyst rating identifies outperforming funds, while the quantitative rating does not—this difference derives mostly from the selection of analyst coverage. Moreover, the tone of the analyst report contains incremental soft information predicting fund performance. Finally, retail investors do not follow analyst recommendations; instead, they chase the quantitative rating. Our results highlight the importance of mutual fund analysts in information production and imply a capital misallocation problem in mutual fund investment. |
Paper 3: When Markowitz Meets Machine: Optimization of Large Portfolios with High-Dimensional Stock Characteristics by Can Yang, Haifeng You, Fa Zhang, Xinghua Zheng Abstract We transform the classic Markowitz’s mean-variance optimization (MVO) problem into an unconstrained machine learning task, which directly takes (potentially high-dimensional) firm characteristics as the inputs without the needs to estimate the expected returns and the covariance matrix for a large number of stocks. The resulting (MMM) model is theoretically equivalent to the original Markowitz mean-variance optimization but overcomes many challenges in its real-world implementation. Using China A-shares market data, we apply the model to construct long-only optimal portfolios targeted at outperforming a benchmark with tracking error constraint. Our results show that the MMM optimal portfolio outperforms the benchmark (CSI 500) by approximately 20% per year with an annual tracking error of approximately 5%. The outperformance is consistent over time and remains stable in recent years despite the substantial deterioration in performance witnessed by the other models. |
Paper 4: The Fast and the Circuitous: Semantic Progression as a Type of Disclosure Complexity by Jiawen Yan and Nicholas Guest Abstract This paper examines linguistic complexity in financial reporting using three measures of semantic progression. Namely, speed reflects how fast the narrative moves, volume reflects how much ground it covers, and circuitousness reflects how much it goes in circles. We find that speed and volume (circuitousness) in the 10-K are positively (negatively) associated with reporting timeliness and accuracy, earnings quality, analyst forecast accuracy, and price discovery. Thus, speed and volume seem beneficial and circuitousness seems detrimental, at least in the context of financial reporting. Moreover, the explanatory power of the progression complexity variables is incremental to a host of traditional factors, including alternative measures of complexity such as the fog index and document length. |
Paper 5: Neural Network Translated into Bag-of-Words Using Attentions by Hitoshi Iwasaki, Ying Chen, Allen H. Huang and Hui Wang Abstract We propose methods that use a neural network to extract contextual information of texts and translate it into a word importance weight system and a lexicon. We first train an interpretable neural network, hierarchical attention network (HAN) that takes analyst reports and explains cumulative abnormal returns. We then construct a word importance weight function with trained attentions, and compile a lexicon of sentiment words by relating variations of attentions with HAN’s outputs through our novel document representation bag-of-attentions. In empirical studies with earnings call transcripts, we show that our methods do not overfit to the analyst reports and can improve performances of bag-of-words models with financial texts beyond the original training corpus. With the supportive evidences in empirical studies, we advocate that the proposed methods effectively materialize contextual information captured by HAN and bolster bag-of-words models by restoring some of the contextual information that is lost in ignorance of word orders. |
Paper 6: Machine Learning-Based Financial Statement Analysis by Amir Amel-Zadeh, Jan-Peter Calliess, Daniel Kaiser and Stephen Roberts Abstract This paper explores the application of machine learning methods to financial statement analysis. We compare a range of models in the machine learning repertoire in their ability to predict the sign and magnitude of abnormal stock returns around earnings announcements based on past financial statement data. Random forests produce the most accurate forecasts and the highest abnormal returns. (Non-)linear models perform relatively better for predictions of moderate (extreme) market reactions. Random forests produce the most accurate forecasts and the highest abnormal returns, which decline over time despite the algorithms learning from more data. Abnormal returns are robust to various risk factors and load in expected ways on size, value and accruals. Analysing the underlying economic drivers of abnormal returns, we find that the models select as most important predictors financial variables required to forecast free cash flows and firm characteristics that are known cross-sectional predictors of stock returns. |