Chapter 4application Of Forcesmr.'s Learning Website

  

Mr.Panchbhaya's Learning Website: Chapter 6 - Thermal Energy. Phys1161.pdf: File Size: 4011 kb: File Type: pdf. Chapter6review.pptx: File Size: 530 kb: File Type: pptx: Download File. Powered by Create your own unique website with customizable templates. Start studying Chapter 3: Using the Internet. Learn vocabulary, terms, and more with flashcards, games, and other study tools.

Chapter specific application exercises will help you think about research design in practice or have you explore a relevant resource.

Exercise 1: APA Style

Under the heading “Primary Versus Secondary Sources” above, locate the referenced article “Prevalence and Characteristics of Youth Sexting” taken from the journal Pediatrics. Rewrite the citation so that it conforms to APA style.

Exercise 2: Comparing Primary and Secondary Sources

Locate the original Pediatrics and New York Times articles on sexting cited above under the heading “Primary Versus Secondary Sources.” Carefully compare these two articles and answer the following questions:

  • What information in the original Pediatrics article is missing from the New York Times article? What content, if any, in the New York Times article cannot be found in the Pediatrics article?
  • Assuming the New York Times article is written to help and inform parents, what writing techniques can you identify that are used to interest and motivate such readers and maintain their interest?

Exercise 3: Search Terms and Boolean Operators

Write down all of the search terms you might use to get a comprehensive listing of scholarly papers on social media.

Combine these search terms using Boolean operators to then focus your research on:

  • Social media and youth
  • Social media and romantic relationships
  • Social media and public relations

Note that this is not just an exercise in Boolean operators; it is also an exercise in vocabulary. For example, social media may prove to be too broad a search term. What terms other than social media could you use to narrow your search? Note also that youth, romantic, and public relations all have analogous terms that might give you better or worse search results. For each of these three terms, identify an alternative word or words that you might substitute to narrow or expand your search.

Exercise 4: Writing a Literature Review

Search the Pew Research Center Internet, Science & Tech Project website—www.pewinternet.org—for the topic “elections and campaigns.” You will find a series of reports on U.S. elections for the years 2000 through 2015. Write a brief literature review summarizing how the role of the Internet and social media in U.S. elections has changed since 2000. Cite all sources correctly using APA style. For most years, you will find several different reports, so you will need to be selective in your reading and writing.

This book is intended to cover some advanced modelling techniques applied to equity investment strategies that are built on firm characteristics. The content is threefold. First, we try to simply explain the ideas behind most mainstream machine learning algorithms that are used in equity asset allocation. Second, we mention a wide range of academic references for the readers who wish to push a little further. Finally, we provide hands-on R code samples that show how to apply the concepts and tools on a realistic dataset which we share to encourage reproducibility.

The printed version can be bought on the publisher's website or on Amazon (US), on Amazon (France) and Amazon (UK).

1.1 What this book is not about

This book deals with machine learning (ML) tools and their applications in factor investing. Factor investing is a subfield of a large discipline that encompasses asset allocation, quantitative trading and wealth management. Its premise is that differences in the returns of firms can be explained by the characteristics of these firms. Thus, it departs from traditional analyses which rely on price and volume data only, like classical portfolio theory à la Markowitz (1952), or high frequency trading. For a general and broad treatment of Machine Learning in Finance, we refer to Matthew F. Dixon, Halperin, and Bilokon (2020).

The topics we discuss are related to other themes that will not be covered in the monograph. These themes include:

  • Applications of ML in other financial fields, such as fraud detection or credit scoring. We refer to Ngai et al. (2011) and Baesens, Van Vlasselaer, and Verbeke (2015) for general purpose fraud detection, to Bhattacharyya et al. (2011) for a focus on credit cards and to Ravisankar et al. (2011) and Abbasi et al. (2012) for studies on fraudulent financial reporting. On the topic of credit scoring, Wang et al. (2011) and Brown and Mues (2012) provide overviews of methods and some empirical results. Also, we do not cover ML algorithms for data sampled at higher (daily or intraday) frequencies (microstructure models, limit order book). The chapter from Kearns and Nevmyvaka (2013) and the recent paper by Sirignano and Cont (2019) are good introductions on this topic.
  • Use cases of alternative datasets that show how to leverage textual data from social media, satellite imagery, or credit card logs to predict sales, earning reports, and, ultimately, future returns. The literature on this topic is still emerging (see, e.g., Blank, Davis, and Greene (2019), Jha (2019) and Ke, Kelly, and Xiu (2019)) but will likely blossom in the near future.
  • Technical details of machine learning tools. While we do provide some insights on specificities of some approaches (those we believe are important), the purpose of the book is not to serve as reference manual on statistical learning. We refer to Hastie, Tibshirani, and Friedman (2009), Cornuejols, Miclet, and Barra (2018) (written in French), James et al. (2013) (coded in R!) and Mohri, Rostamizadeh, and Talwalkar (2018) for a general treatment on the subject.1 Moreover, Du and Swamy (2013) and Goodfellow et al. (2016) are solid monographs on neural networks particularly and Sutton and Barto (2018) provide a self-contained and comprehensive tour in reinforcement learning.
  • Finally, the book does not cover methods of natural language processing (NLP) that can be used to evaluate sentiment which can in turn be translated into investment decisions. This topic has nonetheless been trending lately and we refer to Loughran and McDonald (2016), Cong, Liang, and Zhang (2019a), Cong, Liang, and Zhang (2019b) and Gentzkow, Kelly, and Taddy (2019) for recent advances on the matter.

The targeted audience

Who should read this book? This book is intended for two types of audiences. First, postgraduate students who wish to pursue their studies in quantitative finance with a view towards investment and asset management. The second target groups are professionals from the money management industry who either seek to pivot towards allocation methods that are based on machine learning or are simply interested in these new tools and want to upgrade their set of competences. To a lesser extent, the book can serve scholars or researchers who need a manual with a broad spectrum of references both on recent asset pricing issues and on machine learning algorithms applied to money management. While the book covers mostly common methods, it also shows how to implement more exotic models, like causal graphs (Chapter 14), Bayesian additive trees (Chapter 9), and hybrid autoencoders (Chapter 7).

The book assumes basic knowledge in algebra (matrix manipulation), analysis (function differentiation, gradients), optimization (first and second order conditions, dual forms), and statistics (distributions, moments, tests, simple estimation method like maximum likelihood). A minimal financial culture is also required: simple notions like stocks, accounting quantities (e.g., book value) will not be defined in this book. Lastly, all examples and illustrations are coded in R. A minimal culture of the language is sufficient to understand the code snippets which rely heavily on the most common functions of the tidyverse (Wickham et al. (2019), www.tidyverse.org), and piping (Bache and Wickham (2014), Mailund (2019)).

How this book is structured

The book is divided into four parts.

Forcesmr.

Part I gathers preparatory material and starts with notations and data presentation (Chapter 1), followed by introductory remarks (Chapter 2). Chapter 3 outlines the economic foundations (theoretical and empirical) of factor investing and briefly sums up the dedicated recent literature. Chapter 4 deals with data preparation. It rapidly recalls the basic tips and warns about some major issues.

Chapter

Part II of the book is dedicated to predictive algorithms in supervised learning. Those are the most common tools that are used to forecast financial quantities (returns, volatilities, Sharpe ratios, etc.). They range from penalized regressions (Chapter 5), to tree methods (Chapter 6), encompassing neural networks (Chapter 7), support vector machines (Chapter 8) and Bayesian approaches (Chapter 9).

The next portion of the book bridges the gap between these tools and their applications in finance. Chapter 10 details how to assess and improve the ML engines defined beforehand. Chapter 11 explains how models can be combined and often why that may not be a good idea. Finally, one of the most important chapters (Chapter 12) reviews the critical steps of portfolio backtesting and mentions the frequent mistakes that are often encountered at this stage.

The end of the book covers a range of advanced topics connected to machine learning more specifically. The first one is interpretability. ML models are often considered to be black boxes and this raises trust issues: how and why should one trust ML-based predictions? Chapter 13 is intended to present methods that help understand what is happening under the hood. Chapter 14 is focused on causality, which is both a much more powerful concept than correlation and also at the heart of many recent discussions in Artificial Intelligence (AI). Most ML tools rely on correlation-like patterns and it is important to underline the benefits of techniques related to causality. Finally, Chapters 15 and 16 are dedicated to non-supervised methods. The latter can be useful, but their financial applications should be wisely and cautiously motivated.

Companion website

This book is entirely available at http://www.mlfactor.com. It is important that not only the content of the book be accessible, but also the data and code that are used throughout the chapters. They can be found at https://github.com/shokru/mlfactor.github.io/tree/master/material. The online version of the book will be updated beyond the publication of the printed version.

Why R?

The supremacy of Python as the dominant ML programming language is a widespread belief. This is because almost all applications of deep learning (which is as of 2020 one of the most fashionable branches of ML) are coded in Python via Tensorflow or Pytorch.The fact is that R has a lot to offer as well. First of all, let us not forget that one of the most influencial textbooks in ML (Hastie, Tibshirani, and Friedman (2009)) is written by statisticians who code in R. Moreover, many statistics-orientated algorithms (e.g., BARTs in Section 9.5) are primarily coded in R and not always in Python. The R offering in Bayesian packages in general (https://cran.r-project.org/web/views/Bayesian.html) and in Bayesian learning in particular is probably unmatched.

There are currently several ML frameworks available in R.

  • caret: https://topepo.github.io/caret/index.html, a compilation of more than 200 ML models (originally developed by Max Kuhn);
  • tidymodels: https://github.com/tidymodels, a recent collection of packages for ML workflow (another package developed - more recently - by Max Kuhn at RStudio, which is a token of high quality material!);
  • rtemis: https://rtemis.netlify.com, a general purpose package for ML and visualization;
  • mlr3: https://mlr3.mlr-org.com/index.html, also a simple framework for ML models;
  • h2o: https://github.com/h2oai/h2o-3/tree/master/h2o-r, a large set of tools provided by h2o (coded in Java);
  • Open ML: https://github.com/openml/openml-r, the R version of the OpenML (www.openml.org) community.

Moreover, via the reticulate package, it is possible (but not always easy) to benefit from Python tools as well. The most prominent example is the adaptation of the tensorflow and keras libraries to R. Thus, some very advanced Python material is readily available to R users. This is also true for other resources, like Stanford’s CoreNLP library (in Java) which was adapted to R in the package coreNLP (which we will not use in this book).

Coding instructions

One of the purposes of the book is to propose a large-scale tutorial of ML applications in financial predictions and portfolio selection. Thus, one keyword is REPRODUCIBILITY! In order to duplicate our results (up to possible randomness in some learning algorithms), you will need running versions of R and RStudio on your computer. The best books to learn R are also often freely available online. A short list can be found here https://rstudio.com/resources/books/. The monograph R for Data Science is probably the most crucial.

Chapter

In terms of coding requirements, we rely heavily on the tidyverse, which is a collection of packages (or libraries). The three packages we use most are dplyr which implements simple data manipulations (filter, select, arrange), tidyr which formats data in a tidy fashion, and ggplot, for graphical outputs.

A list of the packages we use can be found in Table 0.1 below. Packages with a star (*) need to be installed via bioconductor.2 Packages with a plus (^+) need to be installed manually.3

Chapter 4 Application Of Forcesmr.'s Learning Website Free

TABLE 0.1: List of all packages used in the book.
PackagePurposeChapter(s)
BARTBayesian additive trees10
broomTidy regression output5
CAM(^+)Causal Additive Models15
caToolsAUC curves11
CausalImpactCausal inference with structural time series15
cowplotStacking plots4 & 13
breakDownBreakdown interpretability14
dummiesOne-hot encoding8
e1071Support Vector Machines9
factoextraPCA visualization16
fastAdaboostBoosted trees7
forecastAutocorrelation function4
FNNNearest Neighbors detection16
ggpubrCombining plots11
glmnetPenalized regressions6
imlInterpretability tools14
kerasNeural networks8
limeInterpretability14
lmtestGranger causality15
lubridateHandling datesAll (or many)
naivebayesNaive Bayes classifier10
pcalgCausal graphs15
quadprogQuadratic programming12
quantmodData extraction4, 12
randomForestRandom forests7
rBayesianOptimizationBayesian hyperparameter tuning11
ReinforcementLearningReinforcement Learning17
Rgraphviz(^*)Causal graphs15
rpart and rpart.plotSimple decision trees7
spBayesBayesian linear regression10
tidyverseEnvironment for data science, data wranglingAll
xgboostBoosted trees7
xtableTable formatting4

Of all of these packages (or collections thereof), the tidyverse and lubridate are compulsory in almost all sections of the book. To install a new package in R, just type

install.packages(“name_of_the_package”)

in the console. Sometimes, because of function name conflicts (especially with the select() function), we use the syntax package::function() to make sure the function call is from the right source. The exact version of the packages used to compile the book is listed in the “renv.lock” file available on the book’s GitHub web page https://github.com/shokru/mlfactor.github.io. One minor comment is the following: while the functions gather() and spread() from the dplyr package have been superseded by pivot_longer() and pivot_wider(), we still use them because of their much more compact syntax.

Chapter 4 Application Of Forcesmr.'s Learning Website Download

As much as we could, we created short code chunks and commented each line whenever we felt it was useful. Comments are displayed at the end of a row and preceded with a single hastag #.

The book is constructed as a very big notebook, thus results are often presented below code chunks. They can be graphs or tables. Sometimes, they are simple numbers and are preceded with two hashtags ##. The example below illustrates this formatting.

Chapter 4 Application Of Forcesmr.'s Learning Websites

The book can be viewed as a very big tutorial. Therefore, most of the chunks depend on previously defined variables. When replicating parts of the code (via online code), please make sure that the environment includes all relevant variables. One best practice is to always start by running all code chunks from Chapter 1. For the exercises, we often resort to variables created in the corresponding chapters.

Acknowledgments

Free

The core of the book was prepared for a series of lectures given by one of the authors to students of master’s degrees in finance at EMLYON Business School and at the Imperial College Business School in the Spring of 2019. We are grateful to those students who asked fruitful questions and thereby contributed to improve the content of the book.

Chapter 4application Of Forcesmr.

Chapter 4 Application Of Forcesmr.'s Learning Website 2017

We are grateful to Bertrand Tavin, Gautier Marti and @fayolle for their thorough screening of the book. We also thank Eric André, Aurélie Brossard, Alban Cousin, Frédérique Girod, Philippe Huber, Jean-Michel Maeso, Javier Nogales and for friendly reviews; Christophe Dervieux for his help with bookdown; Mislav Sagovac and Vu Tran for their early feedback; John Kimmel for making this happen and Jonathan Regenstein for his availability, no matter the topic. Lastly, we are grateful for the anonymous reviews collected by John.

Future developments

Machine learning and factor investing are two immense research domains and the overlap between the two is also quite substantial and developing at a fast pace. The content of this book will always constitute a solid background, but it is naturally destined to obsolescence. Moreover, by construction, some subtopics and many references will have escaped our scrutiny. Our intent is to progressively improve the content of the book and update it with the latest ongoing research. We will be grateful to any comment that helps correct or update the monograph. Thank you for sending your feedback directly (via pull requests) on the book’s website which is hosted at https://github.com/shokru/mlfactor.github.io.