Stock Market Fama-French Three Factor Model

Rate of return is a very important statistic when it comes to judging the performance of a stock portfolio. If we want to calculate the present value of an asset we need the rate of return at which to discount the future cash flows. In this post I am going to discuss how to build the factor models that are used to calculate the rate of return. We will build Fama-French Three Factor Model using R. R is a powerful open source statistical software that is very popular with quant community. Did you read the post on how to use Neural Networks in stock trading?

S&P 500

There are two market models. One is the Capital Asset Pricing Model also known as CAPM and the other is Arbitrage Pricing Model. Capital Asset Pricing Model is an equilibrium model that holds in the long run. Arbitrage Pricing Model is based on the arbitrage pricing theory which stipulates that prices immediately respond to any arbitrage opportunity in the market and massive cash flows in the direction of arbitrage making the price differential zero and taking away the arbitrage opportunity with it. Arbitrage Pricing Theory stipulates that asset returns depend on macroeconomic factors plus firm specific factors. So we try to find those macroeconomic factors and the firm specific factors that can best predict the asset return over the near term future. In short this models assumes that asset return is determine by a linear relationship between the different random factors. Our job is to find that linear relationship using the stock market data. Read the data on how to download options data from Yahoo Finance.

Arbitrage Pricing Theory further assumes that there are a finite number of investors in the market who all the time are trying to optimize their stock market portfolios. These investors are well informed and have no market power meaning they cannot control the market. All these investors are rational and if they spot any arbitrage opportunity in the market, they immediately pounce on it by buying large quantities of the under priced asset and selling large quantities of over priced asset. This results in massive cashflow in the direction of the arbitrage opportunity. So within a very short period of time the price differential vanishes. There is a risk free asset in the market and a massive number of risky assets that get continuously traded in the market. Risk free asset is most of the time the US Treasury Bills. Watch this 50 minute documentary on Quants.

When we build a factor model we have to identify the factors that affect the asset return. These factors are usually macroeconomic factor that can be stock market return, interest rate, inflation, business cycle, oil prices etc. In 1996, Fama and French proposed their three factor model in which they used corporate indicators as factors instead of macroeconomic factor as they believed they could better predict the asset return. Two factors that they considered were the firm size and the book to market ratio of the firm. As said above in this post we will build the Fama French Three Factor Stock Market Model.

> library("quantmod")
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
Learn from a quantmod author: https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r
> stocks <- stockSymbols()
Fetching AMEX symbols...
Fetching NASDAQ symbols...
Fetching NYSE symbols...
> str(stocks)
'data.frame':	6707 obs. of  8 variables:
 $ Symbol   : chr  "AAMC" "AAU" "ABE" "ACU" ...
 $ Name     : chr  "Altisource Asset Management Corp" "Almaden Minerals, Ltd." "Aberdeen Emerging Markets Smaller Company Opportunities Fund I" "Acme United Corporation." ...
 $ LastSale : num  92 1.18 14.3 27.67 12.8 ...
 $ MarketCap: chr  "$143.01M" "$118.17M" "$135.63M" "$92.82M" ...
 $ IPOyear  : int  NA 2015 NA 1988 NA NA NA NA 2014 NA ...
 $ Sector   : chr  "Finance" "Basic Industries" NA "Capital Goods" ...
 $ Industry : chr  "Real Estate" "Precious Metals" NA "Industrial Machinery/Components" ...
 $ Exchange : chr  "AMEX" "AMEX" "AMEX" "AMEX" ...

We have a list of around 6700 stocks that are traded on AMEX, NASDAQ and NYSE. We need two more variables one is the market cap of the firm and the other is the book to market ratio.

> stocks[1:5, c(1, 3:4)]
  Symbol LastSale MarketCap
1   AAMC    92.00  $143.01M
2    AAU     1.18  $118.17M
3    ABE    14.30  $135.63M
4    ACU    27.67   $92.82M
5    ACY    12.80   $18.13M

We need to find book value per share. We have more than 6700 stocks in our data.

> tickers<- stocks$Symbol
> stocks$BookValue <- getQuote(tickers, what = yahooQF(c("Book Value")))
downloading set: 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , ...done

We have the book value.

> stocks1 <-cbind(stocks[ , c(1, 3:4)], stocks$BookValue[,2])
> tail(stocks1)
     Symbol LastSale MarketCap stocks$BookValue[, 2]
6697   ZPIN    18.90    $1.05B                  4.06
6698    ZTO    14.98   $10.79B                  4.15
6699    ZTR    13.12  $320.05M                  0.00
6700    ZTS    62.52   $30.68B                  3.28
6701     ZX     1.70   $87.76M                  7.31
6702   ZYME     7.33   $185.7M                 -0.68

You can see we have the book value data now. We also need to risk free rate time series data. We download it now for the last five years.

> library(Quandl)
> LIBOR <- Quandl('FED/RILSPDEPM01_N_B',
+                 start_date = '2012-06-01', end_date = '2017-06-01')
> tail(LIBOR)
           Date Value
1113 2012-06-08   0.3
1114 2012-06-07   0.3
1115 2012-06-06   0.3
1116 2012-06-05   0.3
1117 2012-06-04   0.3
1118 2012-06-01   0.3

You can see above we have the risk free US LIBOR rate. We have downloaded this risk free US LIBOR rate from the Federal Reserve database using the Quandl package that can do it. For the last few months, Yahoo Finance has changed its API due to which Quantmod package cannot download the stock market data using getSymbols. But there is a get around with the following code.

> google <- function(sym, current = TRUE, sy = 2005, sm = 1, sd = 1, ey, em, ed)
+ {
+   
+   if(current){
+     system_time <- as.character(Sys.time())
+     ey <- as.numeric(substr(system_time, start = 1, stop = 4))
+     em <- as.numeric(substr(system_time, start = 6, stop = 7))
+     ed <- as.numeric(substr(system_time, start = 9, stop = 10))
+   }
+   
+   require(data.table)
+   
+   google_out = tryCatch(
+     suppressWarnings(
+       fread(paste0("http://www.google.com/finance/historical",
+                    "?q=", sym,
+                    "&startdate=", paste(sm, sd, sy, sep = "+"),
+                    "&enddate=", paste(em, ed, ey, sep = "+"),
+                    "&output=csv"), sep = ",")),
+     error = function(e) NULL
+   )
+   
+   if(!is.null(google_out)){
+     names(google_out)[1] = "Date"
+   }
+   
+   return(google_out)
+ }
> google_data = google('GOOGL')
Loading required package: data.table
data.table 1.10.4
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com

Attaching package: ‘data.table’

The following objects are masked from ‘package:xts’:

    first, last

trying URL 'http://www.google.com/finance/historical?q=GOOGL&startdate=1+1+2005&enddate=8+1+2017&output=csv'
Content type 'application/vnd.ms-excel' length unknown
downloaded 141 KB

> tail(google_data)
        Date   Open   High   Low  Close   Volume
1: 10-Jan-05  97.35  99.15 96.01  97.63  7554795
2:  7-Jan-05  95.42  97.22 94.48  97.02  9666175
3:  6-Jan-05  97.72  98.05 93.95  94.37 10389803
4:  5-Jan-05  96.82  98.55 96.21  96.85  8239545
5:  4-Jan-05 100.77 101.57 96.84  97.35 13762396
6:  3-Jan-05  98.80 101.92 97.83 101.46 15860692

Yahoo Finance should made life difficult for quants like us.

Using Principle Component Analysis For Estimating Arbitrage Pricing Theory

As said above we will be using Arbitrage Pricing Theory to build our factor model. We will be using Principle Component Analysis (PCA) to estimate the Arbitrage Pricing Theory (APT). Now it would be difficult for us to do principle component analysis for the 6700 stocks in our list. This is what we will do. We will select 10% of the stocks from the list randomly and then build the model based on it.

> tickers1 <- tickers[runif(tickers) < 0.1]
> tickers1
  [1] "ABE"      "ACU"      "ATNM"     "AUMN"     "AWX"      "BCV"      "BDR"     
  [8] "CH"       "CVM"      "EMI"      "ENRJ"     "FCO"      "GGO-PA"   "GLOW"    
 [15] "GST"      "HLTH"     "INS"      "LLEX"     "LTS"      "LTS-PA"   "MICR"    
 [22] "MTNB"     "NJV"      "PW"       "SGA"      "TEUM"     "TXMD"     "VGZ"     
 [29] "VMM"      "ACOR"     "ACTA"     "ADP"      "ADVM"     "ADXS"     "AFSI"    
 [36] "AGII"     "AHGP"     "AHPAW"    "AIMT"     "AINV"     "AKAO"     "AKTX"    
 [43] "ALBO"     "ALGN"     "ALOG"     "AMCN"     "AMRB"     "AMRS"     "ANIK"    
 [50] "ANY"      "APOP"     "ARGX"     "ARWR"     "ASCMA"    "ATAX"     "ATNI"    
 [57] "ATRI"     "AVXS"     "AXDX"     "AXTI"     "BATRK"    "BBSI"     "BDGE"    
 [64] "BIOP"     "BIOS"     "BJRI"     "BKCC"     "BKSC"     "BLDP"     "BLPH"    
 [71] "BNTC"     "BNTCW"    "BPFHP"    "BPFHW"    "BPRN"     "BRKS"     "BSQR"    
 [78] "BUR"      "BVSN"     "CAAS"     "CAC"      "CACC"     "CALL"     "CASM"    
 [85] "CATM"     "CBAY"     "CBFV"     "CCNE"     "CDOR"     "CENX"     "CEZ"     
 [92] "CFO"      "CHKE"     "CHUBA"    "CIDM"     "CIVBP"    "CLBS"     "CLIRW"   
 [99] "CLMT"     "CLVS"     "CME"      "CNTF"     "CODA"     "CPSS"     "CPTAG"   
[106] "CRDS"     "CRESY"    "CSA"      "CVCY"     "CYAN"     "CYCC"     "CZFC"    
[113] "CZR"      "DBVT"     "DCIX"     "DCTH"     "DEST"     "DMLP"     "DRRX"    
[120] "DRWI"     "DXCM"     "DXYN"     "EBIX"     "EGOV"     "EHTH"     "EIGI"    
[127] "ELEC"     "ENG"      "ENTG"     "ESBK"     "ESEA"     "ESGE"     "ESPR"    
[134] "EUFN"     "EVK"      "EXPO"     "EYES"     "EYESW"    "FAB"      "FATE"    
[141] "FBNC"     "FBSS"     "FEM"      "FEYE"     "FFWM"     "FHCO"     "FIVN"    
[148] "FLAG"     "FLEX"     "FNGN"     "FNJN"     "FOMX"     "FOXF"     "FSBW"    
[155] "FSFR"     "FTCS"     "FTHI"     "FTSL"     "FTXL"     "FUEL"     "FWP"     
[162] "GAIA"     "GALE"     "GCBC"     "GEVO"     "GFNCP"    "GGAL"     "GIGA"    
[169] "GIII"     "GLAD"     "GLBR"     "GLBZ"     "GNCMA"    "GPIC"     "GRFS"    
[176] "GULF"     "GWPH"     "HAFC"     "HBANN"    "HBIO"     "HBNC"     "HCOM"    
[183] "HIBB"     "HIFS"     "HOLX"     "HOTR"     "HRTX"     "HSGX"     "HUNTW"   
[190] "HYGS"     "IBB"      "IBCP"     "IBKCO"    "IDCC"     "IDRA"     "III"     
[197] "IMGN"     "INVA"     "INVE"     "IONS"     "IRMD"     "ISNS"     "ISRL"    
[204] "IXYS"     "JAGX"     "JSMD"     "JTPY"     "KALA"     "KALV"     "KBWR"    
[211] "KEYW"     "KLIC"     "KTWO"     "LAUR"     "LBRDK"    "LECO"     "LIVN"    
[218] "LMB"      "LMRKO"    "LOAN"     "LONE"     "LPNT"     "LPSN"     "LSCC"    
[225] "LSXMA"    "LTRPA"    "LVHD"     "MASI"     "MAT"      "MBUU"     "MBVX"    
[232] "MCRI"     "MDCO"     "MDSO"     "METC"     "MFINL"    "MFSF"     "MGEN"    
[239] "MGYR"     "MIRN"     "MNTA"     "MRDNW"    "MRSN"     "MRUS"     "MRVC"    
[246] "MSBF"     "MTBCP"    "MTEX"     "MXWL"     "MYND"     "NBIX"     "NCTY"    
[253] "NDSN"     "NRCIB"    "NSYS"     "NTGR"     "NUVA"     "NVEC"     "OACQU"   
[260] "OACQW"    "OASM"     "OFIX"     "OFLX"     "OHAI"     "OKSB"     "ONTXW"   
[267] "ONVO"     "OPHT"     "ORBC"     "ORBK"     "OSTK"     "OXLCO"    "PAHC"    
[274] "PATI"     "PBCTP"    "PDCE"     "PERF"     "PFBC"     "PFI"      "PFIS"    
[281] "PHIIK"    "PICO"     "PIH"      "PKW"      "PNBK"     "PPC"      "PRGX"    
[288] "PRMW"     "PRQR"     "PSCH"     "PTGX"     "QURE"     "RBCN"     "RCKY"    
[295] "RDWR"     "REPH"     "RGLS"     "RNDB"     "RNET"     "RNLC"     "RNMC"    
[302] "ROSEU"    "RTTR"     "RUSHB"    "RXIIW"    "SABR"     "SAVE"     "SBCF"    
[309] "SBNY"     "SBRAP"    "SCAC"     "SCMP"     "SEIC"     "SELF"     "SFLY"    
[316] "SGBK"     "SGH"      "SGMA"     "SGQI"     "SHLDW"    "SHOR"     "SIEN"    
[323] "SILC"     "SKIS"     "SMED"     "SMIT"     "SND"      "SNHNI"    "SNI"     
[330] "SNSS"     "SOHOM"    "SONC"     "SONS"     "SPTN"     "SRAX"     "SRCE"    
[337] "SRRA"     "STAF"     "STLR"     "STRT"     "SUMR"     "SUNS"     "SYBT"    
[344] "TEAM"     "THRM"     "TICC"     "TLT"      "TRCB"     "TREE"     "TROW"    
[351] "TSBK"     "TUSK"     "TUTI"     "TVIA"     "UCBI"     "UDBI"     "UG"      
[358] "URRE"     "VCSH"     "VEACW"    "VIDI"     "VRNT"     "VSAR"     "VSMV"    
[365] "VTHR"     "VTL"      "VVPR"     "VWOB"     "WAFDW"    "WLDN"     "WLTW"    
[372] "WVVI"     "WYIGW"    "ZAGG"     "ZEUS"     "ZNGA"     "ZUMZ"     "ABR-PC"  
[379] "ABX"      "ACP"      "AEE"      "AEK"      "AF-PC"    "AGN"      "AGX"     
[386] "AHT-PF"   "AIF"      "AJXA"     "ALL-PE"   "ALV"      "AMID"     "AMOV"    
[393] "ANTX"     "AP"       "ARD"      "ARES"     "ARI"      "ARR-PA"   "ASG"     
[400] "AT"       "AVD"      "AYI"      "AYR"      "AZN"      "BA"       "BAC-PY"  
[407] "BAC.WS.B" "BBX"      "BDC"      "BEL"      "BHE"      "BITA"     "BKH"     
[414] "BML-PH"   "BMO"      "BMY"      "BNJ"      "BPK"      "BSD"      "BT"      
[421] "BTE"      "BVN"      "BWA"      "BXP-PB"   "CAJ"      "CHU"      "CIM"     
[428] "CIM-PA"   "CIO-PA"   "CIT"      "CLB"      "CLF"      "CLW"      "CNNX"    
[435] "COF-PP"   "COG"      "COO"      "CPA"      "CPG"      "CPS"      "CRD.A"   
[442] "CRM"      "CSLT"     "CTX"      "CUBE"     "CUZ"      "CXE"      "DAN"     
[449] "DCM"      "DD-PB"    "DDR-PK"   "DFS-PB"   "DG"       "DHG"      "DNOW"    
[456] "DST"      "DSX-PB"   "DTJ"      "EDD"      "EE"       "EGL"      "EHIC"    
[463] "ELC"      "ELF"      "ELP"      "EMF"      "ENS"      "EOS"      "EP-PC"   
[470] "EPR-PC"   "EPR-PF"   "EQCO"     "ESRT"     "ETG"      "ETX"      "EVR"     
[477] "EXD"      "EXG"      "FAC"      "FCN"      "FET"      "FFA"      "FGL"     
[484] "FICO"     "FLT"      "FRT"      "FTK"      "FTS"      "G"        "GCP"     
[491] "GEB"      "GEH"      "GJS"      "GLOB"     "GM"       "GNRC"     "HES"     
[498] "HLT"      "HLX"      "HNP"      "HRB"      "HRG"      "HRS"      "HT-PC"   
[505] "HTF"      "HVT.A"    "IBM"      "ICD"      "IFF"      "IFN"      "INN-PD"  
[512] "INSI"     "IQI"      "IX"       "JBK"      "JCI"      "JCP"      "JPS"     
[519] "JPT"      "KGC"      "KIM-PJ"   "KMPR"     "KORS"     "KSM"      "KSS"     
[526] "KT"       "KYO"      "LCII"     "LFGR"     "LITB"     "LLY"      "LTM"     
[533] "MAN"      "MCC"      "MCD"      "MCR"      "MD"       "MER-PK"   "MH-PD"   
[540] "MLM"      "MNI"      "MNK"      "MRC"      "MRIN"     "MSD"      "MSI"     
[547] "MT"       "MTB.WS"   "MXF"      "MXL"      "MYCC"     "NAZ"      "NC"      
[554] "NEE-PI"   "NEM"      "NGL-PB"   "NMFC"     "NNN"      "NOAH"     "NS"      
[561] "NW-PC"    "NYCB-PA"  "NYLD.A"   "NYRT"     "OA"       "OAKS-PA"  "OBE"     
[568] "OFC"      "OFG-PA"   "OI"       "OII"      "OIS"      "ORA"      "OUT"     
[575] "PBFX"     "PBT"      "PEB"      "PFL"      "PGZ"      "PHX"      "PIM"     
[582] "PKO"      "PN"       "PPG"      "PRGO"     "PRO"      "PSA-PT"   "PSB-PU"  
[589] "PSB-PW"   "PTY"      "PYN"      "QTS"      "QUOT"     "RBS-PL"   "RF-PA"   
[596] "RFP"      "RLI"      "RNR-PE"   "ROP"      "RSPP"     "SAB"      "SAM"     
[603] "SCE-PG"   "SCHW"     "SEM"      "SLB"      "SLD"      "SNN"      "SNR"     
[610] "SNV-PC"   "SPLP"     "SRC"      "SRE"      "SRG"      "SRT"      "SRV"     
[617] "SSTK"     "SSW-PG"   "STAG"     "STAR-PG"  "STK"      "STNG"     "STOR"    
[624] "SXI"      "T"        "TCO-PK"   "TDE"      "TDG"      "TDJ"      "TDOC"    
[631] "TGP-PA"   "TK"       "TMST"     "TNP-PC"   "TOO-PA"   "TPGE"     "TPRE"    
[638] "TRGP"     "TRNO"     "TY-P"     "UFS"      "UPS"      "URI"      "VC"      
[645] "VHI"      "VLY.WS"   "VNTV"     "VPV"      "VTA"      "VVC"      "W"       
[652] "WCG"      "WFC-PL"   "WGP"      "WPC"      "WR"       "WRB"      "WSM"     
[659] "WSO.B"    "WTTR"     "WU"       "XPO"      "XRM"      "YELP"     "ZB-PA"   
[666] "ZBH"      "ZEN"

Above is the list of stocks that we have chosen randomly. Below is the code to download the close price of these stocks from Google Finance.

# Load list of symbols (Updated May 2017)
SYM <- tickers1


# Hold stock data and vector of invalid requests
DATA <- list()
INVALID <- c()

# Attempt to fetch each symbol
for(sym in SYM){
  google_out <- google(sym)
  
  if(!is.null(google_out)) {
    DATA[[sym]] <- google_out
  } else {
    INVALID <- c(INVALID, sym)
  }
}

# Overwrite with only valid symbols
SYM <- names(DATA)

# Remove iteration variables
rm(google_out, sym)

cat("Successfully download", length(DATA), "symbols.")
cat(length(INVALID), "invalid symbols requested.\n", paste(INVALID, collapse = "\n\t"))
cat("We now have a list of data frames of each symbol.")
cat("e.g. access MMM price history with DATA[['MMM']]")

In the past downloading the historical stock market data was very simple and easy with Yahoo Finance API. But Yahoo Finance team suddenly changed their API and it is no longer working. In a few months maybe Quantmod development team comes up with a solution. Watch this documentary on Money & Speed.

Data Pre-processing is the most difficult part of building the factor model. As said above, Yahoo Finance has disconnected its API disallowing the download of stock data. Yahoo wants to sell the data. So right now I am facing difficulty in download the required stock data. We will need data for thousands of stocks. This is what I am going to do. I will not download the full 6000 stocks.Rather  I will only download data for 10% of the stock tickers randomly. This means I will download around 700 stocks. Downloading data for 700 stocks takes around 20-30 minutes. You can download full 6000 stocks. It can take a few hours. Once you download the data save it into a csv file and then use it repeatedly to build factor models.

Leave A Response