Just like many people, I also own a very limited number of stocks. Although I tried my best to massage my portfolio(based on my guess, feeling and luck) last year, its year-end return was only about 3.5%. Compared to the S&P500’s 12% return, the poor performance shocked me. So I decided to study some formal computational stock analysis techniques. This post is the summary of what I learnt and developed. For the complete code, please refer to stock_analysis in my Github.

All of the methods/strategies mentioned in this post are collected from Internet or books, so there’s no guarantee for the correctness and validness.


Data Preparation

For personal use, the most popular language for stock analysis is Python, and the most convenient library is Pandas, which mainly depends on Numpy. So all the following discussions are based on Python Pandas.

To analyze stocks, we need a lot of data, like the history quotes, stock statistics, financial analysis, and etc. For my personal use, only free data sources are involved.

History Quotes

The history stock quotes can be downloaded using pandas_datareader, which was part of the Pandas before but now an independent library. The pandas_datareader.data.DataReader() could download the history quotes for one or a list of stock ticker(s). When passing one stock ticker, the DataReader() returns quotes as a Pandas DataFrame (2-D). When using a list of tickers as parameter, DataReader() returns Pandas Panel (3-D).

Following is a simple example of how it works:

import pandas as pd
import datetime as dt
import pandas_datareader.data as web

start_date = pd.to_datetime('2016-01-01').date()
end_date = dt.datetime.today().date()
quotes = web.DataReader('AAPL', 'yahoo', start_date, end_date)

Then the quotes should contain the DataFrame of Apple’s history quotes:

In [27]: print(quotes.iloc[:10,].to_string())
                  Open        High         Low       Close    Volume   Adj Close
Date                                                                            
2016-01-04  102.610001  105.370003  102.000000  105.349998  67649400  102.612183
2016-01-05  105.750000  105.849998  102.410004  102.709999  55791000  100.040792
2016-01-06  100.559998  102.370003   99.870003  100.699997  68457400   98.083025
2016-01-07   98.680000  100.129997   96.430000   96.449997  81094400   93.943473
2016-01-08   98.550003   99.110001   96.760002   96.959999  70798000   94.440222
2016-01-11   98.970001   99.059998   97.339996   98.529999  49739400   95.969420
2016-01-12  100.550003  100.690002   98.839996   99.959999  49154200   97.362258
2016-01-13  100.320000  101.190002   97.300003   97.389999  62439600   94.859047
2016-01-14   97.959999  100.480003   95.739998   99.519997  63170100   96.933690
2016-01-15   96.199997   97.709999   95.360001   97.129997  79010000   94.605802

Note that since the market closed on 01/01/2016, the start date is automatically rounded to the first business day followed.

Stock Statistics

The stock statistics from Yahoo Finance are also very useful. Basically there are two options to download them - parsing from Yahoo Finance URL or querying using Yahoo Query Language(YQL). For the former case, webpage Downloading Yahoo data explains the URL format and tags. And herval/yahoo-finance lists more tags. For the YQL case, lukaszbanasiak/yahoo-finance integrates it into a library.

Examples of downloading stats via URL and YQL can be found from functions get_symbol_yahoo_stats_url and get_symbol_yahoo_stats_yql in my stock analysis library.

Financial Analysis

My current implmentation downloads financial statements from Google Finance using selenium. The selenium webdriver will bring up a browser(e.g. Chrome, Firefox, etc.) window and open the Google Financials page. I wrote a function parse_google_financial_table to parse the financial tables.

However, there are too many drawbacks to the current selenium + Google Finance solution, because (1) selenium needs start web browser, which is pretty slow; (2) Google Finance only provides statements for the past 4-5 quarters or 4 years; and (3) many raw fiancial data are not available in Google Finance…

A more promising candidate is pystock-crawler, which can directly parse and download 10-Q and 10-K filings from SEC EDGAR. Unfortunately, pystock-crawler still doesn’t support Python 3.x, and I don’t want to switch to Python 2.7. So I haven’t tried it by myself.

In addition to the aforementioned data, the insider trading and trading events are also very helpful. But I haven’t got chance to try them.


Data Processing

As stock-related data can be analyzed from many different aspects, a lot of indicators/features have been invented. My superficial understanding is that, technical analysis focuses on history quotes, while fundamental analysis pays more attention to the financial statements. Actually the stats data downloaded from Yahoo Finance and the financial data from Google Finance are already processed data by experts. So this part only covers the commonly used indicators of stock quotes.

Return On Investiment(ROI)

ROI is also called Rate Of Return, which measures the gain/loss of your stock investment in a given period. It’s defined as:

Total Stock Return = ((P1 - P0) + D) / P0
where
	P0 = Initial Stock Price
    P1 = Ending Stock Price
    D  = Dividends

Source code: return_on_investment

Moving Average

Moving average is widely used for smoothing stock data. The Simple Moving Average(SMA) calculates the mean prive of the previous n days. And the Exponential Moving Average(EMA) gives more credits to the most recent quote. In my implementation, the following algorithm is used to calculate EMA:

n: time periods in days
SMA := mean(n-day quotes)
multiplier := (2 / (n + 1))
EMA := (ClosePrice - EMA(previous day)) * multiplier + EMA(previous day).

The corresponding Python code is:

def moving_average(x, n=10, type='simple'):
    x = np.asarray(x)
    if type == 'simple':
        # SMA
        w = np.ones(n)
        w /= w.sum() # weights
        avg = np.convolve(x, w, mode='full')[:len(x)]
        avg[:n] = avg[n]
    else:
        # EMA
        avg = np.zeros_like(x)
        avg[:n] = x[:n].mean() # initialization
        m = 2/(n+1) # multiplier
        for i in np.arange(n,len(x)):
            avg[i] = (x[i] - avg[i-1]) * m + avg[i-1]
    return avg

Moving Average Convergence/Divergence(MACD)

The MACD series is the difference between a “fast” (short period) exponential moving average (EMA), and a “slow” (longer period) EMA of the price series. The average series is an EMA of the MACD series itself. The MACD indicator (or “oscillator”) is a collection of three time series calculated from historical price data: the MACD series proper, the “signal” or “average” series, and the “divergence” series which is the difference between the two.

The most commonly used days for MACD are 12, 26, and 9, that is, MACD(12,26,9), which means that

MACD Line = (12-period EMA – 26-period EMA)
Signal Line = 9-period EMA
Histogram = MACD Line – Signal Line

Source code: macd

Momentum

Momentum measures the rate of the trending(rise or fall) in stock prices. It’s defined as

Momentum = (Today's closing price) - (Closing price n days ago)

A similiar trending indicator is Rate of Change(ROC), which is defined as

ROC = ((current value) / (previous value) - 1) * 100

Source code: momentum and roc

Relative Strength Index(RSI)

RSI is an oscillator to identify the trend, which ranges from 0 to 100, with a value greater than 70 indicating an overbought condition and a value lower than 30 indicating an oversold condition. And a RSI greater than 50 can indicate an uptrend and a RSI less than 50 indicates a downtrend.

The standard algorithm of calculating RSI is:

              100
RSI = 100 - --------
             1 + RS
RS = Average Gain / Average Loss

The very first calculations for average gain and average loss are simple 14 period averages.

First Average Gain = Sum of Gains over the past 14 periods / 14.
First Average Loss = Sum of Losses over the past 14 periods / 14.

The second, and subsequent, calculations are based on the prior averages and the current gain loss:

Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.
Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.

Source code: rsi

Stochastic Oscillator (%K)

The stochastic oscillator, represented by the symbol %K, is based on a window period that typically spans 14 trading days, where the highest high and the lowest low are selected from the range, and the last closing price are used to calculate %K. %K = 0 when the last close is also the low for the window period.

The stochastic oscillator can be calculated by:

Stochastic Oscillator(%K) = (Close Price - Lowest Low) / (Highest High - Lowest Low) * 100
Fast %D = 3-day SMA of %K
Slow %D = 3-day SMA of fast %D

Source code: stochastic

All of these indicators can be plotted into one graph:

from stock_analysis import Symbol

tdg = Symbol('TDG')
tdg.plot()

tdg_plot


Strategies

As a newbie, I don’t know many stock trading strategies. Most of my time were spent on finding good and cheap stocks. So I sitll don’t know anything about stock prediction.

The most practicle strategy suitable(I believe) for personal stock trading that I have found, is the Magic Formula Investing from The Little Book That Beats the Market. I also implemented this strategy in my stock analysis library. My version doesn’t strictly follow Greenblatt’s strategy, because I still cannot find all the needed data. But I believe it already has the core of this strategy.

Example of how to use this strategy:

from stock_analysis import *
sp500 = SP500()   # define an index
sp500.get_financials() # download financial data Google Finance, a bit slow
sp500.get_stats() # calculate key statistic features
sp500_value = value_analysis(sp500)  # do the value analysis

Summary

I want to restate what Greenblatt explained in The Little Book That Beats the Market here:

Stock prices move around wildly over very short periods of time, but this does not mean that the values of the underlying companies have changed very much during that same period.

So always buy shares of a company only when they trade at a large discount to TRUE VALUE as investing with a margin of the safety.

Companies that achieve a high return on capital are likely to have a special advantage of some kind. That special advantage keeps competitors from destroying the ability to earn above-average profits.

But I also agree another point that Greenblatt discussed:

Puting all the time and effort into stock market investing isn’t a very productive use of time.

More than 95% of the stock trading back and forth each day is probably unnecessary.

Do not waste too much of the future potential. There are higher and better uses of your time and intellect.