Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5

Prediction the direction of stock market prices using Random Forest
#1

Predicting stock market direction is interesting area of many researchers and traders. We can predict candlestick direction by some features with Random Forest algorithm. The key of success here is applying Exponential Smoothing to our features. Exponential Smoothing applies more weightage to the recent observation and exponentially decreasing weights to past observations.
Features we used:

Code:
Relative Strength Index (RSI)
RSI = 100 - 100/(1+RS)
RS = Average Gain Over past 14 days / Average Loss Over past 14 days

RSI is a momentum indicator which determines the stock is overbought or oversold.





Code:
Stochastic Oscillator

%K = 100 * (C - L14)/(H14 - L14)

C = Current Closing Price
L14 = Lowest Low over the past 14 days
H14 = Highest High over the past 14 days

Stochastic follows of speed or the momentum of the price.



Code:
Williams %R
%R = (H14 - C) / (H14 - L14) * -100

Values are between -100 and 0.


Code:
Moving Average Convergence Divergence
MACD = EMA12(C) - EMA26(C)
Signal Line = EMA9 * MACD

EMAn =  n day Exponential Moving Average


Code:
Price Rate Of Change:
PROC (t) = (C(t) - C(t-n) )/ C(t-n)



Code:
On Balance Volume:

If C(t) > C(t - 1) => OBV = OBV(t-1) + Vol(t)
If C(t) < C(t-1) = > OBV = OBV(t-1) - Vol(t)
If C(t) = C(t-1)


Code:
Or target (Y) is Sign(Close (t+d)  -  close (t))


The above features are can be calculated easily in Python with ta_lib library.

So we used these features:

RSI(14), MACD(12,26,9) , Williams(14), %K of Stochastic(14) , OBV, PROC

This method has accuracy of about 90%.
Reply
#2

Develop a trading strategy based on this algorithm. Developing a trading strategy means developing BUY/SELL rules that use this algorithm. The trading rules should tell you the stop loss and take profit target when you open a trade. Once you have developed the trading rules based on this algorithm, you should backtest it on historical data using Python.

Subscribe My YouTube Channel:
https://www.youtube.com/channel/UCUE7VPo...F_BCoxFXIw

Join Our Million Dollar Trading Challenge:
https://www.doubledoji.com/million-dolla...challenge/
Reply
#3

My trading strategy here is only based on BUY/SELL of the classifier. When system returns BUY,it buys and holds until opposite signal. But I think it can't be done on EURUSD 1M time frame due to high volatility and so much noise. Am I right?
The code I wrote for the model is:

Code:
import pandas as pd   #importing Pandas library for creating our dataframe
import numpy as np  #importing Numpy library , it's used for numerical and array usage in Python
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib #joblib for dumping and loading our classifier
from sklearn.ensemble import RandomForestClassifier  #importing RandomForest classifier
from sklearn.metrics import classification_report,roc_auc_score,accuracy_score  
import talib   #importing this library to calculate indicators


#Now we create a dataframe from our OHLC data and name it df
df = pd.read_csv('Data.csv')

#We name the columns
df.columns = ['DateTime','Open','High','Low','Close','Volume']

#We used these features only, and DateTime is not important
df = df[['Open','High','Low','Close','Volume']]

#At first we want to calculate Stochastic %K, I prefer to calculate it manually, so we define L14 column and H14 which are based on Lowest and Highest of past 14 days

df['L14'] = df['Low'].rolling(window=14).min()
df['H14'] = df['High'].rolling(window=14).max()
#Now we can calculate %K based on these above columns
df['%K'] = 100 * ((df['Close'] - df['L14']) / (df['H14'] - df['L14']))

#Now we used ta_lib library to calculate MACD,PROC,RSI, Williams R,OBV

#Price rate of Change
df['PROC'] = talib.ROCP(np.array(df.Close),1)

#RSI(14)
df['RSI'] = talib.RSI(np.array(df.Close))

#Williams R percentage
df['%R'] = talib.WILLR(np.array(df.High),np.array(df.Low),np.array(df.Close))

#MACD (12,26,9) NOTICE: WE NEED ONLY MAIN LINE,NOT SIGNAL LINE
df['MACD'] = talib.MACDFIX(np.array(df.Close))[0]

#On Balance Volume
df['OBV'] = talib.OBV(np.array(df.Close),np.array(df.Volume))


#Now we define our target variable, We want to predict next candlestick direction,so shift here is 1
df['Target'] = np.sign(df.Close.shift(1) - df.Close)

#Now we need to drop the columns which we have no interest in
df = df.drop(['Open','Low','Close','High','Volume','L14','H14'],axis=1)

#And also We drop all the NaN values
df = df.dropna()

#We create another dataframe which only has our features from the original dataframe and named it X_df
#And create another dataframe from the original dataframe which only has our target

X_df = df[['PROC','RSI','%R','MACD','OBV','%K']]
Y_df = df[['Target']]

#Y_val has values of dataframe in NdArray type
Y_val = Y_df.values
#Now we want to apply exponentially smoothing to our dataframe and get it's data as matrix, we named it dataset
#If we remove these two lines, we will get 50% accuracy!! 
dataset = X_df.ewm(span=windowSize, min_periods=windowSize,).mean().as_matrix()
dataset = dataset[windowSize:]

#It creates a window with lookback of 10
def create_dataset(dataset, look_back=10):
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
        a = dataset[i:(i+look_back), 0:6]
        dataX.append(a)
        dataY.append(Y_val[i + look_back, 0])

    return np.array(dataX), np.array(dataY)


#X,Y are the NdArray values which window lookback applied

X,Y = create_dataset(dataset)

#We reshape X to have 2 dimension to input it to our algorithm
X = X.reshape(len(X),-1)

#Making sure Y is integer
Y = Y.astype('int')

#Spliting the X,Y with test size of 0.2 ,  It splits the X,Y values to 20% for testing, 80% for training
trainX, testX,  trainY, testY = train_test_split(X,Y,test_size=0.2)

#Defining classifier variable as one instance of RandomForestClassifer
classifier = RandomForestClassifier(n_jobs=-1,n_estimators=100,verbose=True,oob_score=True)
classifier.fit(trainX,trainY)
predicted = classifier.predict(testX)
print classifier.score(testX,testY)
print classification_report(testY,predicted)
print classifier.oob_score_


Result :

Accuracy Score : 0.878622197922362
             precision    recall  f1-score   support

         -1       0.87      0.95      0.91     14395
          0       0.84      0.04      0.08      2239
          1       0.88      0.94      0.91     14459

avg / total       0.88      0.88      0.85     31093

OOB Score: 0.8701154600713987

With some optimizing and minimizing the window and look back we can get accuracy as high as 94%


If we removing  Doji from our dataframe we will have:


Accuracy Score: 0.9407559087911707
             precision    recall  f1-score   support

         -1       0.94      0.95      0.94     14411
          1       0.94      0.94      0.94     14402

avg / total       0.94      0.94      0.94     28813

OOB Score: : 0.9349929283551267
ROC score :0.9407544394884746

Am I doing right? When I tested it on Demo account,it didn't work as expected. Maybe EURUSD on 1M is so noisy that it can't work as expected. What is the problem here?
Reply
#4

You can test with different window sizes and see if it improves the accuracy. As I had said you need to develop a trading strategy based on this algorithm. Using the trading strategy rules you will generate trading signals that tell you when to buy and when to sell. In a trading signal, there are two things stop loss and take profit target. Stop loss gives you the risk for the trading signal and take profit target gives you the potential reward. What we need is a trading strategy that has above 70% winrate. When you open a trade, you don't open another trade as long as that trade is open. More on it later. You will need to code your trading strategy in python and then backtest it.

This is what you need to do. Predict price after n bars. Use that prediction to develop your trading strategy rules like when you will buy and when you will sell and what will be the stop loss and what will be the take profit for each trade. When developing your trading strategy rules, you can use additional filters like candlestick patterns and see if it improves the performance of your trading strategy. Once you have developed the buy and sell functions in python, you also need to make sure that once you open a trade, your trading strategy doesn't open a new trade.

While coding make sure if price goes below the stop loss, the trade gets closed and your trading strategy starts looking for a new trade. In the same manner when price hits the profit target, your trading strategy closes the trade. You need counters like number of trades, winners, losers, total profit and total loss, net profit and other trading statistics. You can develop a Python Class that does this. The code should be general so that you can use it when you want to backtest a new trading strategy.

Subscribe My YouTube Channel:
https://www.youtube.com/channel/UCUE7VPo...F_BCoxFXIw

Join Our Million Dollar Trading Challenge:
https://www.doubledoji.com/million-dolla...challenge/
Reply
#5

I forgot to answer your question why the trading strategy is not working on 1 minute timeframe. As I said, the algorithm predicts market direction after n bars. So if you want to predict price after 1 minute, you will need price after each 5 second or 10 seconds. In the original paper, if you have read it it says the prediction works well for 44 bars. Your windows size is just 10. Look into it. Most brokers don't provide price data below 1 minute. However Dukascopy and Oanda are two brokers that provide data for 5 second and 10 seconds. All brokers provide tick data. You can use it to resample price data after 5 seconds or 10 seconds.

At this stage you should avoid extra work. Use 1 minute price data to predict price after 60 minutes. In the same manner use 5 minute data to predict 5 hours later. Test your trading strategy on different timeframes like 1 minute, 5 minute, 15 minute, 30 minute, 60 minute, 240 minutes etc. As I said, use additional filters like candlestick patterns and see if it improves the performance. You can also combine 2-3 trading strategy and see if it improves performance. As I said you will have to develop a few backtest classes in python that you can use it test different trading strategy.

Subscribe My YouTube Channel:
https://www.youtube.com/channel/UCUE7VPo...F_BCoxFXIw

Join Our Million Dollar Trading Challenge:
https://www.doubledoji.com/million-dolla...challenge/
Reply
#6

Based on the result of algorithm ( OOB score, ROC score and classifier score) I found out the best look back parameter is 5. Honestly, I got a little confused. I train the algorithm with 1Min data of EURUSD. I run it on 10sec timeframe. Each 50 seconds(5 bars of data), I input them to classifier to predict next bar direction. Will it predict 50 seconds movement direction? Or only 10 seconds movement direction?
My Target variable in the classifier code is : df['Target'] = np.sign(df.Close.shift(1) - df.Close)
Is it right?
OOB Score, ROC Score and classifier score all above 94%. Because of this, I didn't try anymore filters such as Candlestick Pattern or any noise filter such as CCI or etc.
I just want to input it 5 bars of data and predict 6th bars direction. As you mentioned before, I have to use 10 sec bars. Am I right?

For backtesting, I found BackTrader more suitable. What is your choice? Would you mind suggesting me your framework you use for backtesting?

Thank you so much, I really appreciate it.
Reply
#7

df['Target'] = np.sign(df.Close.shift(1) - df.Close) means you have lagged Close and then taken the sign of the difference. So if you are using 10 seconds data. it is predicting the next 10 seconds and if you are using 1 minute data then it is predicting the next 1 minute Close. You should also test by using df['Target'] = np.sign(df.Close.shift(10) - df.Close) for example. Change the shift and see how it affects the results. The paper focuses on predicting after n bars. You should use df.Close.shift(n) and see how it affects the resutls. OOB error is not going to tell you how to good is the trading strategy. For that you will have to do backtesting.

Python is the best language for backtesting. You can use backtrader framework for doing the backtesting. You can develop your own python functions and classes that you can reuse for testing new trading strategies. This is what you should do. Use 5 second data and use df['Target'] = np.sign(df.Close.shift(12) - df.Close) and use 10 second data and use df['Target'] = np.sign(df.Close.shift(6) - df.Close). Check what is the OOB error for each case. Machine learning is all about experimenting and developing new features that can make better predictions. Building machine learning models can take a few weeks. Experiment as much as possible. First we have a rough idea. We build a quick prototype model in python and test our rough idea.

If we get encouraging results in the first prototype model, we need to further refine the features and build a better prototype model. After a few iterations, once you have a prototype model that is giving results that you had wanted you should start planing for building a production model. In production model, you need to use optimized code that reduces latency as much as possible. You need to connect the live data stream with the model. The model should make predictions after n periods which depends on your choice. Ultimately once you have thoroughly tested the trading model and it works very well, you can build it in C++ or Java and reduces the latency to the minimum.

Subscribe My YouTube Channel:
https://www.youtube.com/channel/UCUE7VPo...F_BCoxFXIw

Join Our Million Dollar Trading Challenge:
https://www.doubledoji.com/million-dolla...challenge/
Reply
#8

Hello again,
I achieved the above 90% accuracy but there is a question is definition of the target label.
I defined it as :
Code:
df['Target'] = np.sign(df.Close.shift(60) - df.Close)

I'm using 1Sec data to predict next 1 min data. Window is 60 bars.
Target here is sign of difference between 60th previous close and the recent close. Did I define it right?
Does not need to be difference between next 60th close and recent close?

Thank you
Reply
#9

I think you have made a mistake in the code. It should be other way around. Let's check. Sorry I didn't check the forum for a few days. I found your post today. I will try to answer your question. I will be using EURUSD 1 minute data to make prediction for the next 60 minutes ( 1 hour) since I don't have data in seconds. The Python code is same so it doesn't matter. First I read the data with pandas:

>>> df=pd.read_csv('D:/Shared/MarketData/EURUSD1.csv', header=None)
>>>
>>>
>>>
>>> df.columns=['Date', 'Time', 'Open', 'High', 'Low',
...
...                 'Close', 'Volume']
>>> df.shape
(58893, 7)
>>>
>>> df.head()
         Date   Time     Open     High      Low    Close  Volume
0  2018.01.10  22:22  1.19508  1.19511  1.19506  1.19509      14
1  2018.01.10  22:23  1.19512  1.19515  1.19495  1.19497      23
2  2018.01.10  22:24  1.19494  1.19494  1.19477  1.19491      26
3  2018.01.10  22:25  1.19487  1.19487  1.19481  1.19485      10
4  2018.01.10  22:26  1.19484  1.19484  1.19481  1.19483      12

Now let's check your Target variable:
>>> df['Target'] = np.sign(df.Close.shift(60) - df.Close)
__main__:1: RuntimeWarning: invalid value encountered in sign

>>> df.head()
         Date   Time     Open     High      Low    Close  Volume  Target
0  2018.01.10  22:22  1.19508  1.19511  1.19506  1.19509      14     NaN
1  2018.01.10  22:23  1.19512  1.19515  1.19495  1.19497      23     NaN
2  2018.01.10  22:24  1.19494  1.19494  1.19477  1.19491      26     NaN
3  2018.01.10  22:25  1.19487  1.19487  1.19481  1.19485      10     NaN
4  2018.01.10  22:26  1.19484  1.19484  1.19481  1.19483      12     NaN
>>> df.tail()             Date   Time     Open     High      Low    Close  Volume  Target
58888  2018.06.26  05:35  1.17134  1.17143  1.17132  1.17143      20    -1.0
58889  2018.06.26  05:36  1.17142  1.17142  1.17132  1.17137      25     1.0
58890  2018.06.26  05:37  1.17135  1.17142  1.17123  1.17142      40    -1.0
58891  2018.06.26  05:38  1.17140  1.17156  1.17140  1.17155      16    -1.0
58892  2018.06.26  05:39  1.17154  1.17156  1.17154  1.17156       2    -1.0
>>>
This is what is happening. You have shifted the Close by 60 bars.  The last entry above is row 58892. Let's shift the close 60 bars now:
>>> df.Close.shift(60).tail(100)
The last entries are:
8881    1.17128
58882    1.17133
58883    1.17134
58884    1.17130
58885    1.17136
58886    1.17141
58887    1.17128
58888    1.17133
58889    1.17139
58890    1.17134
58891    1.17143
58892    1.17144
Name: Close, Length: 100, dtype: float64
>>> 58892-60
58832
>>> df.Close.iloc[58832]
1.17144
So the row 58832 in the close column is row 58892 in the df.Close.shift(60) column. This is what you are doing. You are looking ahead in future while making the predictions. Your Target should be like this:
>>> df['Target'] = np.sign(df.Close.shift(-60)-df.Close)
>>> df.tail()
             Date   Time     Open     High      Low    Close  Volume  Target
58888  2018.06.26  05:35  1.17134  1.17143  1.17132  1.17143      20     NaN
58889  2018.06.26  05:36  1.17142  1.17142  1.17132  1.17137      25     NaN
58890  2018.06.26  05:37  1.17135  1.17142  1.17123  1.17142      40     NaN
58891  2018.06.26  05:38  1.17140  1.17156  1.17140  1.17155      16     NaN
58892  2018.06.26  05:39  1.17154  1.17156  1.17154  1.17156       2     NaN
>>> df.Close.shift(-60).iloc[58832]
1.17156
>>>
So you shift the close 60 bars back and then take the difference with the close at that time. This is the correct Target:
>>> df['Target'] = np.sign(df.Close.shift(-60)-df.Close)

Yes your question was correct. We need to shift the close 60 bars back and then difference it with the recent close.

Subscribe My YouTube Channel:
https://www.youtube.com/channel/UCUE7VPo...F_BCoxFXIw

Join Our Million Dollar Trading Challenge:
https://www.doubledoji.com/million-dolla...challenge/
Reply
#10

Thank you so much. I found the Python code of the paper. In their code, there was no any Rolling Window(Look back). Their features shape was only an 2-Dimensional array which second dimension was only number of features (6) . Their input was `len(dataset),6`. Don't we need any look back window? I get really confused.
And another question: When I'm running the machine (the model) for testing in real time, I used 1 Second bar of MT4. Is MT4 data are reliable?

Thank you so much.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)