Million Dollar Challenge

Full Version: Defining Target Label in Binary Option Machine Learning Solution
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
When I define target label as :

Code:
df['Target'] = (df.Close.shift(-1) / df.Close) - 1
df['Target'] = np.where(df.Target > 0,1,0)
I got 50% to 53% accuracy. But If define it as:

Code:
df['Target'] = (df.Close.shift(-1) / df.Close.shift(2)) - 1
df['Target'] = np.where(df.Target > 0,1,0)
or
Code:
df['Target'] = (df.Close.shift(-1) / df.Close.rolling(2).mean()) - 1
df['Target'] = np.where(df.Target > 0,1,0)
I got better accuracy around 75%.
Why the classification algorithm works better If we use these two definition?
Thank you so much
Let me check. First lets read the dataframe using pandas:

>>> df = get_data('GBPUSD', 1440)
>>> df.shape[0]
2273
>>> df.shape[1]
5
>>> df.head()
              Open    High     Low   Close  Volume
Datetime
2011-05-19  1.6156  1.6241  1.6129  1.6227   12390
2011-05-20  1.6226  1.6303  1.6165  1.6229   11879
2011-05-22  1.6224  1.6228  1.6209  1.6227     663
2011-05-23  1.6226  1.6232  1.6057  1.6072   12852
2011-05-24  1.6073  1.6207  1.6067  1.6177   12635
>>> df.tail()
               Open     High      Low    Close  Volume
Datetime
2018-08-31  1.30102  1.30277  1.29440  1.29563   42080
2018-09-02  1.29130  1.29305  1.29110  1.29213    2115
2018-09-03  1.29207  1.29329  1.28542  1.28675   33859
2018-09-04  1.28644  1.28701  1.28097  1.28568   42009
2018-09-05  1.28580  1.28693  1.27847  1.28169   21295

I am using GBPUSD Daily data. Now let me check the first target:
>>> df['Target'] = (df.Close.shift(-1) / df.Close) - 1
>>> df.tail()
               Open     High      Low    Close  Volume    Target
Datetime
2018-08-31  1.30102  1.30277  1.29440  1.29563   42080 -0.002701
2018-09-02  1.29130  1.29305  1.29110  1.29213    2115 -0.004164
2018-09-03  1.29207  1.29329  1.28542  1.28675   33859 -0.000832
2018-09-04  1.28644  1.28701  1.28097  1.28568   42009 -0.003103
2018-09-05  1.28580  1.28693  1.27847  1.28169   21295       NaN
>>>
This is the one step ahead prediction target. I think this is correct. Now lets check the second target:
>>> df['Target'] = (df.Close.shift(-1) / df.Close.shift(2)) - 1
>>> df.tail()
                           Open     High      Low    Close  Volume    Target
Datetime
2018-08-31  1.30102  1.30277  1.29440  1.29563   42080 -0.008662
2018-09-02  1.29130  1.29305  1.29110  1.29213    2115 -0.010900
2018-09-03  1.29207  1.29329  1.28542  1.28675   33859 -0.007680
2018-09-04  1.28644  1.28701  1.28097  1.28568   42009 -0.008080
2018-09-05  1.28580  1.28693  1.27847  1.28169   21295       NaN
>>>
Here what you have done is shift Close up 1 step (df.Close.shift(-1) ) and divided it by shifting closing 2 steps down(df.Close.shift(2)) which has produced a ratio which is just meaningless. Let me explain how you are wrong. In the above dataframe the last row is 2018-09-05. df.Close.shift(-1) shifts 2018-0-05 close to 2018-09-04. df.Close.shift(2) shifts 2018-09-02 close two steps down to 2018-0-04. The Target is simply meaningless. 

In the same manner, the last target is:
>>> df['Target'] = (df.Close.shift(-1) /\
...  df.Close.rolling(2).mean()) - 1
>>> df.tail()
               Open     High      Low    Close  Volume    Target
Datetime
2018-08-31  1.30102  1.30277  1.29440  1.29563   42080 -0.004737
2018-09-02  1.29130  1.29305  1.29110  1.29213    2115 -0.005511
2018-09-03  1.29207  1.29329  1.28542  1.28675   33859 -0.002916
2018-09-04  1.28644  1.28701  1.28097  1.28568   42009 -0.003518
2018-09-05  1.28580  1.28693  1.27847  1.28169   21295       NaN
>>>

This is what you are doing. Shifting the close 1 step up and diving it by the rolling mean of 2 which is simply SMA 2. Most of the time we take the rolling mean to smooth the price. Taking rolling mean of 2 means we are just averaging the two adjacent prices. The first target is correct. It will predict one step ahead. The predictive accuracy that you have achieved 52% is correct. The predictive accuracy of 75% for the other targets is meaningless. I had written a post on one of my blogs. You can read it. In the blog post I explain how you are going to backtest your trading strategy. I have got fantastic results. 

http://binary.tradingninja.com/randomfor...5-winrate/

This was due to a minor mistake. You will have to figure out what mistake it was. It will be good education for you as it will illustrate an important principle that will always help you in developing algorithmic trading strategies. Despite the accuracy of 85% as claimed in the paper, the Random Forest strategy failed miserably in backtesting. I explain in detail how in the blog post.
Thank you so much.I checked the blog and rewrite the code. The mistake occurred when the data splited into the train and test data. I see many papers(most of them using Technical Indicators) claimed reached a high accuracy but they took a mistake in their `train_test_split(shuffle=True)`.
The part of the post blog, you calculate the Profit/Loss was amazing. Now I know how to test the models like that.

Can we apply EMA to our dataset and calculate the target from smoothed data? What features you suggest to work?




Thank you so much
Using two EMAs is a good trading strategy. Many professional traders use moving averages in their trading strategy. Trading system comprises two moving averages is a robust system. You should use two moving avarges. For example you can use EMA 21 and EMA 55. EMA 21 is the short term trend and EMA 55 is the long term trend. When EMA 21 goes above EMA 55 , you open a buy trade and when EMA 21 goes below EMA 55 you open a short trade. You should code this trading strategy and test how much profit/loss it will make. 

EMA Trading Strategy Whipsaw Filter
The problem with moving averages is whipsaw. They whipsaw a lot meaning the moving averages cross for a short time and then recross each other. So to avoid this problem you should add a filter when price closes above the EMA21, you should open the buy trade. In the same manner when the price closes below the EMA 21 you will open a sell trade. You can add another rules. Price should close above moving average for 3 consecutive bars for a valid buy trade. Similarly price should close below the EMA 21 for 3 consecutive bars for a valid short trade. T

These are the trading strategy rules you you will code and test:

EMA Trading Strategy Buy Rules:
When EMA21 crosses above EMA 55 from below and price closes above EMA 21 for three consecutive bars, you will open a buy trade. Place stop loss below the recent swing low. Take profit 100 pips. Timeframe 30 Minutes.

EMA Trading Strategy Sell Rules:
When EMA21 crosses below EMA 55 from above and price closes below EMA 21 for three consecutive bars you will open a sell trade. Place stop loss below the recent swing high. Take profit 100 pips. Timeframe 30 Minutes. 

Change the stop loss rules and the take profit rules as well the timeframe and find the optimal timeframe as well as optimal rules. Just as I showed in the blog post, first you will convert the buy and sell trading rules into impulse buy/sell rules and then you can easily test your trading strategy. Keep this in mind most machine algorithms have accuracy like 50-52% so they are only like flipping a coin. Simple trading strategies have a higher chance of working. Work on this EMA Trading Strategy and if you have any questions, I will happily answer them here in this forum.