Login

behdad · 07-16-2018, 02:35 PM

I made a dataframe of Fuzzified Candlesticks from a paper. My dataframe looks like this:

Code:
 Lupper  Lbody Llower           OStyle            CStyle       Var  Color

    0  equal  short  equal  open_equal_high        close_high  larg_inc    red

    1  equal  short  equal  open_equal_high       close_equal   ext_dec  green

    2  equal  equal  equal  open_equal_high  close_equal_high    sm_inc    red

    3  equal  equal  short        open_high  close_equal_high  norm_dec  green

    4  equal  equal  equal  open_equal_high  close_equal_high    sm_dec  green

    5  equal  short  equal   open_equal_low   close_equal_low   ext_dec  green

    6  equal  equal  equal       open_equal       close_equal   ext_dec  green

    7  equal  equal  equal  open_equal_high       close_equal    sm_dec  green

    8  short  short  equal   open_equal_low   close_equal_low    sm_inc    red

    9  short  short  equal   open_equal_low       close_equal   ext_dec  green

Input is 5 candlesticks and output is color of 6th candlestick. Target(label) is Color and features are Lbody,Lupper,Llower,OStyle and Cstyle.
I'm trying to train it with Keras library with back end of Tensorflow.
My code is:

Code:
df['Color'].replace('green',1,inplace=True)

df['Color'].replace('red',2,inplace=True)

df['Color'].replace('cross',0,inplace=True)

cols_to_transform = ['Lupper','Lbody','Llower','OStyle','CStyle','Var']

    df = pd.get_dummies(df,columns=cols_to_transform)

    def create_dataset(dataset, look_back=1):

        dataX, dataY = [], []

        for i in range(len(dataset)-look_back-1):

            a = dataset[i:(i + look_back), 1:29]

            dataX.append(a)

            dataY.append(dataset[i + look_back, 0])

        return np.array(dataX), np.array(dataY)

    X, Y = create_dataset(df.values,look_back=5)

    Y = to_categorical(Y,num_classes=3)

    model = Sequential()

    model.add(LSTM(64,input_shape=(5,28),return_sequences=True))

    model.add(Activation('relu'))

    model.add(Dropout(0.2))

    model.add(LSTM(64,return_sequences=True))

    model.add(Activation('relu'))

    model.add(Dropout(0.2))

    model.add(LSTM(32))

    model.add(Dropout(0.2))

    model.add(Dense(3))

    model.add(Activation('softmax'))

    model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

acc is not improved and stuck in 40% to 42%. Also I try it with SVM and I got same result. What's the problem with my code/model?

***Hassam*** · 07-17-2018, 05:28 AM

Why have you selected Color as the target variable? I think the appropriate target variable is Variation. Just lag it and then use it as a target variable. Variation will tell you how much the price is expected to move. You can use this information as a trading signal. For example, if variation is expected to be large, you can open a buy/sell trade in the predicted direction. If the predicted variation is small, you can decide not to trade the signal as the expected price movement is small.

As far as deep learning, I think LSTM network was developed for speech recognition. Deep learning has worked well in speech recognition and image classification. But I haven't come across a good deep learning model that can be used in financial markets. I can be wrong. The main reason is the utmost secrecy practiced by quants in the hedge funds and big financial firms in revealing their algorithms. It is a believe among them that revealing their algorithms is going to make their effectiveness vanish. So if people have good working deep learning models that they are using in financial markets, they are not revealing them.

You should build a few more models using logistic regression, support vector machines, decision trees and other classification models and check if it boosts the performance? In the end, we can combine these models using ensemble machine learning. The problem with financial data is that it is non stationary which makes them unsuitable for most data mining algorithms.

behdad · 07-17-2018, 09:56 AM

I selected Color variable as target because I want to classify next candlestick movement direction for using as Scalping System. When Color is green,system try to buy and hold until next opposition signal then close it and vice versa. I found many papers which just try to classify Variation as target not Color variable as you said. (And thank you for giving me the Fuzzy candlestick paper I asked by Email). I used LSTM because the input can be 3 dimensional. (batch_size, steps, features). I read it in `Deep Learning with Keras` that for forecasting financial markets, LSTM is used because of it's input. (Importance of time-series)
I remember your post on DoubleDoji website that you mentioned If we use Deep Learning we can achieve better accuracy,so I tried it with Keras.

Now I will try it in SVM,MLP, Decision Tree and Logistic Regression and target as Variation variable. I will reply to this thread my models progress and performance.
Thank you so much, You are the only and first one who motivate and lead me to using Machine Learning in Financial Market.

***Hassam*** · 07-18-2018, 05:03 AM

I forgot to mention Naive Bayes and Nearest Neighbors algorithm. Nearest Neighbors algorithm might help. What these algorithms will do predict based on the past history. If history doesn't repeat itself than the predictions will be wide off the market. When building these models you should take care of class imbalance. Class imbalance means one or more classes can be much lower or higher than the other classes. If there is class imbalance, these algorithms will select more from the majority classes so the results will be wrong most of the times. You can take care of class imbalance by binning the target variable into bins having almost equal classes.

I think you haven't given much importance to non stationary nature of financial time series. The probability distribution generating the financial time series data is constantly changing. Above classification algorithms fail when the underlying distribution generating the data is non stationary. Non stationary in simple terms means that the mean and volatility of the financial time series is constantly changing. If the time series is non stationary than the Law of Large Numbers cannot be used to calculate the mean of the time series. Building models can be a time consuming process requiring a lot of backtesting.

Login
Username:
Password:	Lost Password?
	Remember me