Million Dollar Challenge

Full Version: Quadratic Discriminant Analysis With R For Algorithmic Trading
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Most of the time as traders we want to predict market movement. It is difficult to predict the precise market movement. However if we divide the market movement into classes like UP or DOWN, we can predict the market movement. This approach only tells us the market direction but it doesn't give us the magnitude. We can further improve the classification by dividing the movement as BIG and SMALL so the classes will BIG UP, SMALL UP and BIG DOWN and SMALL DOWN.

This is what I do as a forex trader. I divide the currency market movement like  this. Currency market movement less than 50 pips is considered as small and market movement above 50 pips is considered as BIG. Reading the book, Introduction to Statistical Learning, Quadratic Discriminant Analysis has been mentioned. In an example the authors use the 2 lags of DOW JONES daily returns to predict tomorrow's market direction and get an accuracy of 61%.

Quadratic Discriminant Analysis (QDA) is an improved version of Linear Discriminant Analysis (LDA) algorithm. In LDA, all classes have the same mean and covariance matrix. In QDA, we make a new assumption that all classes have a different covariance matrix. Each class has got a Gaussian distribution. I decided to test QDA for EURUSD, GBPUSD and USDJPY daily returns.


Code:
#Quadratic Discriminant Model
# Import the csv file
data1 <- read.csv("D:/Shared/MarketData/GBPUSD1440.csv",
                 header=FALSE)
colnames(data1) <- c("Date", "Time", "Open", "High",
                    "Low", "Close", "Volume")
library(quantmod)
data1 <- as.xts(data1[,-(1:2)],
               as.POSIXct(paste(data1[,1],data1[,2]),
                          format='%Y.%m.%d %H:%M'))

tail(data1)
#number of rows
x1 <- nrow(data1)

#calculate simple returns
data1$lr <- diff(log(data1$Close))

data3 <- cbind(data1$lr,data1$lr,data1$lr)
data3$lr.2 <- lag(data3$lr.2, k=1)
data3$lr.1 <- lag(data3$lr.2, k=1)

tail(data3)
data3 <- as.data.frame(data3)
data3$Direction <- ifelse(data3[, 1] >=0, 1, 0)

data3$Direction <- factor(data3$Direction, levels=c(1,0),
                         labels=c("UP","DOWN"))
tail(data3)
data3 <- na.omit(data3)
is.factor(data3$Direction)
train <- data3[1:(nrow(data3)-500),]
test <- data3[-(1:(nrow(data3)-500)),]
library(MASS)
qda.fit=qda(Direction~lr.1+lr.2 ,data=train)
qda.class =predict(qda.fit,test[,2:3])$class
table(qda.class, test$Direction)
mean(qda.class==test$Direction)

#fit logistic regression model
logistic.fit<-glm(Direction~lr.1+lr.2,
                 family=binomial(), data=train)
summary(logistic.fit)
logistic.prob <- predict(logistic.fit, test[,2:3],
                        type="response")
logistic.prob[2:4]
contrasts(train$Direction)
#test$Logistic <-rep("UP")
#test$Logistic[logistic.prob > 0.5] <- "DOWN"
logistic.pred <- ifelse(logistic.prob > 0.5, 1, 0)
test$Logistic <- factor(logistic.pred, levels=c(0,1),
                       labels=c("UP","DOWN"))
table(logistic.pred)
table(test$Logistic, test$Direction)
#calculate acccuracy on the training data
library(caret)
confusionMatrix(test$Direction, test$Logistic)

Above I have posted QDA algorithmic code. I test this QDA R script and found that it achieved 57% accuracy on EURUSD, 54% on GBPUSD and 51% on USDJPY. So I was unable to achieve above 60% winrate on any of the currency pairs. If we use Logistic Regression, we achieve a winrate of just 49% with 2 lags of daily return. So QDA did improve the predictive accuracy to above 55% on average. This is what I am thinking. Improve this QDA algorithm winrate above 65% and then use Ensemble Learning to build a few models with above 65% predictive accuracy and then combine them into one model.
Good forums with a lot of information about many things..