Latest Threads

Forum Statistics
  • Forum posts:226
  • Forum threads:59
  • Members:13,493
  • Latest member:g6syqjd295


Posted by: Hassam
11-12-2018, 04:30 PM
Forum: Probability And Statistics
- Replies (4)

Probability is the logic of randomness and uncertainty. If you are interested in algorithmic trading or want to become a professional quant, you must master probability theory. I find probability theory very fascinating. Probability theory started its journey many centuries back when Gerolamo Cardano wrote the book: The Book On The Game Of Chance. He was trying to calculate the probability of throwing a dice which had become very important for the professional gamblers in those days. You can read the history of probability theory online just Google it. First we discuss probability theory and then we discuss statistics.

Let start with sampling. I hope you are familiar with R language. R is a very powerful language specifically developed for statistical data analysis. If you are not familiar with R, you can refer the thread: Introduction to R in the Algorithmic Trading with R forum where I have introduced the basic R commands for those who are very new and don't know R.

Sampling and Simulation
The heart of probabilistic analysis is sampling and simulation. Most of the time we would want to draw samples from a distribution. R can help a lot in drawing samples from all sorts of probability distributions.

> sample(10,5)
[1]  8  2 10  6  4
> sample(2:8,12, replace=TRUE)
 [1] 3 6 8 3 8 7 8 5 2 3 4 8

In R we use sample() command to sample randomly from a set of numbers with equal probability also known as discrete uniform probability. In the first sample command, we told R to sample randomly with equal probability betwen numbers 1 and 10 and sample 5 times without replacement. In the second sample command, we told R that we want sampling with replacement by using replace=TRUE numbers between 2 and 8 and we want 12 samples. We can also sample from the English alphabet letters:

> sample(letters, 8)

[1] "g" "r" "m" "k" "p" "y" "q" "x"

> sample(1:5,12, replace=TRUE, prob=c(0.2,0.1,0.3,0.2,0.2))

 [1] 3 2 3 1 2 2 5 2 5 1 5 1

In the above sample command, we allowed  the sampling of numbers 1,2,3,4,5 with unequal probabilities. Many books on probability discuss birthday matching problems. Suppose there are 23 people in a room what is the probability of 2 people having the same birthday. R has build in functions for these types of birthday problems:

> pbirthday(23)

[1] 0.5072972

As you can see 23 people have almost 50% probability of having 2 people amongst them with matching birthdays. On your birthday party, if you have invited a lot of people something like 43 friends of yours, there is almost 90% chance that two of them will have matching birthdays:

> pbirthday(43)

[1] 0.9239229

Print this item


Posted by: behdad
11-05-2018, 08:05 PM
Forum: Machine Learning
- Replies (3)

Hi,
When I define target label as :

Code:
df['Target'] = (df.Close.shift(-1) / df.Close) - 1
df['Target'] = np.where(df.Target > 0,1,0)
I got 50% to 53% accuracy. But If define it as:

Code:
df['Target'] = (df.Close.shift(-1) / df.Close.shift(2)) - 1
df['Target'] = np.where(df.Target > 0,1,0)
or
Code:
df['Target'] = (df.Close.shift(-1) / df.Close.rolling(2).mean()) - 1
df['Target'] = np.where(df.Target > 0,1,0)
I got better accuracy around 75%.
Why the classification algorithm works better If we use these two definition?
Thank you so much

Print this item


Posted by: Hassam
10-15-2018, 01:38 PM
Forum: Algorithmic Trading With MATLAB
- No Replies

Many traders are now using MATLAB for developing algorithmic trading strategies. If you have read the books on Quantitative Trading by Dr. Ernie Chan than you must be knowing that he uses MATLAB in his quantitative trading strategies. His books Algorithmic Trading Winning Strategies and Their Rationale and Machine Trading Deploying Computer Algorithms to Conquer the Markets have many MATLAB based quantitative trading strategies. MATLAB is a ideally suited for algorithmic trading strategy development. But is it not free like R and Python. MATLAB is a commercial software.  A few years ago, buying MATLAB was very expensive. You had to pay something like $2K for a license. But recently last year. MathWorks have reduced the price drastically and now you can buy a license for MATLAB for something like $97.

MATLAB is a solid product. MATLAB is being used extensively for higher research in universities and many technology companies. MATLAB is solely focused on mathematical, scientific and engineering applications. There are many advantages of using MATLAB now keeping in view the reduced price. You have a solid software that you can use to develop algorithmic trading strategies using machine learning, deep learning and other quantitative methods. MathWorks the parent company keeps on developing new modules for their product and keeps on updating the existing modules so you don't have to worry about things that can distract you from your trading.

This is something that makes MATLAB superior to Python. Python has problems when you try to install modules. For example, recently I tried very hard to install Zipline algorithmic trading library. It didn't work. I even installed a new Python 3.5 library as the Zipline Github page was saying it works for  Python 2.7 and 3.5 On 2.7, it worked but on 3.5, it am getting the ImportModuleError: Can't Import RLOCK. With MATLAB you won't face these types of difficulties. Python is single threaded due to GIL. MATLAB can be multithreaded. So you can use parallel programming to speed up the execution of your scripts on MATLAB. The community support is also excellent. If you read books on probability, statistics, monte carlo and stuff like that you will find many books providing MATLAB code. I have started this thread. In this thread we will learn basic MATLAB.

Print this item


Posted by: Hassam
09-17-2018, 11:31 AM
Forum: Bayesian Statistics
- Replies (1)

Bayesian statistics is very important for traders to learn. In Bayesian statistics, we start with an assumption and then use the data sequentially to improve upon that assumption. Trading is also like that. As data arrives, we improve our prediction about the market direction. More on that in the thread as we progress. I believe every trader should have some understanding for Bayesian Statistics. Data is arriving sequential and we are interested in knowing the bar when we should enter into a trade and the bar when we should close the trade. Bayesian Statistics is perfectly suited to help us achieve that. At the end of each bar, we can use the Bayes formula and determine whether we have a buy signal, sell signal and no signal. 

Likelihood: I am starting this thread in which I will discuss Bayesian Statistics using R. R is a powerful statistical and data science programming language. When we learn Bayesian Statistics, we will also learn how to implement Bayesian Statistical models in R. Just like the Frequentist Statistics, in Bayesian Statistics Likelihood is very important. Likelihood is the probability of observing the data with the given model assumption. Likelihood function is very important in both Bayesian and Non Bayesian Statistics as it influences the inferences that we draw from the data.

Prior: This is something important for you to understand from the very beginning. In Bayesian Statistics, data is considered to be fixed and non random while the model parameters are considered to be random and uncertain. This makes sense. Once we have the data there is no uncertainty left in it. Now we need to fit a model to that data using a number of parameters. These parameters are not known at the start of the process of fitting the model to the data. This forces us to use a prior distribution for the parameters. This prior distribution works as an initial guess that we have about the model parameters. Sometime we use a flat prior. Flat prior means that the parameter value is uniformly distributed in an interval and we consider the parameter to be equally likely in that interval.

Now flat priors are not the best priors. Think of the Bayesian Model is an information processing machine that works sequentially. Weakly informative priors are better when it comes to nudging the Bayesian Machine in the right direction. This is another advantage of Bayesian Statistics. We can use our expert knowledge in the model to improve our predictions. This is precisely what we traders do. Most of the time we  have an opinion about the market like its direction. Bayesian Statistics allows us to build models that can incorporate our opinion as market experts in the model. If our opinion is wrong, the model should be able to filter that opinion. We will see how to do that as we progress in the thread. So basically priors are assumptions and like other assumptions need to be tested. 

Posterior: Posterior is what we get when we combine the prior with the likelihood. We use the Bayes formula to calculate the posterior. You can think of posterior is the update of our belief with the arrival of new data. Bayes formula let's us flip the probability and calculate the inverse probability. Below is the Bayes formula in simple terms:

Posterior= Likelihood X Prior/Average Likelihood

Average Likelihood is also known as Marginal Likelihood.

Print this item


Posted by: Bilal
08-09-2018, 06:54 PM
Forum: Quantitative Trading Strategies
- No Replies

Hi

I am new to divergence but I find it simpler to use. I have a setup for it but I would like to find out if there’s anything extra I can enter to make it more accurate like entry and exit points and signals for divergence.heres a screenshot of what it looks like so far

[img]webkit-fake-url://db4c5a31-f59b-4b11-b209-789072a3fd0f/imagepng[/img]

Print this item


Posted by: behdad
07-26-2018, 07:30 PM
Forum: Machine Learning
- Replies (13)

Predicting stock market direction is interesting area of many researchers and traders. We can predict candlestick direction by some features with Random Forest algorithm. The key of success here is applying Exponential Smoothing to our features. Exponential Smoothing applies more weightage to the recent observation and exponentially decreasing weights to past observations.
Features we used:

Code:
Relative Strength Index (RSI)
RSI = 100 - 100/(1+RS)
RS = Average Gain Over past 14 days / Average Loss Over past 14 days

RSI is a momentum indicator which determines the stock is overbought or oversold.





Code:
Stochastic Oscillator

%K = 100 * (C - L14)/(H14 - L14)

C = Current Closing Price
L14 = Lowest Low over the past 14 days
H14 = Highest High over the past 14 days

Stochastic follows of speed or the momentum of the price.



Code:
Williams %R
%R = (H14 - C) / (H14 - L14) * -100

Values are between -100 and 0.


Code:
Moving Average Convergence Divergence
MACD = EMA12(C) - EMA26(C)
Signal Line = EMA9 * MACD

EMAn =  n day Exponential Moving Average


Code:
Price Rate Of Change:
PROC (t) = (C(t) - C(t-n) )/ C(t-n)



Code:
On Balance Volume:

If C(t) > C(t - 1) => OBV = OBV(t-1) + Vol(t)
If C(t) < C(t-1) = > OBV = OBV(t-1) - Vol(t)
If C(t) = C(t-1)


Code:
Or target (Y) is Sign(Close (t+d)  -  close (t))


The above features are can be calculated easily in Python with ta_lib library.

So we used these features:

RSI(14), MACD(12,26,9) , Williams(14), %K of Stochastic(14) , OBV, PROC

This method has accuracy of about 90%.

Print this item


Posted by: behdad
07-20-2018, 04:56 PM
Forum: Machine Learning
- Replies (3)

I made a model and test it on unseen data. I am using Random Forest algorithm. The classifier,get 3 candlestick OHLC data and predict 4th candlestick direction. The evaluation of model are:

Accuracy : 89.8 %

Classification Report:
    precision    recall  f1-score   support

          0       0.89      0.91      0.90     15987
          1       0.91      0.89      0.90     16099

avg / total       0.90      0.90      0.90     32086



Roc_Auc : 89.8 %
Cohen Kappa Score: 79.60 %

Due to imbalance of dataset, Is this a good classifier? When I tested on a demo account,It has many false signals, sometimes It gives many same direction signal. I really doubt about these metrics.
Train dataset is OHLC of 2/6/2017 until 31/10/2017
Test dataset is OHLC of 1/5/2018 until 1/6/2018

Print this item


Posted by: Hassam
07-20-2018, 06:20 AM
Forum: Machine Learning
- No Replies

Today we live in the Age of Algorithms. Computer algorithms have some connection with things we do on a daily basis. But you will be surprised to discover that humans have been using algorithms for thousands of years. In simple terms algorithms are recipes that tell us in a step by step manner how to solve a problem. The only difference is that in the past algorithms could be time consuming while today computers can solve most algorithms in less than a few seconds. Algorithms will play a major role in the 21st century. 21st century is being called the Century of Algorithms. I have started this thread for those people who are beginners and fear algorithms as if they are something supernatural. Nothing of this is true. Algorithms are mathematical solutions to problems that we use to tell computers how to solve problems in real time.

Did you notice I said algorithms are mathematical solutions to problems? What if the mathematical solution to a problem is wrong? This is precisely the risk that we take when we let the computer algorithms make important decisions. These algorithms can be wrong and can result in major losses. So always keep this in mind algorithms can make wrong decisions that can result in heavy costs. So in simple terms algorithms are recipes that show how  to solve a problem in a sequence of steps. This is important algorithms must be finite and able to solve the particular problems so that we can evaluate whether the solution is correct or not.

Euclid was the first Greek mathematician who developed an algorithm to find the greatest common divisor (GCD) of two natural numbers. Later on better algorithms were found that solved the GCD problem. So there is always a race in developing a better algorithm that can solve the problem precisely in less time. Modern examples are sequencing the human genome. So there can be multiple algorithms that solve the same problem. We need to decide which algorithm is the most suitable for our need.

The steps required by the algorithm should be precise and well defined so that the computer is able to take those steps. Algorithm should be able to solve all the cases of the problem. This was the simple introduction to algorithms. Today algorithms are increasingly being used in almost all the fields like finance, medicine, communications, science, aviation, weapon technologies, agriculture, industrial production etc. As said above, algorithms are affecting all aspects of our daily life. So we need to know what are these algorithms and how they affect us in our daily life.

Print this item


Posted by: behdad
07-16-2018, 02:35 PM
Forum: Fuzzy Logic
- Replies (3)

I made a dataframe of Fuzzified Candlesticks from a paper. My dataframe looks like this:

Code:
 Lupper  Lbody Llower           OStyle            CStyle       Var  Color
    0  equal  short  equal  open_equal_high        close_high  larg_inc    red
    1  equal  short  equal  open_equal_high       close_equal   ext_dec  green
    2  equal  equal  equal  open_equal_high  close_equal_high    sm_inc    red
    3  equal  equal  short        open_high  close_equal_high  norm_dec  green
    4  equal  equal  equal  open_equal_high  close_equal_high    sm_dec  green
    5  equal  short  equal   open_equal_low   close_equal_low   ext_dec  green
    6  equal  equal  equal       open_equal       close_equal   ext_dec  green
    7  equal  equal  equal  open_equal_high       close_equal    sm_dec  green
    8  short  short  equal   open_equal_low   close_equal_low    sm_inc    red
    9  short  short  equal   open_equal_low       close_equal   ext_dec  green
Input is 5 candlesticks and output is color of 6th candlestick. Target(label) is Color and features are Lbody,Lupper,Llower,OStyle and Cstyle.
I'm trying to train it with Keras library with back end of Tensorflow.
My code is:



Code:
df['Color'].replace('green',1,inplace=True)
df['Color'].replace('red',2,inplace=True)
df['Color'].replace('cross',0,inplace=True)

cols_to_transform = ['Lupper','Lbody','Llower','OStyle','CStyle','Var']
    
    df = pd.get_dummies(df,columns=cols_to_transform)
    
    def create_dataset(dataset, look_back=1):
        dataX, dataY = [], []
        for i in range(len(dataset)-look_back-1):
            a = dataset[i:(i + look_back), 1:29]
            dataX.append(a)
    
            dataY.append(dataset[i + look_back, 0])
    
        return np.array(dataX), np.array(dataY)
    
    X, Y = create_dataset(df.values,look_back=5)
    
    
    Y = to_categorical(Y,num_classes=3)
    
    model = Sequential()
    model.add(LSTM(64,input_shape=(5,28),return_sequences=True))
    model.add(Activation('relu'))
    model.add(Dropout(0.2))
    model.add(LSTM(64,return_sequences=True))
    model.add(Activation('relu'))
    model.add(Dropout(0.2))
    model.add(LSTM(32))
    model.add(Dropout(0.2))
    model.add(Dense(3))
    model.add(Activation('softmax'))
    model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['acc'])

acc is not improved and stuck in 40% to 42%. Also I try it with SVM and I got same result. What's the problem with my code/model?

Print this item


Posted by: Hassam
06-26-2018, 05:24 AM
Forum: Algorithmic Trading With Python
- Replies (1)

Digital Signal Processing is an important subject that is taught in Electrical Engineering degree courses. There are many books that try to apply digital signal processing to financial market prediction and algorithmic trading. I have started this thread in which I would discuss whether we can apply digital signal processing techniques in predicting the financial market. First we read the 1 minute GBPUSD price data and then use the scipy signal package to determine the peaks/valleys.


Code:
import scipy.signal
from peakutils.plot import plot as pplot
print('Detect peaks without any filters.')
df2=np.array(df['Close'])
df2=df2[-300:]
#indexes = scipy.signal.find_peaks_cwt(df2, np.arange(1, 30),\
#    max_distances=np.arange(1, 30)*2)
indexes = scipy.signal.find_peaks_cwt(-df2, np.arange(1, 20))
indexes = np.array(indexes) - 1
print('Peaks are: %s' % (indexes))
x=np.arange(0,len(df2))
y=df2
plt.figure(figsize=(40,40))
plt.title("Price")
plt.plot(x,y)
plt.show()
print(x[indexes], y[indexes])
plt.figure(figsize=(20,20))
pplot(x, y, indexes)
plt.title('Peaks')

#determine the peaks easily with this code
indexes = scipy.signal.find_peaks_cwt(df2, np.arange(1, 20))
Below is the screenshot of the valleys identified by scipy.signal.find_peaks_cwt package. This package first uses a wavelet transform and then determines the peaks.
[Image: peak1.png]
These were the valleys. We can also determine the peaks also very easily with the above code.
[Image: peak2.png]
Looking at the above two screenshots, you can judge how good is the algorithm in determining the peaks/valleys.

Print this item