Contents

Using a Keras LSTM to predict Cryptocurrency prices

September 01th, 2018 crypto-trading

LSTM's, or Long short-term memory units, are a type of recurrent neural network that are useful in processing sequential data. They keep their hidden state over time, and so have a form of memory.

If you learn one thing from this post, it is that a long short term memory network as simple as the one in this article is not that great at predicting something as random as the swings, and often times the machinations, of the cryptocurrency market. Apologies if you thought this may be your ticket to a life of leisure trading in the crypto markets. It does provide an interesting dataset to use with a LSTM, however.

Ethereum Dataset

In the dataset below, I have included the hourly USD to Ethereum prices for the last several months of 2018.

Date Symbol Open High Low Close Volume From Volume To

You can click the export button in the top right to download this dataset to follow along.

Dependencies

Let's load in our project dependencies. In this tutorial I will be using Keras, Scikit-Learn, and numpy.


import pandas as pd
from keras.models import Sequential
from keras.layers import *
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Load in the data with Pandas


 data = pd.read_csv("ETH_USD.csv")

That's all there is to loading up our dataset in memory. Pandas makes quick work of modifying small datasets like this one. Now we have tidying up to do. First thing is we are going to make sure our dataset is time series ready. I have written a blog post detailing the preparations required for time series forecasting.

Prepare the data

We need to flip the dataset and add a times shifted column. We will be predicting prices one hour into the future.


 df = df.iloc[::-1]

 data['EthBtcFuture'] = data['Close']

 data['EthBtcFuture'] = data['EthBtcFuture'].shift(-1)
data.head()

#Results of the head.() method

                  Date  Symbol    Open    High     Low   Close  Volume From  \
6388  2017-09-22 07-PM  ETHUSD  729.73  755.00  604.19  639.99     52863.64   
6387  2017-11-12 05-AM  ETHUSD  312.34  316.34  308.00  310.16      3263.16   
6386  2017-11-12 06-AM  ETHUSD  310.16  310.16  301.12  307.00      3463.15   
6385  2017-11-12 07-AM  ETHUSD  307.00  314.37  301.37  307.16      6681.07   
6384  2017-11-12 08-AM  ETHUSD  307.16  310.00  302.48  304.79      2152.84   

        Volume To  EthBtcFuture  
6388  35707764.33        310.16  
6387   1017470.49        307.00  
6386   1056163.67        307.16  
6385   2066540.88        304.79  
6384    656841.26        302.40  

Simplifying the dataset

Let's remove the necessary features to simplify our model for training.



if 'Symbol' in data.columns:
        data = data.drop('Symbol', axis=1)
if 'Date' in data.columns:
        data = data.drop('Date', axis=1)

data = data.dropna()

I have also gone ahead and dropped the NaN value created from shifting our time series column back.

Scaling our data

Next we will preprocess our data and pick our target feature.


y = data['EthBtcFuture']
X = data.drop('EthBtcFuture', axis=1)

 scaler = StandardScaler()
 scaler.fit_transform(X)

In this case, we are interested in predicting our future Ethereum price so we set that as our y variable. It is generally not a necessity to scale your target values like you would your training data. Not scaling these data points also saves us the headache of inverse transforming our results when predicting.

Splitting the dataset


X = X.values
y = y.values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, shuffle=False)

Here we use scikit-learn's train_test_split method and make our validation set 15% of the size of our training set.

3D input layer

LSTM's require a 3D input layer broken up into samples, time steps, and features.


X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))

Setting up Keras

This is a simple example and I have only setup 100 iterations of training for this network. In my tests, mae signficantly outperforms mse as the loss type.


 model = Sequential()
 batch_size = 1
 model.add(LSTM(45,return_sequences=False,input_shape=(X_train.shape[1], X_train.shape[2])))
 model.add(Dense(1))
 model.compile(loss='mae', optimizer='adam')
 for i in range(100):
     model.fit(X_train, y_train, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)

Training


Using TensorFlow backend.
Epoch 1/1
 - 14s - loss: 616.2416
Epoch 1/1
 - 13s - loss: 501.9811
Epoch 1/1
 - 13s - loss: 392.9617
Epoch 1/1
 - 13s - loss: 287.9550
Epoch 1/1
 - 13s - loss: 210.8598
Epoch 1/1
 - 13s - loss: 165.0787
Epoch 1/1
 - 13s - loss: 130.2502
Epoch 1/1
.
.
.
Epoch 1/1
 - 14s - loss: 4.0201
Epoch 1/1
 - 14s - loss: 3.9662
Epoch 1/1
 - 14s - loss: 3.8949
Epoch 1/1
 - 14s - loss: 3.8206
Epoch 1/1
 - 14s - loss: 3.7651
Epoch 1/1
 - 14s - loss: 3.6901
Epoch 1/1
 - 14s - loss: 3.6118
Epoch 1/1
 - 14s - loss: 3.5896

As you can see, it makes decent progress with just 100 iterations.

Evaluating the model


test_error_rate = model.evaluate(X_test, y_test, verbose=2)

#8.43927387400638

As you can see the model overfit the training data. Since MAE as a loss function returns our desired error rate in our original units, you can see our prediction is off by a larger amount than would be profitable to trade, hence the warning from the beginning of the tutorial.

Multi layer LSTM

If you are interested in adding more than one layer in testing out a lstm, the syntax would be as follows for Keras:


#return_sequences=True is mandatory
model.add(LSTM(100,return_sequences=True,input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(LSTM(300))
   

Conclusion

I think these numbers could be significantly improved tweaking the batch size, amount of neurons in the LSTM layer, number of LSTM layers, and training iterations. You may also be interested in modifying the data before preprocessing. One idea would be taking the log of the volume features. Ideally, you would have additional predictive features you would add to the model.

Back