Using a Keras LSTM to predict Cryptocurrency prices
LSTM's, or Long short-term memory units, are a type of recurrent neural network that are useful in processing sequential data. They keep their hidden state over time, and so have a form of memory.
If you learn one thing from this post, it is that a long short term memory network as simple as the one in this article is not that great at predicting something as random as the swings, and often times the machinations, of the cryptocurrency market. Apologies if you thought this may be your ticket to a life of leisure trading in the crypto markets. It does provide an interesting dataset to use with a LSTM, however.
In the dataset below, I have included the hourly USD to Ethereum prices for the last several months of 2018.
|Date||Symbol||Open||High||Low||Close||Volume From||Volume To|
You can click the export button in the top right to download this dataset to follow along.
Let's load in our project dependencies. In this tutorial I will be using Keras, Scikit-Learn, and numpy.
import pandas as pd from keras.models import Sequential from keras.layers import * import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler
Load in the data with Pandas
data = pd.read_csv("ETH_USD.csv")
That's all there is to loading up our dataset in memory. Pandas makes quick work of modifying small datasets like this one. Now we have tidying up to do. First thing is we are going to make sure our dataset is time series ready. I have written a blog post detailing the preparations required for time series forecasting.
Prepare the data
We need to flip the dataset and add a times shifted column. We will be predicting prices one hour into the future.
df = df.iloc[::-1] data['EthBtcFuture'] = data['Close'] data['EthBtcFuture'] = data['EthBtcFuture'].shift(-1) data.head() #Results of the head.() method Date Symbol Open High Low Close Volume From \ 6388 2017-09-22 07-PM ETHUSD 729.73 755.00 604.19 639.99 52863.64 6387 2017-11-12 05-AM ETHUSD 312.34 316.34 308.00 310.16 3263.16 6386 2017-11-12 06-AM ETHUSD 310.16 310.16 301.12 307.00 3463.15 6385 2017-11-12 07-AM ETHUSD 307.00 314.37 301.37 307.16 6681.07 6384 2017-11-12 08-AM ETHUSD 307.16 310.00 302.48 304.79 2152.84 Volume To EthBtcFuture 6388 35707764.33 310.16 6387 1017470.49 307.00 6386 1056163.67 307.16 6385 2066540.88 304.79 6384 656841.26 302.40
Simplifying the dataset
Let's remove the necessary features to simplify our model for training.
if 'Symbol' in data.columns: data = data.drop('Symbol', axis=1) if 'Date' in data.columns: data = data.drop('Date', axis=1) data = data.dropna()
I have also gone ahead and dropped the NaN value created from shifting our time series column back.
Scaling our data
Next we will preprocess our data and pick our target feature.
y = data['EthBtcFuture'] X = data.drop('EthBtcFuture', axis=1) scaler = StandardScaler() scaler.fit_transform(X)
In this case, we are interested in predicting our future Ethereum price so we set that as our y variable. It is generally not a necessity to scale your target values like you would your training data. Not scaling these data points also saves us the headache of inverse transforming our results when predicting.
Splitting the dataset
X = X.values y = y.values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, shuffle=False)
Here we use scikit-learn's train_test_split method and make our validation set 15% of the size of our training set.
3D input layer
LSTM's require a 3D input layer broken up into samples, time steps, and features.
X_train = X_train.reshape((X_train.shape, 1, X_train.shape)) X_test = X_test.reshape((X_test.shape, 1, X_test.shape))
Setting up Keras
This is a simple example and I have only setup 100 iterations of training for this network. In my tests, mae signficantly outperforms mse as the loss type.
model = Sequential() batch_size = 1 model.add(LSTM(45,return_sequences=False,input_shape=(X_train.shape, X_train.shape))) model.add(Dense(1)) model.compile(loss='mae', optimizer='adam') for i in range(100): model.fit(X_train, y_train, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
Using TensorFlow backend. Epoch 1/1 - 14s - loss: 616.2416 Epoch 1/1 - 13s - loss: 501.9811 Epoch 1/1 - 13s - loss: 392.9617 Epoch 1/1 - 13s - loss: 287.9550 Epoch 1/1 - 13s - loss: 210.8598 Epoch 1/1 - 13s - loss: 165.0787 Epoch 1/1 - 13s - loss: 130.2502 Epoch 1/1 . . . Epoch 1/1 - 14s - loss: 4.0201 Epoch 1/1 - 14s - loss: 3.9662 Epoch 1/1 - 14s - loss: 3.8949 Epoch 1/1 - 14s - loss: 3.8206 Epoch 1/1 - 14s - loss: 3.7651 Epoch 1/1 - 14s - loss: 3.6901 Epoch 1/1 - 14s - loss: 3.6118 Epoch 1/1 - 14s - loss: 3.5896
As you can see, it makes decent progress with just 100 iterations.
Evaluating the model
test_error_rate = model.evaluate(X_test, y_test, verbose=2) #8.43927387400638
As you can see the model overfit the training data. Since MAE as a loss function returns our desired error rate in our original units, you can see our prediction is off by a larger amount than would be profitable to trade, hence the warning from the beginning of the tutorial.
Multi layer LSTM
If you are interested in adding more than one layer in testing out a lstm, the syntax would be as follows for Keras:
#return_sequences=True is mandatory model.add(LSTM(100,return_sequences=True,input_shape=(X_train.shape, X_train.shape))) model.add(LSTM(300))
I think these numbers could be significantly improved tweaking the batch size, amount of neurons in the LSTM layer, number of LSTM layers, and training iterations. You may also be interested in modifying the data before preprocessing. One idea would be taking the log of the volume features. Ideally, you would have additional predictive features you would add to the model.