TensorFlow in a Nutshell — Part One: Basics

tensorflow_nutshellThe fast and easy guide to the most popular Deep Learning framework in the world.

TensorFlow is a framework created by Google for creating Deep Learning models. Deep Learning is a category of machine learning models that use multi-layer neural networks. The idea of deep learning has been around since 1943 when neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how neurons might work and they model a simple neural network using electrical circuits.

Many, many developments have occurred since then. These highly accurate mathematical models are extremely computationally expensive. With recent advances in processing power from GPUs and increasing CPU power Deep Learning has been exploding with popularity.

TensorFlow was created with processing power limitations in mind. Open sourced in November 2015, this library can be ran on computers of all kinds including smartphones. It allows for instant creation of trained production models. It is currently the number 1 Deep Learning framework at the time of writing this article.

Created by Francois Chollet @fchollet (twitter)

Basic Computational Graph

Everything in TensorFlow is based on creating a computational graph. If you’ve ever used Theano then this section will look familiar. Think of a computational graph as a network of nodes, with each node known as an operation, running some function that can be as simple as addition or subtraction to as complex as some multi variate equation.

An Operation also referred to as op can return zero or more tensors which can be used later on in the graph. Heres a list of operations with their output for example

Each operation can be handed a constant, array, matrix or n-dimensional matrix. Another word for an n-dimensional matrix is a tensor, a 2-dimensional tensor is equivalent to a m x m matrix.

Our computational graph

The code above is creating two constant tensors and multiplying them together and outputting our result. This is a trivial example that demonstrates how you can create a graph and run the session. All inputs needed by the op are run automatically. They’re typically ran in parallel. This session run actually causes the execution of three operations in the graph, creating the two constants then the matrix multiplication.

Graph

The constants and operation that we created above was automagically added to the graph in TensorFlow. The graph default is intantiated when the library is imported. Creating a Graph object instead of using the default graph is useful when creating multiple models in one file that do not depend on each other.

new_graph = tf.Graph()
with new_graph.as_default():
    new_g_const = tf.constant([1., 2.])

any variables or operations used outside of the with new_graph.as_default() will be added to the default graph that is created when the library is loaded. You can even get a handle to the default graph with

default_g = tf.get_default_graph()

for most cases it’s best to stick with the default graph.

Session

There are two kinds of Session objects in TensorFlow:

tf.Session()

This encapsulates teh environment that operations and tensors are executed and evaluated. Sessions can have their own variables, queues and readers that are allocated. So it’s important to use the close() method when the session is over. There are 3 arguments for a Session, all of which are optional.

  1. target — The execution engine to connect to.
  2. graph — The Graph to be launched.
  3. config — A ConfigProto protocl buffer with configuration options for the session

To have run one “step” of the TensorFlow computation this function is called and all of the necessary dependencies for the graph to execute are ran.

tf.InteractiveSession()

This is the exact same as tf.Session() but is targeted for using IPython and Jupyter Notebooks that allows you to add things and use Tensor.eval() and Operation.run() instead of having to do Session.run() every time you want something to be computed.

sess = tf.InteractiveSession()
a = tf.constant(1)
b = tf.constant(2)
c = a + b
# instead of sess.run(c)
c.eval()

InteractiveSession allows so that you dont have to explicitly pass Session object.

Variables

Variables in TensorFlow are managed by the Session. They persist between sessions which are useful because Tensor and Operation objects are immutable. Variables can be created by tf.Variable().

tensorflow_var = tf.Variable(1, name="my_variable")

most of the time you will want to create these variables as tensors of zeros, ones or random values:

  • tf.zeros() — creates a matrix full of zeros
  • tf.ones() — creates a matrix full of ones
  • tf.random_normal() — a matrix with random uniform values between an interval
  • tf.random_uniform() — random normally distributed numbers
  • tf.truncated_normal() — same as random normal but doesn’t include any numbers more than 2 standard deviations.

These functions take an inital shape parameter where the dimension of the matrix is defined. For example:

# 4x4x4 matrix normally distribued mean 0 std 1
normal = tf.truncated_normal([4, 4, 4], mean=0.0, stddev=1.0)

To have your variable set to one of these matrix helper functions:

normal_var = tf.Variable(tf.truncated_normal([4,4,4] , mean=0.0, stddev=1.0)

To have these variables initialized you must use TensorFlow’s variable initialization function then pass it to the session. This way when multiple sessions are ran the variables are the same.

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

If you’d like to completely change the value of a variable you can use Variable.assign() operation, this must be run in a session update the value.

initial_var = tf.Variable(1)
changed_var = initial_var.assign(initial_var + initial_var)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
sess.run(changed_var)
# 2
sess.run(changed_var)
# 3
sess.run(changed_var)
# 4
# .... and so on

Sometimes you would like to add a counter inside your model this is where you can do a Variable.assign_add() method which takes a numeric parameter and increments it by the parameter. Similarily there is Variable.assign_sub().

counter = tf.Variable(0)
sess.run(counter.assign_add(1))
# 1
sess.run(counter.assign_sub(1))
# -1

Scope

To control the complexity of models and make them easier to break down into individual pieces TensorFlow has scopes. Scopes are very simple and even help break down your model when using TensorBoard (which will be covered in Part 2). Scopes can even be nested inside of other scopes.

with tf.name_scope("Scope1"):
    with tf.name_scope("Scope_nested"):
        nested_var = tf.mul(5, 5)

Scopes may not seem that powerful right now but used in collaboration with TensorBoard and they’re very useful.

Conclusion

I’ve demonstrated many of the building blocks that TensorFlow offers. These individual pieces added together can create very complicated models. There is much more that TensorFlow offers, if there are any requests for features in upcoming parts let me know.

Recurrent Neural Networks for Beginners

What are Recurrent Neural Networks and how can you use them?

In this post I discuss the basics of Recurrent Neural Networks (RNNs) which are deep learning models that are becoming increasingly popular. I don’t intend to get too heavily into the math and proofs behind why these work and am aiming for a more abstract understanding.

General Recurrent Neural Network information

Recurrent Neural Networks were created in the 1980’s but have just been recently gaining popularity from advances to the networks designs and increased computational power from graphic processing units. They’re especially useful with sequential data because each neuron or unit can use its internal memory to maintain information about the previous input. This is great because in cases of language, “I had washed my house” is much more different than “I had my house washed”. This allows the network to gain a deeper understanding of the statement.

This is important to note because reading through a sentence even as a human, you’re picking up the context of each word from the words before it.

1*mvFUjoPGBjEYx35lU6N3ew
A rolled up RNN

 A RNN has loops in them that allow infromation to be carried across neurons while reading in input.

1*V2W4TCmTj2h1CE7I-DngPw
An unrolled RNN

In these diagrams x_t is some input, A is a part of the RNN and h_t is the output. Essentially you can feed in words from the sentence or even characters from a string as x_t and through the RNN it will come up with a h_t.

The goal is to use h_t as output and compare it to your test data (which is usually a small subset of the original data). You will then get your error rate. After comparing your output to your test data, with error rate in hand, you can use a technique called Back Propagation Through Time (BPTT). BPTT back checks through the network and adjusts the weights based on your error rate. This adjusts the network and makes it learn to do better.

Theoretically RNNs can handle context from the begging of the sentence which will allow more accurate predictions of a word at the end of a sentence. In practice this isn’t necessarily true for vanilla RNNs. This is a major reason why RNNs faded out from practice for a while until some great results were achieved with using a Long Short Term Memory(LSTM) unit inside the Neural Network. Adding the LSTM to the network is like adding a memory unit that can remember context from the very beggining of the input.

1*K9g9EOeQ9Ca0jdOMmXKrQg

These little memory units allow for RNNs to be much more accurate, and have been the recent cause of the popularity around this model. These memory units allow for the ability across inputs for context to be remembered. Two of these units are widely used today LSTMs and Gated Recurrent Units(GRU), the latter of the two are more efficient computationally because they take up less computer memory.

Applications of Recurrent Neural Networks

There are many different applications of RNNs. A great application is in collaboration with Natural Language Processing (NLP). RNNs have been demonstrated by many people on the internet who created amazing models that can represent a language model. These language models can take input such as a large set of shakespeares poems, and after training these models they can generate their own Shakespearean poems that are very hard to differentiate from originals!

Below is some Shakespeare

PANDARUS:
Alas, I think he shall be come approached and the day
When little srain would be attain'd into being never fed,
And who is but a chain and subjects of his death,
I should not sleep.

Second Senator:
They are away this miseries, produced upon my soul,
Breaking and strongly should be buried, when I perish
The earth and thoughts of many states.

DUKE VINCENTIO:
Well, your wit is in the care of side and that.

Second Lord:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.

Clown:
Come, sir, I will make did behold your worship.

VIOLA:
I'll drink it.

This poem was actually written by an RNN. This was from an awesome article here http://karpathy.github.io/2015/05/21/rnn-effectiveness/ that goes more indepth on Char RNNs.

This particular type of RNNs is fed in a dataset of text and reads the input in character by character. The amazing thing about these networks in comparison to feeding in a word at a time is that the network can create it’s own unique words that were not in the vocabulary you trained it on.

1*IMalbwl6uj3nlqxixZYFvA

This diagram taken from the article referenced above shows how the model would predict “hello”. This gives a good visualization of how these networks take in a word character by character and predict the likely hood of the next probable character.

Another amazing application of RNNs is machine translation. This method is interesting because it involves training two RNNs simultaneously. In these networks the inputs are pairs of sentences in different languages. For example you can feed the network an English sentence paired with its French translation. With enough training you can give the network an english sentence and it will translate it to french! This model is called a Sequence 2 Sequences model or Encoder Decoder model.

Figure2_NMT_system

This diagram shows how information flows through Encoders Decoder model. This diagram is using a word embedding layer to get better word representation. A word embedding layer is usally GloVe or Word2Vec algorithm that just takes a bunch of words and creates a weighted matrix that allows similar words to be correlated with each other. Using an embedding layer genererally makes your RNN more accurate because it is a better representation of how similar words are so the net has less to infer.

Conclusion

Recurrent Neural Networks have been becoming very popular as of recently and for a very good reason. They’re one of the most effective models out for natural language processing. New applications of these models are coming out all the time and its exciting to see what researchers come up with.

To play around with some RNN check out these awesome libraries

Tensorflow — Googles Machine Learning frameworks RNN example: https://www.tensorflow.org/versions/r0.10/tutorials/recurrent/index.html

Keras — a high level machine learning package that runs on top of Tensorflow or Theano: https://keras.io

Torch — Facebook machine learning framework in LUA: http://torch.ch