Simple Optimization in TensorFlow

Actually using TensorFlow to optimize/fit a model is similar to the workflow we outlined in the Basics section, but with a few crucial additions:

  • Placeholder variables for X and y
  • Defining a loss function
  • Select an Optimizer object you want to use
  • Make a train node that uses the Optimizer to minimize the loss
  • Run your Session() to fetch the train node, passing your placeholders X and y with feed_dict

Another Iris Example

Assuming comfort with the general intuition of Logistic Regression, we’ll spin up a trivial example to demonstrate setting up the probem in TensorFlow.

from sklearn.datasets import load_iris
import tensorflow as tf
data = load_iris()

X = data.data
X[:5]
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2]])
y = data.target
y = (y == 0).astype(float)
y[48:54]
array([ 1.,  1.,  0.,  0.,  0.,  0.])
print(X.shape, y.shape)
(150, 4) (150,)

The model

We use tf.placeholder() to slot out nodes we’ll use to pass in observations.

x = tf.placeholder(tf.float32, shape=[None, 4])
y_true = tf.placeholder(tf.float32, shape=None)

The weights and bias terms will update in each iteration. We’ll initialize them to zeros and let TensorFlow do the rest.

y_pred leverages w and b at each step, applying the sigmoid function.

w = tf.Variable([[0, 0, 0, 0]], dtype=tf.float32, name='weights')
b = tf.Variable(0, dtype=tf.float32, name='bias')

y_pred = tf.sigmoid(tf.matmul(w, tf.transpose(x)) + b)

We need to define a loss function for TensorFlow to evaluate against.

The most popular cost function for classification is tf.nn.sigmoid_cross_entropy_with_logits, with labels set to your targets and logits the node/placeholder in your execution graph.

The reduce_mean() gives us the “one over m, times the sum” leading value in our cost function.

loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=y_pred)
loss = tf.reduce_mean(loss)

Finaly, we define an optimization strategy and use that to build a train node.

learning_rate = 0.5
optimizer = tf.train.AdamOptimizer(learning_rate=0.5)
train = optimizer.minimize(loss)

Execute the graph

All told, actually running this model requires initializing the global variables and a call to tf.Session().run() to fetch the train node, passing in our training observations.

NUM_STEPS = 25

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for step in range(NUM_STEPS):
        sess.run(train, feed_dict={x: X, y_true: y})
        
        if step % 5 == 0:
            print(step, sess.run([w, b]))
            
    print(10, sess.run([w, b]))
0 [array([[-0.49999967, -0.49999914, -0.49999964, -0.49999908]], dtype=float32), -0.49999782]
5 [array([[-1.63306212, -1.6268971 , -1.63869369, -1.6400671 ]], dtype=float32), -1.6299324]
10 [array([[-2.16804147, -2.15893531, -2.17636108, -2.1783905 ]], dtype=float32), -2.1634178]
15 [array([[-2.47912383, -2.4683075 , -2.48900652, -2.49141765]], dtype=float32), -2.4736314]
20 [array([[-2.66975784, -2.65789342, -2.68059874, -2.68324351]], dtype=float32), -2.6637332]
10 [array([[-2.76915216, -2.75674129, -2.78049254, -2.78325939]], dtype=float32), -2.7628503]

Printing the weights we can see that within a few quick iterations the model is already learning better values for our features to minimize loss.