{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 勾配\n", "\n", "* すべての変数の偏微分をベクトルとしてまとめたもの\n", "* 関数の値を最も減らす方向を指すことができる" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def numerical_gradient(f, x):\n", " \"\"\"\n", " 勾配を求めます\n", " \n", " Parameters\n", " ----------\n", " f : function\n", " 勾配を求めたい、1つの引数を持つ関数\n", " x : np.array\n", " 一つの引数に与える値の配列\n", " \"\"\"\n", " h = 1e-4 # 0.0001\n", " grad = np.zeros_like(x) # 同じ長さ・形状の配列を0で埋めたもの\n", " \n", " for idx in range(x.size):\n", " tmp_val = float(x[idx]) # 元の値を保存する\n", " # f(x+h) の計算\n", " x[idx] = tmp_val + h\n", " fxh1 = f(x)\n", " \n", " # f(x-h) の計算\n", " x[idx] = tmp_val - h\n", " fxh2 = f(x)\n", " \n", " grad[idx] = (fxh1 - fxh2) / (2*h)\n", " x[idx] = tmp_val # 値を元に戻す\n", " \n", " return grad" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def function_2(x):\n", " return x[0]**2 + x[1]**2" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([6., 8.])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numerical_gradient(function_2, np.array([3.0, 4.0]))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 4.])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numerical_gradient(function_2, np.array([0.0, 2.0]))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([6., 0.])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numerical_gradient(function_2, np.array([3.0, 0.0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 勾配を可視化する" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def numerical_gradient_batch(f, X):\n", " if X.ndim == 1:\n", " return numerical_gradient(f, X)\n", " else:\n", " grad = np.zeros_like(X)\n", " \n", " for idx, x in enumerate(X):\n", " grad[idx] = numerical_gradient(f, x)\n", " \n", " return grad" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pylab as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "x0 = np.arange(-2, 2.5, 0.25)\n", "x1 = np.arange(-2, 2.5, 0.25)\n", "X, Y = np.meshgrid(x0, x1)\n", "\n", "X = X.flatten()\n", "Y = Y.flatten()\n", "\n", "grad = numerical_gradient_batch(function_2, np.array([X, Y]).T).T\n", "\n", "plt.figure()\n", "plt.quiver(X, Y, -grad[0], -grad[1], angles=\"xy\",color=\"#666666\")\n", "plt.xlim([-2, 2])\n", "plt.ylim([-2, 2])\n", "plt.xlabel('x0')\n", "plt.ylabel('x1')\n", "plt.grid()\n", "plt.draw()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 勾配法 (gradient method)\n", "\n", "* 機械学習では、学習時に最適なパラメータを探索\n", "* ニューラルネットワークでは、重みとバイアスを学習時に探索する\n", "* ニューラルネットワークでの最適なパラメータとは、**損失関数が最小値をとる**とき\n", "* 関数の最小値を探索するために、勾配を使うのが勾配法\n", " * ただし実際は、複雑な関数になると勾配が指す方向が必ず最小値になるわけではない\n", " * 最小値、極小値(関数が一番小さい値を返す引数)、鞍点(saddle point)の可能性がある\n", " * また、「プラトー」と呼ばれる、平坦な値しか出ず学習が進まない場合もある\n", "* 最小値を探す場合を**勾配降下法 (gradient decrent method)**という\n", "* 最大値を探す場合を**勾配上昇法 (gradient ascent method)**という" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "勾配法 (勾配降下法) の数式は、\n", "* $\\eta$(イタまたはエータ) を更新の量\n", "* $\\frac{\\partial f}{\\partial x_0}$ と $\\frac{\\partial f}{\\partial x_1}$は 関数$f(x)$の偏微分\n", "$$\n", " x_0 = x_0 - \\eta\\frac{\\partial f}{\\partial x_0}\n", "$$\n", "$$\n", " x_1 = x_1 - \\eta\\frac{\\partial f}{\\partial x_1}\n", "$$\n", "\n", "* 更新の量 $\\eta$ はニューラルネットワークの学習において、**学習率 (learning rate)** という\n", "* 学習率の値は事前に決める必要がある\n", " * 大きすぎても、小さすぎてもだめ\n", " * 学習率の値を変更しながら、学習できているのかを確認していく必要がある\n", "* 学習率のようなパラメータを、**ハイパーパラメーター**という" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def gradient_descent(f, init_x, lr=0.01, step_num=100):\n", " \"\"\"\n", " 勾配法による関数の極小値(または最小値)を求めます\n", " \n", " \n", " Parameters\n", " ----------\n", " f : function\n", " 勾配を求めたい、1つの引数を持つ関数\n", " init_x : np.array\n", " 引数の初期値\n", " lr : float\n", " 学習率 (learning rate)\n", " step_num : int\n", " 勾配法による繰り返しの数\n", " \"\"\"\n", " x = init_x\n", " \n", " for i in range(step_num):\n", " grad = numerical_gradient_batch(f, x)\n", " x -= lr * grad\n", " \n", " return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$f(x_0, x_1) = x_0^2 + x_1^2$ の最小値を勾配法で求める):" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def function_2(x): # 関数の最小値は 0\n", " return x[0]**2 + x[1]**2" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-6.11110793e-10, 8.14814391e-10])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "init_x = np.array([-3.0, 4.0])\n", "gradient_descent(function_2, init_x, lr=0.1, step_num=100)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2.34235971e+12, -3.96091057e+12])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 学習率が大きいと?\n", "gradient_descent(function_2, init_x, lr=10.0, step_num=100)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2.34235971e+12, -3.96091057e+12])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 学習率が小さいと?\n", "gradient_descent(function_2, init_x, lr=1e-10, step_num=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ニューラルネットワークに対する勾配" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import sys, os\n", "sys.path.append(os.path.abspath(os.path.join('..', 'sample')))\n", "import numpy as np\n", "from common.functions import softmax, cross_entropy_error\n", "from common.gradient import numerical_gradient\n", "\n", "class simpleNet:\n", " \"\"\"\n", " 勾配法による、ニューラルネットワークの学習\n", " \n", " Attributes\n", " ----------\n", " W : np.array\n", " 2 × 3 の重みパラメータ\n", " \"\"\"\n", " def __init__(self):\n", " self.W = np.random.randn(2,3)\n", "\n", " def predict(self, x):\n", " \"\"\"\n", " 予測するためのメソッド\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 入力データ\n", " \"\"\"\n", " return np.dot(x, self.W)\n", "\n", " def loss(self, x, t):\n", " \"\"\"\n", " 損失関数の値を求めます\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 入力データ\n", " t : np.array\n", " 正解ラベル\n", " \"\"\"\n", " z = self.predict(x)\n", " y = softmax(z)\n", " loss = cross_entropy_error(y, t)\n", "\n", " return loss" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.24055208 -0.08417877 0.10270625]\n", " [ 0.57949381 0.17237454 0.73158598]]\n" ] } ], "source": [ "net = simpleNet()\n", "print(net.W)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.66587567 0.10462982 0.72005113]\n" ] } ], "source": [ "x = np.array([0.6, 0.9])\n", "p = net.predict(x)\n", "print(p)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.argmax(p) # 最大値のインデックス" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9113499186284006" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = np.array([0, 0, 1]) # 正解ラベル\n", "net.loss(x, t)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.22846974 0.1303415 -0.35881124]\n", " [ 0.34270461 0.19551225 -0.53821686]]\n" ] } ], "source": [ "# 勾配を求める\n", "f = lambda w: net.loss(x, t)\n", "dW = numerical_gradient(f, net.W)\n", "print(dW)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 学習アルゴリズムの実装\n", "\n", "1. 訓練データを無作為に選択し、そのデータをミニバッチとする\n", "2. ミニバッチの損失関数を減らすために、各重みパラメータの勾配を求める\n", "3. 重みパラメータを勾配方向に、微小値だけ更新する\n", "\n", "これを繰り返す\n", "\n", "* ミニバッチとして、無作為に選ばれたデータを使用している方法を**確率的勾配降下法 (strochastic gradient descent)** という\n", "* 頭文字をとって**SGD** と呼ばれる関数で実装されるのが一般的、" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from common.functions import *\n", "from common.gradient import numerical_gradient\n", "\n", "class TwoLayerNet:\n", " \"\"\"\n", " 2層ニューラルネットワーク\n", " \n", " Attributes\n", " ----------\n", " params : dictionary object\n", " ニューラルネットワークで保持する重みとバイアス\n", " \"\"\"\n", " def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):\n", " \"\"\"\n", " Parameters\n", " ----------\n", " input_size : int\n", " 入力層のニューロンの数\n", " hidden_size : int\n", " 隠れ層のニューロンの数\n", " output_size : int\n", " 出力層のニューロンの数\n", " \"\"\"\n", " # 重みの初期化\n", " self.params = {}\n", " self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)\n", " self.params['b1'] = np.zeros(hidden_size)\n", " self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)\n", " self.params['b2'] = np.zeros(output_size)\n", "\n", " def predict(self, x):\n", " \"\"\"\n", " 推論を行います\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 画像データ\n", " \"\"\"\n", " W1, W2 = self.params['W1'], self.params['W2']\n", " b1, b2 = self.params['b1'], self.params['b2']\n", " \n", " a1 = np.dot(x, W1) + b1\n", " z1 = sigmoid(a1)\n", " a2 = np.dot(z1, W2) + b2\n", " y = softmax(a2)\n", " \n", " return y\n", " \n", " def loss(self, x, t):\n", " \"\"\"\n", " 損失関数の値を求めます\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 画像データ\n", " t : np.array\n", " 画像データの正解ラベル\n", " \"\"\"\n", " y = self.predict(x)\n", " \n", " return cross_entropy_error(y, t)\n", " \n", " def accuracy(self, x, t):\n", " \"\"\"\n", " 認識精度を求めます\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 画像データ\n", " t : np.array\n", " 画像データの正解ラベル\n", " \"\"\"\n", " y = self.predict(x)\n", " y = np.argmax(y, axis=1)\n", " t = np.argmax(t, axis=1)\n", " \n", " accuracy = np.sum(y == t) / float(x.shape[0])\n", " return accuracy\n", " \n", " def numerical_gradient(self, x, t):\n", " \"\"\"\n", " 重みパラメータに対する勾配を求めます\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 画像データ\n", " t : np.array\n", " 画像データの正解ラベル\n", " \"\"\"\n", " loss_W = lambda W: self.loss(x, t)\n", " \n", " grads = {}\n", " grads['W1'] = numerical_gradient(loss_W, self.params['W1'])\n", " grads['b1'] = numerical_gradient(loss_W, self.params['b1'])\n", " grads['W2'] = numerical_gradient(loss_W, self.params['W2'])\n", " grads['b2'] = numerical_gradient(loss_W, self.params['b2'])\n", " \n", " return grads\n", " \n", " def gradient(self, x, t):\n", " \"\"\"\n", " 重みパラメータに対する勾配を求めます\n", " numerical_gradient の高速版\n", " \n", " Parameters\n", " ----------\n", " x : np.array\n", " 画像データ\n", " t : np.array\n", " 画像データの正解ラベル\n", " \"\"\"\n", " W1, W2 = self.params['W1'], self.params['W2']\n", " b1, b2 = self.params['b1'], self.params['b2']\n", " grads = {}\n", " \n", " batch_num = x.shape[0]\n", " \n", " # forward\n", " a1 = np.dot(x, W1) + b1\n", " z1 = sigmoid(a1)\n", " a2 = np.dot(z1, W2) + b2\n", " y = softmax(a2)\n", " \n", " # backward\n", " dy = (y - t) / batch_num\n", " grads['W2'] = np.dot(z1.T, dy)\n", " grads['b2'] = np.sum(dy, axis=0)\n", " \n", " dz1 = np.dot(dy, W2.T)\n", " da1 = sigmoid_grad(a1) * dz1\n", " grads['W1'] = np.dot(x.T, da1)\n", " grads['b1'] = np.sum(da1, axis=0)\n", "\n", " return grads" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "写経" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "W1.shape = (784, 100)\n", "b1.shape = (100,)\n", "W2.shape = (100, 10)\n", "b2.shape = (10,)\n", "grads[W1].shape = (784, 100)\n", "grads[b1].shape = (100,)\n", "grads[W2].shape = (100, 10)\n", "grads[b2].shape = (10,)\n" ] } ], "source": [ "net = TwoLayerNet(input_size=784, hidden_size=100, output_size=10)\n", "\n", "print(\"{0}.shape = {1}\".format('W1', net.params['W1'].shape))\n", "print(\"{0}.shape = {1}\".format('b1', net.params['b1'].shape))\n", "print(\"{0}.shape = {1}\".format('W2', net.params['W2'].shape))\n", "print(\"{0}.shape = {1}\".format('b2', net.params['b2'].shape))\n", "\n", "\n", "# ダミーの入力データ\n", "x = np.random.rand(100, 784)\n", "\n", "# ダミーの正解ラベル\n", "t = np.random.rand(100, 10)\n", "\n", "# 勾配を計算\n", "grads = net.numerical_gradient(x, t)\n", "\n", "print(\"{0}.shape = {1}\".format('grads[W1]', grads['W1'].shape))\n", "print(\"{0}.shape = {1}\".format('grads[b1]', grads['b1'].shape))\n", "print(\"{0}.shape = {1}\".format('grads[W2]', grads['W2'].shape))\n", "print(\"{0}.shape = {1}\".format('grads[b2]', grads['b2'].shape))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ミニバッチ学習の実装" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "train acc, test acc | 0.18345, 0.1865\n", "train acc, test acc | 0.7884666666666666, 0.7935\n", "train acc, test acc | 0.8772833333333333, 0.881\n", "train acc, test acc | 0.8993, 0.9007\n", "train acc, test acc | 0.9088666666666667, 0.9115\n", "train acc, test acc | 0.9153833333333333, 0.9174\n", "train acc, test acc | 0.9207166666666666, 0.9226\n", "train acc, test acc | 0.9244, 0.9265\n", "train acc, test acc | 0.9281, 0.9301\n", "train acc, test acc | 0.9309, 0.9333\n", "train acc, test acc | 0.93425, 0.9352\n", "train acc, test acc | 0.9372, 0.9355\n", "train acc, test acc | 0.9397666666666666, 0.9385\n", "train acc, test acc | 0.94195, 0.9407\n", "train acc, test acc | 0.94375, 0.9413\n", "train acc, test acc | 0.9455, 0.9445\n", "train acc, test acc | 0.9468, 0.9455\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import sys, os\n", "sys.path.append(os.path.abspath(os.path.join('..', 'sample')))\n", "sys.path.append(os.path.abspath(os.path.join('..', 'sample', 'ch04')))\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from dataset.mnist import load_mnist\n", "from two_layer_net import TwoLayerNet\n", "\n", "# データの読み込み\n", "(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)\n", "\n", "network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)\n", "\n", "iters_num = 10000 # 繰り返しの回数を適宜設定する\n", "train_size = x_train.shape[0]\n", "batch_size = 100\n", "learning_rate = 0.1\n", "\n", "train_loss_list = [] # 損失関数の結果\n", "train_acc_list = [] # 学習データの認識精度\n", "test_acc_list = [] # テストデータの認識精度\n", "\n", "iter_per_epoch = max(train_size / batch_size, 1)\n", "\n", "for i in range(iters_num):\n", " batch_mask = np.random.choice(train_size, batch_size)\n", " x_batch = x_train[batch_mask]\n", " t_batch = t_train[batch_mask]\n", " \n", " # 勾配の計算\n", " #grad = network.numerical_gradient(x_batch, t_batch)\n", " grad = network.gradient(x_batch, t_batch)\n", " \n", " # パラメータの更新\n", " for key in ('W1', 'b1', 'W2', 'b2'):\n", " network.params[key] -= learning_rate * grad[key]\n", " \n", " # 学習経過を損失関数の結果として記録\n", " loss = network.loss(x_batch, t_batch)\n", " train_loss_list.append(loss)\n", " \n", " # 1エポック(訓練データをすべて使い切った回数)ごとに認識精度を計算\n", " if i % iter_per_epoch == 0:\n", " train_acc = network.accuracy(x_train, t_train)\n", " test_acc = network.accuracy(x_test, t_test)\n", " train_acc_list.append(train_acc)\n", " test_acc_list.append(test_acc)\n", " print(\"train acc, test acc | \" + str(train_acc) + \", \" + str(test_acc))\n", "\n", "# グラフの描画\n", "markers = {'train': 'o', 'test': 's'}\n", "x = np.arange(len(train_acc_list))\n", "plt.plot(x, train_acc_list, label='train acc')\n", "plt.plot(x, test_acc_list, label='test acc', linestyle='--')\n", "plt.xlabel(\"epochs\")\n", "plt.ylabel(\"accuracy\")\n", "plt.ylim(0, 1.0)\n", "plt.legend(loc='lower right')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 訓練データとテストデータの認識精度が向上しているのがわかる\n", "* 二つの認識精度に差がない = 過学習が起きていない" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }