{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Affine / Softmax レイヤの実装"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Affineレイヤ\n",
    "\n",
    "バッチ処理などで行列の積を使ったので、その計算を行うAffineレイヤを作成する\n",
    "\n",
    "![Affine](../draw.io/images.drawio-2-Affine.svg)\n",
    "\n",
    "\n",
    "\n",
    "## バッチ版Affineレイヤ\n",
    "\n",
    "バッチ版の場合、入力$X$の要素数が増えたり減ったりする。がバイアス$B$の要素数は固定なので、このバイアス$B$の逆要素をどうやって計算するのかがカギとなる。\n",
    "\n",
    "例えば、入力$x$のデータ数は10個だとしても、バイアス$b$は1個で、結果$y$は10個である。  \n",
    "しかし逆伝播は、入力$x$は10個に戻るが、バイアス$b$は1個に戻らなければならない。\n",
    "\n",
    "まず順伝播の計算から。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 0  0  0]\n",
      " [10 10 10]]\n"
     ]
    }
   ],
   "source": [
    "import sys, os\n",
    "sys.path.append(os.path.abspath(os.path.join('..', 'sample')))\n",
    "import numpy as np\n",
    "\n",
    "X_dot_W = np.array([[0, 0, 0,], [10, 10, 10]])\n",
    "B = np.array([1, 2, 3])\n",
    "\n",
    "print(X_dot_W)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1,  2,  3],\n",
       "       [11, 12, 13]])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_dot_W + B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ここから、逆伝播を考える。\n",
    "\n",
    "バイアス$B$は、入力$X$のそれぞれの要素に対して加算される。そのため、逆伝播の値がバイアスの要素に集約される必要がある。\n",
    "\n",
    "まず、逆伝播による、前のノードの値を$dY$、計算したデータが2個だとすると"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3]\n",
      " [4 5 6]]\n"
     ]
    }
   ],
   "source": [
    "dY = np.array([[1, 2, 3], [4, 5, 6]])\n",
    "print(dY)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "次に、$dY$をバイアス$B$からの逆伝播として、$dY$の結果を集約する。  \n",
    "データは2個だが、そのデータ内のそれぞれの要素の合計を求める。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 7 9]\n"
     ]
    }
   ],
   "source": [
    "dB = np.sum(dY, axis=0)\n",
    "print(dB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Affine:\n",
    "    def __init__(self, W, b):\n",
    "        \"\"\"\n",
    "        重みとバイアスを初期化します\n",
    "        Attributes\n",
    "        ----------\n",
    "        W : numpy.array\n",
    "            重み\n",
    "        b : numpy.array\n",
    "            バイアス\n",
    "        x : numpy.array\n",
    "            順伝播で入力されるx\n",
    "        original_x_shape : numpy.array\n",
    "            xのデータ数\n",
    "        dW : numpy.array\n",
    "            Wからの逆伝搬\n",
    "        db : numpy.array\n",
    "            bからの逆伝播\n",
    "        \"\"\"\n",
    "        self.W =W\n",
    "        self.b = b\n",
    "        \n",
    "        self.x = None\n",
    "        self.original_x_shape = None\n",
    "        # 重み・バイアスパラメータの微分\n",
    "        self.dW = None\n",
    "        self.db = None\n",
    "\n",
    "    def forward(self, x):\n",
    "        \"\"\"\n",
    "        順伝播の計算をします\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        x : numpy.array\n",
    "            入力1\n",
    "\n",
    "        Returns\n",
    "        -------\n",
    "        out : numpy.array\n",
    "            計算結果\n",
    "        \"\"\"\n",
    "        # テンソル対応\n",
    "        self.original_x_shape = x.shape\n",
    "        x = x.reshape(x.shape[0], -1)\n",
    "        self.x = x\n",
    "\n",
    "        out = np.dot(self.x, self.W) + self.b\n",
    "\n",
    "        return out\n",
    "\n",
    "    def backward(self, dout):\n",
    "        \"\"\"\n",
    "        逆伝播の計算をします(乗算)\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        dout : int\n",
    "            逆伝播で上流から伝わってきた微分\n",
    "\n",
    "        Returns\n",
    "        -------\n",
    "        dx : int\n",
    "            入力2の微分\n",
    "        \"\"\"\n",
    "        dx = np.dot(dout, self.W.T)\n",
    "        self.dW = np.dot(self.x.T, dout)\n",
    "        self.db = np.sum(dout, axis=0)\n",
    "        \n",
    "        dx = dx.reshape(*self.original_x_shape)  # 入力データの形状に戻す（テンソル対応）\n",
    "        return dx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `x.reshape(x.shape[0], -1)` の -1?\n",
    "\n",
    "* `reshape`は、次元数を書くとその次元数に合わせて変換した行列を作ってくれる\n",
    "* 何かの次元数と`-1`を組み合わせると、指定してくれた次元数以外の次元を自動的に生成してくれる"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]\n"
     ]
    }
   ],
   "source": [
    "a = np.arange(24) # 0~23の一次元配列\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],\n",
       "       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.reshape([2, -1]) # (2, 12)のshape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "cannot reshape array of size 24 into shape (2,0)",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-8-b6375f3eef96>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0ma\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# (2, 0)のshape…はつくれない\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mValueError\u001b[0m: cannot reshape array of size 24 into shape (2,0)"
     ]
    }
   ],
   "source": [
    "a.reshape([2, 0]) # (2, 0)のshape…はつくれない"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11, 12, 13, 14, 15],\n",
       "       [16, 17, 18, 19, 20, 21, 22, 23]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.reshape([3, -1]) # (3, 8)のshape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Softmax-with-Lossレイヤ\n",
    "\n",
    "Softmax-with-Lossレイヤーは、Softmax関数と交差エントロピー誤差(cross entropy error)の二つが合体したレイヤー。\n",
    "\n",
    "Softmax関数は、出力層で使われる関数で、n個の出力結果の合計が「1.0」になるように正規化してくれます。(3章の話)\n",
    "\n",
    "- ニューラルネットワークの「学習」フェーズで使う\n",
    " - 推論では、結果が高い、低いだけわかればよかった\n",
    " - 学習では、高い、低いの正しいのかを機械的に比較する必要があるので使う\n",
    " \n",
    "\n",
    "一方、交差エントロピー誤差(cross entropy error)は、「教師データが示す答え」と「学習によって出力した答え」を比較して、「学習によって出力した答え」がどれだけ間違っているのかを示すための方法です。(4章)\n",
    "\n",
    "交差エントロピー誤差を出す前に正規化しないと、わけわかんない結果になるよ！たぶん！\n",
    "\n",
    "(計算グラフについては時間がないので省略しまする…大事なところだけど)\n",
    "\n",
    "Softmax関数の逆伝播は、**前の交差エントロピー誤差によって出た結果をキレイに伝えてくれる**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SoftmaxWithLoss:\n",
    "    def __init__(self):\n",
    "        \"\"\"\n",
    "        損失、softmaxの出力、教師データを入れる場所を作ります\n",
    "        Attributes\n",
    "        ----------\n",
    "        loss : numpy.array\n",
    "            損失\n",
    "        y : numpy.array\n",
    "            softmaxの出力\n",
    "        t : numpy.array\n",
    "            教師データの答え\n",
    "        \"\"\"\n",
    "        self.loss = None\n",
    "        self.y = None # softmaxの出力\n",
    "        self.t = None # 教師データ\n",
    "\n",
    "    def forward(self, x, t):\n",
    "        \"\"\"\n",
    "        順伝搬の計算をします\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        x : numpy.array\n",
    "            前のレイヤーから出力した結果\n",
    "        t : numpy.array\n",
    "            教師データの答え\n",
    "        Returns\n",
    "        -------\n",
    "        dx : float\n",
    "            損失\n",
    "        \"\"\"\n",
    "        self.t = t\n",
    "        self.y = softmax(x)\n",
    "        self.loss = cross_entropy_error(self.y, self.t)\n",
    "        \n",
    "        return self.loss\n",
    "\n",
    "    def backward(self, dout=1):\n",
    "        \"\"\"\n",
    "        逆伝播の計算をします\n",
    "        \n",
    "        Parameters\n",
    "        ----------\n",
    "        dout : int\n",
    "            正規化の値(1)\n",
    "\n",
    "        Returns\n",
    "        -------\n",
    "        dx : int\n",
    "            損失から計算した差\n",
    "        \"\"\"\n",
    "        batch_size = self.t.shape[0]\n",
    "        if self.t.size == self.y.size: # 教師データがone-hot-vectorの場合\n",
    "            dx = (self.y - self.t) / batch_size\n",
    "        else:\n",
    "            dx = self.y.copy()\n",
    "            dx[np.arange(batch_size), self.t] -= 1\n",
    "            dx = dx / batch_size\n",
    "        \n",
    "        return dx"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}