{ "cells": [ { "cell_type": "markdown", "source": [ "# Modelo de regressão linear simples\r\n", "\r\n", "Vamos implementar um modelo de regressão linear simples em python" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Modelo\r\n", "$$ \\begin{equation}\r\n", "\\tag{1}\r\n", "Y = β_0 + β1X + ε\r\n", "\\end{equation}$$\r\n", "\r\n", "β0 : intercepto\r\n", "\r\n", "β1 : coeficiente angular\r\n", "\r\n", "X: v. independente, preditora, regressora, explanatória, covariável, feature\r\n", "\r\n", "Y: dependente, resposta\r\n", "\r\n", "ε : variável aleatória da diferença entre o valor observado de y e a reta $(β_0 + β1X)$, erro estatístico\r\n", "\r\n", "Como a equação dada envolve apenas uma variavel regressora, é chamada de **regressão linear simples**" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Notação\r\n", "$$\\overline{X} = \\frac{\\sum_{i=1}^n X_i}{n}; \\overline{Y} = \\frac{\\sum_{i=1}^n Y_i}{n}$$\r\n", "$$S_{XX} = \\sum_{i=1}^n (X_i - \\overline X)^2 = \\sum_{i=1}^n X_i^2 - n\\overline{X}^2$$\r\n", "$$S_{YY} = \\sum_{i=1}^n (Y_i - \\overline Y)^2 = \\sum_{i=1}^n Y_i^2 - n\\overline{Y}^2$$\r\n", "$$S_{XY} = \\sum_{i=1}^n (X_i - \\overline X) (Y_i - \\overline Y) = \\sum_{i=1}^n (X_i Y_i) - n \\overline{XY} = \\sum_{i=1}^n (X_i - \\overline X) Y_i$$" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "O metodo dos mínimos quadrados consiste em minimizar a soma dos quadrados dos resíduos:\r\n", "\r\n", "Derivando com relação a $β_0$ e $β_1$ e igualando a zero encontramos o ponto de minimo:\r\n", "$$\\hat{β_1} = \\frac{S_{XY}}{S_{XX}}$$\r\n", "$$\\hat{β_0} = \\overline Y − \\hat β_1X$$" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Implementando em python" ], "metadata": {} }, { "cell_type": "code", "execution_count": 7, "source": [ "import numpy as np\r\n", "import matplotlib.pyplot as plt\r\n", "from sklearn.linear_model import LinearRegression" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 23, "source": [ "def least_squares2(X,Y):\r\n", " '''\r\n", " X: Vetor com valores de x\r\n", " Y: Vetor com valores de y\r\n", " Retorna os coeficientes m e b da equação\r\n", " '''\r\n", " \r\n", " #Valores para facilitar\r\n", " x_a = sum(X)/len(X)\r\n", " y_a = sum(Y)/len(Y)\r\n", " x2_a = sum(np.power(X,2))/len(X)\r\n", " xy_a = np.dot(X,Y)/len(X)\r\n", " \r\n", " #coeficiente angular (m) e variavel idenpendente b : y = mx + b\r\n", " m = (xy_a - x_a * y_a)/(x2_a - x_a**2)\r\n", " \r\n", " b = (x2_a * y_a - x_a * xy_a)/(x2_a - x_a**2)\r\n", " \r\n", " #vetor com predição de y nos pontos do vetor x\r\n", " X = np.array(X)\r\n", " y_pred = X*m + b\r\n", " \r\n", " #plotando o grafico\r\n", " plt.scatter(X, Y) #plotando pontos \r\n", " plt.plot(X, y_pred, color='red')\r\n", " plt.show()\r\n", "\r\n", " return m,b" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Conjunto de pontos para a regressão" ], "metadata": {} }, { "cell_type": "code", "execution_count": 24, "source": [ "x = np.array([58,105,88,118,117,137,157,169,149,202])\r\n", "y = np.array([2,6,8,8,12,16,20,20,22,26])" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 33, "source": [ "m,b = least_squares2(x,y)\r\n", "print('Coeficiente m:{}'.format(m))\r\n", "print('Coeficient b:{}'.format(b))" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/svg+xml": "\r\n\r\n\r\n \r\n \r\n \r\n \r\n 2021-08-27T20:42:32.857194\r\n image/svg+xml\r\n \r\n \r\n Matplotlib v3.4.2, https://matplotlib.org/\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n\r\n", "image/png": "" }, "metadata": { "needs_background": "light" } }, { "output_type": "stream", "name": "stdout", "text": [ "Coeficiente m:0.18054672600127145\n", "Coeficient b:-9.47107438016529\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Utilizando a biblioteca do sklearn" ], "metadata": {} }, { "cell_type": "code", "execution_count": 35, "source": [ "reg = LinearRegression().fit(x.reshape(-1,1),y)\r\n", "print('Coeficiente m:{}'.format(reg.coef_[0]))\r\n", "print('Coeficient b:{}'.format(reg.intercept_))" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Coeficiente m:0.1805467260012714\n", "Coeficient b:-9.47107438016528\n" ] } ], "metadata": {} } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python", "version": "3.9.5", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python3", "display_name": "Python 3.9.5 64-bit" }, "interpreter": { "hash": "69bb2fd2ce50414ec0a459f55a2c459ce8f6b88c78656cb9bb8bcc7015cc73a0" } }, "nbformat": 4, "nbformat_minor": 2 }