{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Car Price"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2021-09-17T23:20:54.071563Z",
"start_time": "2021-09-17T23:20:41.130582Z"
},
"id": "cS-op1_XQzR4"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "zT5EsjTLU71z"
},
"outputs": [],
"source": [
"df_test= pd.read_csv(\"test_car_details.csv\")\n",
"df_train = pd.read_csv(\"train_car_details.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "YMjgguNTjtxK",
"outputId": "00a9fc66-1681-4294-ae9c-22f277486ef7"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Id', 'name', 'year', 'selling_price', 'km_driven', 'fuel',\n",
" 'seller_type', 'transmission', 'owner', 'mileage', 'engine',\n",
" 'max_power', 'torque', 'seats'],\n",
" dtype='object')\n",
"Index(['Id', 'name', 'year', 'km_driven', 'fuel', 'seller_type',\n",
" 'transmission', 'owner', 'mileage', 'engine', 'max_power', 'torque',\n",
" 'seats'],\n",
" dtype='object')\n"
]
}
],
"source": [
"print(df_train.columns)\n",
"print(df_test.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nIMoUKctL0xx"
},
"source": [
"## Objetivo"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Rvi3gtTaSqdU"
},
"source": [
"Um dos problemas que ocorrem na OLX da Índia, pela baixa volumetria de dados, é a empresa não conseguir estimar um preço de venda para o carro do seu cliente baseado em algumas características do veículo. O objetivo é estimar tal valor a partir de dados do concorrente (CarDekho)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nKnbPT9VUKmw"
},
"source": [
"## Análise qualitativa e quantitativa dos dados"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 289
},
"id": "vSgDhqnuUaxY",
"outputId": "147325c1-0008-4356-c041-f79d79c9acd2"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Id | \n",
" name | \n",
" year | \n",
" selling_price | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" torque | \n",
" seats | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Hyundai Santro GLS I - Euro I | \n",
" 1999 | \n",
" 80000 | \n",
" 110000 | \n",
" Petrol | \n",
" Individual | \n",
" Manual | \n",
" Second Owner | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Maruti Ertiga VDI | \n",
" 2012 | \n",
" 459999 | \n",
" 87000 | \n",
" Diesel | \n",
" Individual | \n",
" Manual | \n",
" First Owner | \n",
" 20.77 kmpl | \n",
" 1248 CC | \n",
" 88.76 bhp | \n",
" 200Nm@ 1750rpm | \n",
" 7.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" BMW 3 Series 320d Luxury Line | \n",
" 2010 | \n",
" 1100000 | \n",
" 102000 | \n",
" Diesel | \n",
" Dealer | \n",
" Automatic | \n",
" First Owner | \n",
" 19.62 kmpl | \n",
" 1995 CC | \n",
" 187.74 bhp | \n",
" 400Nm@ 1750-2500rpm | \n",
" 5.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Tata New Safari DICOR 2.2 EX 4x2 | \n",
" 2009 | \n",
" 229999 | \n",
" 212000 | \n",
" Diesel | \n",
" Individual | \n",
" Manual | \n",
" Third Owner | \n",
" 11.57 kmpl | \n",
" 2179 CC | \n",
" 138.1 bhp | \n",
" 320Nm@ 1700-2700rpm | \n",
" 7.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Toyota Fortuner 3.0 Diesel | \n",
" 2010 | \n",
" 800000 | \n",
" 125000 | \n",
" Diesel | \n",
" Individual | \n",
" Manual | \n",
" Second Owner | \n",
" 11.5 kmpl | \n",
" 2982 CC | \n",
" 171 bhp | \n",
" 343Nm@ 1400-3400rpm | \n",
" 7.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Id name year selling_price km_driven \\\n",
"0 1 Hyundai Santro GLS I - Euro I 1999 80000 110000 \n",
"1 2 Maruti Ertiga VDI 2012 459999 87000 \n",
"2 3 BMW 3 Series 320d Luxury Line 2010 1100000 102000 \n",
"3 4 Tata New Safari DICOR 2.2 EX 4x2 2009 229999 212000 \n",
"4 5 Toyota Fortuner 3.0 Diesel 2010 800000 125000 \n",
"\n",
" fuel seller_type transmission owner mileage engine \\\n",
"0 Petrol Individual Manual Second Owner NaN NaN \n",
"1 Diesel Individual Manual First Owner 20.77 kmpl 1248 CC \n",
"2 Diesel Dealer Automatic First Owner 19.62 kmpl 1995 CC \n",
"3 Diesel Individual Manual Third Owner 11.57 kmpl 2179 CC \n",
"4 Diesel Individual Manual Second Owner 11.5 kmpl 2982 CC \n",
"\n",
" max_power torque seats \n",
"0 NaN NaN NaN \n",
"1 88.76 bhp 200Nm@ 1750rpm 7.0 \n",
"2 187.74 bhp 400Nm@ 1750-2500rpm 5.0 \n",
"3 138.1 bhp 320Nm@ 1700-2700rpm 7.0 \n",
"4 171 bhp 343Nm@ 1400-3400rpm 7.0 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Verificando as variáveis\n",
"df_train.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "DXkxkyyDa2aj"
},
"outputs": [],
"source": [
"df_train = df_train.iloc[:,1:] #retirando a coluna de id\n",
"\n",
"Id = df_test.Id\n",
"df_test = df_test.iloc[:,1:] #retirando a coluna de id"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "O8lv_YnOO_6P"
},
"source": [
"### Analise de dados nulos"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zP8597KtW7pX",
"outputId": "43539f2c-5b17-4da0-885d-678841f07fa3"
},
"outputs": [
{
"data": {
"text/plain": [
"name 0\n",
"year 0\n",
"selling_price 0\n",
"km_driven 0\n",
"fuel 0\n",
"seller_type 0\n",
"transmission 0\n",
"owner 0\n",
"mileage 157\n",
"engine 157\n",
"max_power 151\n",
"torque 158\n",
"seats 157\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Verificado a quantidade de NaN por atributo\n",
"df_train.isna().sum()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EtZj5WU1PME8",
"outputId": "3f68afd7-1bf8-477e-cc13-e6a615c3b00d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"name 0.000000\n",
"year 0.000000\n",
"selling_price 0.000000\n",
"km_driven 0.000000\n",
"fuel 0.000000\n",
"seller_type 0.000000\n",
"transmission 0.000000\n",
"owner 0.000000\n",
"mileage 2.759712\n",
"engine 2.759712\n",
"max_power 2.654245\n",
"torque 2.777290\n",
"seats 2.759712\n",
"dtype: float64\n"
]
}
],
"source": [
"#Porcentagem de nan por atributo\n",
"print(100*df_train.isna().sum()/len(df_train))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DghC0QGiq_np",
"outputId": "15cd8e5a-73a1-45d9-aeab-a17a36e4ea81"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"name 0.0\n",
"year 0.0\n",
"km_driven 0.0\n",
"fuel 0.0\n",
"seller_type 0.0\n",
"transmission 0.0\n",
"owner 0.0\n",
"mileage 0.0\n",
"engine 0.0\n",
"max_power 0.0\n",
"torque 0.0\n",
"seats 0.0\n",
"dtype: float64\n"
]
}
],
"source": [
"#Porcentagem de nan por atributo no test\n",
"print(100*df_test.isna().sum()/len(df_test))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cPeACfqLXdCL"
},
"source": [
"Pelo fatos dos NaN's estar majoritariamente presente nas mesmas linhas e por representar um baixo volume em relação ao total (menos de 3%), tais linhas serão retiradas."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SLNlNI1vYz3g",
"outputId": "e32eb487-fbe8-4c06-df6f-2c5eb59c239a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Quantidade de linhas totais: 5689\n",
"Quantidade de linhas após retirada dos NaNs: 5531\n"
]
}
],
"source": [
"print(f'Quantidade de linhas totais: ', df_train.shape[0])\n",
"# Remove as linhas com NaN\n",
"df_train = df_train.dropna(axis=0) \n",
"print(f'Quantidade de linhas após retirada dos NaNs: ', df_train.shape[0])\n",
"#Aproximadamente 3% de linhas eliminadas"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2MhTDUWnQlvV"
},
"source": [
"### Categoria das variaveis"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jLjrd98qlWMR"
},
"source": [
"Os dados são compostos pelas variáveis:\n",
"\n",
"\n",
"* Variaveis quantitativas discreta:
\n",
" * Ano de fabricacao do carro (year)\n",
" * Qtd de Km dirigidos (km_driven)\n",
" * Potência máxima do motor (max_power)\n",
" * Qtd de acentos (seats)\n",
"* Variaveis quantitativas continuas:\n",
" * Quilometragem por litro (mileage)\n",
" * Potencia do motor (engine)\n",
" * Preço de venda (selling_price) **Valor a ser predito**\n",
"* Variaveis qualitativas nominais:\n",
" * nome do carro (name)\n",
" * tipo de combustivel utilizado (fuel)\n",
" * tipo de vendendor (seller_type)\n",
" * transmissao (transmission)\n",
" * Torque: responsável pela capacidade do motor produzir força motriz, ou seja, o movimento giratório\n",
"* Variaveis qualitativas ordinais:\n",
" * Quantos donos ja possuiram o carro (owner)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qX0b_8ToVmFq"
},
"source": [
"Nota-se que 4 variáveis são numéricas, mas é necessário uma tratativa para retirar as strings que representam a unidade de medida. Ao todo, considerando as variáveis que precisam ser tratadas, há 7 variáveis numéricas e 6 variáveis categóricas qualitativas."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1E4O0qOfncSS"
},
"source": [
"### Retirando a palavra owner da coluna owner, retirando a unidade de medida de mileage, engine e max power e retirando a segunda unidade de medida utilizada no torque (rpm) e deixando apenas a unidade Nm"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "3PJ68h7Jp0SC"
},
"outputs": [],
"source": [
"df1 = df_train.copy()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "Lyn3uW_sxna0"
},
"outputs": [],
"source": [
"colunas = ['owner', 'mileage', 'engine',\n",
" 'max_power', 'torque']"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "HKyNC-IdyKVT"
},
"outputs": [],
"source": [
"#base de treino\n",
"for i in colunas:\n",
" df1[i] = df1[i].str.split(' ').str[0]\n",
"\n",
"#Na base de test\n",
"for i in colunas:\n",
" df_test[i] = df_test[i].str.split(' ').str[0]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HEWJoQeioxpp"
},
"source": [
"### Retirando a unidade de medida do torque"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "MjhqBrFRcqIJ"
},
"outputs": [],
"source": [
"#Retirando a unidade de medida do torque \n",
"df1['torque'] = df1['torque'].str.replace('Nm@', '', regex=True).replace('nm@', '', regex=True).replace('@', '', regex=True).replace('Nm', '', regex=True).replace('NM', '', regex=True).replace('kgm', '', regex=True)\n",
"\n",
"#no test\n",
"df_test['torque'] = df_test['torque'].str.replace('Nm@', '', regex=True).replace('nm@', '', regex=True).replace('@', '', regex=True).replace('Nm', '', regex=True).replace('NM', '', regex=True).replace('kgm', '', regex=True)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"id": "fLkK6JyRwC4c",
"outputId": "11aeb842-76c6-46cc-d22d-d8eaacfed914"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" year | \n",
" selling_price | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" torque | \n",
" seats | \n",
"
\n",
" \n",
" \n",
" \n",
" 1954 | \n",
" Honda Jazz Select Edition Active | \n",
" 2011 | \n",
" 350000 | \n",
" 80000 | \n",
" Petrol | \n",
" Individual | \n",
" Manual | \n",
" Second | \n",
" 16.0 | \n",
" 1198 | \n",
" 90 | \n",
" 110(11.2) | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name year selling_price km_driven \\\n",
"1954 Honda Jazz Select Edition Active 2011 350000 80000 \n",
"\n",
" fuel seller_type transmission owner mileage engine max_power \\\n",
"1954 Petrol Individual Manual Second 16.0 1198 90 \n",
"\n",
" torque seats \n",
"1954 110(11.2) 5.0 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1[df1['torque'] == '110(11.2)']"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "K4aVGdITrz6A"
},
"source": [
"Nao temos esse problema na base de teste"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D-3Zv5-ZpZoh"
},
"source": [
"#### Tratando a linha com torque (11.2) no treino"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"id": "xte0Egxcyveu"
},
"outputs": [],
"source": [
"#eliminando a linha com toque = 110(11.2)\n",
"df1.drop(df1.loc[df1['torque'] == '110(11.2)'].index, inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rSGU10uWswh3"
},
"source": [
"#### Tratando linha 380(38.7) no torque (base teste)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"id": "xPJSLw2asuuu",
"outputId": "d41e780d-4dab-4faa-9d40-e143cdd31d90"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" year | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" torque | \n",
" seats | \n",
"
\n",
" \n",
" \n",
" \n",
" 885 | \n",
" Ford Endeavour Hurricane Limited Edition | \n",
" 2013 | \n",
" 110000 | \n",
" Diesel | \n",
" Individual | \n",
" Automatic | \n",
" Third | \n",
" 12.8 | \n",
" 2953 | \n",
" 156 | \n",
" 380(38.7) | \n",
" 7.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name year km_driven fuel \\\n",
"885 Ford Endeavour Hurricane Limited Edition 2013 110000 Diesel \n",
"\n",
" seller_type transmission owner mileage engine max_power torque seats \n",
"885 Individual Automatic Third 12.8 2953 156 380(38.7) 7.0 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test[df_test['torque'] == '380(38.7)']"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3fPNDEkcyB1U"
},
"source": [
"Nao podemos eliminar linhas da base de teste\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"id": "7HKaZDOutACC",
"outputId": "1e9ccc6c-c15d-44f8-b01e-2e515d9e7831"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" year | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" torque | \n",
" seats | \n",
"
\n",
" \n",
" \n",
" \n",
" 885 | \n",
" Ford Endeavour Hurricane Limited Edition | \n",
" 2013 | \n",
" 110000 | \n",
" Diesel | \n",
" Individual | \n",
" Automatic | \n",
" Third | \n",
" 12.8 | \n",
" 2953 | \n",
" 156 | \n",
" 380(38.7) | \n",
" 7.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name year km_driven fuel \\\n",
"885 Ford Endeavour Hurricane Limited Edition 2013 110000 Diesel \n",
"\n",
" seller_type transmission owner mileage engine max_power torque seats \n",
"885 Individual Automatic Third 12.8 2953 156 380(38.7) 7.0 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test[df_test['torque'] == '380(38.7)']"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "GMBQG03SzPAW"
},
"outputs": [],
"source": [
"df_test.torque.replace('380(38.7)', '380', inplace= True)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49
},
"id": "MZEIkXmWs9No",
"outputId": "28560560-4a0a-4c39-9ca4-fc3631b8848f"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" year | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" torque | \n",
" seats | \n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [name, year, km_driven, fuel, seller_type, transmission, owner, mileage, engine, max_power, torque, seats]\n",
"Index: []"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test[df_test['torque'] == '380(38.7)']"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dgAw8_SktMk4"
},
"source": [
"#### Analisando nome dos carros"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"id": "CgRJNSmz-tGZ"
},
"outputs": [],
"source": [
"#Colocando a marca e modelo do carro em um dicionario juntamente com sua frequência no conjunto \n",
"import collections\n",
"agrupamento = df1['name']\n",
"counter=collections.Counter(agrupamento)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ytRnpNuj_fhA",
"outputId": "dd1a59f2-3e5f-47d3-e3fb-14dd06f06a61"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Quantidade de modelos de carros distintos: 1706\n",
"A maior quantidade de um único modelo de carro: 92\n"
]
}
],
"source": [
"#Colocando a frequencia em uma lista para poder contar a qtd de itens diferentes\n",
"contador = []\n",
"for i in sorted(counter, key = counter.get, reverse = True):\n",
" contador.append(counter[i])\n",
"print(\"Quantidade de modelos de carros distintos: \",len(contador))\n",
"print(\"A maior quantidade de um único modelo de carro: \",max(contador))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0f3mIjEDdie7"
},
"source": [
"Como o nome dos carros é muito variado, não é interessante estar presente no modelo, porém, a informação da marca do carro pode ser importante, assim como outras informações oriundas da própria base."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mRrCRfsLd18B"
},
"source": [
"## Feature engineering"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sbV_L_k5qi2w"
},
"source": [
"### Criando a feature marca"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"id": "TwQOwSUnnoC_"
},
"outputs": [],
"source": [
"#Criando a coluna marca\n",
"df1['brand'] = df1['name'].str.split(' ').str[0]\n",
"\n",
"#para a base de teste\n",
"df_test['brand'] = df_test['name'].str.split(' ').str[0]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "P7fQUSM60PJP",
"outputId": "65ac56c1-2770-4116-f349-ca9bda84668c"
},
"outputs": [
{
"data": {
"text/plain": [
"array(['Maruti', 'BMW', 'Tata', 'Toyota', 'Hyundai', 'Chevrolet', 'Honda',\n",
" 'Jaguar', 'Renault', 'Mahindra', 'Volkswagen', 'Ford', 'Skoda',\n",
" 'Datsun', 'Fiat', 'Volvo', 'Nissan', 'Mercedes-Benz', 'Kia',\n",
" 'Jeep', 'Audi', 'Isuzu', 'Lexus', 'Land', 'Force', 'Mitsubishi',\n",
" 'Ambassador', 'Daewoo', 'MG', 'Ashok'], dtype=object)"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1['brand'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TX_PMhJb0LBo",
"outputId": "cbd5435f-44f9-44d4-e3fb-6637494942c6"
},
"outputs": [
{
"data": {
"text/plain": [
"array(['Tata', 'Maruti', 'Mahindra', 'Hyundai', 'Volvo', 'Jaguar',\n",
" 'Chevrolet', 'Jeep', 'Honda', 'Toyota', 'Kia', 'Ford', 'Lexus',\n",
" 'Skoda', 'BMW', 'Fiat', 'Renault', 'Nissan', 'Datsun',\n",
" 'Mercedes-Benz', 'Volkswagen', 'Opel', 'Mitsubishi', 'Ambassador',\n",
" 'Audi', 'Land', 'Isuzu', 'Force'], dtype=object)"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_test['brand'].unique()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KVJN1Gybqnn-"
},
"source": [
"### Criando a coluna idade do carro"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"id": "5zSaXZGGoBrr"
},
"outputs": [],
"source": [
"#Criando a coluna idade do carro\n",
"df1['age'] = 2021 - df1.year\n",
"\n",
"#para a base de teste\n",
"df_test['age'] = 2021 - df_test.year"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3_6Au6o-q_U7"
},
"source": [
"### Retirando a coluna name e year"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"id": "hNozwzdSogMy"
},
"outputs": [],
"source": [
"del df1[\"name\"] #retirando a coluna name\n",
"del df1[\"year\"] #retirando a coluna year\n",
"del df1[\"torque\"] #retirando a coluna year ## 0.9705529686331869 (sem torque) gradiente\n",
"#del df1[\"seats\"] #retirando a coluna year ## 0.9719490890165687 (sem seats e torque) gradiente\n",
"#del df1[\"engine\"] #retirando a coluna year ### 0.9723490022839645 (sem engine, seats e torque) gradiente\n",
"\n",
"\n",
"del df_test[\"name\"] #retirando a coluna name\n",
"del df_test[\"year\"] #retirando a coluna year\n",
"del df_test[\"torque\"] #retirando a coluna torque\n",
"#del df_test[\"seats\"] #retirando a coluna year\n",
"#del df_test[\"engine\"] #retirando a coluna year"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bTGXKQpVrKB2"
},
"source": [
"### Adequando o tipo de dado de algumas variáveis"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"id": "TGUv4w3NrE_T"
},
"outputs": [],
"source": [
"#Mudando o tipo de dado de algumas variáveis\n",
"df1['mileage'] = pd.to_numeric(df1['mileage'])\n",
"df1['max_power'] = pd.to_numeric(df1['max_power'])\n",
"df1['engine'] = pd.to_numeric(df1['engine']) \n",
"df1['seats'] = pd.to_numeric(df1['seats']) \n",
"#df1['torque'] = pd.to_numeric(df1['torque']) \n",
"\n",
"#Base de Teste\n",
"df_test['mileage'] = pd.to_numeric(df_test['mileage'])\n",
"df_test['max_power'] = pd.to_numeric(df_test['max_power'])\n",
"df_test['engine'] = pd.to_numeric(df_test['engine'])\n",
"df_test['seats'] = pd.to_numeric(df_test['seats']) \n",
"#df_test['torque'] = pd.to_numeric(df_test['torque']) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aHiZuHmTz4AS"
},
"source": [
"## Transformar as variaveis categoricas para a regressão"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PA4-FryE4wwt"
},
"source": [
"### Substituicao variaveis categoricas por rótulos numéricos\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"id": "0Ee4glR8IbfB"
},
"outputs": [],
"source": [
"from sklearn import preprocessing\n",
"le = preprocessing.LabelEncoder()\n",
"#base de treino\n",
"for i in range(0, len(df1.columns.values)):\n",
" if df1.dtypes[i] == 'O':\n",
" df1.iloc[:, i] = le.fit_transform(df1.iloc[:, i]).astype('str')\n",
"\n",
"#Na base de test\n",
"for i in range(0, len(df_test.columns.values)):\n",
" if df_test.dtypes[i] == 'O':\n",
" df_test.iloc[:, i] = le.fit_transform(df_test.iloc[:, i]).astype('str')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wLSbwCkY8IuW",
"outputId": "fc85f43b-276c-43ee-bc1b-85d831c8b5b4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Int64Index: 5530 entries, 1 to 5688\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 selling_price 5530 non-null int64 \n",
" 1 km_driven 5530 non-null int64 \n",
" 2 fuel 5530 non-null object \n",
" 3 seller_type 5530 non-null object \n",
" 4 transmission 5530 non-null object \n",
" 5 owner 5530 non-null object \n",
" 6 mileage 5530 non-null float64\n",
" 7 engine 5530 non-null int64 \n",
" 8 max_power 5530 non-null float64\n",
" 9 seats 5530 non-null float64\n",
" 10 brand 5530 non-null object \n",
" 11 age 5530 non-null int64 \n",
"dtypes: float64(3), int64(4), object(5)\n",
"memory usage: 561.6+ KB\n"
]
}
],
"source": [
"df1.info()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "hKDzE8yj9I3N",
"outputId": "5e844c57-e8a8-47b4-d118-6da1238d7e24"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" selling_price | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" seats | \n",
" brand | \n",
" age | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 459999 | \n",
" 87000 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 20.77 | \n",
" 1248 | \n",
" 88.76 | \n",
" 7.0 | \n",
" 20 | \n",
" 9 | \n",
"
\n",
" \n",
" 2 | \n",
" 1100000 | \n",
" 102000 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 19.62 | \n",
" 1995 | \n",
" 187.74 | \n",
" 5.0 | \n",
" 3 | \n",
" 11 | \n",
"
\n",
" \n",
" 3 | \n",
" 229999 | \n",
" 212000 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 4 | \n",
" 11.57 | \n",
" 2179 | \n",
" 138.10 | \n",
" 7.0 | \n",
" 26 | \n",
" 12 | \n",
"
\n",
" \n",
" 4 | \n",
" 800000 | \n",
" 125000 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 11.50 | \n",
" 2982 | \n",
" 171.00 | \n",
" 7.0 | \n",
" 27 | \n",
" 11 | \n",
"
\n",
" \n",
" 5 | \n",
" 180000 | \n",
" 25000 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 19.70 | \n",
" 796 | \n",
" 46.30 | \n",
" 5.0 | \n",
" 20 | \n",
" 11 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" selling_price km_driven fuel seller_type transmission owner mileage \\\n",
"1 459999 87000 1 1 1 0 20.77 \n",
"2 1100000 102000 1 0 0 0 19.62 \n",
"3 229999 212000 1 1 1 4 11.57 \n",
"4 800000 125000 1 1 1 2 11.50 \n",
"5 180000 25000 3 1 1 2 19.70 \n",
"\n",
" engine max_power seats brand age \n",
"1 1248 88.76 7.0 20 9 \n",
"2 1995 187.74 5.0 3 11 \n",
"3 2179 138.10 7.0 26 12 \n",
"4 2982 171.00 7.0 27 11 \n",
"5 796 46.30 5.0 20 11 "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R0QjbNF25Ok-"
},
"source": [
"Temos agora todas as variaveis numericas para utilizar na nossa regressao"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 407
},
"id": "v_gj0KH_5R8T",
"outputId": "06b9f4ed-7c7d-4999-c796-7d241102f83e"
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 6))\n",
"# define the mask to set the values in the upper triangle to True\n",
"mask = np.triu(np.ones_like(df1.corr(), dtype=bool))\n",
"heatmap = sns.heatmap(df1.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')\n",
"heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':18}, pad=16);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "We1FtfYS6cmW"
},
"source": [
"Correlacoes muito altas entre variaveis do treino poderiam ser reduntantes, observamos que isso nao ocorre"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VUI9wNF-t52a"
},
"source": [
"### Analise das variaveis quantitativas\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "VUSsXDLnEK9R",
"outputId": "a8c66c0e-c57e-4fe0-bba4-f9a22f92c988"
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEmCAYAAACJXlw1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAxvUlEQVR4nO3de1wU5f4H8M8CAqIZaVxKFLEEFUmUOpW3QhFQQQRveMPUvISXDueYoKBUpJJReky7mJ4stWNmImaG17QM83YsU0FMBJVcQFARubPP7w9/O4d1lmVRYUf5vF8vXy9m5tmZz+z68N258IxKCCFARERUjZmpAxARkfKwOBARkQyLAxERybA4EBGRDIsDERHJsDgQEZEMi0MjcfnyZXTq1AlBQUEICgpCYGAghg8fjuPHj9fL9tzc3FBQUGCwzf79+/Gvf/3rnrYTFRWFNWvW1Lj9P/74A7NmzTK4jpMnT2LBggX3lEMp1qxZg6ioqDq9pqb38H4rKCiAm5ub3mV79+7FO++8U+8ZyHgWpg5ADcfa2hpJSUnS9I4dOzB37lzs2rXLJHn++OMP3Lhxo1634eHhgeXLlxts8+effyInJ6dec5Bh/fr1Q79+/Uwdg6phcWjErl+/Djs7O2n666+/xrp162BmZobHH38c8+fPh7OzMyZMmAB3d3fMmTMHKSkpiIqKwpYtW5CQkAArKyukpaUhPz8fPXv2RExMDJo0aaKznZUrV+L777+Hubk5XFxcMH/+fPz111/YuHEjqqqq8MgjjyAiIkLnNefPn8fChQtx/fp1VFVVYdy4cRg2bFid9/Hw4cOIi4vD9u3bcezYMcTHx0Oj0QAApk6dimeeeQbLly/HzZs3MXfuXCxevFjv++Di4oKCggLMnTsXFy9ehK2tLezs7NChQwfMnDkTXbp0Qb9+/ZCWloaEhAScPXsWX3/9NSoqKnDjxg1MnjwZo0ePxpYtW7Br1y5oNBr89ddfcHBwwIgRI7B+/XpkZmZiwoQJmDhxIoqLi/Hmm28iKysL169fR7NmzZCQkID27dvr7F9FRQXeeecdpKSkoFWrVmjVqhUeeeQRAMDNmzexcOFCpKeno6KiAi+++CLmzJkDC4uau/2iRYtw9uxZfPTRR4iLi4O1tTXS09ORn5+Pvn37wtbWFj/++CPy8vLwzjvv4MUXXzT4/u/atQtLly5F06ZN0aVLF2n+li1bsHnzZpSUlKB58+YIDg7Gzp07ERUVhdDQUPz888+wtLREVVUVXn75Zaxduxb29vY17o+HhwemTJmCX375Bbm5uXj11VcxevToOv9/oWoENQqXLl0SHTt2FIMHDxaDBw8WL7/8snB3dxf79+8XQgiRkpIifHx8RH5+vhBCiG+//VYMGDBAaDQakZOTI3r06CF2794tevfuLY4cOSKEECIyMlIMGTJEFBUVibKyMjFmzBixbt06IYQQrq6uIj8/X2zevFmMHDlS3Lp1SwghxPLly8XEiROln9966y1Z1oqKCjFw4EBx6tQpIYQQhYWFYsCAAeLEiROytpGRkaJXr17Sfmn/abf/66+/ikGDBgkhhAgLCxPbt28XQgiRmpoq3nzzTWlfp0yZUuv7EBERIZYsWSKEECInJ0f07NlTLF++XNrfxMREIYQQRUVFYsSIEaKgoEAIIcSJEyeEp6entD4vLy/x119/iaqqKjFw4EAxc+ZMUVVVJVJTU4WHh4eoqqoSP/zwg4iLi5P2c/78+eLtt9+W7f/atWtFWFiYKCsrE7du3RLBwcEiMjJSCCFEVFSU+PLLL4UQQlRWVorZs2eLVatW6X0PP/vsM/HWW2+J6dOni7KyMmn+8OHDRXl5ucjNzRWurq7S+tauXSsmTJggW1d1eXl5wsvLS5w7d04IIcQnn3wiXF1dpffhueeeEzdv3pR9BmPGjBE//PCDEEKI/fv3i9DQ0Fr3x9XVVfq/98cff4guXbqI0tJSg/nIMB45NCJ3nlZKSUnB9OnTsW3bNvz8888YOHAgWrZsCQAICQnBwoULcfnyZbRp0wZxcXEIDw/HzJkz8dxzz0nrCA4ORrNmzQAAQUFB2Lt3L8aOHSst/+mnnxASEgIbGxsAQFhYGD755BOUl5fXmDMzMxMXL17EvHnzpHmlpaU4c+YMPD09Ze1feeUVTJo0SWeevnPbAwYMwNtvv419+/ahR48e+Mc//iFrY+h9OHDgABITEwEA9vb28Pf313nts88+CwBo1qwZPvnkExw4cACZmZlIS0tDcXGx1M7DwwNPPPEEAMDJyQm9evWCmZkZ2rRpg7KyMpSUlMDf3x9t2rTBunXrkJWVhSNHjqBbt26yvIcOHUJAQAAsLS1haWmJwMBAnD17FsDtazp//PEHNm/eLL2HNVm7di3y8/OxdetWWFpaSvO9vb3RpEkT2NnZwcbGBr179wYAtG3bFtevX69xfQBw/PhxuLq64umnnwYAjBw5Eh988IG03M3NDc2bN5e9btiwYUhMTIS/vz+2bNmCESNGGLU/2tNS7u7uKC8vR3FxMaysrAxmpJqxODRiPXr0QNu2bfHHH39Ip1qqE0KgsrISwO3z8o8//jhOnjyp08bc3FynvZmZ7j0OGo0GKpVKZ1q7zppoTzVVL2RXr16VTpfcrdDQUHh7e+OXX37Bzz//jBUrViA5OVmW907a98HCwgKi2lBkd+6rtgCq1WqMHDkSI0aMgJeXF/z9/fHjjz9K7ar/8gWg9zTPV199hU2bNmHMmDEIDAyEra0tLl++XOs+Vv88NBoN/vWvf+Gpp54CABQWFup8FtU999xz6N69O+bOnYuvv/5aOjVoTFZDqr9fd75W+37dacCAAYiPj8f58+dx9OhRxMfHG7U/2kKgnSc4bNw94d1KjdiFCxeQnZ2NTp06oXfv3tixY4d0h9G3334LW1tbODs74+TJk/jyyy/x7bff4ubNm/jiiy+kdfzwww8oLy9HWVkZEhMT4e3trbON3r1749tvv5W+Oa9btw7PPfccLC0tYW5urrdQuLi46BzlXLlyBQEBATh16tQ97W9oaChSU1MREhKCuLg4FBYWIi8vTyeHoffhpZdekr61Xrt2DXv27NH7y/bUqVNo2bIlwsPD0atXL6kwVFVVGZ314MGDCA4OxvDhw+Hi4oJ9+/bpfX3v3r2xdetWlJWVoaysDDt27JCW9erVC2vXroUQAuXl5Xjttdewfv16vdvr0qULxo4di0ceeQQrVqwwOqchzz33HP7880+kpaUBuH2dwRhWVlYYNGgQoqKi4Ovri6ZNm9Z5f+je8cihESktLUVQUJA0rdFo8Pbbb8PFxQUuLi545ZVXMH78eGg0GrRs2RKffvopiouL8Y9//AMxMTFwcHBAfHw8hg8fLp1asra2xujRo1FYWAg/Pz8MHTpUZ5vDhg3DlStXMHz4cGg0Gjg7OyMhIQEA8MILL2D27NmIi4vD/PnzpddYWlrio48+wsKFC7F69WpUVlbi9ddfh5eX1z3t/+zZs7Fo0SIsW7YMKpUKM2bMgJOTE6qqqrBy5UrMmDEDK1as0Ps+mJmZYe7cuYiJiZG+yT/55JOwtraWbadnz57YvHkz/P39oVKp8Le//Q0tW7ZEVlaW0VknTpyIBQsWSMXI09MT6enpsnahoaG4ePEiAgICpCKmFR0djYULFyIwMBAVFRXo0aMHXn311Rq3qVKpsGjRIgwZMgQvvfSS0Vlr0rJlSyQkJGD27Nlo0qSJzunI2gwfPhzr16/Hm2++Kc2r6/7QvVEJHnvRXYqKikKHDh1k5/sfVhs2bEDnzp3RrVs3lJeXY/To0Zg5c+Z9+UVKpDQ8ciAy0tNPP424uDhoNBpUVFTA39+/0ReG1atX47vvvtO7bNKkSRg8eHADJ6L7hUcOREQkwwvSREQkw+JAREQyLA5ERCTD4kAm8/3332PAgAHw9PSEj48Pjh07Ji07f/48wsLC4OXlhf79+2P37t0AgPLycsybNw/e3t7o1q0bhgwZggMHDhi97m7duun869SpE+Li4mrMuH79eoSEhKBLly6y0U6NzdKQxo0bBw8PD2n//Pz8dJYb2h+tzMxMeHh4YPbs2Trzr1+/junTp8PT0xPe3t6yC9HGrFurps+3OkP/P6gBmGbUDmrsDh48KF5++WVx4sQJUVVVJdRqtVCr1UKI22Mr+fr6in//+9+isrJSpKSkiK5du4qMjAxx69YtsXz5cnHp0iVRVVUl9u3bJzw9PcWlS5eMWnd1t27dEp6entJYUfrs3LlT7N69WyxYsEAas6j662vL0tDGjh0rNm3aVONyQ/ujNWHCBDFq1Cjxz3/+U2d+RESEeP3110VRUZE4evSo6N69u0hPT6/TuoUw/PlqGfsZUv3hkQMZ9PHHHyM2NlaavnHjBtzd3VFWVnZP6/3www8RHh4OT09PmJmZwcHBAQ4ODgCAjIwM5Obm4pVXXoG5uTlefPFFdO/eHUlJSbCxscHMmTPh5OQEMzMzeHt7w8nJCadPnzZq3dXt3LkTLVu2lMZE0sfX1xc+Pj6wtbWVLTMmS00qKiqwdOlS9O3bF+7u7nBzc4Obm1u93/ppaH+A29/WH3nkEdloq8XFxdi1axdef/11NGvWDM8++yz69u2rM8RJbevWMvT5ahn7GVL94d85kEHp6el4/vnnpenU1FS4uLjIBjSbOnVqjQ8O8vLywqeffipNV1VV4dSpU+jbty/69++PsrIy+Pj4YM6cObC2ttY7Jo4QAufOnZPNv3r1KjIzM6XB3Wpbd3WJiYkYMmRIjeMN1dWdWQxZtmwZjh49ig0bNuDRRx9FeHg4mjdvjsjISJ12dXlftd5//30kJCTAxcUFEREROp+fIUVFRVi+fDnWrl0r/WW2VmZmJszMzODi4iLN69ixI44ePWrUuqur7fOty2dI9YfFgQxKT0/HK6+8Ik2npaXpHfFU3y+pmly9ehUVFRVITk7Ghg0bYGFhgfDwcHz88ceIiIhA+/bt0bJlS6xevRqvvPIKDh8+jKNHj8p+yVVUVGD27NkIDg6WBmOrbd1af/31F44ePYqFCxfW8R3RT1+WmhQVFWHdunXYtm2bNDqrr68vfvjhB7Rp00anbV3eV+D2ECFPPfUULC0t8f3332PatGlISkpC27Zta33tsmXLMHToUClTdcXFxbKBDx955BHcunWrTvkA1Pr5GvsZUv3iaSWqUXl5OS5evAhXV1dpXlpaGjp16nRP69V++xs3bhzs7e3RsmVLTJgwQbqY26RJE6xcuRIHDhxAr1698Pnnn8Pf31/ntIJGo8GcOXPQpEkTnXGZalu31tatW+Hl5SX7ZXw3aspSk2PHjqFNmzZo166dNK+wsBCPP/74PWfp2rUrmjdvDktLSwQHB6N79+5GXSRPTU3FoUOHdL4IVGdjY4OioiKdeUVFRdJw7XVR2+dr7GdI9YtHDlSj8+fPw8HBQRoVUwiBI0eOICAgQNb21VdfNXj6Y/Xq1dL0o48+CkdHR4Onczp27Kgz4mZoaCiGDBki5YiOjsbVq1fx2Wef6Tx5zph1A0BSUhImT55ssI0xDGWpSUFBAVq0aKGzjt27dyMsLEzWti7vqz4qlcqooasPHz6M7OxsaVTd4uJiVFVVITg4GImJiWjXrh2qqqqQmZkpFbW0tDSjTqHpY+jzNfYzpPrF4kA1Onv2LPLz83Hx4kXY29vj448/RnZ2Nlq3bi1rW9svqTuFhIRg3bp16N27NywsLPDFF1/g5ZdflpanpaXBxcUFGo0GX331FXJzcxESEgIAiI2Nxfnz5/H555/rPQdd27r/+9//IicnR/awHn0qKytRVVUFjUaDqqoqlJWVwdzcXHo2QW1Z9OnQoQPOnDkjXb9ZsWIFVCoVBg4cKGtbl/e1sLAQv//+O/72t7/B3NwcO3bswLFjx3QemlTT/owcORKDBg2S2v373/9Gdna2NCqqjY0N+vfvj+XLl+Odd95Bamoq9u7di40bNxr9XlVn6PMFav8MqQGY7D4pUrx3331XzJw5U/j6+opevXqJL7/8UvTr10/MmTPnntddXl4uYmNjhZeXl+jRo4eIi4vTeaxjfHy8ePbZZ4Wnp6eYNGmSyMzMFEIIcfnyZeHq6iq6dOkiPD09pX9JSUlGr3v+/Pli9uzZenNNmjRJfPzxx9L08uXLhaurq84/7aNBa8vy6quvij179ujdzkcffSR69uwpevbsKSIjI6XHkt6L/Px8ERISIjw9PYWXl5cYPny4OHjwoE4bQ/tzZ7s7b2W9du2aeO2110TXrl3FSy+9JLZt22b0uu98X2v6fLVq+wyp/nHgParRq6++iuHDh8v+kIqMs2nTJjg6OqJPnz6mjkJUZ7wgTTVKT0+v9c4bqpn2Hn6iBxGPHEivGzduoGfPnjhx4oRRF1mJ6OHC4kBERDI8rURERDIsDkREJMPiQEREMiwOREQkw+JAREQyLA5ERCTD4kBERDIsDkREJMPiQEREMiwOREQkw+JAREQyfNhPDT777DNkZGTozLt27RoA4LHHHqvxde3bt78vTxgjIjIlFocaZGRk4NSZszC3tpXmVZVeBwCor5XrfY12ORHRg47FwQBza1vYOPeTpouz9gKAzrzqtMuJiB50vOZAREQyLA5ERCTD4kBERDIsDkREJMPiQEREMiwOREQkw+JAREQyLA5ERCTD4kBERDIsDkREJMPiQEREMiwOREQkw+JAREQyLA5ERCTD4kBERDIsDkREJMPiQEREMo26OOzbtw/79u0zdQwZpeYiosajUT8mdPfu3QCAvn37mjiJLqXmIqLGo1EfORARkX4sDkREJMPiQEREMiwOREQkw+JAREQyLA5ERCTD4kBERDIsDkREJMPiQEREMiwOREQkw+JAREQyLA5ERCTD4kBERDIsDkREJNOoh+xWqmvXriE7OxuBgYG1tu3Tpw9++uknODs7IysrC08//TT+/PNPaXlkZCS2bt2KkpIS5ObmokWLFsjNzYWTkxMWLVoEIQSWLFmC0NBQLF68GPHx8XBxcZFtp6CgAAsXLkRZWRlyc3Px7rvv6m1Xvf2SJUsQGRkpbSMyMhKPPfZYje20y/TNIyK5+uwrPHJQoOzsbKPb/vTTTwCArKwsANApDADw/vvv4+zZs7h48SJKS0uRm5sLALh8+TI2btyIjRs34syZM3j33XdRXFyMhIQEvdvZuHEj0tPTkZWVhZKSkhrbVW9/5swZnW1s3LjRYDtD84hIrj77CouDwmh/2d8vlZWVNS5LTk7Gnj17IIRAUVERAODixYu4cOGCTruCggLs2bNHZ56+dtXb7927F0II7N69W9rGnj17cO3aNb3ttMv0zSMiufruK436tNL169dRUFCAuXPnypZlZGRAU2lep/VpKkuRkZGhd33GOnXq1F2/tq40Gg2EELL5CQkJWLlypTS9ceNGvUXmznbV22s0GgC6xUmj0WDjxo147bXXZO20y4QQsnna9kT0P/r6z/3sKzxyaOT0FYeLFy/qTO/fv9+odtXba4uCEEJ6bWVlJX788Ue97bTL9M0jIrn67iuN+sjB1tYWtra2WLx4sWzZ3LlzkZqRU6f1mVlYo317B73rM1ZwcLDBU0H3m0qlkv3ib9u2rc70yy+/jOTk5FrbVW+/e/duVFZWQqVSAbhdJCwsLODt7a23nXaZ9lRU9XlEJKev/9xPPHJQmIiIiAbblpmZGSws5N8PZs+erTMdGhpqVLvq7c3Mbv/XsrCwkF5rZmaG0NBQve20y/TNIyK5+u4rLA4K06dPn/u6Pn2/1LX8/f3h4+MDlUqF5s2bA7h9NHDnLaotW7aEj4+Pzjx97aq379evH1QqFfr37y9tw8fHR+d2u+rttMv0zSMiufruKywOCtS6dWuj22qLibOzMwDg6aef1ln+z3/+E25ubmjbti2sra1hb28PAHBycpK+qXfu3BmRkZGwsbExeDTg6uoKZ2dnNG3atMZ21dt37txZZxv6vtnoW2aoPRH9T332FZXQd6WxkdDeVWTomoONcz9pXnHWXgDQmVddcdZedLrHaw615SIiagg8ciAiIhkWByIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWByIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWByIikmnUjwnt37+/qSPopdRcRNR4NOri0LdvX1NH0EupuYio8eBpJSIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWByIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWByIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJp1E+Cq01V6XUUZ+3VmQagM+/O9oBD/QcjIqpnLA41aN++vWzetWuWAIDHHnushlc56H0dEdGDRiWEEKYOQUREysJrDkREJMPiQEREMiwOREQkw+JAREQyLA5ERCTD4kBERDIsDkREJPPA/xFcZWUl1Gq1qWMQET2QHB0dYWEhLwUPfHFQq9Xo16+fqWMQET2Q9u7dCycnJ9n8B/4vpO/lyEGtVmPMmDHYsGEDHB0d73Oy+0PpGZWeD2DG+0Hp+QDlZ1Rqvof2yMHCwkJv1asLR0fHe15HfVN6RqXnA5jxflB6PkD5GZWeT4sXpImISIbFgYiIZFgciIhIplEXhxYtWmDGjBlo0aKFqaPUSOkZlZ4PYMb7Qen5AOVnVHq+Oz3wdysREdH916iPHIiISD8WByIikmk0xeG7777DwIED4evriw0bNsiWp6amIiQkBH5+foiOjkZlZaXiMu7ZswdBQUEYPHgwwsPDcePGDUXl09q/fz/69u3bgMn+p7aMGRkZGDduHAYPHoxJkyYp7j08ffo0hg4disGDB2Pq1KkoLCxs0HxaRUVFCAgIwOXLl2XLlNBXDOUzdT/RMpRRy5R9pVaiEVCr1cLb21tcu3ZN3Lp1SwQGBopz587ptBk0aJA4ceKEEEKIuXPnig0bNigq482bN0XPnj2FWq0WQgixbNkyERcXp5h8Wnl5ecLf3194e3s3WDZjM2o0GuHr6ysOHDgghBDivffeE0uWLFFMPiGEGDVqlNi/f78QQojFixeLDz74oMHyaf32228iICBAuLu7i0uXLsmWm7qvGMpn6n5iTEYtU/YVYzSKI4eUlBS88MILsLW1hY2NDfz8/JCcnCwtz87ORmlpKTw9PQEAISEhOsuVkLGiogKxsbFwcHAAALi5ueHKlSuKyacVExODGTNmNFiu6mrLePr0adjY2KBPnz4AgGnTpmHMmDGKyQcAGo0Gt27dAgCUlJTA2tq6wfJpbdq0CbGxsbC3t5ctU0JfMZTP1P1Ey1BGLVP2FWM88MNnGCM3Nxd2dnbStL29PU6ePFnjcjs7O+Tk5Cgq42OPPYb+/fsDAEpLS7Fq1SqMGzdOMfkA4Msvv0Tnzp3RtWvXBstVXW0ZL168iMcffxzz5s1Damoq2rdvj/nz5ysmHwBERUVh4sSJWLRoEZo2bYpNmzY1WD6thQsX1rhMCX3FUD5T9xMtQxkB0/cVYzSKIweNRgOVSiVNCyF0pmtbroSMWjdv3sSUKVPQsWNHBAcHKyZfeno6du3ahfDw8AbLdKfaMlZWVuLIkSMYNWoUEhMT0aZNG8THxysmX2lpKaKjo7F27VocPHgQo0ePRmRkZIPlM4YS+ooxTNVPjKGEvmKMRlEcHB0dkZeXJ03n5eXpHO7dufzq1asGDwdNkRG4/a1t9OjRcHNzq/WbSUPnS05ORl5eHoYOHYopU6ZIWZWU0c7ODs7OzvDw8AAABAQEyL65mzJfeno6rKys8MwzzwAARo4ciSNHjjRYPmMooa/UxpT9xBhK6CvGaBTFoUePHjh06BAKCgpQUlKCXbt2SeedAaB169awsrLC8ePHAQBJSUk6y5WQsaqqCtOmTcOAAQMQHR3d4N/Wass3a9Ys7Ny5E0lJSVi1ahXs7e3x1VdfKSpjt27dUFBQgLS0NADAvn374O7urph8zs7OUKvVyMjIAHB7nH1tIVMKJfQVQ0zdT4yhhL5ijEZxzcHBwQEREREICwtDRUUFhg0bhmeeeQaTJ0/GrFmz4OHhgYSEBMTExKCoqAju7u4ICwtTVEa1Wo0zZ86gqqoKO3fuBAB06dKlwb4ZGfMempoxGVeuXImYmBiUlJTA0dERS5YsUVS+xYsX4+9//zuEEGjVqhUWLVrUYPkMUVJfMZTP1P3EECX1FWM88MNnaB/2U9MDK4iIqO4e+NNK2seE8jnSRET3zwNfHIiI6P5jcSAiIhkWByIikmFxICIiGRYHAxLWH0fC+uOmjkFE1OB476cBJWUNPxQxEZES8MihDngkQUSNBY8c6qD6kYS2SMwe62WqOERE9YbF4S7xlBMRPcx4WomIiGRYHIiISIbFgYiIZOq9OLz77ruIiooCcPsZuoGBgfD19cXSpUulNqmpqQgJCYGfnx+io6NRWcnz+UREplSvxeHQoUNITEwEcPsRiPPmzcNHH32EHTt24NSpUzhw4AAA4I033sCCBQuwc+dOCCFM8txcIiL6n3orDtevX8fSpUsxbdo0AMDJkyfh7OyMNm3awMLCAoGBgUhOTkZ2djZKS0vh6ekJAAgJCUFycrLedRYWFuLy5cs6/xpqqO7VW0/J5llbmmPlN783yPaJiBpSvd3KumDBAkRERODKlSsAbj/X1c7OTlpub2+PnJwc2Xw7Ozvk5OToXecXX3yBFStW1Fdkg0rL9Z/q4i2tRPQwqpfi8M033+CJJ57Aiy++iC1btgAANBqNzvNchRBQqVQ1ztdn/PjxCA4O1pmnVqsxZsyYetgLIqLGq16Kw44dO5CXl4egoCDcuHEDxcXFyM7Ohrm5udQmLy8P9vb2cHR0RF5enjT/6tWrsLe317veFi1aoEWLFvURmYiIqqmX4vD5559LP2/ZsgVHjhzBW2+9BV9fX2RlZcHJyQnbt2/H0KFD0bp1a1hZWeH48ePw8vJCUlIS+vTpUx+xiIjISA32dw5WVlaIj4/HzJkzMXDgQLRv3x7+/v4AgISEBCxevBj+/v4oLi5GWFhYQ8Wqd7xgTUQPonofWykkJAQhISEAgBdffBHbtm2TtenYsSM2b95c31FMghesiehBxL+QJiIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWh2p42ykR0W0sDtXwtlMiottYHIiISIbFgYiIZFgciIhIhsWBiIhkWByIiEiGxYGIiGRYHIiISMao4jBv3jzZvFmzZt33MEREpAwGn+cQGxuLnJwcHD9+HAUFBdL8yspKXLp0qd7DERGRaRgsDsOGDcO5c+dw9uxZ+Pn5SfPNzc3h6elZ39mIiMhEDBYHDw8PeHh4oEePHnB0dGyoTCaxeuspU0cgIlIMox4TeuXKFbzxxhu4ceMGhBDS/O+++67egjW00nLdcZVYLIioMTOqOCxYsAAhISHo3LkzVCpVfWdShDuLBRFRY2JUcbCwsMCECRPqO8sDLWH9cQDA7LFeJk5CRHTvjLqVtUOHDjh79mx9Z3mglZRVQgjBZ0IQ0UPBqCOHS5cuYejQoXjyySdhZWUlzX+YrjkAgLWlORLWH4e1pfldr4PPhCCih4FRxSEiIqK+cyiG9gigqZVRbw0R0UPJqN+Arq6u9Z2DiIgUxKji8MILL0ClUkEIId2tZGdnh59++qlewxERkWkYVRzS0tKkn8vLy7F9+3ZcuHCh3kIREZFp1XlUVktLS4SEhOCXX36pjzwPPO1Fbe2trUREDyKjjhyuX78u/SyEwKlTp1BYWFhfmR54vGOJiB50db7mAACtWrVCdHR0vQYjIiLTqfM1ByIievgZVRw0Gg3WrFmDn376CZWVlejZsyemTZsGCwv+LQAR0cPIqAvS77//Pn799VeMHz8eEyZMwIkTJ7BkyZJaX7dixQoMGjQIgwYNktqnpKQgMDAQvr6+WLp0qdQ2NTUVISEh8PPzQ3R0NCored6eiMhUjCoOP//8Mz755BP4+PjA19cXH3/8ca1/45CSkoKDBw8iMTERW7duxenTp7F9+3bMmzcPH330EXbs2IFTp07hwIEDAIA33ngDCxYswM6dOyGEwKZNm+5974iI6K4YVRyEEGjSpIk0bWlpqTOtj52dHaKioqS2Tz31FDIzM+Hs7Iw2bdrAwsICgYGBSE5ORnZ2NkpLS6Wny4WEhCA5Ofnu94qIiO6JURcNOnbsiEWLFmHs2LFQqVRYt25drUNqdOjQQfo5MzMTP/zwA8aOHQs7Oztpvr29PXJycpCbm6sz387ODjk5ObJ1FhYWym6hVavVxuwCERHVgVHFITY2Fu+88w5CQ0Oh0WjQu3dvzJ8/36gNnDt3DlOnTsWcOXNgbm6OzMxMaZl2OA6NRqPzEKHqw3RU98UXX2DFihVGbZeIiO6eweJQXl6O+fPnw8fHB/Hx8QCAKVOmwNzcHM2bN6915cePH8esWbMwb948DBo0CEeOHEFeXp60PC8vD/b29nB0dNSZf/XqVdjb28vWN378eAQHB+vMU6vVGDNmTK1ZiIjIeAavOSxfvhxFRUXo3r27NC8uLg6FhYX48MMPDa74ypUrmD59OhISEjBo0CAAQNeuXXHhwgVkZWWhqqoK27dvR58+fdC6dWtYWVnh+PHbQ04kJSWhT58+snW2aNECTk5OOv8cHR3rvNNERGSYwSOH/fv3Y/PmzbC2tpbmOTg4YMmSJRg5cqTB5zysWbMGZWVl0hEHAISGhiI+Ph4zZ85EWVkZXnrpJfj7+wMAEhISEBMTg6KiIri7uyMsLOxe942IiO6SweLQpEkTncKg1bx5c1haWhpccUxMDGJiYvQu27Ztm2xex44dsXnzZoPrJCKihmHwtJKZmRmKiopk84uKivhHakREDzGDxSEgIAAxMTEoLi6W5hUXFyMmJga+vr71Hk4JrC3NsXrrKVPHICJqUAZPK40fPx6xsbHo2bMnOnToAI1Gg/PnzyMwMBDTp09vqIz1KmH9cVhbmhtsU1rOoyQialwMFgczMzPExcVh2rRpOH36NMzMzPDMM8/ovc30QVVSVgkhBJpacRBBIiIto34jtm7dGq1bt67vLEREpBB1fkwoERE9/Fgc6ogXqImoMWBxuAu8QE1EDzsWByIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWByIikmFxICIiGRYHIiKSYXEgIiIZFgciIpJhcSAiIhkWByIikmFxICIiGRYHIiKSYXG4R3z4DxE9jFgc7gM+/IeIHjYWpg7Q2CSsPw5rS3OoVCpMH97V1HGIiPRicahn1pbmSFh/HAAwe6wXSsoqIYSASqUycTIiopqxODSAkjKediKiBwuvORARkQyLAxERyfC0Uj258xbXmm555QVqIlIiHjnUoztvcdV3y2tJWaX0ryYrv/n9vmcjIjKExcFE9B1ZJKw/Lt3ZVL0g8II2ETU0RRWH7777DgMHDoSvry82bNhg6jj17s4jCe1trqu3nmJBICKTUsw1h5ycHCxduhRbtmyBpaUlQkND8fzzz+Ppp582dbQGV1peKR1JWFua6/xs6NrEym9+53ULIrovFFMcUlJS8MILL8DW1hYA4Ofnh+TkZMyYMUNqU1hYiMLCQp3XZWdnAwDUavVdbbey5BoqKs1gVmHx/9Pl0nT1n02x7GZxOcqbmMGqiYX0s0qlwuXLl7FuRyoAYNzATv97f67l1rhMy9AyImp8HB0dYWEhLwUqIYQwQR6ZTz/9FMXFxYiIiAAAfPPNNzh58iTi4uKkNh9++CFWrFhhqohERA+dvXv3wsnJSTZfMUcOGo1GZ0gJfUNMjB8/HsHBwTrzysvLcenSJbRr1w7m5uZ12qZarcaYMWOwYcMGODo63n34eqT0jErPBzDj/aD0fIDyMyo1X01ZFFMcHB0dcezYMWk6Ly8P9vb2Om1atGiBFi1ayF7bvn37e962vsqpJErPqPR8ADPeD0rPByg/o9LzaSnmbqUePXrg0KFDKCgoQElJCXbt2oU+ffqYOhYRUaOkmCMHBwcHREREICwsDBUVFRg2bBieeeYZU8ciImqUFFMcACAwMBCBgYGmjkFE1Ogp5rSSKbRo0QIzZszQex1DKZSeUen5AGa8H5SeD1B+RqXnu5NibmUlIiLlaNRHDkREpB+LAxERyTSa4lDboH6pqakICQmBn58foqOjUVnZ8APf1ZZxz549CAoKwuDBgxEeHo4bN24oKp/W/v370bdv3wZM9j+1ZczIyMC4ceMwePBgTJo0SXHv4enTpzF06FAMHjwYU6dOlQ0X01CKiooQEBCAy5cvy5Ypoa8YymfqfqJlKKOWKftKrUQjoFarhbe3t7h27Zq4deuWCAwMFOfOndNpM2jQIHHixAkhhBBz584VGzZsUFTGmzdvip49ewq1Wi2EEGLZsmUiLi5OMfm08vLyhL+/v/D29m6wbMZm1Gg0wtfXVxw4cEAIIcR7770nlixZoph8QggxatQosX//fiGEEIsXLxYffPBBg+XT+u2330RAQIBwd3cXly5dki03dV8xlM/U/cSYjFqm7CvGaBRHDtUH9bOxsZEG9dPKzs5GaWkpPD09AQAhISE6y5WQsaKiArGxsXBwcAAAuLm54cqVK4rJpxUTE6MzWGJDqi3j6dOnYWNjI/1x5bRp0zBmzBjF5ANuDyNz69YtAEBJSQmsra0bLJ/Wpk2bEBsbKxuhAFBGXzGUz9T9RMtQRi1T9hVjKOrvHOpLbm4u7OzspGl7e3ucPHmyxuV2dnbIyclRVMbHHnsM/fv3BwCUlpZi1apVGDdunGLyAcCXX36Jzp07o2tX0wwbXlvGixcv4vHHH8e8efOQmpqK9u3bY/78+YrJBwBRUVGYOHEiFi1ahKZNm2LTpk0Nlk9r4cKFNS5TQl8xlM/U/UTLUEbA9H3FGI3iyKG2Qf2MGfTP1Bm1bt68iSlTpqBjx46yQQhNmS89PR27du1CeHh4g2W6U20ZKysrceTIEYwaNQqJiYlo06YN4uPjFZOvtLQU0dHRWLt2LQ4ePIjRo0cjMjKywfIZQwl9xRim6ifGUEJfMUajKA6Ojo7Iy8uTpu8c1O/O5VevXjV4OGiKjMDtb22jR4+Gm5tbrd9MGjpfcnIy8vLyMHToUEyZMkXKqqSMdnZ2cHZ2hoeHBwAgICBA9s3dlPnS09NhZWUlDRszcuRIHDlypMHyGUMJfaU2puwnxlBCXzFGoygOtQ3q17p1a1hZWeH48dvPb05KSmrwQf9qy1hVVYVp06ZhwIABiI6ObvBva7XlmzVrFnbu3ImkpCSsWrUK9vb2+OqrrxSVsVu3bigoKEBaWhoAYN++fXB3d1dMPmdnZ6jVamRkZAC4Pc6+tpAphRL6iiGm7ifGUEJfMUajuOZQ06B+kydPxqxZs+Dh4YGEhATExMSgqKgI7u7uCAsLU1RGtVqNM2fOoKqqCjt37gQAdOnSpcG+GRnzHpqaMRlXrlyJmJgYlJSUwNHREUuWLFFUvsWLF+Pvf/87hBBo1aoVFi1a1GD5DFFSXzGUz9T9xBAl9RVjcPgMIiKSaRSnlYiIqG5YHIiISIbFgYiIZFgciIhIhsWBiOgBZczgfsDdDTjJ4kBE9AD6/fffMWrUKGRmZhpsJ4TAa6+9hsmTJ2Pbtm3o1KkTVq1aVev6WRyoUTp8+DACAgLqfTtTp07Fli1b9C4LCgoy2ZDc9ODTN7jf1q1bERwcjKCgIMybNw9lZWV3PeBko/gjOCIlSkpKMnUEeoDd+Yd9586dw6ZNm7Bx40ZYWVnh/fffx5o1a9CuXbu7GnCSxYEavWPHjmH27NkIDQ3Fjz/+iCeeeAIXLlxA06ZNMWXKFKxbtw4XLlyAr68v5s2bZ3BdOTk5iIqKQm5uLp588knk5+dLy7p06YJ+/fohLS0NCQkJGDZsGA4dOoTw8HBMmDABfn5+AID33nsPAPDGG2/gm2++wX/+8x9oNBrY2tpi/vz5eOqppxAVFYXmzZvj7NmzUKvVcHNzw7vvvotmzZrV3xtFinb48GFkZWVhxIgRAG4PX965c2c4OTnhyJEjWL9+PTw8PLBs2TLEx8fXPuikiZ4jQWRSv/76qxg0aJA4dOiQ8PHxEampqeLXX38VnTp1EqdPnxZCCDFp0iQxcuRIUVZWJvLz84W7u7v0EJmahIeHi6VLlwohhMjMzBSenp7i22+/FUII4erqKhITE6W2rq6uIj8/X2zevFlMmTJFCCFEZWWl6NWrl7hw4YI4fPiwGD16tCguLhZCCPHzzz8Lf39/IYQQkZGRUrby8nIxZMgQsXnz5vv5FtEDwtvbW1y6dEmsXbtW58FGRUVF4saNGyIlJUUEBgZK88+dOycGDBhQ63p5zYEaLbVajWnTpsHHxwcdO3YEADg5OaFz584AgLZt2+L555+HpaUlWrZsiWbNmtV6l0dKSgpCQkIA3B5I7/nnn9dZ/uyzz8peM3DgQPz222/Iy8vDwYMH0a5dO7Rr1w779+9HVlYWQkNDERQUhPfeew+FhYW4fv06AKB3796wtLREkyZN4OrqarLHYZIyPP/889i9ezfy8/MhhMCbb76JL7744q4HnORpJWq0zM3NsWrVKoSHh8Pf3x8AYGlpqdPGwqJuXUSlUkFUG67sztfb2NjIXtO0aVP4+flh+/btOHHiBIYPHw7g9rMTgoKC8MYbb0jTubm5ePTRRwFA5ylxd26XGp+OHTtixowZGD9+PDQaDTp16oQpU6bAysrqrgac5JEDNVp2dnbo3r07IiMjMWfOHJSWlt7zOnv37o2vv/4aAPDXX3/h8OHDRr1uxIgRSExMxH//+1/p2kOvXr3w/fffIzc3FwDwn//8B+PHj7/njPRw2bdvH5ycnAAAw4cPx/bt27Fjxw68//77sLKyAgB07doVmzdvxvfff481a9agVatWta6XRw7U6AUHB2Pnzp2Ij4+Hubn5Pa0rNjYWc+fOxYABA+Do6CidrqpNly5dYG5uDn9/f6lD9+rVC5MnT8bEiROhUqnQvHlzrFixQpHPKKCHD4fsJiIiGR45ENVBRkYGIiIi9C5zcXHBsmXLGjYQUT3hkQMREcnwgjQREcmwOBARkQyLAxERybA4EBGRDIsDERHJ/B8FQ/2acnchpQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"c = ['selling_price','km_driven','mileage','engine', 'max_power', 'seats', 'age']\n",
"\n",
"for i in c:\n",
"\n",
" sns.set(style=\"ticks\")\n",
"\n",
" x = df1[i]\n",
" coluna = i\n",
" mu = round(x.mean(),2) # mean of distribution\n",
" sigma = round(x.std(),2) # standard deviation of distribution\n",
"\n",
" f, (ax_box, ax_hist) = plt.subplots(2)\n",
"\n",
" sns.boxplot(x=x, ax=ax_box)\n",
" sns.histplot(x=x, ax=ax_hist)\n",
"\n",
" ax_box.set(yticks=[])\n",
" sns.despine(ax=ax_hist)\n",
" sns.despine(ax=ax_box, left=True)\n",
" ax_box.set_title('Boxplot e Histograma de {}\\n $\\mu={}$, $\\sigma={}$'.format(coluna, mu,sigma))\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1N0OOpfAPInk"
},
"source": [
"Nota se que a variavel que devemos prever, selling price tem uma altissima variancia e quase todas as variaveis quantitativas possuem outliers"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "COvVJeHx7O7J"
},
"source": [
"## Preparacao dos dados"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ix5D0hVs7Sat"
},
"source": [
"antes de fazer a feature selection vamos normalizar e separar o que deve ser predito das variaveis
\n",
"duvida: deve ser normalizado antes da feature selection?"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "ZgSy4H1A6bix",
"outputId": "7fbd4c89-9af6-4192-b1fa-ccfb0408f4d3"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" selling_price | \n",
" km_driven | \n",
" fuel | \n",
" seller_type | \n",
" transmission | \n",
" owner | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" seats | \n",
" brand | \n",
" age | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 459999 | \n",
" 87000 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 20.77 | \n",
" 1248 | \n",
" 88.76 | \n",
" 7.0 | \n",
" 20 | \n",
" 9 | \n",
"
\n",
" \n",
" 2 | \n",
" 1100000 | \n",
" 102000 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 19.62 | \n",
" 1995 | \n",
" 187.74 | \n",
" 5.0 | \n",
" 3 | \n",
" 11 | \n",
"
\n",
" \n",
" 3 | \n",
" 229999 | \n",
" 212000 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 4 | \n",
" 11.57 | \n",
" 2179 | \n",
" 138.10 | \n",
" 7.0 | \n",
" 26 | \n",
" 12 | \n",
"
\n",
" \n",
" 4 | \n",
" 800000 | \n",
" 125000 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 11.50 | \n",
" 2982 | \n",
" 171.00 | \n",
" 7.0 | \n",
" 27 | \n",
" 11 | \n",
"
\n",
" \n",
" 5 | \n",
" 180000 | \n",
" 25000 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 19.70 | \n",
" 796 | \n",
" 46.30 | \n",
" 5.0 | \n",
" 20 | \n",
" 11 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" selling_price km_driven fuel seller_type transmission owner mileage \\\n",
"1 459999 87000 1 1 1 0 20.77 \n",
"2 1100000 102000 1 0 0 0 19.62 \n",
"3 229999 212000 1 1 1 4 11.57 \n",
"4 800000 125000 1 1 1 2 11.50 \n",
"5 180000 25000 3 1 1 2 19.70 \n",
"\n",
" engine max_power seats brand age \n",
"1 1248 88.76 7.0 20 9 \n",
"2 1995 187.74 5.0 3 11 \n",
"3 2179 138.10 7.0 26 12 \n",
"4 2982 171.00 7.0 27 11 \n",
"5 796 46.30 5.0 20 11 "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1.head()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"id": "LQQKL0gmd18L"
},
"outputs": [],
"source": [
"colunas = df1.iloc[:,1:].columns"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_SeReaNcd18L",
"outputId": "f1f6102a-b648-4da9-b871-bfb84d6f7e56"
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['km_driven', 'fuel', 'seller_type', 'transmission', 'owner', 'mileage',\n",
" 'engine', 'max_power', 'seats', 'brand', 'age'],\n",
" dtype='object')"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"colunas"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ext0cWqB8cgt"
},
"source": [
"### Padronizacao"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CZ4C5y6O9aA2"
},
"source": [
"Vamos padronizar a base a ser predita tbm"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IsvhHjXn-o5R"
},
"source": [
"Transformar para formato numpy para nao termos erro na normalizacao"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"id": "ACl7ENAn-n9A"
},
"outputs": [],
"source": [
"data = df1.to_numpy()\n",
"nrow,ncol = df1.shape\n",
"y = data[:,:1]\n",
"X = data[:,1:]"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"id": "iJsZnUw78OMB"
},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"\n",
"scaler_train = StandardScaler()\n",
"#scaler_train = MinMaxScaler()\n",
"X = scaler_train.fit_transform(X)\n",
"\n",
"#Padronizando os precos tbm\n",
"#scaler_train = StandardScaler()\n",
"#scaler_train = MinMaxScaler()\n",
"#y = scaler_train.fit_transform(y)\n",
"\n",
"#Vamos padronizar o teste tbm\n",
"scaler_train = StandardScaler()\n",
"#scaler_train = MinMaxScaler()\n",
"df_test1 = scaler_train.fit_transform(df_test)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NLazgycV_lMt",
"outputId": "a5ce6fec-1ce7-47f3-c70b-0e20bca0c086"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(5530, 1)\n",
"(5530, 11)\n"
]
}
],
"source": [
"print(y.shape)\n",
"print(X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JYaEaiFVABkO"
},
"source": [
"### Separacao treino e teste"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"id": "YXq9T9u6_nPr"
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 4)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JjpYugqyXiqi"
},
"source": [
"## Catboost"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"id": "d7sVQJfTd18O"
},
"outputs": [],
"source": [
"from catboost import CatBoostRegressor\n",
"from sklearn.model_selection import GridSearchCV"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"id": "-09m2Xk4VaE5"
},
"outputs": [],
"source": [
" parameters = {'depth' : [6,8,10],\n",
" 'learning_rate' : [0.01, 0.05, 0.1],\n",
" 'iterations' : [30, 50, 100]\n",
" }\n",
"model_CBR = CatBoostRegressor(logging_level='Silent')\n",
"\n",
"eval_set=[(X_train, y_train), (X_test, y_test)]"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0YdBmrweVXEL",
"outputId": "3972120d-5206-49d7-8b7e-357a8c2b5888"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Melhor modelo: \n",
"Melhor score: 0.9636405747742944\n",
"Wall time: 4min\n"
]
}
],
"source": [
"%%time\n",
"\n",
"grid = GridSearchCV(estimator=model_CBR, param_grid = parameters, cv = 10, n_jobs=-1, scoring='r2')\n",
"grid.fit(X_train, y_train, eval_set=eval_set, early_stopping_rounds=10)\n",
"#grid.fit(X_train, y_train)\n",
"\n",
"\n",
"\n",
"print(\"Melhor modelo: {}\".format(grid.best_estimator_))\n",
"print(\"Melhor score: {}\".format(grid.best_score_))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IU8Un4NLdD1q",
"outputId": "624e751a-7ca1-472a-fd94-52649df8585f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9525834925856915\n"
]
}
],
"source": [
"from sklearn.metrics import r2_score\n",
"y_predict = grid.predict(X_test)\n",
"\n",
"#rmse = np.sqrt(mean_squared_error(y_test,y_linear_pred))\n",
"r2 = r2_score(y_test,y_predict)\n",
"print(r2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LTxJ8eIPd18Q"
},
"source": [
"## Feature Selection"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RAfusSJJd18Q"
},
"source": [
"https://www.analyseup.com/learn-python-for-data-science/python-random-forest-feature-importance-plot.html"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"id": "DB-oGTwUd18R"
},
"outputs": [],
"source": [
"def plot_feature_importance(importance,names,model_type):\n",
" \n",
" #Create arrays from feature importance and feature names\n",
" feature_importance = np.array(importance)\n",
" feature_names = np.array(names)\n",
" \n",
" #Create a DataFrame using a Dictionary\n",
" data={'feature_names':feature_names,'feature_importance':feature_importance}\n",
" fi_df = pd.DataFrame(data)\n",
" \n",
" #Sort the DataFrame in order decreasing feature importance\n",
" fi_df.sort_values(by=['feature_importance'], ascending=False,inplace=True)\n",
" \n",
" #Define size of bar plot\n",
" plt.figure(figsize=(10,8))\n",
" #Plot Searborn bar chart\n",
" sns.barplot(x=fi_df['feature_importance'], y=fi_df['feature_names'])\n",
" #Add chart labels\n",
" plt.title(model_type + 'FEATURE IMPORTANCE')\n",
" plt.xlabel('FEATURE IMPORTANCE')\n",
" plt.ylabel('FEATURE NAMES')"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 518
},
"id": "1MBOGDiYd18R",
"outputId": "f9bd20d3-d7ee-4be8-aa9f-eddcb40308ff"
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_feature_importance(grid.best_estimator_.get_feature_importance(),colunas,'CATBOOST ')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "18Bt_fdad18S"
},
"source": [
"### Vamos tirar as colunas com menos importancia"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"id": "_pPo9ZGVd18S"
},
"outputs": [],
"source": [
"df1.head()\n",
"c = ['owner', 'seats', 'seller_type']\n",
"df2 = df1.drop(labels = c, axis = 1)\n",
"\n",
"df_test2 = df_test.drop(labels = c, axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"id": "-Vtj_hPed18T",
"outputId": "09a8690d-fdd8-470f-eb60-a1db5b23939e"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" selling_price | \n",
" km_driven | \n",
" fuel | \n",
" transmission | \n",
" mileage | \n",
" engine | \n",
" max_power | \n",
" brand | \n",
" age | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 459999 | \n",
" 87000 | \n",
" 1 | \n",
" 1 | \n",
" 20.77 | \n",
" 1248 | \n",
" 88.76 | \n",
" 20 | \n",
" 9 | \n",
"
\n",
" \n",
" 2 | \n",
" 1100000 | \n",
" 102000 | \n",
" 1 | \n",
" 0 | \n",
" 19.62 | \n",
" 1995 | \n",
" 187.74 | \n",
" 3 | \n",
" 11 | \n",
"
\n",
" \n",
" 3 | \n",
" 229999 | \n",
" 212000 | \n",
" 1 | \n",
" 1 | \n",
" 11.57 | \n",
" 2179 | \n",
" 138.10 | \n",
" 26 | \n",
" 12 | \n",
"
\n",
" \n",
" 4 | \n",
" 800000 | \n",
" 125000 | \n",
" 1 | \n",
" 1 | \n",
" 11.50 | \n",
" 2982 | \n",
" 171.00 | \n",
" 27 | \n",
" 11 | \n",
"
\n",
" \n",
" 5 | \n",
" 180000 | \n",
" 25000 | \n",
" 3 | \n",
" 1 | \n",
" 19.70 | \n",
" 796 | \n",
" 46.30 | \n",
" 20 | \n",
" 11 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 5684 | \n",
" 550000 | \n",
" 20000 | \n",
" 3 | \n",
" 1 | \n",
" 18.90 | \n",
" 1197 | \n",
" 82.00 | \n",
" 11 | \n",
" 4 | \n",
"
\n",
" \n",
" 5685 | \n",
" 360000 | \n",
" 81000 | \n",
" 1 | \n",
" 1 | \n",
" 19.01 | \n",
" 1461 | \n",
" 108.45 | \n",
" 24 | \n",
" 8 | \n",
"
\n",
" \n",
" 5686 | \n",
" 310000 | \n",
" 70000 | \n",
" 1 | \n",
" 1 | \n",
" 19.30 | \n",
" 1248 | \n",
" 73.90 | \n",
" 20 | \n",
" 10 | \n",
"
\n",
" \n",
" 5687 | \n",
" 650000 | \n",
" 57000 | \n",
" 1 | \n",
" 1 | \n",
" 23.65 | \n",
" 1248 | \n",
" 88.50 | \n",
" 20 | \n",
" 6 | \n",
"
\n",
" \n",
" 5688 | \n",
" 420000 | \n",
" 90000 | \n",
" 1 | \n",
" 1 | \n",
" 24.40 | \n",
" 1120 | \n",
" 71.01 | \n",
" 11 | \n",
" 7 | \n",
"
\n",
" \n",
"
\n",
"
5530 rows × 9 columns
\n",
"
"
],
"text/plain": [
" selling_price km_driven fuel transmission mileage engine max_power \\\n",
"1 459999 87000 1 1 20.77 1248 88.76 \n",
"2 1100000 102000 1 0 19.62 1995 187.74 \n",
"3 229999 212000 1 1 11.57 2179 138.10 \n",
"4 800000 125000 1 1 11.50 2982 171.00 \n",
"5 180000 25000 3 1 19.70 796 46.30 \n",
"... ... ... ... ... ... ... ... \n",
"5684 550000 20000 3 1 18.90 1197 82.00 \n",
"5685 360000 81000 1 1 19.01 1461 108.45 \n",
"5686 310000 70000 1 1 19.30 1248 73.90 \n",
"5687 650000 57000 1 1 23.65 1248 88.50 \n",
"5688 420000 90000 1 1 24.40 1120 71.01 \n",
"\n",
" brand age \n",
"1 20 9 \n",
"2 3 11 \n",
"3 26 12 \n",
"4 27 11 \n",
"5 20 11 \n",
"... ... ... \n",
"5684 11 4 \n",
"5685 24 8 \n",
"5686 20 10 \n",
"5687 20 6 \n",
"5688 11 7 \n",
"\n",
"[5530 rows x 9 columns]"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"id": "BtCfx2SUd18T"
},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"\n",
"data = df2.to_numpy()\n",
"nrow,ncol = df2.shape\n",
"y = data[:,:1]\n",
"X = data[:,1:]\n",
"\n",
"scaler_train = StandardScaler()\n",
"#scaler_train = MinMaxScaler()\n",
"X = scaler_train.fit_transform(X)\n",
"\n",
"#Vamos padronizar o teste tbm\n",
"scaler_train = StandardScaler()\n",
"#scaler_train = MinMaxScaler()\n",
"df_test2 = scaler_train.fit_transform(df_test2)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"id": "SWexPDOSd18T"
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 4)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"id": "C4brA94xd18U"
},
"outputs": [],
"source": [
"parameters = {'depth' : [6,8,10],\n",
" 'learning_rate' : [0.01, 0.05, 0.1],\n",
" 'iterations' : [30, 50, 100]\n",
" }\n",
"model_CBR = CatBoostRegressor()\n",
"\n",
"eval_set=[(X_train, y_train), (X_test, y_test)]"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_vwzeXXid18U",
"outputId": "1d1613d5-b01b-422e-8fdf-61f836fdf0e2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0:\tlearn: 725099.3360888\ttest: 725099.3360888\ttest1: 725838.2268251\tbest: 725838.2268251 (0)\ttotal: 41.3ms\tremaining: 4.09s\n",
"1:\tlearn: 667962.7502607\ttest: 667962.7502607\ttest1: 670005.4649577\tbest: 670005.4649577 (1)\ttotal: 47.4ms\tremaining: 2.32s\n",
"2:\tlearn: 618026.2240742\ttest: 618026.2240742\ttest1: 621886.9502452\tbest: 621886.9502452 (2)\ttotal: 54.6ms\tremaining: 1.76s\n",
"3:\tlearn: 570637.0627416\ttest: 570637.0627416\ttest1: 578186.2612178\tbest: 578186.2612178 (3)\ttotal: 60.7ms\tremaining: 1.46s\n",
"4:\tlearn: 527755.3300552\ttest: 527755.3300552\ttest1: 536947.9038419\tbest: 536947.9038419 (4)\ttotal: 67.1ms\tremaining: 1.27s\n",
"5:\tlearn: 490277.1990580\ttest: 490277.1990580\ttest1: 502595.3096877\tbest: 502595.3096877 (5)\ttotal: 73.5ms\tremaining: 1.15s\n",
"6:\tlearn: 453374.2521619\ttest: 453374.2521619\ttest1: 467777.7143207\tbest: 467777.7143207 (6)\ttotal: 80.3ms\tremaining: 1.07s\n",
"7:\tlearn: 420792.5779912\ttest: 420792.5779912\ttest1: 437861.6946392\tbest: 437861.6946392 (7)\ttotal: 90.2ms\tremaining: 1.04s\n",
"8:\tlearn: 393043.0440251\ttest: 393043.0440251\ttest1: 411090.6830783\tbest: 411090.6830783 (8)\ttotal: 97.3ms\tremaining: 984ms\n",
"9:\tlearn: 367221.8236830\ttest: 367221.8236830\ttest1: 386919.3392056\tbest: 386919.3392056 (9)\ttotal: 107ms\tremaining: 966ms\n",
"10:\tlearn: 343181.0694443\ttest: 343181.0694443\ttest1: 364792.0635716\tbest: 364792.0635716 (10)\ttotal: 118ms\tremaining: 957ms\n",
"11:\tlearn: 322896.6381826\ttest: 322896.6381826\ttest1: 345040.7307973\tbest: 345040.7307973 (11)\ttotal: 125ms\tremaining: 916ms\n",
"12:\tlearn: 302845.9618204\ttest: 302845.9618204\ttest1: 326860.5565415\tbest: 326860.5565415 (12)\ttotal: 135ms\tremaining: 902ms\n",
"13:\tlearn: 286641.4715800\ttest: 286641.4715800\ttest1: 312219.6312316\tbest: 312219.6312316 (13)\ttotal: 143ms\tremaining: 879ms\n",
"14:\tlearn: 270952.0744893\ttest: 270952.0744893\ttest1: 297170.7323797\tbest: 297170.7323797 (14)\ttotal: 153ms\tremaining: 869ms\n",
"15:\tlearn: 258475.3925918\ttest: 258475.3925918\ttest1: 286340.8212579\tbest: 286340.8212579 (15)\ttotal: 164ms\tremaining: 863ms\n",
"16:\tlearn: 246371.0312132\ttest: 246371.0312132\ttest1: 275079.9896599\tbest: 275079.9896599 (16)\ttotal: 172ms\tremaining: 840ms\n",
"17:\tlearn: 233887.0889817\ttest: 233887.0889817\ttest1: 264771.1551038\tbest: 264771.1551038 (17)\ttotal: 183ms\tremaining: 832ms\n",
"18:\tlearn: 222216.5469460\ttest: 222216.5469460\ttest1: 255136.6229553\tbest: 255136.6229553 (18)\ttotal: 189ms\tremaining: 807ms\n",
"19:\tlearn: 212494.4651008\ttest: 212494.4651008\ttest1: 246835.1105830\tbest: 246835.1105830 (19)\ttotal: 198ms\tremaining: 793ms\n",
"20:\tlearn: 204382.3914991\ttest: 204382.3914991\ttest1: 240594.7171910\tbest: 240594.7171910 (20)\ttotal: 206ms\tremaining: 775ms\n",
"21:\tlearn: 197076.2687288\ttest: 197076.2687288\ttest1: 234047.3517937\tbest: 234047.3517937 (21)\ttotal: 214ms\tremaining: 759ms\n",
"22:\tlearn: 190717.9316182\ttest: 190717.9316182\ttest1: 229118.7773587\tbest: 229118.7773587 (22)\ttotal: 224ms\tremaining: 750ms\n",
"23:\tlearn: 184265.7333922\ttest: 184265.7333922\ttest1: 224184.7455733\tbest: 224184.7455733 (23)\ttotal: 233ms\tremaining: 738ms\n",
"24:\tlearn: 179223.7916875\ttest: 179223.7916875\ttest1: 219812.4763748\tbest: 219812.4763748 (24)\ttotal: 240ms\tremaining: 719ms\n",
"25:\tlearn: 174527.9822117\ttest: 174527.9822117\ttest1: 214888.9196695\tbest: 214888.9196695 (25)\ttotal: 247ms\tremaining: 704ms\n",
"26:\tlearn: 170404.2989815\ttest: 170404.2989815\ttest1: 211053.6780763\tbest: 211053.6780763 (26)\ttotal: 254ms\tremaining: 687ms\n",
"27:\tlearn: 166863.7350093\ttest: 166863.7350093\ttest1: 208784.3367464\tbest: 208784.3367464 (27)\ttotal: 261ms\tremaining: 672ms\n",
"28:\tlearn: 163260.1751759\ttest: 163260.1751759\ttest1: 205808.8396433\tbest: 205808.8396433 (28)\ttotal: 270ms\tremaining: 661ms\n",
"29:\tlearn: 158926.3923587\ttest: 158926.3923587\ttest1: 202478.9809685\tbest: 202478.9809685 (29)\ttotal: 277ms\tremaining: 647ms\n",
"30:\tlearn: 155785.3469715\ttest: 155785.3469715\ttest1: 200400.2700177\tbest: 200400.2700177 (30)\ttotal: 285ms\tremaining: 635ms\n",
"31:\tlearn: 152749.5027848\ttest: 152749.5027848\ttest1: 198254.6641147\tbest: 198254.6641147 (31)\ttotal: 293ms\tremaining: 623ms\n",
"32:\tlearn: 150483.6658493\ttest: 150483.6658493\ttest1: 196047.9345613\tbest: 196047.9345613 (32)\ttotal: 300ms\tremaining: 609ms\n",
"33:\tlearn: 148229.8724341\ttest: 148229.8724341\ttest1: 194514.8600443\tbest: 194514.8600443 (33)\ttotal: 308ms\tremaining: 597ms\n",
"34:\tlearn: 146027.7013355\ttest: 146027.7013355\ttest1: 192266.2622987\tbest: 192266.2622987 (34)\ttotal: 314ms\tremaining: 584ms\n",
"35:\tlearn: 144245.4701191\ttest: 144245.4701191\ttest1: 191552.0168309\tbest: 191552.0168309 (35)\ttotal: 322ms\tremaining: 572ms\n",
"36:\tlearn: 142413.2536117\ttest: 142413.2536117\ttest1: 190117.7821396\tbest: 190117.7821396 (36)\ttotal: 331ms\tremaining: 563ms\n",
"37:\tlearn: 140688.3204281\ttest: 140688.3204281\ttest1: 188455.4328949\tbest: 188455.4328949 (37)\ttotal: 340ms\tremaining: 554ms\n",
"38:\tlearn: 139066.2852420\ttest: 139066.2852420\ttest1: 187131.2301431\tbest: 187131.2301431 (38)\ttotal: 346ms\tremaining: 541ms\n",
"39:\tlearn: 137603.2707736\ttest: 137603.2707736\ttest1: 185678.7582810\tbest: 185678.7582810 (39)\ttotal: 353ms\tremaining: 529ms\n",
"40:\tlearn: 136068.0198434\ttest: 136068.0198434\ttest1: 184205.7321537\tbest: 184205.7321537 (40)\ttotal: 362ms\tremaining: 521ms\n",
"41:\tlearn: 134378.2331538\ttest: 134378.2331538\ttest1: 183290.2839965\tbest: 183290.2839965 (41)\ttotal: 371ms\tremaining: 512ms\n",
"42:\tlearn: 132782.4541325\ttest: 132782.4541325\ttest1: 182169.0617821\tbest: 182169.0617821 (42)\ttotal: 378ms\tremaining: 501ms\n",
"43:\tlearn: 131790.3338328\ttest: 131790.3338328\ttest1: 181371.1566799\tbest: 181371.1566799 (43)\ttotal: 386ms\tremaining: 492ms\n",
"44:\tlearn: 130880.2599549\ttest: 130880.2599549\ttest1: 180683.8795768\tbest: 180683.8795768 (44)\ttotal: 393ms\tremaining: 480ms\n",
"45:\tlearn: 129808.2123748\ttest: 129808.2123748\ttest1: 180036.1907090\tbest: 180036.1907090 (45)\ttotal: 399ms\tremaining: 468ms\n",
"46:\tlearn: 129303.5785381\ttest: 129303.5785381\ttest1: 179738.2600447\tbest: 179738.2600447 (46)\ttotal: 407ms\tremaining: 459ms\n",
"47:\tlearn: 128048.3689670\ttest: 128048.3689670\ttest1: 178556.9712466\tbest: 178556.9712466 (47)\ttotal: 415ms\tremaining: 449ms\n",
"48:\tlearn: 127037.1538879\ttest: 127037.1538879\ttest1: 178171.7057705\tbest: 178171.7057705 (48)\ttotal: 421ms\tremaining: 438ms\n",
"49:\tlearn: 126176.0556530\ttest: 126176.0556530\ttest1: 177706.5702217\tbest: 177706.5702217 (49)\ttotal: 427ms\tremaining: 427ms\n",
"50:\tlearn: 125570.3422714\ttest: 125570.3422714\ttest1: 177407.1423650\tbest: 177407.1423650 (50)\ttotal: 439ms\tremaining: 421ms\n",
"51:\tlearn: 124650.7426271\ttest: 124650.7426271\ttest1: 176796.2265874\tbest: 176796.2265874 (51)\ttotal: 450ms\tremaining: 415ms\n",
"52:\tlearn: 123882.3428112\ttest: 123882.3428112\ttest1: 176625.5441358\tbest: 176625.5441358 (52)\ttotal: 457ms\tremaining: 406ms\n",
"53:\tlearn: 122924.4547779\ttest: 122924.4547779\ttest1: 175859.5014073\tbest: 175859.5014073 (53)\ttotal: 466ms\tremaining: 397ms\n",
"54:\tlearn: 122129.0599356\ttest: 122129.0599356\ttest1: 175180.7398020\tbest: 175180.7398020 (54)\ttotal: 473ms\tremaining: 387ms\n",
"55:\tlearn: 121511.6989019\ttest: 121511.6989019\ttest1: 174846.2723380\tbest: 174846.2723380 (55)\ttotal: 481ms\tremaining: 378ms\n",
"56:\tlearn: 120589.4171994\ttest: 120589.4171994\ttest1: 174736.4785205\tbest: 174736.4785205 (56)\ttotal: 488ms\tremaining: 368ms\n",
"57:\tlearn: 120008.4701474\ttest: 120008.4701474\ttest1: 174249.8404187\tbest: 174249.8404187 (57)\ttotal: 497ms\tremaining: 360ms\n",
"58:\tlearn: 119430.6644494\ttest: 119430.6644494\ttest1: 174265.5090162\tbest: 174249.8404187 (57)\ttotal: 504ms\tremaining: 350ms\n",
"59:\tlearn: 118630.3058181\ttest: 118630.3058181\ttest1: 173976.2184429\tbest: 173976.2184429 (59)\ttotal: 512ms\tremaining: 341ms\n",
"60:\tlearn: 118182.1997115\ttest: 118182.1997115\ttest1: 173876.2073501\tbest: 173876.2073501 (60)\ttotal: 519ms\tremaining: 332ms\n",
"61:\tlearn: 117570.8979635\ttest: 117570.8979635\ttest1: 173793.8082586\tbest: 173793.8082586 (61)\ttotal: 526ms\tremaining: 323ms\n",
"62:\tlearn: 116938.2068202\ttest: 116938.2068202\ttest1: 173418.3256343\tbest: 173418.3256343 (62)\ttotal: 534ms\tremaining: 313ms\n",
"63:\tlearn: 116403.4637366\ttest: 116403.4637366\ttest1: 173307.5139834\tbest: 173307.5139834 (63)\ttotal: 541ms\tremaining: 304ms\n",
"64:\tlearn: 115943.1552822\ttest: 115943.1552822\ttest1: 172843.7260659\tbest: 172843.7260659 (64)\ttotal: 548ms\tremaining: 295ms\n",
"65:\tlearn: 115501.0159642\ttest: 115501.0159642\ttest1: 172739.4278919\tbest: 172739.4278919 (65)\ttotal: 554ms\tremaining: 286ms\n",
"66:\tlearn: 114896.8633806\ttest: 114896.8633806\ttest1: 172469.6259225\tbest: 172469.6259225 (66)\ttotal: 561ms\tremaining: 276ms\n",
"67:\tlearn: 114371.1003208\ttest: 114371.1003208\ttest1: 172089.9094769\tbest: 172089.9094769 (67)\ttotal: 567ms\tremaining: 267ms\n",
"68:\tlearn: 113769.1552686\ttest: 113769.1552686\ttest1: 171975.3847819\tbest: 171975.3847819 (68)\ttotal: 573ms\tremaining: 257ms\n",
"69:\tlearn: 113367.6082125\ttest: 113367.6082125\ttest1: 171941.0547624\tbest: 171941.0547624 (69)\ttotal: 579ms\tremaining: 248ms\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"70:\tlearn: 112807.8812827\ttest: 112807.8812827\ttest1: 171750.1811411\tbest: 171750.1811411 (70)\ttotal: 587ms\tremaining: 240ms\n",
"71:\tlearn: 112486.8281573\ttest: 112486.8281573\ttest1: 171731.0786620\tbest: 171731.0786620 (71)\ttotal: 594ms\tremaining: 231ms\n",
"72:\tlearn: 111984.3409624\ttest: 111984.3409624\ttest1: 171211.1273708\tbest: 171211.1273708 (72)\ttotal: 600ms\tremaining: 222ms\n",
"73:\tlearn: 111271.7262892\ttest: 111271.7262892\ttest1: 171116.4885800\tbest: 171116.4885800 (73)\ttotal: 607ms\tremaining: 213ms\n",
"74:\tlearn: 110412.4742720\ttest: 110412.4742720\ttest1: 170661.2066491\tbest: 170661.2066491 (74)\ttotal: 613ms\tremaining: 204ms\n",
"75:\tlearn: 109833.7743343\ttest: 109833.7743343\ttest1: 170522.0560090\tbest: 170522.0560090 (75)\ttotal: 621ms\tremaining: 196ms\n",
"76:\tlearn: 109327.5292193\ttest: 109327.5292193\ttest1: 170162.1461219\tbest: 170162.1461219 (76)\ttotal: 629ms\tremaining: 188ms\n",
"77:\tlearn: 108938.8797941\ttest: 108938.8797941\ttest1: 170065.9735423\tbest: 170065.9735423 (77)\ttotal: 637ms\tremaining: 180ms\n",
"78:\tlearn: 108377.0521836\ttest: 108377.0521836\ttest1: 169758.9858351\tbest: 169758.9858351 (78)\ttotal: 643ms\tremaining: 171ms\n",
"79:\tlearn: 107920.0212175\ttest: 107920.0212175\ttest1: 169823.8961036\tbest: 169758.9858351 (78)\ttotal: 649ms\tremaining: 162ms\n",
"80:\tlearn: 107605.6619060\ttest: 107605.6619060\ttest1: 169663.5161577\tbest: 169663.5161577 (80)\ttotal: 657ms\tremaining: 154ms\n",
"81:\tlearn: 107217.1526180\ttest: 107217.1526180\ttest1: 169521.3574243\tbest: 169521.3574243 (81)\ttotal: 665ms\tremaining: 146ms\n",
"82:\tlearn: 106703.3143505\ttest: 106703.3143505\ttest1: 169183.0635164\tbest: 169183.0635164 (82)\ttotal: 672ms\tremaining: 138ms\n",
"83:\tlearn: 106394.4399996\ttest: 106394.4399996\ttest1: 168881.9310923\tbest: 168881.9310923 (83)\ttotal: 678ms\tremaining: 129ms\n",
"84:\tlearn: 105989.4218876\ttest: 105989.4218876\ttest1: 168980.7043083\tbest: 168881.9310923 (83)\ttotal: 686ms\tremaining: 121ms\n",
"85:\tlearn: 105583.7281376\ttest: 105583.7281376\ttest1: 168676.7258927\tbest: 168676.7258927 (85)\ttotal: 692ms\tremaining: 113ms\n",
"86:\tlearn: 104871.9827146\ttest: 104871.9827146\ttest1: 168297.8732113\tbest: 168297.8732113 (86)\ttotal: 698ms\tremaining: 104ms\n",
"87:\tlearn: 104424.8228558\ttest: 104424.8228558\ttest1: 168045.4120648\tbest: 168045.4120648 (87)\ttotal: 705ms\tremaining: 96.1ms\n",
"88:\tlearn: 103889.3089230\ttest: 103889.3089230\ttest1: 167956.6055735\tbest: 167956.6055735 (88)\ttotal: 711ms\tremaining: 87.9ms\n",
"89:\tlearn: 103616.4776767\ttest: 103616.4776767\ttest1: 167977.5844983\tbest: 167956.6055735 (88)\ttotal: 719ms\tremaining: 79.8ms\n",
"90:\tlearn: 103253.3979466\ttest: 103253.3979466\ttest1: 167773.6376465\tbest: 167773.6376465 (90)\ttotal: 725ms\tremaining: 71.7ms\n",
"91:\tlearn: 102986.4402999\ttest: 102986.4402999\ttest1: 167711.6607700\tbest: 167711.6607700 (91)\ttotal: 733ms\tremaining: 63.7ms\n",
"92:\tlearn: 102701.8595845\ttest: 102701.8595845\ttest1: 167547.6694351\tbest: 167547.6694351 (92)\ttotal: 740ms\tremaining: 55.7ms\n",
"93:\tlearn: 102343.2845205\ttest: 102343.2845205\ttest1: 167279.3629962\tbest: 167279.3629962 (93)\ttotal: 748ms\tremaining: 47.8ms\n",
"94:\tlearn: 101981.3584052\ttest: 101981.3584052\ttest1: 167225.8302052\tbest: 167225.8302052 (94)\ttotal: 755ms\tremaining: 39.7ms\n",
"95:\tlearn: 101336.8314230\ttest: 101336.8314230\ttest1: 166898.4084543\tbest: 166898.4084543 (95)\ttotal: 763ms\tremaining: 31.8ms\n",
"96:\tlearn: 100996.5880829\ttest: 100996.5880829\ttest1: 166986.0832879\tbest: 166898.4084543 (95)\ttotal: 771ms\tremaining: 23.8ms\n",
"97:\tlearn: 100782.5086486\ttest: 100782.5086486\ttest1: 166796.3737366\tbest: 166796.3737366 (97)\ttotal: 778ms\tremaining: 15.9ms\n",
"98:\tlearn: 100484.1626654\ttest: 100484.1626654\ttest1: 166742.1681268\tbest: 166742.1681268 (98)\ttotal: 785ms\tremaining: 7.93ms\n",
"99:\tlearn: 100220.7995788\ttest: 100220.7995788\ttest1: 166791.4282489\tbest: 166742.1681268 (98)\ttotal: 793ms\tremaining: 0us\n",
"\n",
"bestTest = 166742.1681\n",
"bestIteration = 98\n",
"\n",
"Shrink model to first 99 iterations.\n",
"Melhor modelo: \n",
"Melhor score: 0.9645632608480236\n",
"Wall time: 3min 52s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"grid = GridSearchCV(estimator=model_CBR, param_grid = parameters, cv = 10, n_jobs=-1, scoring='r2')\n",
"grid.fit(X_train, y_train, eval_set=eval_set, early_stopping_rounds=10)\n",
"#grid.fit(X_train, y_train)\n",
"\n",
"y_predict = grid.predict(X_test)\n",
"\n",
"#rmse = np.sqrt(mean_squared_error(y_test,y_linear_pred))\n",
"#r2 = r2_score(y_test,y_linear_pred)\n",
"\n",
"print(\"Melhor modelo: {}\".format(grid.best_estimator_))\n",
"print(\"Melhor score: {}\".format(grid.best_score_))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aqDxFIpQ8LdQ"
},
"source": [
"## Gradient Boosting"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"id": "YWE88gKW7Ysv"
},
"outputs": [],
"source": [
"from sklearn.ensemble import GradientBoostingRegressor\n",
"from sklearn.model_selection import GridSearchCV"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xipv09JM7LhF",
"outputId": "ebdb542a-9b60-489c-8163-6a3188003372"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting 10 folds for each of 1 candidates, totalling 10 fits\n",
"Melhor modelo: GradientBoostingRegressor(learning_rate=0.02, max_depth=4, n_estimators=3000)\n",
"Melhor score: 0.973730974769129\n",
"Wall time: 1min 43s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"parameters = {'max_depth':[4], 'learning_rate':[0.02],\n",
" \"n_estimators\":[3000], \"loss\":[\"ls\"], \n",
" \"criterion\":[\"friedman_mse\"]}\n",
"\n",
"grb_model = GridSearchCV(GradientBoostingRegressor(), parameters,\n",
" cv = 10, scoring = \"r2\", n_jobs = -1, verbose = 3,\n",
" refit = True)\n",
"\n",
"grb_model.fit(X_train, y_train.ravel())\n",
"y_pred_train = grb_model.predict(X_train)\n",
"\n",
"\n",
"print(\"Melhor modelo: {}\".format(grb_model.best_estimator_))\n",
"print(\"Melhor score: {}\".format(grb_model.best_score_))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TJt_gxp4fgZ7"
},
"source": [
"## Submetendo a predicao"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"id": "kVXOOB3TYOnc"
},
"outputs": [],
"source": [
"y_pred = grid.best_estimator_.predict(df_test2)\n",
"y_pred = np.array(y_pred, dtype = int)\n",
"prediction = pd.DataFrame()\n",
"prediction['Id'] = Id\n",
"prediction['selling_price'] = y_pred\n",
"\n",
"prediction.to_csv('catboost2.csv', index = False)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 359
},
"id": "mG3IJCgwdpde",
"outputId": "dc3e1aa3-8ac7-44de-a384-b993ccc3a210"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Id | \n",
" selling_price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 193067 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 556582 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 616575 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1429865 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 554934 | \n",
"
\n",
" \n",
" 5 | \n",
" 6 | \n",
" 457523 | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" 652911 | \n",
"
\n",
" \n",
" 7 | \n",
" 8 | \n",
" 1964215 | \n",
"
\n",
" \n",
" 8 | \n",
" 9 | \n",
" 2390847 | \n",
"
\n",
" \n",
" 9 | \n",
" 10 | \n",
" 512478 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Id selling_price\n",
"0 1 193067\n",
"1 2 556582\n",
"2 3 616575\n",
"3 4 1429865\n",
"4 5 554934\n",
"5 6 457523\n",
"6 7 652911\n",
"7 8 1964215\n",
"8 9 2390847\n",
"9 10 512478"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prediction.head(10)"
]
}
],
"metadata": {
"colab": {
"name": "Car Price Final 2.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": false,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}