We have 3 different sources of data:
Our sensor data: that has the Indoor Air Quality and Indoor Environmental Data.
SINAICA: Outdoor Air Quality Monitoring Data from the Government.
OpenWeatherData: Outdoor Environmental Data.
We need it to be available that data to the models we plan to train. In the following sections this process is detailed.
import os, gzip, json, re, stan, dplython, asyncio, nest_asyncio
#nest_asyncio.apply()
import warnings
from matplotlib import pyplot as plt
warnings.filterwarnings("ignore", category=DeprecationWarning)
from dplython import (DplyFrame, X, diamonds, select, sift,
sample_n, sample_frac, head, arrange, mutate, group_by,
summarize, DelayFunction, dfilter)
import seaborn as sns
from plotnine import *
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import (mean_squared_error,
r2_score,
mean_absolute_error)
import pandas as pd
import numpy as np
from IPython.display import display, Markdown, update_display
DEBUG=True
if DEBUG:
display(Markdown("Default Values:"))
display(Markdown(f"* pandas max_columns={pd.options.display.max_columns}\n" +
f"* pandas max_rows={pd.options.display.max_rows}"))
pd.options.display.max_columns=35
#pd.options.display.max_rows=100
display(Markdown("New Values:"))
display(Markdown(f"* pandas max_columns={pd.options.display.max_columns}\n" +
f"* pandas max_rows={pd.options.display.max_rows}"))
/home/jaa6766/.conda/envs/cuda/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject /home/jaa6766/.conda/envs/cuda/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject /home/jaa6766/.conda/envs/cuda/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject /home/jaa6766/.conda/envs/cuda/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
Default Values:
New Values:
airdata = pd.read_pickle('data/airdata/air.pickle')
airdata["year"] = [dt.year for dt in airdata["datetime"]]
airdata["month"] = [dt.month for dt in airdata["datetime"]]
airdata["day"] = [dt.day for dt in airdata["datetime"]]
airdata["hour"] = [dt.hour for dt in airdata["datetime"]]
airdata["minute"] = [dt.minute for dt in airdata["datetime"]]
airdata["second"] = [dt.second for dt in airdata["datetime"]]
airdata.set_index("datetime", inplace=True)
airdata.sort_index(inplace=True)
airdata
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||
2021-02-12 06:04:09.089621067 | 21.54 | 777.41 | 43.93 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 9 |
2021-02-12 06:04:12.087778807 | 21.56 | 777.41 | 43.89 | 152702 | 35.6 | 1 | 2021 | 2 | 12 | 6 | 4 | 12 |
2021-02-12 06:04:15.072475433 | 21.53 | 777.41 | 43.97 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 15 |
2021-02-12 06:04:18.070170164 | 21.51 | 777.41 | 44.03 | 151464 | 38.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 18 |
2021-02-12 06:04:21.061994791 | 21.51 | 777.41 | 44.05 | 152425 | 36.9 | 1 | 2021 | 2 | 12 | 6 | 4 | 21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:20:38.889113188 | 25.84 | 782.96 | 56.64 | 928867 | 130.8 | 1 | 2021 | 9 | 18 | 1 | 20 | 38 |
2021-09-18 01:20:41.882042885 | 25.83 | 782.94 | 56.66 | 923130 | 131.5 | 1 | 2021 | 9 | 18 | 1 | 20 | 41 |
2021-09-18 01:20:44.877856970 | 25.83 | 782.94 | 56.63 | 925034 | 131.3 | 1 | 2021 | 9 | 18 | 1 | 20 | 44 |
2021-09-18 01:20:47.872255564 | 25.83 | 782.94 | 56.62 | 923130 | 131.9 | 1 | 2021 | 9 | 18 | 1 | 20 | 47 |
2021-09-18 01:20:50.866486311 | 25.83 | 782.96 | 56.63 | 925034 | 131.6 | 1 | 2021 | 9 | 18 | 1 | 20 | 50 |
6285103 rows × 12 columns
sinaica = pd.read_pickle('data/sinaica2/dsinaica.pickle')
sinaica.rename(mapper={
"Merced_CO": "CO",
"Camarones_NO": "NO",
"Merced_NO2": "NO2",
"Merced_NOx": "NOx",
"Merced_O3": "O3",
"Merced_PM10": "PM10",
"Merced_PM2.5": "PM2.5",
"Merced_SO2": "SO2"
}, axis=1, inplace=True)
sinaica.drop(columns=[col
for col in sinaica.columns
if re.match('^(Camaron|Gustavo|Miguel|Tlalne|FES|Merced|La Pre)', col)],
inplace=True
)
sinaica["year"] = [dt.year for dt in sinaica["Fecha"]]
sinaica["month"] = [dt.month for dt in sinaica["Fecha"]]
sinaica["day"] = [dt.day for dt in sinaica["Fecha"]]
sinaica["hour"] = [dt.hour for dt in sinaica["Fecha"]]
sinaica["minute"] = [dt.minute for dt in sinaica["Fecha"]]
sinaica.set_index("Fecha", inplace=True)
sinaica.sort_index(inplace=True)
sinaica = sinaica.copy()
sinaica
NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | year | month | day | hour | minute | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fecha | |||||||||||||
2021-01-01 00:00:00 | 0.006000 | 1.000000 | 0.032000 | 0.036000 | 0.006000 | 31.000000 | 19.000000 | 0.003000 | 2021 | 1 | 1 | 0 | 0 |
2021-01-01 01:00:00 | 0.021000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021 | 1 | 1 | 1 | 0 |
2021-01-01 02:00:00 | 0.013000 | 1.100000 | 0.032000 | 0.039000 | 0.004000 | 37.000000 | 24.000000 | 0.003000 | 2021 | 1 | 1 | 2 | 0 |
2021-01-01 03:00:00 | 0.031000 | 1.200000 | 0.033000 | 0.043000 | 0.001000 | 49.000000 | 39.000000 | 0.003000 | 2021 | 1 | 1 | 3 | 0 |
2021-01-01 04:00:00 | 0.005000 | 1.200000 | 0.031000 | 0.039000 | 0.002000 | 80.000000 | 65.000000 | 0.003000 | 2021 | 1 | 1 | 4 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-10-04 00:00:00 | 0.008292 | 0.545833 | 0.019083 | 0.026875 | 0.015833 | 11.826087 | 7.913043 | 0.000750 | 2021 | 10 | 4 | 0 | 0 |
2021-10-05 00:00:00 | 0.010000 | 0.563158 | 0.019722 | 0.030500 | 0.012278 | 11.090909 | 6.772727 | 0.000556 | 2021 | 10 | 5 | 0 | 0 |
2021-10-06 00:00:00 | 0.007571 | 0.672222 | 0.026111 | 0.035611 | 0.011000 | 18.722222 | 11.833333 | 0.000111 | 2021 | 10 | 6 | 0 | 0 |
2021-10-07 00:00:00 | 0.011565 | 0.713636 | 0.028636 | 0.040318 | 0.017909 | 26.772727 | 17.000000 | 0.001045 | 2021 | 10 | 7 | 0 | 0 |
2021-10-08 00:00:00 | 0.023778 | 0.758824 | 0.029412 | 0.050588 | 0.017941 | 29.000000 | 17.705882 | 0.008176 | 2021 | 10 | 8 | 0 | 0 |
2352 rows × 13 columns
weather = pd.read_pickle("data/openweathermap/weather.pickle.gz")
#weather["year"] = [dt.year for dt in weather["dt"]]
#weather["month"] = [dt.month for dt in weather["dt"]]
#weather["day"] = [dt.day for dt in weather["dt"]]
#weather["hour"] = [dt.hour for dt in weather["dt"]]
#weather["minute"] = [dt.minute for dt in weather["dt"]]
weather.rename(columns={'temp': 'temperature'},
inplace=True)
weather.set_index("dt", inplace=True)
weather.sort_index(inplace=True)
weather.drop(columns=['clouds_all', "weather_id", 'rain_1h', 'rain_3h',
'temp_max', 'temp_min'], inplace=True)
weather
temperature | feels_like | pressure | humidity | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|
dt | |||||||
2021-02-12 07:00:00 | 13.87 | 12.46 | 1020 | 44 | 0.00 | 0 | Clear |
2021-02-12 08:00:00 | 12.81 | 11.37 | 1020 | 47 | 0.00 | 0 | Clear |
2021-02-12 09:00:00 | 10.83 | 9.35 | 1019 | 53 | 1.54 | 60 | Clear |
2021-02-12 10:00:00 | 6.40 | 3.51 | 1019 | 61 | 4.12 | 40 | Clear |
2021-02-12 11:00:00 | 6.23 | 6.23 | 1019 | 57 | 0.00 | 0 | Clear |
... | ... | ... | ... | ... | ... | ... | ... |
2021-09-27 19:00:00 | 21.51 | 20.89 | 1006 | 45 | 0.89 | 139 | Clear |
2021-09-27 20:00:00 | 23.18 | 22.81 | 1005 | 48 | 0.45 | 224 | Rain |
2021-09-27 21:00:00 | 22.21 | 21.69 | 1025 | 46 | 6.17 | 220 | Rain |
2021-09-27 22:00:00 | 21.03 | 20.68 | 1004 | 57 | 0.45 | 242 | Rain |
2021-09-27 23:00:00 | 20.17 | 19.81 | 1004 | 60 | 5.66 | 140 | Clouds |
5601 rows × 7 columns
outdoor = sinaica.drop(columns=['year', 'month', 'day', 'hour', 'minute']).join(weather,
rsuffix='_weather').copy()
outdoor = outdoor[(outdoor.index >= airdata.index.min()) &
(outdoor.index <= airdata.index.max())]
outdoor.sort_index(inplace=True)
outdoor
NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature | feels_like | pressure | humidity | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-02-12 07:00:00 | 0.244000 | 2.500000 | 0.035000 | 0.205000 | 0.002000 | 57.000000 | 25.000000 | 0.005000 | 13.87 | 12.46 | 1020.0 | 44.0 | 0.00 | 0.0 | Clear |
2021-02-12 08:00:00 | 0.146000 | 1.600000 | 0.030000 | 0.089000 | 0.004000 | 67.000000 | 33.000000 | 0.003000 | 12.81 | 11.37 | 1020.0 | 47.0 | 0.00 | 0.0 | Clear |
2021-02-12 09:00:00 | 0.099000 | 1.500000 | 0.039000 | 0.072000 | 0.012000 | 50.000000 | 28.000000 | 0.002000 | 10.83 | 9.35 | 1019.0 | 53.0 | 1.54 | 60.0 | Clear |
2021-02-12 10:00:00 | 0.024000 | 1.200000 | 0.030000 | 0.047000 | 0.025000 | 40.000000 | 21.000000 | 0.002000 | 6.40 | 3.51 | 1019.0 | 61.0 | 4.12 | 40.0 | Clear |
2021-02-12 11:00:00 | 0.009000 | 0.900000 | 0.016000 | 0.026000 | 0.033000 | 33.000000 | 19.000000 | 0.001000 | 6.23 | 6.23 | 1019.0 | 57.0 | 0.00 | 0.0 | Clear |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-14 00:00:00 | 0.017000 | 0.716667 | 0.015125 | 0.024625 | 0.013333 | 7.047619 | 4.500000 | 0.000125 | 15.85 | 14.95 | 1025.0 | 56.0 | 4.12 | 170.0 | Rain |
2021-09-15 00:00:00 | 0.027458 | 0.954167 | 0.028167 | 0.048625 | 0.019375 | 24.416667 | 17.333333 | 0.000875 | 17.95 | 17.08 | 1023.0 | 49.0 | 7.72 | 130.0 | Clouds |
2021-09-16 00:00:00 | 0.006875 | 0.883333 | 0.028000 | 0.034458 | 0.022792 | 46.833333 | 40.041667 | 0.001542 | 18.45 | 17.63 | 1022.0 | 49.0 | 6.17 | 130.0 | Smoke |
2021-09-17 00:00:00 | 0.010250 | 0.947619 | 0.032700 | 0.042750 | 0.026650 | 29.666667 | 24.875000 | 0.003350 | 18.34 | 17.69 | 1024.0 | 56.0 | 4.63 | 300.0 | Rain |
2021-09-18 00:00:00 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 17.61 | 17.85 | 1015.0 | 93.0 | 1.37 | 199.0 | Rain |
1341 rows × 15 columns
outdoor[outdoor.index >= "2021-09-15 23:59"]
NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature | feels_like | pressure | humidity | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-09-16 | 0.006875 | 0.883333 | 0.028000 | 0.034458 | 0.022792 | 46.833333 | 40.041667 | 0.001542 | 18.45 | 17.63 | 1022.0 | 49.0 | 6.17 | 130.0 | Smoke |
2021-09-17 | 0.010250 | 0.947619 | 0.032700 | 0.042750 | 0.026650 | 29.666667 | 24.875000 | 0.003350 | 18.34 | 17.69 | 1024.0 | 56.0 | 4.63 | 300.0 | Rain |
2021-09-18 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 17.61 | 17.85 | 1015.0 | 93.0 | 1.37 | 199.0 | Rain |
data = pd.merge_asof(airdata,
outdoor,
left_index=True, right_index=True,
suffixes=('', '_outdoor'),
tolerance=pd.Timedelta('3 seconds'),
direction="backward"
)
data[data.index >= "2021-09-16 23:59:40"].head(10)
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||||||||||||||||||
2021-09-16 23:59:42.445474625 | 25.86 | 780.44 | 57.00 | 877365 | 246.4 | 1 | 2021 | 9 | 16 | 23 | 59 | 42 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-16 23:59:45.439809322 | 25.85 | 780.40 | 57.04 | 874512 | 246.7 | 1 | 2021 | 9 | 16 | 23 | 59 | 45 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-16 23:59:48.434415102 | 25.85 | 780.42 | 57.07 | 877365 | 245.9 | 1 | 2021 | 9 | 16 | 23 | 59 | 48 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-16 23:59:51.428925753 | 25.84 | 780.42 | 57.07 | 872810 | 246.9 | 1 | 2021 | 9 | 16 | 23 | 59 | 51 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-16 23:59:54.423571348 | 25.84 | 780.42 | 57.07 | 872244 | 247.7 | 1 | 2021 | 9 | 16 | 23 | 59 | 54 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-16 23:59:57.418201685 | 25.84 | 780.44 | 57.06 | 868302 | 249.6 | 1 | 2021 | 9 | 16 | 23 | 59 | 57 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-17 00:00:00.412962675 | 25.84 | 780.42 | 57.02 | 869987 | 250.0 | 1 | 2021 | 9 | 17 | 0 | 0 | 0 | 0.01025 | 0.947619 | 0.0327 | 0.04275 | 0.02665 | 29.666667 | 24.875 | 0.00335 | 18.34 | 17.69 | 1024.0 | 56.0 | 4.63 | 300.0 | Rain |
2021-09-17 00:00:03.407538652 | 25.84 | 780.44 | 57.02 | 877365 | 248.1 | 1 | 2021 | 9 | 17 | 0 | 0 | 3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-17 00:00:06.402314186 | 25.83 | 780.42 | 57.05 | 868302 | 249.8 | 1 | 2021 | 9 | 17 | 0 | 0 | 6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-17 00:00:09.396840096 | 25.84 | 780.42 | 57.00 | 875651 | 248.6 | 1 | 2021 | 9 | 17 | 0 | 0 | 9 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Markdown("Dataset with Indoor and Outdoor Data:\n* %d Rows\n* %d Columns."%(data.shape))
Dataset with Indoor and Outdoor Data:
data[~data.isna().any(axis=1)]
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||||||||||||||||||
2021-02-12 07:00:02.979657173 | 21.51 | 777.30 | 43.74 | 144561 | 95.2 | 1 | 2021 | 2 | 12 | 7 | 0 | 2 | 0.244000 | 2.500000 | 0.035000 | 0.205000 | 0.002000 | 57.000000 | 25.000000 | 0.005000 | 13.87 | 12.46 | 1020.0 | 44.0 | 0.00 | 0.0 | Clear |
2021-02-12 08:00:01.982832432 | 21.04 | 776.92 | 42.35 | 153539 | 78.3 | 1 | 2021 | 2 | 12 | 8 | 0 | 1 | 0.146000 | 1.600000 | 0.030000 | 0.089000 | 0.004000 | 67.000000 | 33.000000 | 0.003000 | 12.81 | 11.37 | 1020.0 | 47.0 | 0.00 | 0.0 | Clear |
2021-02-12 09:00:00.729691744 | 20.41 | 776.33 | 42.56 | 153820 | 99.0 | 1 | 2021 | 2 | 12 | 9 | 0 | 0 | 0.099000 | 1.500000 | 0.039000 | 0.072000 | 0.012000 | 50.000000 | 28.000000 | 0.002000 | 10.83 | 9.35 | 1019.0 | 53.0 | 1.54 | 60.0 | Clear |
2021-02-12 10:00:02.449775934 | 20.27 | 776.20 | 42.21 | 144066 | 178.7 | 1 | 2021 | 2 | 12 | 10 | 0 | 2 | 0.024000 | 1.200000 | 0.030000 | 0.047000 | 0.025000 | 40.000000 | 21.000000 | 0.002000 | 6.40 | 3.51 | 1019.0 | 61.0 | 4.12 | 40.0 | Clear |
2021-02-12 11:00:01.044736862 | 19.91 | 776.25 | 42.26 | 142117 | 212.6 | 1 | 2021 | 2 | 12 | 11 | 0 | 1 | 0.009000 | 0.900000 | 0.016000 | 0.026000 | 0.033000 | 33.000000 | 19.000000 | 0.001000 | 6.23 | 6.23 | 1019.0 | 57.0 | 0.00 | 0.0 | Clear |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-14 00:00:02.855808020 | 24.31 | 780.42 | 55.46 | 873944 | 230.3 | 3 | 2021 | 9 | 14 | 0 | 0 | 2 | 0.017000 | 0.716667 | 0.015125 | 0.024625 | 0.013333 | 7.047619 | 4.500000 | 0.000125 | 15.85 | 14.95 | 1025.0 | 56.0 | 4.12 | 170.0 | Rain |
2021-09-15 00:00:01.543255568 | 25.52 | 779.38 | 54.92 | 938590 | 125.5 | 1 | 2021 | 9 | 15 | 0 | 0 | 1 | 0.027458 | 0.954167 | 0.028167 | 0.048625 | 0.019375 | 24.416667 | 17.333333 | 0.000875 | 17.95 | 17.08 | 1023.0 | 49.0 | 7.72 | 130.0 | Clouds |
2021-09-16 00:00:01.135869265 | 27.09 | 778.30 | 48.41 | 1222727 | 148.3 | 3 | 2021 | 9 | 16 | 0 | 0 | 1 | 0.006875 | 0.883333 | 0.028000 | 0.034458 | 0.022792 | 46.833333 | 40.041667 | 0.001542 | 18.45 | 17.63 | 1022.0 | 49.0 | 6.17 | 130.0 | Smoke |
2021-09-17 00:00:00.412962675 | 25.84 | 780.42 | 57.02 | 869987 | 250.0 | 1 | 2021 | 9 | 17 | 0 | 0 | 0 | 0.010250 | 0.947619 | 0.032700 | 0.042750 | 0.026650 | 29.666667 | 24.875000 | 0.003350 | 18.34 | 17.69 | 1024.0 | 56.0 | 4.63 | 300.0 | Rain |
2021-09-18 00:00:02.824432135 | 26.37 | 782.04 | 54.60 | 950530 | 111.9 | 1 | 2021 | 9 | 18 | 0 | 0 | 2 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 17.61 | 17.85 | 1015.0 | 93.0 | 1.37 | 199.0 | Rain |
1186 rows × 27 columns
We found that the resulting dataframe after merging 2 datasets (Outdoor Data that is sampled every 1 hour and Indoor Data that is sampled every 3 seconds) contains repeated records on the columns of hourly data: SINAICA Gov't Air Quality Monitoring and OpenWeatherData.
We think that the repeated data can be an issue, as the data moves very abruptly from a record call it at 10:57 and 11:00. This is relevant as the real world is not represented by the data correctly. Temperature, pressure and general natural features move slowly from one value to other. But we don't have that data, and it's not easily obtainable.
Therefore, we propose an approach similar to the imputations using the interpolation incorporating noise, that could avert the overfitting issue on our machine learning and deep learning training.
%%time
### We need to impute the first data and last data
### Therefore we need the datapoints to copy
first_data = pd.Timestamp(
year=data.index[0].year,
month=data.index[0].month,
day=data.index[0].day,
hour=data.index[0].hour,
minute=0, second=0
)
last_data = pd.Timestamp(
year=data.index[-1].year,
month=data.index[-1].month,
day=data.index[-1].day,
hour=data.index[-1].hour + 1,
minute=0, second=0
)
weather2 = pd.read_csv("data/openweathermap/2f101ea00e7759ea8723b848ac8b18d0.csv")
weather2["dt"] = pd.to_datetime(weather2["dt"], unit='s')
weather2.set_index("dt", drop=True, inplace=True)
weather2 = weather2[["temp", "feels_like", "temp_min", "temp_max",
"pressure", "humidity", "wind_speed", "wind_deg", "rain_1h", "rain_3h",
"clouds_all", "weather_id", "weather_main"]]
weather2 = weather2.loc[[first_data, last_data]]
#display(Markdown("Weather data:"))
#display(weather2)
sinaica2 = sinaica.loc[[first_data,
sinaica.loc[sinaica.index <= last_data].iloc[-1].name]]
#display(Markdown("SINAICA data:"))
#display(sinaica2)
CPU times: user 988 ms, sys: 151 ms, total: 1.14 s Wall time: 1.14 s
%%time
### First data
d = data.iloc[0]
#sinaica data
d["NO"] = sinaica2.iloc[0]["NO"]
d["CO"] = sinaica2.iloc[0]["CO"]
d["NO2"] = sinaica2.iloc[0]["NO2"]
d["NOx"] = sinaica2.iloc[0]["NOx"]
d["O3"] = sinaica2.iloc[0]["O3"]
d["PM10"] = sinaica2.iloc[0]["PM10"]
d["PM2.5"] = sinaica2.iloc[0]["PM2.5"]
d["SO2"] = sinaica2.iloc[0]["SO2"]
##weather data
d["temperature_outdoor"] = weather2.iloc[0]["temp"]
d["feels_like"] = weather2.iloc[0]["feels_like"]
d["pressure_outdoor"] = weather2.iloc[0]["pressure"]
d["humidity_outdoor"] = weather2.iloc[0]["humidity"]
d["wind_speed"] = weather2.iloc[0]["wind_speed"]
d["wind_deg"] = weather2.iloc[0]["wind_deg"]
d["weather_main"] = weather2.iloc[0]["weather_main"]
data.iloc[0] = d
### Last data
d = data.iloc[-1]
#sinaica data
d["NO"] = sinaica2.iloc[-1]["NO"]
d["CO"] = sinaica2.iloc[-1]["CO"]
d["NO2"] = sinaica2.iloc[-1]["NO2"]
d["NOx"] = sinaica2.iloc[-1]["NOx"]
d["O3"] = sinaica2.iloc[-1]["O3"]
d["PM10"] = sinaica2.iloc[-1]["PM10"]
d["PM2.5"] = sinaica2.iloc[-1]["PM2.5"]
d["SO2"] = sinaica2.iloc[-1]["SO2"]
##weather data
d["temperature_outdoor"] = weather2.iloc[-1]["temp"]
d["feels_like"] = weather2.iloc[-1]["feels_like"]
d["pressure_outdoor"] = weather2.iloc[-1]["pressure"]
d["humidity_outdoor"] = weather2.iloc[-1]["humidity"]
d["wind_speed"] = weather2.iloc[-1]["wind_speed"]
d["wind_deg"] = weather2.iloc[-1]["wind_deg"]
d["weather_main"] = weather2.iloc[-1]["weather_main"]
data.iloc[-1] = d
data
/home/jaa6766/.conda/envs/cuda/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
CPU times: user 1.9 s, sys: 1.05 s, total: 2.95 s Wall time: 2.95 s
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||||||||||||||||||
2021-02-12 06:04:09.089621067 | 21.54 | 777.41 | 43.93 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 9 | 0.205000 | 2.200000 | 0.031000 | 0.207000 | 0.002000 | 45.000000 | 22.000000 | 0.004000 | 14.93 | 13.6 | 1021.0 | 43.0 | 2.57 | 110.0 | Clear |
2021-02-12 06:04:12.087778807 | 21.56 | 777.41 | 43.89 | 152702 | 35.6 | 1 | 2021 | 2 | 12 | 6 | 4 | 12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-02-12 06:04:15.072475433 | 21.53 | 777.41 | 43.97 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-02-12 06:04:18.070170164 | 21.51 | 777.41 | 44.03 | 151464 | 38.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-02-12 06:04:21.061994791 | 21.51 | 777.41 | 44.05 | 152425 | 36.9 | 1 | 2021 | 2 | 12 | 6 | 4 | 21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:20:38.889113188 | 25.84 | 782.96 | 56.64 | 928867 | 130.8 | 1 | 2021 | 9 | 18 | 1 | 20 | 38 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:41.882042885 | 25.83 | 782.94 | 56.66 | 923130 | 131.5 | 1 | 2021 | 9 | 18 | 1 | 20 | 41 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:44.877856970 | 25.83 | 782.94 | 56.63 | 925034 | 131.3 | 1 | 2021 | 9 | 18 | 1 | 20 | 44 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:47.872255564 | 25.83 | 782.94 | 56.62 | 923130 | 131.9 | 1 | 2021 | 9 | 18 | 1 | 20 | 47 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:50.866486311 | 25.83 | 782.96 | 56.63 | 925034 | 131.6 | 1 | 2021 | 9 | 18 | 1 | 20 | 50 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.29 | 14.1 | 1018.0 | 89.0 | 2.11 | 350.0 | Rain |
6285103 rows × 27 columns
Here we can see the first and last data points to create the interpolation for the first and last values:
%%time
df2 = data.copy()
#df2 = data[["temperature_outdoor", "feels_like", "pressure_outdoor",
# "humidity_outdoor", "wind_speed", "wind_deg", "weather_main"]].copy()
#df2[~df2.isna().any(axis=1)]
df2
CPU times: user 327 ms, sys: 254 ms, total: 581 ms Wall time: 580 ms
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||||||||||||||||||
2021-02-12 06:04:09.089621067 | 21.54 | 777.41 | 43.93 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 9 | 0.205000 | 2.200000 | 0.031000 | 0.207000 | 0.002000 | 45.000000 | 22.000000 | 0.004000 | 14.93 | 13.6 | 1021.0 | 43.0 | 2.57 | 110.0 | Clear |
2021-02-12 06:04:12.087778807 | 21.56 | 777.41 | 43.89 | 152702 | 35.6 | 1 | 2021 | 2 | 12 | 6 | 4 | 12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-02-12 06:04:15.072475433 | 21.53 | 777.41 | 43.97 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-02-12 06:04:18.070170164 | 21.51 | 777.41 | 44.03 | 151464 | 38.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-02-12 06:04:21.061994791 | 21.51 | 777.41 | 44.05 | 152425 | 36.9 | 1 | 2021 | 2 | 12 | 6 | 4 | 21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:20:38.889113188 | 25.84 | 782.96 | 56.64 | 928867 | 130.8 | 1 | 2021 | 9 | 18 | 1 | 20 | 38 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:41.882042885 | 25.83 | 782.94 | 56.66 | 923130 | 131.5 | 1 | 2021 | 9 | 18 | 1 | 20 | 41 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:44.877856970 | 25.83 | 782.94 | 56.63 | 925034 | 131.3 | 1 | 2021 | 9 | 18 | 1 | 20 | 44 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:47.872255564 | 25.83 | 782.94 | 56.62 | 923130 | 131.9 | 1 | 2021 | 9 | 18 | 1 | 20 | 47 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2021-09-18 01:20:50.866486311 | 25.83 | 782.96 | 56.63 | 925034 | 131.6 | 1 | 2021 | 9 | 18 | 1 | 20 | 50 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.29 | 14.1 | 1018.0 | 89.0 | 2.11 | 350.0 | Rain |
6285103 rows × 27 columns
%%time
df2["NO"] = df2["NO"].interpolate(method="linear", limit_direction='forward')
df2["CO"] = df2["CO"].interpolate(method="linear", limit_direction='forward')
df2["NO2"] = df2["NO2"].interpolate(method="linear", limit_direction='forward')
df2["NOx"] = df2["NOx"].interpolate(method="linear", limit_direction='forward')
df2["O3"] = df2["O3"].interpolate(method="linear", limit_direction='forward')
df2["PM10"] = df2["PM10"].interpolate(method="linear", limit_direction='forward')
df2["PM2.5"] = df2["PM2.5"].interpolate(method="linear", limit_direction='forward')
df2["SO2"] = df2["SO2"].interpolate(method="linear", limit_direction='forward')
df2["temperature_outdoor"] = df2["temperature_outdoor"].interpolate(method="linear", limit_direction='forward')
df2["feels_like"] = df2["feels_like"].interpolate(method="linear", limit_direction='forward')
df2["pressure_outdoor"] = df2["pressure_outdoor"].interpolate(method="linear", limit_direction='forward')
df2["humidity_outdoor"] = df2["humidity_outdoor"].interpolate(method="linear", limit_direction='forward')
df2["wind_speed"] = df2["wind_speed"].interpolate(method="linear", limit_direction='forward')
df2["wind_deg"] = df2["wind_deg"].interpolate(method="linear", limit_direction='forward')
df2["weather_main"] = df2["weather_main"].interpolate(method="pad", limit_direction='forward')
display(df2.head(3600))
display(df2.tail(3600))
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||||||||||||||||||
2021-02-12 06:04:09.089621067 | 21.54 | 777.41 | 43.93 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 9 | 0.205000 | 2.200000 | 0.031000 | 0.207000 | 0.002000 | 45.000000 | 22.000000 | 0.004000 | 14.930000 | 13.600000 | 1021.000000 | 43.000000 | 2.570000 | 110.000000 | Clear |
2021-02-12 06:04:12.087778807 | 21.56 | 777.41 | 43.89 | 152702 | 35.6 | 1 | 2021 | 2 | 12 | 6 | 4 | 12 | 0.205036 | 2.200274 | 0.031004 | 0.206998 | 0.002000 | 45.010949 | 22.002737 | 0.004001 | 14.929033 | 13.598960 | 1020.999088 | 43.000912 | 2.567655 | 109.899635 | Clear |
2021-02-12 06:04:15.072475433 | 21.53 | 777.41 | 43.97 | 151328 | 37.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 15 | 0.205071 | 2.200547 | 0.031007 | 0.206996 | 0.002000 | 45.021898 | 22.005474 | 0.004002 | 14.928066 | 13.597920 | 1020.998175 | 43.001825 | 2.565310 | 109.799270 | Clear |
2021-02-12 06:04:18.070170164 | 21.51 | 777.41 | 44.03 | 151464 | 38.5 | 1 | 2021 | 2 | 12 | 6 | 4 | 18 | 0.205107 | 2.200821 | 0.031011 | 0.206995 | 0.002000 | 45.032847 | 22.008212 | 0.004003 | 14.927099 | 13.596880 | 1020.997263 | 43.002737 | 2.562965 | 109.698905 | Clear |
2021-02-12 06:04:21.061994791 | 21.51 | 777.41 | 44.05 | 152425 | 36.9 | 1 | 2021 | 2 | 12 | 6 | 4 | 21 | 0.205142 | 2.201095 | 0.031015 | 0.206993 | 0.002000 | 45.043796 | 22.010949 | 0.004004 | 14.926131 | 13.595839 | 1020.996350 | 43.003650 | 2.560620 | 109.598540 | Clear |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-02-12 09:04:38.933268309 | 20.65 | 776.29 | 42.03 | 152564 | 107.7 | 1 | 2021 | 2 | 12 | 9 | 4 | 38 | 0.093207 | 1.476827 | 0.038305 | 0.070069 | 0.013004 | 49.227575 | 27.459302 | 0.002000 | 10.487816 | 8.898904 | 1019.000000 | 53.617940 | 1.739286 | 58.455150 | Clear |
2021-02-12 09:04:41.924828053 | 20.66 | 776.29 | 42.01 | 152149 | 107.9 | 1 | 2021 | 2 | 12 | 9 | 4 | 41 | 0.093145 | 1.476578 | 0.038297 | 0.070048 | 0.013015 | 49.219269 | 27.453488 | 0.002000 | 10.484136 | 8.894053 | 1019.000000 | 53.624585 | 1.741429 | 58.438538 | Clear |
2021-02-12 09:04:44.916538477 | 20.62 | 776.29 | 42.08 | 151737 | 109.2 | 1 | 2021 | 2 | 12 | 9 | 4 | 44 | 0.093082 | 1.476329 | 0.038290 | 0.070027 | 0.013026 | 49.210963 | 27.447674 | 0.002000 | 10.480457 | 8.889203 | 1019.000000 | 53.631229 | 1.743571 | 58.421927 | Clear |
2021-02-12 09:04:47.907913446 | 20.63 | 776.29 | 42.05 | 151464 | 110.7 | 1 | 2021 | 2 | 12 | 9 | 4 | 47 | 0.093020 | 1.476080 | 0.038282 | 0.070007 | 0.013037 | 49.202658 | 27.441860 | 0.002000 | 10.476777 | 8.884352 | 1019.000000 | 53.637874 | 1.745714 | 58.405316 | Clear |
2021-02-12 09:04:50.899550915 | 20.59 | 776.25 | 42.12 | 151601 | 111.5 | 1 | 2021 | 2 | 12 | 9 | 4 | 50 | 0.092958 | 1.475831 | 0.038275 | 0.069986 | 0.013047 | 49.194352 | 27.436047 | 0.002000 | 10.473098 | 8.879502 | 1019.000000 | 53.644518 | 1.747857 | 58.388704 | Clear |
3600 rows × 27 columns
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | year | month | day | hour | minute | second | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | weather_main | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | |||||||||||||||||||||||||||
2021-09-17 22:21:13.529673576 | 27.02 | 779.99 | 49.73 | 1049961 | 32.3 | 1 | 2021 | 9 | 17 | 22 | 21 | 13 | 0.009248 | 0.777732 | 0.028160 | 0.039298 | 0.016083 | 20.946691 | 16.973366 | 0.001445 | 17.660085 | 17.839022 | 1015.617485 | 90.461451 | 1.593667 | 205.929554 | Rain |
2021-09-17 22:21:16.524485349 | 27.02 | 780.01 | 49.74 | 1050781 | 31.8 | 1 | 2021 | 9 | 17 | 22 | 21 | 16 | 0.009248 | 0.777726 | 0.028160 | 0.039298 | 0.016082 | 20.946367 | 16.973072 | 0.001445 | 17.660060 | 17.839028 | 1015.617173 | 90.462733 | 1.593554 | 205.926054 | Rain |
2021-09-17 22:21:19.519613981 | 27.02 | 780.01 | 49.70 | 1044255 | 33.4 | 1 | 2021 | 9 | 17 | 22 | 21 | 19 | 0.009248 | 0.777719 | 0.028160 | 0.039298 | 0.016082 | 20.946043 | 16.972778 | 0.001445 | 17.660034 | 17.839034 | 1015.616861 | 90.464015 | 1.593441 | 205.922554 | Rain |
2021-09-17 22:21:22.514595508 | 27.02 | 779.99 | 49.69 | 1042636 | 35.0 | 1 | 2021 | 9 | 17 | 22 | 21 | 22 | 0.009248 | 0.777713 | 0.028160 | 0.039297 | 0.016082 | 20.945718 | 16.972484 | 0.001444 | 17.660009 | 17.839039 | 1015.616549 | 90.465297 | 1.593328 | 205.919055 | Rain |
2021-09-17 22:21:25.509442091 | 27.01 | 780.01 | 49.68 | 1047508 | 34.7 | 1 | 2021 | 9 | 17 | 22 | 21 | 25 | 0.009248 | 0.777707 | 0.028160 | 0.039297 | 0.016081 | 20.945394 | 16.972190 | 0.001444 | 17.659984 | 17.839045 | 1015.616238 | 90.466579 | 1.593215 | 205.915555 | Rain |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:20:38.889113188 | 25.84 | 782.96 | 56.64 | 928867 | 130.8 | 1 | 2021 | 9 | 18 | 1 | 20 | 38 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.298203 | 14.109265 | 1017.992588 | 89.009883 | 2.108172 | 349.626930 | Rain |
2021-09-18 01:20:41.882042885 | 25.83 | 782.94 | 56.66 | 923130 | 131.5 | 1 | 2021 | 9 | 18 | 1 | 20 | 41 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.296152 | 14.106949 | 1017.994441 | 89.007412 | 2.108629 | 349.720198 | Rain |
2021-09-18 01:20:44.877856970 | 25.83 | 782.94 | 56.63 | 925034 | 131.3 | 1 | 2021 | 9 | 18 | 1 | 20 | 44 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.294101 | 14.104632 | 1017.996294 | 89.004941 | 2.109086 | 349.813465 | Rain |
2021-09-18 01:20:47.872255564 | 25.83 | 782.94 | 56.62 | 923130 | 131.9 | 1 | 2021 | 9 | 18 | 1 | 20 | 47 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.292051 | 14.102316 | 1017.998147 | 89.002471 | 2.109543 | 349.906733 | Rain |
2021-09-18 01:20:50.866486311 | 25.83 | 782.96 | 56.63 | 925034 | 131.6 | 1 | 2021 | 9 | 18 | 1 | 20 | 50 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.290000 | 14.100000 | 1018.000000 | 89.000000 | 2.110000 | 350.000000 | Rain |
3600 rows × 27 columns
CPU times: user 19.5 s, sys: 5.57 s, total: 25.1 s Wall time: 25.1 s
%%time
df_1min = (
df2.
resample('1min').
mean()
)
df_1min = df_1min[~df_1min.isna().any(axis=1)]
df_1min.drop(columns=["year", "month", "day", "hour",
"minute", "second"], inplace=True)
df_1min.to_pickle('data/data_1min.pickle.gz')
df_1min
CPU times: user 6.53 s, sys: 244 ms, total: 6.77 s Wall time: 6.77 s
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||||||||||
2021-02-12 06:04:00 | 21.530000 | 777.410000 | 43.974000 | 151849.400000 | 37.200000 | 1.0 | 0.205071 | 2.200547 | 0.031007 | 0.206996 | 0.002000 | 45.021898 | 22.005474 | 0.004002 | 14.928066 | 13.597920 | 1020.998175 | 43.001825 | 2.565310 | 109.799270 |
2021-02-12 06:05:00 | 21.526250 | 777.408750 | 43.840000 | 152790.000000 | 32.162500 | 1.0 | 0.205302 | 2.202327 | 0.031031 | 0.206984 | 0.002000 | 45.093066 | 22.023266 | 0.004008 | 14.921779 | 13.591159 | 1020.992245 | 43.007755 | 2.550068 | 109.146898 |
2021-02-12 06:06:00 | 21.693000 | 777.409000 | 43.426000 | 152220.550000 | 34.325000 | 1.0 | 0.205801 | 2.206159 | 0.031082 | 0.206959 | 0.002000 | 45.246350 | 22.061588 | 0.004021 | 14.908239 | 13.576597 | 1020.979471 | 43.020529 | 2.517240 | 107.741788 |
2021-02-12 06:07:00 | 21.759000 | 777.410500 | 43.245500 | 151978.450000 | 36.190000 | 1.0 | 0.206512 | 2.211633 | 0.031155 | 0.206922 | 0.002000 | 45.465328 | 22.116332 | 0.004039 | 14.888896 | 13.555794 | 1020.961223 | 43.038777 | 2.470342 | 105.734489 |
2021-02-12 06:08:00 | 21.750500 | 777.390500 | 43.056000 | 150300.400000 | 46.600000 | 1.0 | 0.207224 | 2.217108 | 0.031228 | 0.206886 | 0.002000 | 45.684307 | 22.171077 | 0.004057 | 14.869553 | 13.534991 | 1020.942974 | 43.057026 | 2.423444 | 103.727190 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:16:00 | 25.871000 | 782.805000 | 56.607000 | 921467.050000 | 133.940000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.469432 | 14.302671 | 1017.837863 | 89.216183 | 2.070006 | 341.839098 |
2021-09-18 01:17:00 | 25.861000 | 782.832000 | 56.587500 | 921211.600000 | 134.815000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.428419 | 14.256347 | 1017.874923 | 89.166770 | 2.079148 | 343.704447 |
2021-09-18 01:18:00 | 25.850000 | 782.866000 | 56.597500 | 922348.700000 | 133.850000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.387406 | 14.210022 | 1017.911983 | 89.117356 | 2.088289 | 345.569796 |
2021-09-18 01:19:00 | 25.836190 | 782.900000 | 56.683810 | 921997.095238 | 134.190476 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.345368 | 14.162539 | 1017.949969 | 89.066708 | 2.097659 | 347.481779 |
2021-09-18 01:20:00 | 25.836471 | 782.932941 | 56.662941 | 922768.882353 | 133.876471 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.306405 | 14.118530 | 1017.985176 | 89.019765 | 2.106343 | 349.253860 |
313595 rows × 20 columns
%%time
df_2min = (
df2.
resample('2min').
mean()
)
df_2min = df_2min[~df_2min.isna().any(axis=1)]
df_2min.drop(columns=["year", "month", "day", "hour",
"minute", "second"], inplace=True)
df_2min.to_pickle('data/data_2min.pickle.gz')
df_2min
CPU times: user 2.91 s, sys: 152 ms, total: 3.06 s Wall time: 3.06 s
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||||||||||
2021-02-12 06:04:00 | 21.527692 | 777.409231 | 43.891538 | 152428.230769 | 34.100000 | 1.0 | 0.205214 | 2.201642 | 0.031022 | 0.206989 | 0.002000 | 45.065693 | 22.016423 | 0.004005 | 14.924197 | 13.593759 | 1020.994526 | 43.005474 | 2.555931 | 109.397810 |
2021-02-12 06:06:00 | 21.726000 | 777.409750 | 43.335750 | 152099.500000 | 35.257500 | 1.0 | 0.206156 | 2.208896 | 0.031119 | 0.206941 | 0.002000 | 45.355839 | 22.088960 | 0.004030 | 14.898568 | 13.566195 | 1020.970347 | 43.029653 | 2.493791 | 106.738139 |
2021-02-12 06:08:00 | 21.686250 | 777.365250 | 43.291500 | 147429.200000 | 72.652500 | 1.0 | 0.207580 | 2.219845 | 0.031265 | 0.206868 | 0.002000 | 45.793796 | 22.198449 | 0.004066 | 14.859881 | 13.524589 | 1020.933850 | 43.066150 | 2.399995 | 102.723540 |
2021-02-12 06:10:00 | 21.499500 | 777.302000 | 43.106250 | 149288.475000 | 69.505000 | 1.0 | 0.209003 | 2.230794 | 0.031411 | 0.206795 | 0.002000 | 46.231752 | 22.307938 | 0.004103 | 14.821195 | 13.482984 | 1020.897354 | 43.102646 | 2.306200 | 98.708942 |
2021-02-12 06:12:00 | 21.628250 | 777.279500 | 42.830750 | 149325.975000 | 71.237500 | 1.0 | 0.210427 | 2.241743 | 0.031557 | 0.206722 | 0.002000 | 46.669708 | 22.417427 | 0.004139 | 14.782509 | 13.441378 | 1020.860858 | 43.139142 | 2.212404 | 94.694343 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:12:00 | 25.913500 | 782.776500 | 56.513250 | 920534.625000 | 134.397500 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.612977 | 14.464809 | 1017.708153 | 89.389129 | 2.038011 | 335.310377 |
2021-09-18 01:14:00 | 25.892750 | 782.793500 | 56.555500 | 921024.825000 | 134.380000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.530951 | 14.372159 | 1017.782273 | 89.290303 | 2.056294 | 339.041075 |
2021-09-18 01:16:00 | 25.866000 | 782.818500 | 56.597250 | 921339.325000 | 134.377500 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.448925 | 14.279509 | 1017.856393 | 89.191476 | 2.074577 | 342.771773 |
2021-09-18 01:18:00 | 25.842927 | 782.883415 | 56.641707 | 922168.609756 | 134.024390 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.365874 | 14.185701 | 1017.931439 | 89.091414 | 2.093088 | 346.549104 |
2021-09-18 01:20:00 | 25.836471 | 782.932941 | 56.662941 | 922768.882353 | 133.876471 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.306405 | 14.118530 | 1017.985176 | 89.019765 | 2.106343 | 349.253860 |
156801 rows × 20 columns
%%time
df_5min = (
df2.
resample('5min').
mean()
)
df_5min = df_5min[~df_5min.isna().any(axis=1)]
df_5min.drop(columns=["year", "month", "day", "hour",
"minute", "second"], inplace=True)
df_5min.to_pickle('data/data_5min.pickle.gz')
df_5min
CPU times: user 1.15 s, sys: 102 ms, total: 1.25 s Wall time: 1.25 s
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||||||||||
2021-02-12 06:00:00 | 21.530000 | 777.410000 | 43.974000 | 151849.400000 | 37.200000 | 1.0 | 0.205071 | 2.200547 | 0.031007 | 0.206996 | 0.002000 | 45.021898 | 22.005474 | 0.004002 | 14.928066 | 13.597920 | 1020.998175 | 43.001825 | 2.565310 | 109.799270 |
2021-02-12 06:05:00 | 21.689773 | 777.389432 | 43.361477 | 150039.409091 | 51.973864 | 1.0 | 0.206726 | 2.213276 | 0.031177 | 0.206911 | 0.002000 | 45.531022 | 22.132755 | 0.004044 | 14.883093 | 13.549553 | 1020.955748 | 43.044252 | 2.456273 | 105.132299 |
2021-02-12 06:10:00 | 21.538300 | 777.285200 | 42.909800 | 149975.940000 | 67.172000 | 1.0 | 0.210071 | 2.239005 | 0.031520 | 0.206740 | 0.002000 | 46.560219 | 22.390055 | 0.004130 | 14.792181 | 13.451779 | 1020.869982 | 43.130018 | 2.235853 | 95.697993 |
2021-02-12 06:15:00 | 21.563900 | 777.269000 | 42.704100 | 150897.020000 | 65.798000 | 1.0 | 0.213629 | 2.266378 | 0.031885 | 0.206557 | 0.002000 | 47.655109 | 22.663777 | 0.004221 | 14.695465 | 13.347765 | 1020.778741 | 43.221259 | 2.001364 | 85.661496 |
2021-02-12 06:20:00 | 21.616931 | 777.223960 | 42.695545 | 149963.910891 | 71.275248 | 1.0 | 0.217205 | 2.293887 | 0.032252 | 0.206374 | 0.002000 | 48.755474 | 22.938869 | 0.004313 | 14.598266 | 13.243230 | 1020.687044 | 43.312956 | 1.765703 | 75.574818 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 01:00:00 | 25.987500 | 782.832000 | 56.333000 | 918660.290000 | 135.811000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 15.043613 | 14.951220 | 1017.319024 | 89.907968 | 1.942026 | 315.724212 |
2021-09-18 01:05:00 | 25.966100 | 782.800200 | 56.379800 | 920302.740000 | 134.243000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.838548 | 14.719595 | 1017.504324 | 89.660902 | 1.987733 | 325.050957 |
2021-09-18 01:10:00 | 25.915100 | 782.782200 | 56.511300 | 920836.000000 | 134.184000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.633484 | 14.487971 | 1017.689623 | 89.413836 | 2.033440 | 334.377702 |
2021-09-18 01:15:00 | 25.860099 | 782.840594 | 56.613069 | 921519.306931 | 134.372277 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.427393 | 14.255188 | 1017.875849 | 89.165534 | 2.079376 | 343.751081 |
2021-09-18 01:20:00 | 25.836471 | 782.932941 | 56.662941 | 922768.882353 | 133.876471 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.306405 | 14.118530 | 1017.985176 | 89.019765 | 2.106343 | 349.253860 |
62724 rows × 20 columns
%%time
df_10min = (
df2.
resample('10min').
mean()
)
df_10min = df_10min[~df_10min.isna().any(axis=1)]
df_10min.drop(columns=["year", "month", "day", "hour",
"minute", "second"], inplace=True)
df_10min.to_pickle('data/data_10min.pickle.gz')
df_10min
CPU times: user 790 ms, sys: 109 ms, total: 900 ms Wall time: 898 ms
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||||||||||
2021-02-12 06:00:00 | 21.681183 | 777.390538 | 43.394409 | 150136.720430 | 51.179570 | 1.0 | 0.206637 | 2.212591 | 0.031168 | 0.206916 | 0.002000 | 45.503650 | 22.125912 | 0.004042 | 14.885511 | 13.552153 | 1020.958029 | 43.041971 | 2.462135 | 105.383212 |
2021-02-12 06:10:00 | 21.551100 | 777.277100 | 42.806950 | 150436.480000 | 66.485000 | 1.0 | 0.211850 | 2.252692 | 0.031703 | 0.206649 | 0.002000 | 47.107664 | 22.526916 | 0.004176 | 14.743823 | 13.399772 | 1020.824361 | 43.175639 | 2.118609 | 90.679745 |
2021-02-12 06:20:00 | 21.607811 | 777.222786 | 42.713881 | 149817.253731 | 71.577612 | 1.0 | 0.218984 | 2.307573 | 0.032434 | 0.206283 | 0.002000 | 49.302920 | 23.075730 | 0.004359 | 14.549909 | 13.191223 | 1020.641423 | 43.358577 | 1.648458 | 70.556569 |
2021-02-12 06:30:00 | 21.538950 | 777.266600 | 47.500900 | 134899.510000 | 112.599500 | 1.0 | 0.226119 | 2.362454 | 0.033166 | 0.205917 | 0.002000 | 51.498175 | 23.624544 | 0.004542 | 14.355995 | 12.982673 | 1020.458485 | 43.541515 | 1.178307 | 50.433394 |
2021-02-12 06:40:00 | 21.536716 | 777.252736 | 44.678806 | 139608.238806 | 115.494527 | 1.0 | 0.233254 | 2.417336 | 0.033898 | 0.205551 | 0.002000 | 53.693431 | 24.173358 | 0.004724 | 14.162080 | 12.774124 | 1020.275547 | 43.724453 | 0.708157 | 30.310219 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 00:40:00 | 26.107350 | 783.207700 | 55.989000 | 919791.035000 | 134.245000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 15.763391 | 15.764222 | 1016.668623 | 90.775170 | 1.781594 | 282.987338 |
2021-09-18 00:50:00 | 26.044627 | 782.988358 | 56.206169 | 917950.154229 | 135.905473 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 15.352236 | 15.299815 | 1017.040148 | 90.279802 | 1.873237 | 301.687461 |
2021-09-18 01:00:00 | 25.976800 | 782.816100 | 56.356400 | 919481.515000 | 135.027000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.941081 | 14.835408 | 1017.411674 | 89.784435 | 1.964880 | 320.387585 |
2021-09-18 01:10:00 | 25.887463 | 782.811542 | 56.562438 | 921179.353234 | 134.278607 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.529926 | 14.371001 | 1017.783200 | 89.289067 | 2.056523 | 339.087708 |
2021-09-18 01:20:00 | 25.836471 | 782.932941 | 56.662941 | 922768.882353 | 133.876471 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.306405 | 14.118530 | 1017.985176 | 89.019765 | 2.106343 | 349.253860 |
31364 rows × 20 columns
%%time
df_15min = (
df2.
resample('15min').
mean()
)
df_15min = df_15min[~df_15min.isna().any(axis=1)]
df_15min.drop(columns=["year", "month", "day", "hour",
"minute", "second"], inplace=True)
df_15min.to_pickle('data/data_15min.pickle.gz')
df_15min
CPU times: user 660 ms, sys: 94.9 ms, total: 755 ms Wall time: 754 ms
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||||||||||
2021-02-12 06:00:00 | 21.607150 | 777.335959 | 43.143316 | 150053.414508 | 59.465803 | 1.0 | 0.208416 | 2.226277 | 0.031350 | 0.206825 | 0.002000 | 46.051095 | 22.262774 | 0.004088 | 14.837153 | 13.500146 | 1020.912409 | 43.087591 | 2.344891 | 100.364964 |
2021-02-12 06:15:00 | 21.593223 | 777.238140 | 42.710631 | 150175.980066 | 69.657475 | 1.0 | 0.217205 | 2.293887 | 0.032252 | 0.206374 | 0.002000 | 48.755474 | 22.938869 | 0.004313 | 14.598266 | 13.243230 | 1020.687044 | 43.312956 | 1.765703 | 75.574818 |
2021-02-12 06:30:00 | 21.541993 | 777.262924 | 46.587375 | 136444.478405 | 112.892691 | 1.0 | 0.227916 | 2.376277 | 0.033350 | 0.205825 | 0.002000 | 52.051095 | 23.762774 | 0.004588 | 14.307153 | 12.930146 | 1020.412409 | 43.587591 | 1.059891 | 45.364964 |
2021-02-12 06:45:00 | 21.505050 | 777.272093 | 44.306312 | 141370.046512 | 109.778405 | 1.0 | 0.238627 | 2.458668 | 0.034449 | 0.205276 | 0.002000 | 55.346715 | 24.586679 | 0.004862 | 14.016040 | 12.617062 | 1020.137774 | 43.862226 | 0.354078 | 15.155109 |
2021-02-12 07:00:00 | 21.335133 | 777.231133 | 43.786767 | 146254.326667 | 88.569333 | 1.0 | 0.231821 | 2.388155 | 0.034379 | 0.190584 | 0.002249 | 58.242727 | 25.994181 | 0.004751 | 13.738271 | 12.324543 | 1020.000000 | 44.372818 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-18 00:15:00 | 26.295914 | 782.908738 | 55.281462 | 927031.076412 | 128.179070 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 16.687208 | 16.807690 | 1015.833848 | 91.888203 | 1.575683 | 240.970352 |
2021-09-18 00:30:00 | 26.183000 | 783.176733 | 55.732867 | 921393.526667 | 132.783333 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 16.070988 | 16.111658 | 1016.390673 | 91.145769 | 1.713033 | 268.997221 |
2021-09-18 00:45:00 | 26.054252 | 783.056545 | 56.168405 | 918341.003322 | 135.632890 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 15.454768 | 15.415627 | 1016.947498 | 90.403335 | 1.850383 | 297.024089 |
2021-09-18 01:00:00 | 25.956233 | 782.804800 | 56.408033 | 919933.010000 | 134.746000 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.838548 | 14.719595 | 1017.504324 | 89.660902 | 1.987733 | 325.050957 |
2021-09-18 01:15:00 | 25.856695 | 782.853898 | 56.620254 | 921699.330508 | 134.300847 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.409963 | 14.235500 | 1017.891600 | 89.144534 | 2.083261 | 344.543854 |
20909 rows × 20 columns
%%time
df_30min = (
df2.
resample('30min').
mean()
)
df_30min = df_30min[~df_30min.isna().any(axis=1)]
df_30min.drop(columns=["year", "month", "day", "hour",
"minute", "second"], inplace=True)
df_30min.to_pickle('data/data_30min.pickle.gz')
df_30min
CPU times: user 559 ms, sys: 95 ms, total: 654 ms Wall time: 653 ms
temperature | pressure | humidity | gasResistance | IAQ | iaqAccuracy | NO | CO | NO2 | NOx | O3 | PM10 | PM2.5 | SO2 | temperature_outdoor | feels_like | pressure_outdoor | humidity_outdoor | wind_speed | wind_deg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | ||||||||||||||||||||
2021-02-12 06:00:00 | 21.598664 | 777.276356 | 42.879676 | 1.501281e+05 | 65.675709 | 1.0 | 0.213771 | 2.267473 | 0.031900 | 0.206550 | 0.002000 | 47.698905 | 22.674726 | 0.004225 | 14.691597 | 13.343604 | 1020.775091 | 43.224909 | 1.991984 | 85.260036 |
2021-02-12 06:30:00 | 21.523522 | 777.267508 | 45.446844 | 1.389073e+05 | 111.335548 | 1.0 | 0.233271 | 2.417473 | 0.033900 | 0.205550 | 0.002000 | 53.698905 | 24.174726 | 0.004725 | 14.161597 | 12.773604 | 1020.275091 | 43.724909 | 0.706984 | 30.260036 |
2021-02-12 07:00:00 | 21.285807 | 777.166356 | 43.516739 | 1.476645e+05 | 84.998336 | 1.0 | 0.219561 | 2.275561 | 0.033753 | 0.176072 | 0.002499 | 59.493766 | 26.995012 | 0.004501 | 13.605661 | 12.188180 | 1020.000000 | 44.748130 | 0.000000 | 0.000000 |
2021-02-12 07:30:00 | 21.021811 | 776.989535 | 42.850565 | 1.518114e+05 | 78.670100 | 1.0 | 0.170561 | 1.825561 | 0.031253 | 0.118072 | 0.003499 | 64.493766 | 30.995012 | 0.003501 | 13.075661 | 11.643180 | 1020.000000 | 46.248130 | 0.000000 | 0.000000 |
2021-02-12 08:00:00 | 20.750449 | 776.791229 | 42.674053 | 1.536451e+05 | 81.092525 | 1.0 | 0.134260 | 1.575021 | 0.032248 | 0.084754 | 0.005998 | 62.753533 | 31.751039 | 0.002750 | 12.315411 | 10.865420 | 1019.750208 | 48.498753 | 0.384680 | 14.987531 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-09-17 23:00:00 | 26.241714 | 782.198103 | 53.698003 | 1.013392e+06 | 66.757238 | 1.0 | 0.009208 | 0.770925 | 0.027979 | 0.039159 | 0.015659 | 20.597295 | 16.656760 | 0.001368 | 17.632842 | 17.844994 | 1015.281611 | 91.842268 | 1.472006 | 202.160297 |
2021-09-17 23:30:00 | 26.321860 | 781.936711 | 54.365880 | 9.703531e+05 | 95.548007 | 1.0 | 0.009185 | 0.767123 | 0.027877 | 0.039082 | 0.015423 | 20.402159 | 16.479936 | 0.001326 | 17.617627 | 17.848328 | 1015.094026 | 92.613448 | 1.404058 | 200.055182 |
2021-09-18 00:00:00 | 26.327471 | 782.591181 | 55.042629 | 9.334957e+05 | 123.341764 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 16.994805 | 17.155127 | 1015.555899 | 92.258802 | 1.507122 | 226.980235 |
2021-09-18 00:30:00 | 26.118519 | 783.116539 | 55.950998 | 9.198647e+05 | 134.210483 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 15.762366 | 15.763064 | 1016.669549 | 90.773935 | 1.781822 | 283.033972 |
2021-09-18 01:00:00 | 25.928134 | 782.818660 | 56.467943 | 9.204316e+05 | 134.620335 | 1.0 | 0.009174 | 0.765217 | 0.027826 | 0.039043 | 0.015304 | 20.304348 | 16.391304 | 0.001304 | 14.717560 | 14.582937 | 1017.613650 | 89.515133 | 2.014700 | 330.553737 |
10455 rows × 20 columns
https://scikit-learn.org/stable/modules/linear_model.html#generalized-linear-regression
https://pythonhealthcare.org/2018/05/03/81-distribution-fitting-to-data/
https://medium.com/@amirarsalan.rajabi/distribution-fitting-with-python-scipy-bb70a42c0aed
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html#pandas.merge_asof
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#interpolation