
Welcome to our Bike Rental Demand Prediction Project! This project aims to forecast the number of bike rentals based on various environmental and temporal features using a data-driven approach. We utilize machine learning algorithms to make accurate predictions that can help in efficient bike fleet management.
We use a rich dataset containing hourly rental data, along with weather and seasonal information. Here’s a glimpse into the data we use:
Our dataset captures the pulse of urban mobility, featuring:
This comprehensive dataset allows us to analyze how various factors influence bike rental behaviors.
Quality data leads to quality insights. We meticulously cleaned and prepared our dataset, transforming raw data into a format suitable for analysis. Here’s how we tackled missing data and extracted new features:
# Handling missing values and extracting the 'Hour' feature
train['DateHour'] = pd.to_datetime(train['DateHour'])
train['Hour'] = train['DateHour'].dt.hour
train.fillna(method='ffill', inplace=True)
Our EDA revealed fascinating trends:
Visualizations from our EDA helped us understand the cyclical nature of bike rentals:
# Plotting bike rentals over different hours of the day
sns.lineplot(x='Hour', y='RENTALS', data=train, marker='o')
plt.title('Impact of Hour on Bike Rentals')
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Rentals')
After experimenting with multiple models, the Decision Tree Regressor emerged as best due to its precision and ability to capture non-linear relationships without overfitting:
# Model evaluation with Decision Tree
decision_tree = Decision TreeRegressor()
cv_scores = cross_val_score(decision_tree, X_train, y_train, scoring='neg_mean_squared_log_error', cv=5)
mean_rmsle = np.sqrt(-cv_scores.mean())
print(f"Optimized RMSLE: {mean_rmsle}")
Our chosen model was put to the ultimate test on unseen data, showcasing its robustness and accuracy:
# Final evaluation on the test set
y_pred = decision_tree.predict(X_test)
final_rmsle = np.sqrt(mean_squared_log_error(y_test, y_pred))
print(f"Final Test RMSLE: {final_rmsle}")
This final RMSLE score reflects the model’s efficiency in predicting real-world scenarios, underscoring the practical value of our analytical rigor.
Our predictive model not only forecasts bike rental demands but also illuminates the dynamics of urban transportation. Details of the code can be found in the repo:Github repository