A room in the city |
Airbnb is an online platform where hosts can list their property, and a traveler can book them just like a hotel. A traveler may opt to use Airbnb, because unlike a hotel room, each listing is unique. Every host has a different home to share, and travelers are free to pick where ever they want to stay.
Each listing is unique, and so are the prices. You can book rooms as low as 30$ and some as high as 1000$ a night. But, how do Airbnb hosts determine their prices? and can I even predict them?
How do Airbnb Hosts Determine Their Prices?
I used data provided by https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data,and did some feature engineering. For example, I separated neighborhoods into communities, and added income information.
Data Visualization
Before making visualizations, I needed to clean the data and explore more of the variables. The data cleaning and exploration process can be seen in my github.After I cleaned the data I decided to visualize Airbnb rooms by their respective 59 NYC communities.
From the tableau map generated below, you can see that there is a similar pattern between median income in the community and the average airbnb room prices of that community
I also decided to look at the distribution in prices between the boroughs and the room types. I also found the frequencies of each, which is shown in the tableau dashboard below. You can filter out the room types to see the records in more detail.
From the dashboard you can see that Manhattan had the most rooms and had the greatest average room prices. You can also see that as expected, entire places were more expensive than private rooms and shared rooms. However, one thing to note is that Bronx, Queens, Brooklyn had more private rooms than entire places, while Manhattan and Staten Island had more entire places than private rooms. There are not as many listings in the Bronx and Staten Island.
Predicting the Prices
I used multi-linear regression to predict the airbnb prices. Because price is a quantitative variable, a regression problem was necessary. The model features that were used and the coefficients of the model are shown below.
From the coefficient we can see that the having an entire home/apt had the greatest impact in predicting the prices. We can also see that it will be cheaper to find an airbnb rental in the Bronx than in Manhattan.Results for Linear Regression Model |
Results for Decision Tree Regression |
Limitations & Biases
One bias that is present in the data are outliers. The data consisted of many outliers, some host even listed a place for $10,000. From the data it was assumed that many hosts were inactive, and thus listed some of their prices for high, or changed their minimum nights to the maximum. It is likely that these hosts wanted to keep their listings up and thus altered prices and minimum nights so that no one would reserve their places.
Some were priced at $10,000 some had a minimum of 10,000 nights to rent |
Now what?
Most hosts choose their prices depending on the location and type of room.
In the future, I can try to predict whether or not a listing is too expensive, normal, or cheap.
I need to keep in mind that some listings are not real listings, and remove the outliers.
Source:
Data:https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City
https://www.icphusa.org/wp-content/uploads/2016/04/Appendix.pdf
Misc:
https://secondmeasure.com/datapoints/airbnb-sales-surpass-most-hotel-brands/
Comments
Post a Comment