How do Airbnb Hosts Determine Their Prices?

A room in the city
Airbnb began in 2008 and in a span of 12 years it has already gained 19% of market share nationwide among big hotel and HomeAways (misc).

Airbnb is an online platform where hosts can list their property, and a traveler can book them just like a hotel. A traveler may opt to use Airbnb, because unlike a hotel room, each listing is unique. Every host has a different home to share, and travelers are free to pick where ever they want to stay.

Each listing is unique, and so are the prices. You can book rooms as low as 30$ and some as high as 1000$ a night. But, how do Airbnb hosts determine their prices? and can I even predict them?

How do Airbnb Hosts Determine Their Prices?

I used data provided by https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data,
and did some feature engineering. For example,  I separated neighborhoods into communities, and added income information.

Data Visualization

Before making visualizations, I needed to clean the data and explore more of the variables. The data cleaning and exploration process can be seen in my github.

After I cleaned the data I decided to visualize Airbnb rooms by their respective 59 NYC communities.

From the tableau map generated below, you can see that there is a similar pattern between median income in the community and the average airbnb room prices of that community



I also decided to look at the distribution in prices between the boroughs and the room types. I also found the frequencies of each, which is shown in the tableau dashboard below. You can filter out the room types to see the records in more detail.


From the dashboard you can see that Manhattan had the most rooms and had the greatest average room prices. You can also see that as expected, entire places were more expensive than private rooms and shared rooms. However, one thing to note is that Bronx, Queens, Brooklyn had more private rooms than entire places, while Manhattan and Staten Island had more entire places than private rooms. There are not as many listings in the Bronx and Staten Island.

Predicting the Prices

I used multi-linear regression to predict the airbnb prices. Because price is a quantitative variable, a regression problem was necessary. The model features that were used and the coefficients of the model are shown below.


From the coefficient we can see that the having an entire home/apt had the greatest impact in predicting the prices. We can also see that it will be cheaper to find an airbnb rental in the Bronx than in Manhattan.

Results for Linear Regression Model

Above is the Mean absolute error, Mean Squared Error, Root Mean Squared Error and the R^2 score using the linear regression model. From the R^2 we can see that the model did not predict the prices well, with only 10% accuracy on the training set. The test RMSE shows a value of 221 which is very high. I decided to use a Decision Tree regression model to compare the results with the linear model.

Results for Decision Tree Regression
Above shows the results for the Decision Tree model. The training set performed very well, however the test did not. This means the data was over fitted. When comparing the test set of the linear model to the decision tree model, the linear model performed better. However, both model did not perform too well.

Limitations & Biases


There were many limitations and biases that were evident when trying to predict AirBnb prices. A limitation was the features that were used to predict the model. If we are provided with more features such as the average reviews or cleaning fees we can have a better sense of the prices. For example, a host with high average reviews may charge more, or a host that has a high cleaning fee may have a lower base price.

One bias that is present in the data are outliers. The data consisted of many outliers, some host even listed  a place for $10,000. From the data it was assumed that many hosts were inactive, and thus listed some of their prices for high, or changed their minimum nights to the maximum. It is likely that these hosts wanted to keep their listings up and thus altered prices and minimum nights so that no one would reserve their places. 

Some were priced at $10,000 some had a minimum of 10,000 nights to rent

Now what?

Most hosts choose their prices depending on the location and type of room.
In the future, I can try to predict whether or not a listing is too expensive, normal, or cheap.
I need to keep in mind that some listings are not real listings, and remove the outliers.

Source:

Data:
https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City
https://www.icphusa.org/wp-content/uploads/2016/04/Appendix.pdf

Misc:
https://secondmeasure.com/datapoints/airbnb-sales-surpass-most-hotel-brands/

Comments