Clustering Neighborhoods: Best Place To Move


Introduction

Ever since immigrating from S.Korea I have lived in Queens, New York for almost all my life. Some days I think about moving to a different state or borough. I decided to do research on which area will be best for me. However, right now I am comfortable staying where I am. And so, instead, I created a hypothetical scenario for a future client.

Scenario

Client A currently resides in Flushing, NY. It is known that Flushing is populated with a lot of restaurants and bars, particular Asian, as most of the residents living there are Asian. Client A likes his current neighborhood because of the reasons mentioned above. Client A expressed his concerns about moving, and wants to move into a different borough. The borough he chose was Brooklyn, as it was where his workplace was located. Client A does not care whether or not his new address will be close to his work, nor does he care about the demographics. Client A only cares about restaurants and bars. Client A wants to know which areas in Brooklyn are populated with the greatest amount of restaurants and bars.

Description of Data

The data used to help Client A will be found in FourSquare. The data from FourSquare will be used in order to locate the restaurants and bars in Brooklyn. A JSON file containing the geographic coordinates, neighborhood, and borough in New York will also be used.

Methodology

In order to find out the venues near Flushing and Brooklyn, both the JSON file and FourSquare API was used. The JSON file was especially useful as it already had all the necessary neighborhoods and its geographic coordinates.


The above is a Dataframe showing some of the store names and categories 
found in Flushing. Note that the restaurants are all Asian restaurants

A KMeans algorithm was also used to cluster neighborhoods in Brooklyn based on the venue categories. The number of clusters was set to 4.

Data

There was a limit in finding the number of venues per neighborhood, due to using a free account. The limit was 100, and thus not every venue was able to be located.


The DataFrame below shows the top 10 categories found in Flushing. 
This was done to show which was done to find out specific restaurant categories and bars.
However this information could not be used because Brooklyn did not have any of the below categories, instead just the keyword 'Restaurant' & 'Bar' was used.

The Bar graph below shows the number of bars and restaurants in the four clusters of Brooklyn, NY. Note that Cluster 4 has the greatest amount of Restaurant/Bar. Which is closest to that of Flushing.


Concluding Messages

We can see that the the neighborhoods in Brooklyn were segmented into different clusters inorder to give Client A a variety of locations. Clustering the neighborhoods using KMeans may not have been the best way to solve the problems. However, clustering may have been beneficial to Client A, so that he could choose from more than one neighborhood. The data shows that the neighborhoods Client A will enjoy is mostly in the northern parts of Brooklyn, near the tip of Manhattan. It was also noted that the FourSquare location data did not give the venue names of every single venue in Brooklyn. There was a limit of just 100 venues per neighborhood.The limit in the number of venues was a critical part in determining the categories of stores. It was noted that there were no asian restaurants in Brooklyn, but this is due to the limit in searches from the free FourSquare account. Although the areas with many bars and restaurants were found they were not the same types of restaurants and bars as in Flushing. The data should be further analyzed into giving Client A a more specific area to choose, this can be done by limiting the location radius and finding more specific types of venues.

The analysis of the neighborhoods in Brooklyn was conducted so that Client A can be closer to his work. The data from this analysis showed that Client A should move towards the northern part of Brooklyn. The data also showed that Brooklyn is filled with many restaurants and bars, and Client A will have a fun time transitioning to his new neighborhood. Because of the limit in the amount of searches, unfortunately there was a problem in locating every venue. The categories of stores in Flushing and Brooklyn are very different. Client A can also refine his housing research by looking into the housing costs. In conclusion, if money is not an issue, the northern parts of Brooklyn can be a good place for Client A.







Comments