Google Play Store Apps dataset - Data pre-processing, visualization and analysis

Learn how to pre-process, visualize and analyze Google Play Store App dataset

The focus of this project is data pre-processing, visualization and drawing inferences. Here, we will be looking at a dataset which contains information about different android applications (apps) available on Google Play Store. These applications are divided into different categories such a Games, Lifestyle, Social, etc. In addition, we have information about the ratings given by customers, pricing of the app, etc.
The main goals for this project are:

1. Learning how to pre-process the data to get it into your desired form in order to do data visualization and analysis.

2. Figuring out the most popular genres and categories for apps on Google Play Store (Art and design, auto and vehicles, social, dating, sports, weather, game, etc.)

3. How to price an app on Play Store?
a. Popularity of free vs paid apps and the confounding factors (if any)
b. Are free apps downloaded more than paid ones?
c. Do paid apps raise the expectations of a customer and hence receive lower ratings as compared to their free counterparts?

4. Whether app size is a factor when it comes to number of downloads, and ratings.

The Google Colab document with all the code, explanations and outputs can be found below. Experiment by changing the different arguments for the plot functions. Have fun!

Here are some important links for your reference:
Seaborn API
matplotlib.pyplot API
Regular Expressions

Archana Tikayat Ray
Archana Tikayat Ray

My research interests include applied data science, machine learning, and Natural Language Processing.