Project 2:Analyze Apps Using SQL

Project Goal:

Find insights for the app developer who needs to decide what type of app to develop.
分析 app 的樣貌協助 app 開發者決定開發類型與方向。

Dataset Overview:

Contain Information of apps on the Apple Store, including names, price, ratings, number of supporting languages, etc.

Data Source: Kaggle

Tools: MySQL

Analysis Process:

1. Import Data
2. Read and Explore Data
3. Data Analysis
4. Find Insights


🔗 Check out Full Code here.

About Dataset

01 id : app ID
02 track_name: app name
03 size_bytes: size in bytes
05 price: Price amount
06 rating_count_tot: user rating counts (for all version)
08 user_rating: average user rating value (for all version)
12 prime_genre: primary Genre
13 sup_devices.num: number of supporting devices
14 ipadSc_urls.num: number of screenshots showed for display
15 lang.num: number of supported languages

1. Import Data 🔗 Full Code

a. Create Tables and Import Data 🔗

-- Create Tables and Import Data
DROP TABLE IF EXISTS applestore;
CREATE TABLE applestore(
    id INT NOT NULL PRIMARY KEY,
    track_name VARCHAR(255),
    size_bytes BIGINT,
    currency VARCHAR(255),
    price DECIMAL,
    rating_count_tot INT,
    rating_count_ver INT,
    user_rating DECIMAL,
    user_rating_ver DECIMAL,
    ver VARCHAR(255),
    cont_rating VARCHAR(255),
    prime_genre VARCHAR(255),
    sup_devices_num INT,
    ipadSc_urls_num INT,
    lang_num INT,
    vpp_lic INT
);

LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/applestore.csv'
INTO TABLE applestore
CHARACTER SET latin7
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

2. Read and Explore Data 🔗 Full Code

a. Explore Data
There are 7197 rows and 16 columns.

-- Explore Data
SELECT *
FROM applestore;


b. Number of Apps by Genre

-- Number of Apps by Genre
SELECT prime_genre AS Genre,
       COUNT(*) AS Num
FROM applestore
GROUP BY Genre
ORDER BY Num DESC;


c. Average of App Ratings and Paid App Price

-- Average of App Ratings and Paid App Price
SELECT ROUND(AVG(user_rating), 1) AS AvgRating,
       ROUND((SELECT AVG(price) FROM applestore WHERE price > 0), 1) AS AvgPrice
FROM applestore;


3. Data Analysis 🔗 Full Code

a. Check whether paid apps have higher ratings than free apps

-- Check whether paid apps have higher ratings than free apps
SELECT 
       CASE
	   WHEN price > 0 THEN 'Paid'
	   ELSE 'Free'
	END AS AppType,
        ROUND(AVG(user_rating), 1) AS Rating
FROM applestore
GROUP BY AppType;


b. Check if apps with more supporting languages have higher rating

-- Check if apps with more supporting languages have higher rating
SELECT 
       CASE
	   WHEN lang_num < 10 THEN '<10 languages'
	   WHEN lang_num BETWEEN 10 AND 30 THEN '10-30 languages'
	   ELSE '>30 languages'
	END AS lang_type,
        ROUND(AVG(user_rating), 1) AS AvgRating
FROM applestore
GROUP BY lang_type
ORDER BY AvgRating DESC;


c. Check correlation between app screenshot and rating

-- Check if apps with more supporting languages have higher rating
SELECT 
       CASE
	   WHEN ipadSc_urls_num < 1 THEN 'No Screenshot'
	   WHEN ipadSc_urls_num BETWEEN 1 AND 3 THEN '1-3 Screenshot'
	   ELSE '4-5 Screenshot'
	END AS ScrnType,
        ROUND(AVG(user_rating), 1) AS AvgRating
FROM AvgRating
GROUP BY ScrnType;


d. Top Rating App in Each Genre

-- Top Rating App in Each Genre
WITH top_app AS(
SELECT 
       prime_genre,
       track_name,
       user_rating,
       rating_count_tot,
       RANK() OVER(PARTITION BY prime_genre ORDER BY user_rating DESC, rating_count_tot DESC) AS tot_r
FROM applestore)

SELECT 
       prime_genre AS Genre,
       track_name AS App,
       user_rating AS Rating
FROM top_app
WHERE tot_r = 1
ORDER BY rating_count_tot DESC;


4. Finding Insights

・The new app should set goal for an average rating above 3.8.
・Paid Apps have better ratings, and the average price is 4.0.
・Games and Entertainment genre have high competition.
・The average ratings in Catelog, Medical and Navigation genre are very low.
・Apps with 4-5 screenshots showed for display and 10-30 supporting languages have better ratings.