Gabe's Musings: Data Science Movies Recommendation System

Monday, November 2, 2020

Data Science Movies Recommendation System

Nearly everybody wants to invest their recreation energy to watch motion pictures with their loved ones. We as a whole have a similar encounter when we sit on our lounge chair to pick a film that we will watch and go through the following two hours yet can't discover one following 20 minutes. It is so baffling. We unquestionably need a PC operator to give film proposals to us when we have to pick a film and spare our time.

Evidently, a film suggestion specialist has just become a fundamental aspect of our life. As indicated by Data Science Central "Albeit hard information is hard to obtain, many educated sources gauge that, for the significant online business stages like Amazon and Netflix, that recommenders might be liable for as much as 10% to 25% of steady income."

What is recommender System?

There are two types of recommendation systems. They are:

Content-Based Recommender System

A content-based recommender system functions on a user's generated data. We can create the data either directly (such as clicking likes) or indirectly (such as clicking links). This information is used to create a personal profile for the personal that includes the metadata of the user-interacted objects. The more reliable the device or engine collects results, the Interactive Recommender System becomes.

Collaborative Recommender System

A collaborative recommender system makes a suggestion based on how the item was liked by related people. Users with common preferences would be grouped by the system. Recommender schemes can also conduct mutual filtering using object similarities in addition to user similarities (such as 'Users who liked this object X also liked Y'). Most systems will be a combination of these two methods.

It is not a novel idea to make suggestions. Even if e-commerce was not so prevalent, retail store sales workers promoted goods to consumers for the purpose of upselling and cross-selling, eventually optimising profit. The goal of the recommendation programmes is exactly the same.

The recommendation system's other goal is to achieve customer satisfaction by delivering valuable content and optimising the time a person spends on your website or channel. It also tends to increase the commitment of customers. On the other hand, ad budgets can be tailored only for those who have a tendency to respond to them by highlighting products and services.

Why Recommendation systems?

1. They assist the customer with identifying objects of interest
2. Helps the provider of products distribute their products to the proper customer
(a) To classify, for each consumer, the most appropriate products
(b) Display each user customised content
(c) Recommend the correct customer with top deals and discounts
3. User interaction will enhance websites
4. This raises company profits by increased consumption.

Daily Life Examples of Movies Recommender Systems:

1.GroupLens
a) Helped in developing initial recommender systems by pioneering collaborative filtering model.
b) It also provided many data-sets to train models including Movie Lens and Book Lens

2. Amazon
a) Implemented commercial recommender systems
b) They also implemented a lot of computational improvements

3. Netflix
a) Pioneered Latent Factor/ Matrix Factorization models

4. Google
a) Search results in search bar
b) Gmail typing next word

5. YouTube
a) Making a playlist
b) Suggesting same Genre videos
c) Hybrid Recommendation Systems
d) Deep Learning based systems

Let’s go with the Coding part. The dataset link is: https://www.kaggle.com/rounakbanik/the-movies-dataset

 import pandas as pd  import numpy as np 
 df1=pd.read_csv('../input/movies-dataset/movie_dataset.csv')

 df1.columns

 df1.head(5)

 import matplotlib.pyplot as plt

 rich=df1.sort_values('budget',ascending=False)

 fig, ax = plt.subplots()

 rects1 = ax.bar(rich['title'].head(15),rich['budget'].head(15), 

     color=["Red","Orange","Yellow","Green","Blue"])

 plt.xlabel("Movie Title")

 plt.rcParams["figure.figsize"] = (50,50)

 plt.title("Budget Wise top movies")

 plt.ylabel("Movie Budeget")

 def autolabel(rects):

 for rect in rects:

 height = rect.get_height()

 ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,

 '%f' % float(height/100000),

 ha='center', va='bottom')

 autolabel(rects1)

 plt.xticks(rotation=90)

 plt.show()

 rich1=df1.sort_values('vote_average',ascending=False)

 rich1.head()

 fig, ax = plt.subplots()

 rects1 = ax.bar(rich1['title'].head(20),rich1['vote_average'].head(20),  

     color=["Red","Orange","Yellow","Green","Blue"])

 plt.xlabel("Movie Title")

 plt.rcParams["figure.figsize"] = (30,20)

 plt.title("Rating Wise top movies")

 plt.ylabel("Average rating")

 def autolabel(rects):

 for rect in rects:

 height = rect.get_height()

 ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,

 '%f' % float(height),

 ha='center', va='bottom')

 autolabel(rects1)

 plt.xticks(rotation=90)

 plt.show()

 C= df1['vote_average'].mean()

 print(C)

 m= df1['vote_count'].quantile(0.9)

 q_movies = df1.copy().loc[df1['vote_count'] >= m]

 q_movies.shape

 def weightedrating(x,m=m,C=C):

 v = x['vote_count']

 R = x['vote_average']

 # Calculation based on the IMDB formula

 return (v/(v+m) * R) + (m/(m+v) * C)

 # A new column for weighted rating named weight_score in the dataset

 q_movies['weight_score'] = q_movies.apply(weightedrating, axis=1)

 #Sort movies based on score calculated above

 q_movies = q_movies.sort_values('weight_score', ascending=False)

 #Print the top 20 movies

 q_movies[['title', 'vote_count', 'vote_average', 'weight_score']].head(20)

 pop= df1.sort_values('popularity', ascending=False)

 import matplotlib.pyplot as plt

 plt.figure(figsize=(12,4))

 plt.barh(pop['title'].head(5),pop['popularity'].head(5), align='center',

 color=['red','pink','orange','yellow','green'])

 plt.gca().invert_yaxis()

 plt.xlabel("Popularity")

 plt.title("Popular Movies")

 df1['overview'].head(5)

 features = ['keywords','cast','genres','director']

 ##Step 3: Create a column in DF which combines all selected features

 for feature in features:

 df1[feature] = df1[feature].fillna('')

 def combine_features(row):

 try:

 return row['keywords'] +" "+row['cast']+" "+row["genres"]+" "+row["director"]

 except:

 print("Error:", row)

 df1["combined_features"] = df1.apply(combine_features,axis=1)

 from sklearn.feature_extraction.text import CountVectorizer

 from sklearn.metrics.pairwise import cosine_similarity

 cv = CountVectorizer()

 count_matrix = cv.fit_transform(df1["combined_features"])

 ##Step 5: Compute the Cosine Similarity based on the count_matrix

 cosine_sim = cosine_similarity(count_matrix) 

 sim_df = pd.DataFrame(cosine_sim,index=df1.title,columns=df1.title)

 sim_df.head()

 movie_user_likes = "Avatar"

 sim_df[movie_user_likes].sort_values(ascending=False)[:20]

 movie_user_likes = "Gravity"

 sim_df[movie_user_likes].sort_values(ascending=False)[:20]

 from scipy import sparse

 from sklearn.metrics.pairwise import cosine_similarity

 ratings = pd.read_csv("../input/colab-fitting/toy_dataset.csv",index_col=0)

 ratings = ratings.fillna(0)

 ratings

 def standardize(row):

 new_row = (row - row.mean())/(row.max()-row.min())

 return new_row

 ratings_std = ratings.apply(standardize)

 item_similarity = cosine_similarity(ratings_std.T)

 print(item_similarity)

 item_similarity_df = 

          pd.DataFrame(item_similarity,index=ratings.columns,columns=ratings.columns)

 item_similarity_df

 def get_similar_movies(movie_name,user_rating):

 similar_score = item_similarity_df[movie_name]*(user_rating-2.5)

 similar_score = similar_score.sort_values(ascending=False)

 return similar_score

 print(get_similar_movies("romantic3",1))

 action_lover = [("action1",5),("romantic2",1),("romantic3",1)]

 similar_movies = pd.DataFrame()

 for movie,rating in action_lover:

 similar_movies = similar_movies.append(get_similar_movies(movie,rating),ignore_index=True)

 similar_movies.head()

 similar_movies.sum().sort_values(ascending=False)

In case the user or the movie is very new, we do not have many records to predict results. In such cases, the last value in the prediction will appear in recommendations and the performance of the recommendation system by comparing predicted values and original rating values. We will calculate the ‘RMSE’ (root mean squared error) value. In this case, the RMSE value is 0.9313, which one can judge if it is good or bad depending on the size of the dataset.

Disadvantages of Movie Recommendation system

It does not work for a new user who has not rated any item yet as enough ratings are required content-based recommender evaluates the user preferences and provides accurate recommendations.
No recommendation of serendipitous items.
Limited Content Analysis- The recommender does not work if the system fails to distinguish the items that a user likes from the items that he does not like.

Conclusion

In this article we discussed about recommender system, recommendation systems, daily real life examples and disadvantages of data science movie recommendation system.

Author Bio

Rohit Sharma is the Program Director for the upGrad-IIIT Bangalore, PG Diploma Data Analytics Program, one of the leading data science courses. Motivated to leverage technology to solve problems. Working on solving problems of scale and long term technology strategy.

from Featured Blog Posts - Data Science Central https://ift.tt/3oUoNWK
via Gabe's MusingsGabe's Musings

Pages - HTML

Translate

Pages

Pages

Pages

Monday, November 2, 2020

Data Science Movies Recommendation System

Game | Life

Art of Manliness

Snap Judgement

Best of Tech Startups