Cara Marta Messina
Northeastern University
messina [dot] c [at] husky [dot] neu [dot] edu
This notebook takes data collected from Archive of Our Own, a popular fanfiction repository, and sets it up to be analyzed. The data was collected using this AO3 python scraper. The corpus consists of The Legend of Korra and Game of Thrones fanfics, from the first one published on AO3 to 2019. Specifically, I am preparing the data to be analyzed using computatioanl temporal analysis methods, which focus on trends over time. Read more about this method in my article "Tracing Fan Uptakes: Tagging, Language, and Ideological Practices in The Legend of Korra Fanfiction," to be published in The Journal of Writing Analytics. The code for this article is published on my GitHub.
This notebook:
This notebook is part of the Critical Fan Toolkit, Cara Marta Messina's public + digital dissertation
#pandas for working with dataframes
import pandas as pd
#regular expression library
import re
#numpy specifically works with numbers
import numpy as np
#matplot library creates visualizations
import matplotlib.pyplot as plt
%matplotlib inline
#nltk libraries
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.porter import *
import string
#for making a string of elements separated by commas into a list
from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktLanguageVars
#has the nice counter feature for counting tags
import collections
from collections import Counter
import warnings
warnings.filterwarnings("ignore")
I have to read in multiple CSVs of the same dataset (the Game of Thrones fanfics published on AO3) because the original CSV was almost 2GB and my Python kernels kept crashing.
First I load in my CSVs. Then I have a function that goes through and replaces any empty bodies with just empty strings. I have also added a space after the end of each body string because if I merge all the bodies together in a groupby, I want there to be a space.
Second, I have created a function that uses a regular expression to take the published dates information from one column, add another column, and input specific information into that column. The first column, the published date, is structured as 0000-00-00 (YEAR-MONTH-DATE). I want to just keep the month+year, so my new column has 0000-00 (YEAR-MONTH).
Finally, I'm going to save my new CSVs!
korra = pd.read_csv('./data/allkorra.csv')
got = pd.read_csv('./data/got_data_original/got0.csv')
got1 = pd.read_csv('./data/got_data_original/got1.csv')
got2 = pd.read_csv('./data/got_data_original/got2.csv')
got3 = pd.read_csv('./data/got_data_original/got3.csv')
merged_got = pd.concat([got, got1, got2, got3])
merged_got.count()
I only need to do this with the GoT fanfics because I've already done it with TLoK fanfics.
This is to prevent future errors with potentially empty variables.
def prepare_columns(df):
'''
Description: Takes a dataframe collected from AO3 (so the column names are the same).
Prepares them to be grouped by:
-lowercasing all the letters
-adding new lines/columns after particular columns
-replaces NAN values with an empty slot
Input: A dataframe from AO3 with the same column names
Output: A similar dataframe, except the words are lower-cased and there are commas or newlines after particular columns
'''
#creating a new dataframe in case I accidentally run this cell multiple times; then there would be multiple commas.
newDF = df
# make all the text lowercased. The "applymap" function applies a function to each element in a df.
newDF = newDF[['work_id','title','published', 'rating', 'character','relationship','additional tags','category','body']]
newDF = newDF.applymap(lambda s:s.lower() if type(s) == str else s)
#adding a new line after each "body" column
newDF['body'] = (newDF['body'] + ' ')
newDF['body'] = newDF['body'].replace(np.nan,'',regex=True)
newDF.dropna(how='all')
#adding commas after the ratings, relationship, and additional tags columns to make sure they are separated properly
newDF['rating'] = (newDF['rating'] + ', ')
newDF['rating'] = newDF['rating'].replace(np.nan,'',regex=True)
newDF['title'] = (newDF['title'] + ', ')
newDF['title'] = newDF['title'].replace(np.nan,'',regex=True)
newDF['character'] = (newDF['character'] + ', ')
newDF['character'] = newDF['character'].replace(np.nan,'',regex=True)
newDF['relationship'] = (newDF['relationship'] + ', ')
newDF['relationship'] = newDF['relationship'].replace(np.nan,'',regex=True)
newDF['additional tags'] = (newDF['additional tags'] + ', ')
newDF['additional tags'] = newDF['additional tags'].replace(np.nan,'',regex=True)
newDF['category'] = (newDF['category'] + ', ')
newDF['category'] = newDF['category'].replace(np.nan,'',regex=True)
return newDF
tlok_prepped = prepare_columns(korra)
tlok_prepped[:1]
got_prepped = prepare_columns(got)
got1_prepped = prepare_columns(got1)
got2_prepped = prepare_columns(got2)
got3_prepped = prepare_columns(got3)
got_prepped[:2]
Again, I've already done this with the TLoK fanfics, so I only need to do it for the GoT ones.
def add_month_column(df, newcolumn, originalcolumn):
'''
Description: Takes a column that specifically usess the style 2000-11-22 (date) and adds a new column with 0000-00 (year + month)
Input: dataframe with a column structure like 0000-11-22
Output: same dataframe with a new column with the addition of 0000-11 (year + month in this case)
'''
#using a regular expression to create a "month" column
df[newcolumn] = df[originalcolumn].replace('(\d{4})(\-)(\d{2})(\-)(\d{2})','\g<1>\g<2>\g<3>', regex=True)
return df
tlok_new = add_month_column(tlok_prepped,'month','published')
tlok_new.head(2)
got_new = add_month_column(got_prepped, 'month','published')
got1_new = add_month_column(got1_prepped, 'month','published')
got2_new = add_month_column(got2_prepped, 'month','published')
got3_new = add_month_column(got3_prepped, 'month','published')
#checking that the month columb has been added
got_new[:2]
Let's save the new dataframes as CSVs so I can use them! They are commented out, though, so I don't accidentally run it again. I am saving them with the same name as the original CSVs. This is not a great practice, because you want to save all steps of your data in case something happens. However, I have the data already saved on an external file, so I would prefer not to save the same data over and over on my harddrive.
got_new.to_csv(r'./data/got_data_clean/got0.csv')
got1_new.to_csv(r'./data/got_data_clean/got1.csv')
got2_new.to_csv(r'./data/got_data_clean/got2.csv')
got3_new.to_csv(r'./data/got_data_clean/got3.csv')
While it may seem counterintiative to read in a bunch of split dataframes and then merge them again, the one large csv kept crashing my python. This means, I will probably need to keep all the CSVs separate when I load them in, and then merge them each time in my notebook. However, it seems to have worked.
#loading these in so I no longer have to run all the code above to access these
got0New = pd.read_csv('./data/got_data_clean/got0.csv')
got1New = pd.read_csv('./data/got_data_clean/got1.csv')
got2New = pd.read_csv('./data/got_data_clean/got2.csv')
got3New = pd.read_csv('./data/got_data_clean/got3.csv')
merged_got1 = pd.concat([got0New, got1New, got2New, got3New])
got_metadata = merged_got1.drop(['body'], axis=1)
got_metadata.head(1)
got_textual = merged_got[['work_id','body']]
got_textual.head(4)
#saving the metadata files and body of text files
got_textual.to_csv(r'./data/got_data_clean/got_body.csv')
got_metadata.to_csv(r'./data/got_data_clean/got_metadata.csv')
I wanted to just check what months the most fanfics were published in (and made a quick bar chart). This graph function can be used for a lot of datasets and is fairly easy.
def viz_months(df, column, name_of_show):
'''
Description: This function takes a dataframe with a number column, counts the top 10 frequency in that column, and then visualizes it. I am specifically using this function for visualizing published dates of fanfictions, but the labels below can be changed.
Input: the dataframe, the column that you want to count the highest values, and the name of the show
Output: the top 10 highest months published in that set & a cute graph
'''
monthcount = df[column].value_counts().head(10)
print(monthcount)
monthCountGraph = monthcount.plot.bar()
monthCountGraph = plt.title('Highest Months of Published Fanfics of '+name_of_show)
monthCountGraph = plt.xlabel('Month and Year')
monthCountGraph = plt.xlabel('Month and Year')
viz_months(merged_got, 'month', 'Game of Thrones')
viz_months(korra, 'month', 'The Legend of Korra')
Next, I will demonstrate how I prepared the data for computational temporal analysis, or tracing trends over time. I use two corpora: The Legend of Korra and Game of Thrones fanfiction published on Archive of Our Own. This notebook will work with four different columns with textual data: the "relationship" tags column , the "additional tags" column, the "categories" column which provides the gender of particular relationships (such as M/M, F/F, M/F, etc), and the "body" which contains the entire text for each fanfic.
Since I do not need to load in the data again, I will just show the beginning of the data files. Next, I wil need to define my functions.
def prepare_columns_for_groupby(df):
'''
Takes a dataframe collected from AO3 (so the column names are the same) and prepares them to be grouped by lowercasing all the letters and adding new lines/columns after particular columns.
Input: A dataframe from AO3 with the same column names
Output: A similar dataframe, except the words are lower-cased and there are commas or newlines after particular columns
'''
#creating a new dataframe in case I accidentally run this cell multiple times; then there would be multiple commas.
newDF = df
# make all the text lowercased. The "applymap" function applies a function to each element in a df.
newDF = newDF.applymap(lambda s:s.lower() if type(s) == str else s)
newDF = newDF[['published','rating','relationship','additional tags','character','category','month','body']]
#adding a new line after each "body" column
newDF['body'] = (newDF['body'])
newDF['body'] = newDF['body'].replace(np.nan,'',regex=True)
newDF.dropna(how='all')
#adding commas after the ratings, relationship, and additional tags columns to make sure they are separated properly
newDF['rating'] = (newDF['rating'])
newDF['rating'] = newDF['rating'].replace(np.nan,'',regex=True)
newDF['relationship'] = (newDF['relationship'])
newDF['relationship'] = newDF['relationship'].replace(np.nan,'',regex=True)
newDF['character'] = (newDF['character'])
newDF['character'] = newDF['character'].replace(np.nan,'',regex=True)
newDF['additional tags'] = (newDF['additional tags'])
newDF['additional tags'] = newDF['additional tags'].replace(np.nan,'',regex=True)
newDF['category'] = (newDF['category'])
newDF['category'] = newDF['category'].replace(np.nan,'',regex=True)
#make publsihed dates into readable dates
newDF['published'] = pd.to_datetime(newDF['published'])
return newDF
def group_by(df):
'''
This function will group a dataframe by the 'month' column. This can also be used in a later function to group by particular months.
Input: a pandas dataframe and the column you want to groupby with
Output: a new dataframe with the month as the index
'''
#first, group a dataframe by months and count. This will create a list of how many rows for each month.
month_count = df.groupby('month').count()
#in the new dataframe, use a column that has been counted and rename it 'Count'
month_count['count'] = month_count['relationship']
month_count_new = month_count['count']
#create another new dataframe that aggregrates all the proper columns
new_group = df.groupby('month').agg({'rating':'sum','additional tags':'sum','category':'sum','character':'sum','relationship':'sum','body':'sum'})
#join both dataframes to include the count & make it ascending to the earliest FFs are on top
join_df = pd.concat([new_group, month_count_new], axis=1)
join_df.sort_index(ascending=True)
return join_df
Before I group the dataframe, I want to first prepare my data.
The steps to do this are:
allkorra_prepped = prepare_columns_for_groupby(tlok_new)
allkorra_prepped.head(2)
allkorra_month = group_by(allkorra_prepped)
allkorra_month.head(5)
#save the new dataframe to be used later (commenting out so I don't resave)
allkorra_month.to_csv(r'./data/group_month/allkorra_months.csv')
#preKorrasami
tlok01 = allkorra_month.loc['2011-02':'2014-05']
#subtextual Korrasami
tlok02 = allkorra_month.loc['2014-06':'2014-11']
#postKorrasami
tlok03 = allkorra_month.loc['2014-12':'2015-07']
Before I group the dataframe, I want to first prepare my data.
The steps to do this are:
I am currently saving the dataframe as one .csv, but then I will use a csv splitter created by Jordi Rivero. This way, I can upload the split csv without killing my kernels.
allgot_prepped = prepare_columns_for_groupby(got_new)
allgot_prepped.head(2)
allgot_months = group_by(allgot_prepped)
#making individual dataframes for each season
#Season 1: Beginning of data to March 2012 (Season 2 airs April 1st, 2012)
gots1 = allgot_months.loc['2007-02':'2012-03']
#Season 2: April 2012-March 2013 (season 3 airs March 31, 2013)
gots2 = allgot_months.loc['2012-04':'2013-03']
#Season 3: April 2013-March 2014 (season 4 airs April 6, 2014)
gots3 = allgot_months.loc['2013-04':'2014-03']
#Season 4: April 2014 to March 2015 (season 5 airs April 12, 2015)
gots4 = allgot_months.loc['2014-04':'2015-03']
#Season 5: April 2015-March 2016 (season 6 airs April 24, 2016)
gots5 = allgot_months.loc['2015-04':'2016-03']
#Season 6: April 2016-June 2017 (season 7 airs July 16, 2017)
gots6 = allgot_months.loc['2016-04':'2017-06']
#Season 7: July 2017-March 2019 (season 8 airs April 14, 2019)
gots7 = allgot_months.loc['2017-07':'2019-03']
#Season 8: April 2019-end of data
gots8 = allgot_months.loc['2019-04':'2019-09']
# allgot_prepped.to_csv(r'./data/group_month/allgot_months.csv')
gots8
# save data
gots1.to_csv(r'./data/group_month/got_s1.csv')
gots2.to_csv(r'./data/group_month/got_s2.csv')
gots3.to_csv(r'./data/group_month/got_s3.csv')
gots4.to_csv(r'./data/group_month/got_s4.csv')
gots5.to_csv(r'./data/group_month/got_s5.csv')
gots6.to_csv(r'./data/group_month/got_s6.csv')
gots7.to_csv(r'./data/group_month/got_s7.csv')
gots8.to_csv(r'./data/group_month/got_s8.csv')
def column_to_list(df,columnName):
'''
this function takes all the information from a specific column, joins it to a string, and then tokenizes & cleans that string.
input: the name of the dataframe and the column name
output: the tokenized list of the text with all lower case, punctuation removed, and no stop words
'''
df[columnName] = df[columnName].replace(np.nan,'',regex=True)
string = ' '.join(df[columnName].tolist())
return string
def MetadataToDF(df, columnName,NewSeasonColumnName):
'''
input: the dataframe you will work with, the new column names for your new DF)
output: a new dataframe with the metadata and the count in a newly named column
load in the proper data into a string'''
#replace empty values & make a list of all the words
string = column_to_list(df, columnName)
#the function to tokenize, or put each value as an element in a list
class CommaPoint(PunktLanguageVars):
sent_end_chars = (',')
tokenizer = PunktSentenceTokenizer(lang_vars = CommaPoint())
#tokenizing the list of strings based on the COMMA, not the white space (as seen in the CommaPoint above)
ListOfTags = tokenizer.tokenize(string)
length = len(ListOfTags)
#the "Counter" function is from the collections library
allCounter=collections.Counter(ListOfTags)
most = allCounter.most_common()
newDF = pd.DataFrame(most, columns =[columnName, NewSeasonColumnName])
#return
return newDF
tlokALLcharacter = MetadataToDF(allkorra_month, 'character','allCount')
tlok01character = MetadataToDF(tlok01, 'character','01Count')
tlok02character = MetadataToDF(tlok02, 'character','02Count')
tlok03character = MetadataToDF(tlok03, 'character','03Count')
tlokALLcharacter[:5]
tlokCharMerge01 = pd.merge(tlok01character, tlok02character, on='character', how='outer')
tlokCharMerge02 = pd.merge(tlokCharMerge01, tlok03character, on='character', how='outer')
tlokCharMerge = pd.merge(tlokCharMerge02, tlokALLcharacter, on='character', how='outer')
tlokCharMerge[:5]
tlokALLrel = MetadataToDF(allkorra_month, 'relationship','allCount')
tlok01rel = MetadataToDF(tlok01, 'relationship','01Count')
tlok02rel = MetadataToDF(tlok02, 'relationship','02Count')
tlok03rel = MetadataToDF(tlok03, 'relationship','03Count')
tlokALLrel[:5]
tlokRelMerge01 = pd.merge(tlok01rel, tlok02rel, on='relationship', how='outer')
tlokRelMerge02 = pd.merge(tlokRelMerge01, tlok03rel, on='relationship', how='outer')
tlokRelMerge = pd.merge(tlokRelMerge02, tlokALLrel, on='relationship', how='outer')
tlokRelMerge[:5]
tlokALLcat = MetadataToDF(allkorra_month, 'category','allCount')
tlok01cat = MetadataToDF(tlok01, 'category','01Count')
tlok02cat = MetadataToDF(tlok02, 'category','02Count')
tlok03cat = MetadataToDF(tlok03, 'category','03Count')
tlokALLcat
tlokCatMerge01 = pd.merge(tlok01cat, tlok02cat, on='category', how='outer')
tlokCatMerge02 = pd.merge(tlokCatMerge01, tlok03cat, on='category', how='outer')
tlokCatMerge = pd.merge(tlokCatMerge02, tlokALLcat, on='category', how='outer')
tlokCatMerge[:5]
tlokALLtags = MetadataToDF(allkorra_month, 'additional tags','allCount')
tlok01tags = MetadataToDF(tlok01, 'additional tags','01Count')
tlok02tags = MetadataToDF(tlok02, 'additional tags','02Count')
tlok03tags = MetadataToDF(tlok03, 'additional tags','03Count')
tlokALLtags[:5]
tlokTagsMerge01 = pd.merge(tlok01tags, tlok02tags, on='additional tags', how='outer')
tlokTagsMerge02 = pd.merge(tlokTagsMerge01, tlok03tags, on='additional tags', how='outer')
tlokTagsMerge = pd.merge(tlokTagsMerge02, tlokALLtags, on='additional tags', how='outer')
tlokTagsMerge[:5]
tlokALLrating = MetadataToDF(allkorra_month, 'rating','allCount')
tlok01rating = MetadataToDF(tlok01, 'rating','01Count')
tlok02rating = MetadataToDF(tlok02, 'rating','02Count')
tlok03rating = MetadataToDF(tlok03, 'rating','03Count')
tlokALLrating
tlokRatingMerge01 = pd.merge(tlok01rating, tlok02rating, on='rating', how='outer')
tlokRatingMerge02 = pd.merge(tlokRatingMerge01, tlok03rating, on='rating', how='outer')
tlokRatingMerge = pd.merge(tlokRatingMerge02, tlokALLrating, on='rating', how='outer')
tlokRatingMerge[:5]
tlokCharMerge.to_csv(r'./data/metadata/TLoK/tlok_metadata_character.csv')
tlokRelMerge.to_csv(r'./data/metadata/TLoK/tlok_metadata_relationship.csv')
tlokCatMerge.to_csv(r'./data/metadata/TLoK/tlok_metadata_categories.csv')
tlokTagsMerge.to_csv(r'./data/metadata/TLoK/tlok_metadata_tags.csv')
tlokRatingMerge.to_csv(r'./data/metadata/TLoK/tlok_metadata_rating.csv')
gots0character = MetadataToDF(allgot_months, 'character','AllCount')
gots1character = MetadataToDF(gots1, 'character','S1Count')
gots2character = MetadataToDF(gots2, 'character','S2Count')
gots3character = MetadataToDF(gots3, 'character','S3Count')
gots4character = MetadataToDF(gots4, 'character','S4Count')
gots5character = MetadataToDF(gots5, 'character','S5Count')
gots6character = MetadataToDF(gots6, 'character','S6Count')
gots7character = MetadataToDF(gots7, 'character','S7Count')
gots8character = MetadataToDF(gots8, 'character','S8Count')
gots0character[:5]
gotMerge1Chara = pd.merge(gots1character, gots2character, on='character', how='outer')
gotMerge2Chara = pd.merge(gotMerge1Chara, gots3character, on='character', how='outer')
gotMerge3Chara = pd.merge(gotMerge2Chara, gots4character, on='character', how='outer')
gotMerge4Chara = pd.merge(gotMerge3Chara, gots5character, on='character', how='outer')
gotMerge5Chara = pd.merge(gotMerge4Chara, gots6character, on='character', how='outer')
gotMerge6Chara = pd.merge(gotMerge5Chara, gots7character, on='character', how='outer')
gotMerge7Chara = pd.merge(gotMerge6Chara, gots8character, on='character', how='outer')
gotMergeChara = pd.merge(gotMerge7Chara, gots0character, on='character', how='outer')
gotMergeChara[:5]
gots0rel = MetadataToDF(allgot_months, 'relationship','AllCount')
gots1rel = MetadataToDF(gots1, 'relationship','S1Count')
gots2rel = MetadataToDF(gots2, 'relationship','S2Count')
gots3rel = MetadataToDF(gots3, 'relationship','S3Count')
gots4rel = MetadataToDF(gots4, 'relationship','S4Count')
gots5rel = MetadataToDF(gots5, 'relationship','S5Count')
gots6rel = MetadataToDF(gots6, 'relationship','S6Count')
gots7rel = MetadataToDF(gots7, 'relationship','S7Count')
gots8rel = MetadataToDF(gots8, 'relationship','S8Count')
gots0rel[:5]
gotMerge1rel = pd.merge(gots1rel, gots2rel, on='relationship', how='outer')
gotMerge2rel = pd.merge(gotMerge1rel, gots3rel, on='relationship', how='outer')
gotMerge3rel = pd.merge(gotMerge2rel, gots4rel, on='relationship', how='outer')
gotMerge4rel = pd.merge(gotMerge3rel, gots5rel, on='relationship', how='outer')
gotMerge5rel = pd.merge(gotMerge4rel, gots6rel, on='relationship', how='outer')
gotMerge6rel = pd.merge(gotMerge5rel, gots7rel, on='relationship', how='outer')
gotMerge7Rel = pd.merge(gotMerge6rel, gots8rel, on='relationship', how='outer')
gotMergeRel = pd.merge(gotMerge7Rel, gots0rel, on='relationship', how='outer')
gotMergeRel[:5]
gots0cat = MetadataToDF(allgot_months, 'category','AllCount')
gots1cat = MetadataToDF(gots1, 'category','S1Count')
gots2cat = MetadataToDF(gots2, 'category','S2Count')
gots3cat = MetadataToDF(gots3, 'category','S3Count')
gots4cat = MetadataToDF(gots4, 'category','S4Count')
gots5cat = MetadataToDF(gots5, 'category','S5Count')
gots6cat = MetadataToDF(gots6, 'category','S6Count')
gots7cat = MetadataToDF(gots7, 'category','S7Count')
gots8cat = MetadataToDF(gots8, 'category','S8Count')
gots0cat
gotMerge1cat = pd.merge(gots1cat, gots2cat, on='category', how='outer')
gotMerge2cat = pd.merge(gotMerge1cat, gots3cat, on='category', how='outer')
gotMerge3cat = pd.merge(gotMerge2cat, gots4cat, on='category', how='outer')
gotMerge4cat = pd.merge(gotMerge3cat, gots5cat, on='category', how='outer')
gotMerge5cat = pd.merge(gotMerge4cat, gots6cat, on='category', how='outer')
gotMerge6cat = pd.merge(gotMerge5cat, gots7cat, on='category', how='outer')
gotMerge7Cat = pd.merge(gotMerge6cat, gots8cat, on='category', how='outer')
gotMergeCat = pd.merge(gotMerge7Cat, gots0cat, on='category', how='outer')
gotMergeCat[:5]
gots0tags = MetadataToDF(allgot_months, 'additional tags','AllCount')
gots1tags = MetadataToDF(gots1, 'additional tags','S1Count')
gots2tags = MetadataToDF(gots2, 'additional tags','S2Count')
gots3tags = MetadataToDF(gots3, 'additional tags','S3Count')
gots4tags = MetadataToDF(gots4, 'additional tags','S4Count')
gots5tags = MetadataToDF(gots5, 'additional tags','S5Count')
gots6tags = MetadataToDF(gots6, 'additional tags','S6Count')
gots7tags = MetadataToDF(gots7, 'additional tags','S7Count')
gots8tags = MetadataToDF(gots8, 'additional tags','S8Count')
gots0tags[:5]
gotMerge1tags = pd.merge(gots1tags, gots2tags, on='additional tags', how='outer')
gotMerge2tags = pd.merge(gotMerge1tags, gots3tags, on='additional tags', how='outer')
gotMerge3tags = pd.merge(gotMerge2tags, gots4tags, on='additional tags', how='outer')
gotMerge4tags = pd.merge(gotMerge3tags, gots5tags, on='additional tags', how='outer')
gotMerge5tags = pd.merge(gotMerge4tags, gots6tags, on='additional tags', how='outer')
gotMerge6tags = pd.merge(gotMerge5tags, gots7tags, on='additional tags', how='outer')
gotMerge7Tags = pd.merge(gotMerge6tags, gots8tags, on='additional tags', how='outer')
gotMergeTags = pd.merge(gotMerge7Tags, gots0tags, on='additional tags', how='outer')
gotMergeTags[:5]
gots0rating = MetadataToDF(allgot_months, 'rating','AllCount')
gots1rating = MetadataToDF(gots1, 'rating','S1Count')
gots2rating = MetadataToDF(gots2, 'rating','S2Count')
gots3rating = MetadataToDF(gots3, 'rating','S3Count')
gots4rating = MetadataToDF(gots4, 'rating','S4Count')
gots5rating = MetadataToDF(gots5, 'rating','S5Count')
gots6rating = MetadataToDF(gots6, 'rating','S6Count')
gots7rating = MetadataToDF(gots7, 'rating','S7Count')
gots8rating = MetadataToDF(gots8, 'rating','S8Count')
gots0rating
gotMerge1rat = pd.merge(gots1rating, gots2rating, on='rating', how='outer')
gotMerge2rat = pd.merge(gotMerge1rat, gots3rating, on='rating', how='outer')
gotMerge3rat = pd.merge(gotMerge2rat, gots4rating, on='rating', how='outer')
gotMerge4rat = pd.merge(gotMerge3rat, gots5rating, on='rating', how='outer')
gotMerge5rat = pd.merge(gotMerge4rat, gots6rating, on='rating', how='outer')
gotMerge6rat = pd.merge(gotMerge5rat, gots7rating, on='rating', how='outer')
gotMerge7Rat = pd.merge(gotMerge6rat, gots8rating, on='rating', how='outer')
gotMergeRat = pd.merge(gotMerge7Rat, gots0rating, on='rating', how='outer')
gotMergeRat[:5]
gotMergeChara.to_csv(r'./data/metadata/GoT/got_metadata_character.csv')
gotMergeRel.to_csv(r'./data/metadata/GoT/got_metadata_relationship.csv')
gotMergeCat.to_csv(r'./data/metadata/GoT/got_metadata_categories.csv')
gotMergeTags.to_csv(r'./data/metadata/GoT/got_metadata_tags.csv')
gotMergeRat.to_csv(r'./data/metadata/GoT/got_metadata_rating.csv')