Parsing XML

This computational notebook can be used to parse XML files, particularly to pull out attribute values and see how often particular attribute values are used together. I used this XML parser for the Critical Fan Toolkit interviews, which I qualitatively coded using an XML schema I created in RelaxNG.

Information about the interviews and about the qualitative coding process is available on the Critical Fan Toolkit's interview portion of the website.

This computational notebook was created by Cara Marta Messina and William Reed Quinn. Cara wrote the documentation, while Bill helped with transforming the attribute values dictionary into a dataframe. This notebook is part of Cara's documentation process for her dissertation.

In [1]:
import xml.etree.ElementTree as ET
import os

import pandas as pd
import numpy as np

import re

import plotly.graph_objs as go

import plotly.express as px
import plotly

Reading in the XML files

First, I created two functions. The first function I found on Stackoverflow by ponayz. The second function is a basic function to read through each XML file and grab all the coding content begininning with the root element. For the interviews I coded, the rooot element is "interview." Then, I read in the XML data for all 6 of the interviews; there is probably a more automatic way to do this, but because I only had six, I just did the get_root_data function for each filepath.

In [2]:
path = '../critical-fan-toolkit-XML/interviews-encoded/v2/'


def get_xml_files(path):
    '''
    Describe: This function retrieves a list of all the xml files in a particular folder.
    Input: A path to a folder with XML files
    Output: A list of all the XML files in that folder. 
    '''
    xml_list = []
    for filename in os.listdir(path):
        if filename.endswith(".xml"):
            xml_list.append(os.path.join(path, filename))
    return xml_list

def get_root_data(path):
    '''
    Describe: This function retrieves the root element and all subsequent children nodes within an XML file. There is probably an easier way to do this with a for loop, but I was failing miserably. 
    Input: An XML file path.
    Output: The entire XML document from the root element that can now be parsed
    '''
    tree = ET.parse(path)
    root = tree.getroot()
    return root
In [3]:
xml_list = get_xml_files(path)
xml_list
Out[3]:
['../critical-fan-toolkit-XML/interviews-encoded/v2/aria-interview-transcription.xml',
 '../critical-fan-toolkit-XML/interviews-encoded/v2/dialux-interview-transcription.xml',
 '../critical-fan-toolkit-XML/interviews-encoded/v2/gillywulf-interview-transcription.xml',
 '../critical-fan-toolkit-XML/interviews-encoded/v2/kittya-interview-transcription.xml',
 '../critical-fan-toolkit-XML/interviews-encoded/v2/valk-interview-transcription.xml',
 '../critical-fan-toolkit-XML/interviews-encoded/v2/writegirl-interview-transcription.xml']
In [4]:
aria = get_root_data('../critical-fan-toolkit-XML/interviews-encoded/v2/aria-interview-transcription.xml')
dia = get_root_data('../critical-fan-toolkit-XML/interviews-encoded/v2/dialux-interview-transcription.xml')
gill = get_root_data('../critical-fan-toolkit-XML/interviews-encoded/v2/gillywulf-interview-transcription.xml')
kitt = get_root_data('../critical-fan-toolkit-XML/interviews-encoded/v2/kittya-interview-transcription.xml')
valk = get_root_data('../critical-fan-toolkit-XML/interviews-encoded/v2/valk-interview-transcription.xml')
wg = get_root_data('../critical-fan-toolkit-XML/interviews-encoded/v2/writegirl-interview-transcription.xml')

Creating Attribute Values Dictionary

In this next section, I created a list of dictionaries with all the attribute values used in each "code" element. The way my XML is set up, the only time I mark up text explicitly is with the "code" element or the "power-identity" element. However, for this portion, I only focus on the "code" element.

Within the "code" element, there are several attributes, inclulding "fan-communities," "writing-agency," and "rgs-genre." For each of these attributes, there are a series of values. To read my codebook, visit the "Qualitative Coding" webpage on my dissertation.

For each "code" across all six interviews, I created a dictionary with the attribute as the key and the attribute value as the value.

In [5]:
attribute_values = []

def append_to_dict(dictionary, rootXML):
    for item in rootXML.findall('./transcription/dialogue/code'):    # find all projects node
        code_list = {}
        code_list = item.attrib
        for child in item.getchildren():
            allbabies = child.attrib
            code_list.update(allbabies)
        attribute_values.append(code_list)
    return attribute_values
In [6]:
#adding every interview to the dictionary to make a list of dictionaries of ALL the attribute values
append_to_dict(attribute_values, aria)
append_to_dict(attribute_values, dia)
append_to_dict(attribute_values, gill)
append_to_dict(attribute_values, kitt)
append_to_dict(attribute_values, valk)
append_to_dict(attribute_values, wg)

for key in attribute_values:
    del key['id']

#a list of dictionaries
attribute_values[:20]
C:\Users\caram\Anaconda3\lib\site-packages\ipykernel_launcher.py:7: DeprecationWarning: This method will be removed in future versions.  Use 'list(elem)' or iteration over elem instead.
  import sys
Out[6]:
[{'writing-agency': 'motivation',
  'canon': 'canon-critique',
  'describe': 'LGBTQplus heteronormativity'},
 {'fan-community': 'fan-politics', 'rgs-genre': 'identity-bending'},
 {'fan-community': 'fan-politics'},
 {'important': 'important-quote', 'fan-community': 'fan-politics'},
 {'fan-community': 'fan-politics', 'rgs-uptake': 'fan-practices-uptake'},
 {'rgs-uptake': 'critical-uptake',
  'writing-agency': 'motivation',
  'fan-community': 'fan-politics',
  'important': 'important-quote'},
 {'writing-agency': 'reflection', 'describe': 'disability'},
 {'fan-community': 'fan-politics',
  'rgs-uptake': 'critical-uptake',
  'describe': 'other'},
 {'important': 'important-quote',
  'rgs-uptake': 'critical-uptake canon-resistant',
  'writing-agency': 'motivation',
  'canon': 'canon-critique',
  'describe': 'disability abelism'},
 {'writing-agency': 'motivation', 'rgs-uptake': 'critical-uptake'},
 {'writing-agency': 'reflection', 'rgs-genre': 'genre-other'},
 {'writing-agency': 'reflection', 'describe': 'other disability'},
 {'writing-agency': 'motivation',
  'canon': 'canon-critique canon-relation',
  'describe': 'disability'},
 {'writing-agency': 'reflection',
  'rgs-genre': 'genre-other',
  'describe': 'disability LGBTQplus'},
 {'writing-agency': 'research revising', 'describe': 'disability other'},
 {'writing-agency': 'reception'},
 {'rgs-genre': 'fluff angst'},
 {'rgs-genre': 'authors-note',
  'writing-agency': 'audience',
  'describe': 'homophobia-transphobia'},
 {'writing-agency': 'motivation', 'canon': 'canon-compliment'},
 {'writing-agency': 'motivation',
  'canon': 'canon-relation',
  'rgs-uptake': 'canon-compliant',
  'describe': 'LGBTQplus'}]

Splitting by Whitespace

In my XML code, there are a few attributes that have multiple attribute values. For example, in the "rgs-genre", I include both "angst fluff" and separate the values by whitespace.

This next codebox simply goes through each of the values in the dictionary and creates a list of those values. This way, if there are any attributes that have two attribute values, there will be a list of two values ('angst', 'fluff') instead of just a string that Python will not be able to recognize.

In [7]:
copy_att = attribute_values.copy()

new_list = []

for i in copy_att:
    new = {key: value.split(" ") for key, value in i.items()}
    new_list.append(new)

#a list of dictionaries, and each dictionary's value is a list. PHEW
new_list[:2]
Out[7]:
[{'writing-agency': ['motivation'],
  'canon': ['canon-critique'],
  'describe': ['LGBTQplus', 'heteronormativity']},
 {'fan-community': ['fan-politics'], 'rgs-genre': ['identity-bending']}]

Creating Dataframes

This next portion creates a dataframe and then uses the for loop to go through the newly created list from above. This for loop goes through each item in the list (aka all the attributes and attribute values as dictionaries), breaks down the key and value structure of the dictionary, and takes only the values.

So {rgs-genre: ['fluff', 'angst'] becomes just [fluff, angst] and gets added to an empty list. The empty list and an id for each " code"are added to the data frame. This way, the dataframe can maintain the integrity of each "code."

In [8]:
#create an empty dataframe
df = pd.DataFrame(columns=['id','values'])
#create an id for each "code"
n = 0

for i in new_list:
    #create empty list
    empty_list = []

    #add new id for each new code (the +1 increases the number)
    n = n+1

    #for loop for dictionary items: k is key and v is values
    for k,v in i.items():

        #append onto the new list
        empty_list = empty_list + v

    #append each new list and the id for that list into a dataframe    
    df = df.append({'id':n, "values":empty_list}, ignore_index = True)

Cleaning Up and Transforming

The "explode" function takes a list within a dataframe that has a unique identifier AND a list within a dataframe column/row and creates a longer dataframe. The attribute values ["fluff","angst"] now each have their own row along with their original code's unique identifier.

Then I will add a "count" column and "pivot" the dataframe to make it into the matrix.

The last cell in this section is the final dataframe, which has more of the matrix feel. There is a unique ID (the index) for each of the original codes and then a number of how many times each attribute value appears in that code. As the dataframe shows, most codes only have one or two attribute values, so there are mostly zeros.

In [9]:
#explode takes a list within a dataframe and gives each item a unique identifier
df = df.explode('values')
df
Out[9]:
id values
0 1 motivation
0 1 canon-critique
0 1 LGBTQplus
0 1 heteronormativity
1 2 fan-politics
... ... ...
422 423 fan-politics
422 423 modern-setting
422 423 fan-practices-uptake
423 424 fan-politics
423 424 other

911 rows × 2 columns

In [10]:
#adding a "count" columnn
df["count"] = df.groupby(["id","values"])["values"].transform("count")
In [11]:
df
Out[11]:
id values count
0 1 motivation 1
0 1 canon-critique 1
0 1 LGBTQplus 1
0 1 heteronormativity 1
1 2 fan-politics 1
... ... ... ...
422 423 fan-politics 1
422 423 modern-setting 1
422 423 fan-practices-uptake 1
423 424 fan-politics 1
423 424 other 1

911 rows × 3 columns

In [12]:
df = df.pivot(index="id", columns="values", values="count").fillna(0)
In [13]:
#removing the columns "uptake" and "critical" because there wasn an error in the whitespace and it separated one of the attribute values.
df= df.drop(columns=["important-quote"])
In [14]:
df
Out[14]:
values LGBTQplus abelism angst antiracism audience authors-note canon-compliant canon-compliment canon-critique canon-relation ... modern-setting motivation other racism reception reflection research revising sexism vignette
id
1 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
420 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
421 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
422 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
423 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
424 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

424 rows × 36 columns

Co-Occurrence and Correlation Matrixes

Now that we have this large dataframe that has the 424 unique code elements and the number of times each attribute value appeared in that element, we can create matrixes and better capture how often particular attribute values appeared together!

Specifically, the adjacency matrix (df.T.dot(df) shows a raw count of when particular attribute values were used together. This will be especially useful when talking about the relationship between writing, motivation, uptake, genre, and how fans interpret the canonical text.

In [15]:
#correlation matrix
corr = df.corr()
corr
Out[15]:
values LGBTQplus abelism angst antiracism audience authors-note canon-compliant canon-compliment canon-critique canon-relation ... modern-setting motivation other racism reception reflection research revising sexism vignette
values
LGBTQplus 1.000000 -0.032359 -0.042961 -0.045982 -0.068834 -0.042961 0.028181 0.035259 -0.039765 0.125968 ... 0.038625 0.089026 -0.094738 -0.056589 -0.079412 0.078040 -0.017090 -0.036222 -0.048830 -0.051534
abelism -0.032359 1.000000 -0.012644 -0.013533 -0.028815 -0.012644 -0.027409 -0.026443 0.150853 -0.020548 ... -0.016655 0.029512 -0.027883 -0.016655 -0.023372 0.026193 0.118505 -0.010661 -0.014372 -0.015167
angst -0.042961 -0.012644 1.000000 -0.017967 -0.038255 -0.016787 -0.036389 -0.035106 -0.038255 -0.027281 ... -0.022112 -0.053112 0.103122 -0.022112 -0.031029 -0.055130 -0.023942 -0.014153 -0.019080 -0.020136
antiracism -0.045982 -0.013533 -0.017967 1.000000 0.022881 -0.017967 -0.038948 0.031097 0.022881 -0.029199 ... -0.023667 0.041937 -0.039621 -0.023667 -0.033212 0.037220 0.071385 -0.015149 -0.020422 -0.021553
audience -0.068834 -0.028815 -0.038255 0.022881 1.000000 0.029897 -0.049567 -0.080003 -0.087179 -0.062170 ... -0.050391 -0.096293 -0.084360 -0.050391 -0.032372 -0.077429 -0.054561 -0.032254 -0.043481 0.068563
authors-note -0.042961 -0.012644 -0.016787 -0.017967 0.029897 1.000000 -0.036389 -0.035106 -0.038255 -0.027281 ... -0.022112 -0.000373 -0.037018 -0.022112 -0.031029 0.047618 -0.023942 -0.014153 -0.019080 -0.020136
canon-compliant 0.028181 -0.027409 -0.036389 -0.038948 -0.049567 -0.036389 1.000000 0.031574 -0.082926 0.075670 ... 0.006700 0.013942 -0.080245 0.006700 -0.027264 -0.069213 -0.001196 -0.030681 -0.041360 0.016052
canon-compliment 0.035259 -0.026443 -0.035106 0.031097 -0.080003 -0.035106 0.031574 1.000000 -0.080003 -0.057052 ... -0.046243 -0.031206 -0.042045 -0.046243 -0.064892 -0.115295 -0.050069 -0.029599 0.089742 -0.042111
canon-critique -0.039765 0.150853 -0.038255 0.022881 -0.087179 -0.038255 -0.082926 -0.080003 1.000000 -0.019098 ... -0.050391 0.126406 -0.018609 0.106709 -0.070713 -0.125637 -0.054561 -0.032254 0.137265 -0.045889
canon-relation 0.125968 -0.020548 -0.027281 -0.029199 -0.062170 -0.027281 0.075670 -0.057052 -0.019098 1.000000 ... -0.035935 0.113671 -0.060160 0.105143 -0.050427 -0.089595 -0.038909 -0.023001 -0.031008 -0.032725
canon-resistant 0.080819 0.096788 -0.028063 -0.030036 0.019997 -0.028063 -0.060832 -0.058688 0.061972 -0.045606 ... 0.031777 0.073621 -0.018728 -0.036965 -0.051873 0.002761 -0.040024 -0.023661 -0.031897 0.041458
class -0.027991 -0.008238 -0.010937 -0.011706 -0.024925 -0.010937 0.084382 -0.022873 -0.024925 -0.017774 ... -0.014407 0.045572 0.082405 0.155271 -0.020217 0.042182 -0.015599 -0.009221 -0.012431 -0.013120
critical-uptake 0.006519 0.054793 -0.040652 0.259919 -0.062239 -0.040652 -0.088122 -0.085016 0.150578 -0.025112 ... -0.053548 0.130174 0.191677 -0.003758 -0.075143 -0.018918 -0.057979 -0.034275 -0.046206 -0.048764
cultural-difference 0.017549 -0.024428 -0.032431 -0.034712 -0.037041 0.046145 0.045084 0.011507 -0.037041 0.245257 ... -0.042719 -0.074082 -0.071518 0.078034 -0.059948 0.115818 -0.046255 -0.027344 -0.036862 -0.038903
disability -0.030661 0.221466 -0.027281 -0.029199 -0.062170 -0.027281 0.030734 -0.010711 0.023974 0.129721 ... -0.035935 0.113671 0.072693 -0.035935 -0.050427 0.105216 0.157487 0.085360 -0.031008 -0.032725
drafting -0.051534 -0.015167 -0.020136 -0.021553 -0.045889 0.101832 -0.043650 -0.042111 -0.045889 -0.032725 ... -0.026524 -0.063711 -0.044405 -0.026524 -0.037221 -0.066132 -0.028719 -0.016978 -0.022887 0.078261
fan-politics 0.085828 -0.058825 -0.036235 -0.005178 0.116645 -0.078098 -0.066838 -0.142195 -0.099412 -0.074006 ... -0.006373 -0.171101 0.029720 0.090128 -0.050157 -0.226877 -0.081533 -0.065847 -0.014751 -0.093683
fan-practices-uptake -0.046579 -0.030175 -0.040061 0.018542 0.124099 -0.040061 -0.054740 -0.050675 -0.091296 -0.023657 ... -0.002377 0.016118 -0.088344 -0.002377 0.036634 -0.131569 -0.010368 -0.033777 -0.045535 0.007014
feminism -0.041975 -0.022830 -0.030310 0.045713 -0.069073 -0.030310 -0.065703 0.315742 -0.029919 0.003483 ... -0.039925 0.055595 0.013672 -0.039925 -0.056026 0.018516 -0.043229 -0.025555 0.039322 -0.036358
fix-it -0.032359 -0.009524 -0.012644 -0.013533 -0.028815 -0.012644 -0.027409 -0.026443 0.061019 -0.020548 ... -0.016655 0.099029 -0.027883 -0.016655 0.084343 -0.041525 -0.018033 -0.010661 -0.014372 -0.015167
fluff 0.115488 -0.014372 0.237743 -0.020422 -0.043481 -0.019080 -0.041360 -0.039902 -0.043481 -0.031008 ... -0.025133 0.032878 -0.042075 -0.025133 -0.035269 -0.017246 -0.027213 -0.016087 -0.021687 -0.022887
genre-other -0.006380 -0.022830 0.053140 -0.032441 0.009234 -0.030310 0.015992 -0.063387 -0.069073 -0.049257 ... -0.039925 -0.065599 -0.026584 -0.039925 -0.009079 0.018516 -0.043229 -0.025555 -0.034450 -0.036358
heteronormativity -0.012892 -0.023905 -0.031736 -0.033968 -0.034744 -0.031736 -0.029589 -0.025938 0.002836 -0.051576 ... 0.019741 -0.071331 -0.069985 -0.041804 -0.058663 -0.047571 -0.045263 -0.026758 0.105545 -0.038069
homophobia-transphobia -0.004451 -0.015927 -0.021145 -0.022632 0.061071 0.095289 -0.045836 -0.044220 -0.048187 0.039223 ... -0.027852 -0.024627 -0.046629 -0.027852 -0.039085 0.012917 0.052873 -0.017828 -0.024034 -0.025364
identity-bending 0.260614 -0.015167 -0.020136 -0.021553 -0.045889 -0.020136 -0.043650 -0.042111 -0.045889 -0.032725 ... 0.160913 0.024858 -0.044405 0.067195 -0.037221 0.063283 -0.028719 -0.016978 -0.022887 -0.024155
implicit-explicit 0.017549 -0.024428 -0.032431 -0.034712 -0.073908 -0.032431 -0.031840 0.011507 -0.073908 -0.052706 ... 0.138411 0.040035 0.042196 -0.042719 -0.059948 -0.050928 -0.046255 -0.027344 0.032603 -0.038903
modern-setting 0.038625 -0.016655 -0.022112 -0.023667 -0.050391 -0.022112 0.006700 -0.046243 -0.050391 -0.035935 ... 1.000000 -0.069961 -0.048761 -0.029126 -0.040873 -0.033144 -0.031537 -0.018643 0.073536 -0.026524
motivation 0.089026 0.029512 -0.053112 0.041937 -0.096293 -0.000373 0.013942 -0.031206 0.126406 0.113671 ... -0.069961 1.000000 0.035521 -0.069961 -0.098176 -0.174430 -0.075750 -0.044781 -0.060368 -0.019426
other -0.094738 -0.027883 0.103122 -0.039621 -0.084360 -0.037018 -0.080245 -0.042045 -0.018609 -0.060160 ... -0.048761 0.035521 1.000000 0.005079 -0.068426 0.027120 0.147076 0.051498 0.081813 -0.044405
racism -0.056589 -0.016655 -0.022112 -0.023667 -0.050391 -0.022112 0.006700 -0.046243 0.106709 0.105143 ... -0.029126 -0.069961 0.005079 1.000000 0.021917 -0.033144 -0.031537 -0.018643 -0.025133 -0.026524
reception -0.079412 -0.023372 -0.031029 -0.033212 -0.032372 -0.031029 -0.027264 -0.064892 -0.070713 -0.050427 ... -0.040873 -0.098176 -0.068426 0.021917 1.000000 -0.101906 -0.044255 -0.026162 -0.035269 -0.037221
reflection 0.078040 0.026193 -0.055130 0.037220 -0.077429 0.047618 -0.069213 -0.115295 -0.125637 -0.089595 ... -0.033144 -0.174430 0.027120 -0.033144 -0.101906 1.000000 -0.078629 -0.046482 -0.017246 -0.022993
research -0.017090 0.118505 -0.023942 0.071385 -0.054561 -0.023942 -0.001196 -0.050069 -0.054561 -0.038909 ... -0.031537 -0.075750 0.147076 -0.031537 -0.044255 -0.078629 1.000000 0.102083 -0.027213 0.058259
revising -0.036222 -0.010661 -0.014153 -0.015149 -0.032254 -0.014153 -0.030681 -0.029599 -0.032254 -0.023001 ... -0.018643 -0.044781 0.051498 -0.018643 -0.026162 -0.046482 0.102083 1.000000 -0.016087 -0.016978
sexism -0.048830 -0.014372 -0.019080 -0.020422 -0.043481 -0.019080 -0.041360 0.089742 0.137265 -0.031008 ... 0.073536 -0.060368 0.081813 -0.025133 -0.035269 -0.017246 -0.027213 -0.016087 1.000000 -0.022887
vignette -0.051534 -0.015167 -0.020136 -0.021553 0.068563 -0.020136 0.016052 -0.042111 -0.045889 -0.032725 ... -0.026524 -0.019426 -0.044405 -0.026524 -0.037221 -0.022993 0.058259 -0.016978 -0.022887 1.000000

36 rows × 36 columns

In [16]:
strong_pairs = corr[abs(corr) > 0.10]
print(strong_pairs)
values                  LGBTQplus   abelism     angst  antiracism  audience  \
values
LGBTQplus                1.000000       NaN       NaN         NaN       NaN
abelism                       NaN  1.000000       NaN         NaN       NaN
angst                         NaN       NaN  1.000000         NaN       NaN
antiracism                    NaN       NaN       NaN    1.000000       NaN
audience                      NaN       NaN       NaN         NaN  1.000000
authors-note                  NaN       NaN       NaN         NaN       NaN
canon-compliant               NaN       NaN       NaN         NaN       NaN
canon-compliment              NaN       NaN       NaN         NaN       NaN
canon-critique                NaN  0.150853       NaN         NaN       NaN
canon-relation           0.125968       NaN       NaN         NaN       NaN
canon-resistant               NaN       NaN       NaN         NaN       NaN
class                         NaN       NaN       NaN         NaN       NaN
critical-uptake               NaN       NaN       NaN    0.259919       NaN
cultural-difference           NaN       NaN       NaN         NaN       NaN
disability                    NaN  0.221466       NaN         NaN       NaN
drafting                      NaN       NaN       NaN         NaN       NaN
fan-politics                  NaN       NaN       NaN         NaN  0.116645
fan-practices-uptake          NaN       NaN       NaN         NaN  0.124099
feminism                      NaN       NaN       NaN         NaN       NaN
fix-it                        NaN       NaN       NaN         NaN       NaN
fluff                    0.115488       NaN  0.237743         NaN       NaN
genre-other                   NaN       NaN       NaN         NaN       NaN
heteronormativity             NaN       NaN       NaN         NaN       NaN
homophobia-transphobia        NaN       NaN       NaN         NaN       NaN
identity-bending         0.260614       NaN       NaN         NaN       NaN
implicit-explicit             NaN       NaN       NaN         NaN       NaN
modern-setting                NaN       NaN       NaN         NaN       NaN
motivation                    NaN       NaN       NaN         NaN       NaN
other                         NaN       NaN  0.103122         NaN       NaN
racism                        NaN       NaN       NaN         NaN       NaN
reception                     NaN       NaN       NaN         NaN       NaN
reflection                    NaN       NaN       NaN         NaN       NaN
research                      NaN  0.118505       NaN         NaN       NaN
revising                      NaN       NaN       NaN         NaN       NaN
sexism                        NaN       NaN       NaN         NaN       NaN
vignette                      NaN       NaN       NaN         NaN       NaN

values                  authors-note  canon-compliant  canon-compliment  \
values
LGBTQplus                        NaN              NaN               NaN
abelism                          NaN              NaN               NaN
angst                            NaN              NaN               NaN
antiracism                       NaN              NaN               NaN
audience                         NaN              NaN               NaN
authors-note                1.000000              NaN               NaN
canon-compliant                  NaN              1.0               NaN
canon-compliment                 NaN              NaN          1.000000
canon-critique                   NaN              NaN               NaN
canon-relation                   NaN              NaN               NaN
canon-resistant                  NaN              NaN               NaN
class                            NaN              NaN               NaN
critical-uptake                  NaN              NaN               NaN
cultural-difference              NaN              NaN               NaN
disability                       NaN              NaN               NaN
drafting                    0.101832              NaN               NaN
fan-politics                     NaN              NaN         -0.142195
fan-practices-uptake             NaN              NaN               NaN
feminism                         NaN              NaN          0.315742
fix-it                           NaN              NaN               NaN
fluff                            NaN              NaN               NaN
genre-other                      NaN              NaN               NaN
heteronormativity                NaN              NaN               NaN
homophobia-transphobia           NaN              NaN               NaN
identity-bending                 NaN              NaN               NaN
implicit-explicit                NaN              NaN               NaN
modern-setting                   NaN              NaN               NaN
motivation                       NaN              NaN               NaN
other                            NaN              NaN               NaN
racism                           NaN              NaN               NaN
reception                        NaN              NaN               NaN
reflection                       NaN              NaN         -0.115295
research                         NaN              NaN               NaN
revising                         NaN              NaN               NaN
sexism                           NaN              NaN               NaN
vignette                         NaN              NaN               NaN

values                  canon-critique  canon-relation  ...  modern-setting  \
values                                                  ...
LGBTQplus                          NaN        0.125968  ...             NaN
abelism                       0.150853             NaN  ...             NaN
angst                              NaN             NaN  ...             NaN
antiracism                         NaN             NaN  ...             NaN
audience                           NaN             NaN  ...             NaN
authors-note                       NaN             NaN  ...             NaN
canon-compliant                    NaN             NaN  ...             NaN
canon-compliment                   NaN             NaN  ...             NaN
canon-critique                1.000000             NaN  ...             NaN
canon-relation                     NaN        1.000000  ...             NaN
canon-resistant                    NaN             NaN  ...             NaN
class                              NaN             NaN  ...             NaN
critical-uptake               0.150578             NaN  ...             NaN
cultural-difference                NaN        0.245257  ...             NaN
disability                         NaN        0.129721  ...             NaN
drafting                           NaN             NaN  ...             NaN
fan-politics                       NaN             NaN  ...             NaN
fan-practices-uptake               NaN             NaN  ...             NaN
feminism                           NaN             NaN  ...             NaN
fix-it                             NaN             NaN  ...             NaN
fluff                              NaN             NaN  ...             NaN
genre-other                        NaN             NaN  ...             NaN
heteronormativity                  NaN             NaN  ...             NaN
homophobia-transphobia             NaN             NaN  ...             NaN
identity-bending                   NaN             NaN  ...        0.160913
implicit-explicit                  NaN             NaN  ...        0.138411
modern-setting                     NaN             NaN  ...        1.000000
motivation                    0.126406        0.113671  ...             NaN
other                              NaN             NaN  ...             NaN
racism                        0.106709        0.105143  ...             NaN
reception                          NaN             NaN  ...             NaN
reflection                   -0.125637             NaN  ...             NaN
research                           NaN             NaN  ...             NaN
revising                           NaN             NaN  ...             NaN
sexism                        0.137265             NaN  ...             NaN
vignette                           NaN             NaN  ...             NaN

values                  motivation     other    racism  reception  reflection  \
values
LGBTQplus                      NaN       NaN       NaN        NaN         NaN
abelism                        NaN       NaN       NaN        NaN         NaN
angst                          NaN  0.103122       NaN        NaN         NaN
antiracism                     NaN       NaN       NaN        NaN         NaN
audience                       NaN       NaN       NaN        NaN         NaN
authors-note                   NaN       NaN       NaN        NaN         NaN
canon-compliant                NaN       NaN       NaN        NaN         NaN
canon-compliment               NaN       NaN       NaN        NaN   -0.115295
canon-critique            0.126406       NaN  0.106709        NaN   -0.125637
canon-relation            0.113671       NaN  0.105143        NaN         NaN
canon-resistant                NaN       NaN       NaN        NaN         NaN
class                          NaN       NaN  0.155271        NaN         NaN
critical-uptake           0.130174  0.191677       NaN        NaN         NaN
cultural-difference            NaN       NaN       NaN        NaN    0.115818
disability                0.113671       NaN       NaN        NaN    0.105216
drafting                       NaN       NaN       NaN        NaN         NaN
fan-politics             -0.171101       NaN       NaN        NaN   -0.226877
fan-practices-uptake           NaN       NaN       NaN        NaN   -0.131569
feminism                       NaN       NaN       NaN        NaN         NaN
fix-it                         NaN       NaN       NaN        NaN         NaN
fluff                          NaN       NaN       NaN        NaN         NaN
genre-other                    NaN       NaN       NaN        NaN         NaN
heteronormativity              NaN       NaN       NaN        NaN         NaN
homophobia-transphobia         NaN       NaN       NaN        NaN         NaN
identity-bending               NaN       NaN       NaN        NaN         NaN
implicit-explicit              NaN       NaN       NaN        NaN         NaN
modern-setting                 NaN       NaN       NaN        NaN         NaN
motivation                1.000000       NaN       NaN        NaN   -0.174430
other                          NaN  1.000000       NaN        NaN         NaN
racism                         NaN       NaN  1.000000        NaN         NaN
reception                      NaN       NaN       NaN   1.000000   -0.101906
reflection               -0.174430       NaN       NaN  -0.101906    1.000000
research                       NaN  0.147076       NaN        NaN         NaN
revising                       NaN       NaN       NaN        NaN         NaN
sexism                         NaN       NaN       NaN        NaN         NaN
vignette                       NaN       NaN       NaN        NaN         NaN

values                  research  revising    sexism  vignette
values
LGBTQplus                    NaN       NaN       NaN       NaN
abelism                 0.118505       NaN       NaN       NaN
angst                        NaN       NaN       NaN       NaN
antiracism                   NaN       NaN       NaN       NaN
audience                     NaN       NaN       NaN       NaN
authors-note                 NaN       NaN       NaN       NaN
canon-compliant              NaN       NaN       NaN       NaN
canon-compliment             NaN       NaN       NaN       NaN
canon-critique               NaN       NaN  0.137265       NaN
canon-relation               NaN       NaN       NaN       NaN
canon-resistant              NaN       NaN       NaN       NaN
class                        NaN       NaN       NaN       NaN
critical-uptake              NaN       NaN       NaN       NaN
cultural-difference          NaN       NaN       NaN       NaN
disability              0.157487       NaN       NaN       NaN
drafting                     NaN       NaN       NaN       NaN
fan-politics                 NaN       NaN       NaN       NaN
fan-practices-uptake         NaN       NaN       NaN       NaN
feminism                     NaN       NaN       NaN       NaN
fix-it                       NaN       NaN       NaN       NaN
fluff                        NaN       NaN       NaN       NaN
genre-other                  NaN       NaN       NaN       NaN
heteronormativity            NaN       NaN  0.105545       NaN
homophobia-transphobia       NaN       NaN       NaN       NaN
identity-bending             NaN       NaN       NaN       NaN
implicit-explicit            NaN       NaN       NaN       NaN
modern-setting               NaN       NaN       NaN       NaN
motivation                   NaN       NaN       NaN       NaN
other                   0.147076       NaN       NaN       NaN
racism                       NaN       NaN       NaN       NaN
reception                    NaN       NaN       NaN       NaN
reflection                   NaN       NaN       NaN       NaN
research                1.000000  0.102083       NaN       NaN
revising                0.102083  1.000000       NaN       NaN
sexism                       NaN       NaN  1.000000       NaN
vignette                     NaN       NaN       NaN       1.0

[36 rows x 36 columns]
In [17]:
#Adjacency/co-occurance matrix!!!

coocc = df.T.dot(df)
coocc
Out[17]:
values LGBTQplus abelism angst antiracism audience authors-note canon-compliant canon-compliment canon-critique canon-relation ... modern-setting motivation other racism reception reflection research revising sexism vignette
values
LGBTQplus 42.0 0.0 0.0 0.0 1.0 0.0 4.0 4.0 2.0 5.0 ... 2.0 10.0 0.0 0.0 0.0 10.0 1.0 0.0 0.0 0.0
abelism 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
angst 0.0 0.0 7.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
antiracism 0.0 0.0 0.0 8.0 1.0 0.0 0.0 1.0 1.0 0.0 ... 0.0 2.0 0.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0
audience 1.0 0.0 0.0 1.0 34.0 1.0 1.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 1.0 2.0 0.0 0.0 0.0 2.0
authors-note 0.0 0.0 0.0 0.0 1.0 7.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0
canon-compliant 4.0 0.0 0.0 0.0 1.0 0.0 31.0 3.0 0.0 3.0 ... 1.0 5.0 0.0 1.0 1.0 2.0 1.0 0.0 0.0 1.0
canon-compliment 4.0 0.0 0.0 1.0 0.0 0.0 3.0 29.0 0.0 0.0 ... 0.0 3.0 1.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0
canon-critique 2.0 2.0 0.0 1.0 0.0 0.0 0.0 0.0 34.0 1.0 ... 0.0 10.0 2.0 3.0 0.0 0.0 0.0 0.0 3.0 0.0
canon-relation 5.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 1.0 18.0 ... 0.0 6.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0
canon-resistant 4.0 1.0 0.0 0.0 2.0 0.0 0.0 0.0 3.0 0.0 ... 1.0 5.0 1.0 0.0 0.0 3.0 0.0 0.0 0.0 1.0
class 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0
critical-uptake 4.0 1.0 0.0 5.0 1.0 0.0 0.0 0.0 8.0 1.0 ... 0.0 11.0 9.0 1.0 0.0 5.0 0.0 0.0 0.0 0.0
cultural-difference 3.0 0.0 0.0 0.0 1.0 1.0 3.0 2.0 1.0 6.0 ... 0.0 1.0 0.0 2.0 0.0 8.0 0.0 0.0 0.0 0.0
disability 1.0 2.0 0.0 0.0 0.0 0.0 2.0 1.0 2.0 3.0 ... 0.0 6.0 3.0 0.0 0.0 6.0 3.0 1.0 0.0 0.0
drafting 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
fan-politics 16.0 0.0 1.0 2.0 15.0 0.0 5.0 1.0 4.0 2.0 ... 3.0 5.0 10.0 6.0 4.0 2.0 1.0 0.0 2.0 0.0
fan-practices-uptake 2.0 0.0 0.0 1.0 7.0 0.0 1.0 1.0 0.0 1.0 ... 1.0 6.0 0.0 1.0 3.0 0.0 1.0 0.0 0.0 1.0
feminism 1.0 0.0 0.0 1.0 0.0 0.0 0.0 9.0 1.0 1.0 ... 0.0 5.0 2.0 0.0 0.0 4.0 0.0 0.0 1.0 0.0
fix-it 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 ... 0.0 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
fluff 3.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 2.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
genre-other 2.0 0.0 1.0 0.0 2.0 0.0 2.0 0.0 0.0 0.0 ... 0.0 1.0 1.0 0.0 1.0 4.0 0.0 0.0 0.0 0.0
heteronormativity 2.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 2.0 0.0 ... 1.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 2.0 0.0
homophobia-transphobia 1.0 0.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 1.0 0.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0
identity-bending 6.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.0 2.0 0.0 1.0 0.0 3.0 0.0 0.0 0.0 0.0
implicit-explicit 3.0 0.0 0.0 0.0 0.0 0.0 1.0 2.0 0.0 0.0 ... 3.0 5.0 3.0 0.0 0.0 2.0 0.0 0.0 1.0 0.0
modern-setting 2.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 12.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
motivation 10.0 1.0 0.0 2.0 1.0 1.0 5.0 3.0 10.0 6.0 ... 0.0 61.0 6.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
other 0.0 0.0 2.0 0.0 0.0 0.0 0.0 1.0 2.0 0.0 ... 0.0 6.0 32.0 1.0 0.0 6.0 4.0 1.0 2.0 0.0
racism 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 3.0 2.0 ... 0.0 0.0 1.0 12.0 1.0 1.0 0.0 0.0 0.0 0.0
reception 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 23.0 0.0 0.0 0.0 0.0 0.0
reflection 10.0 1.0 0.0 2.0 2.0 2.0 2.0 0.0 0.0 0.0 ... 1.0 0.0 6.0 1.0 0.0 65.0 0.0 0.0 1.0 1.0
research 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 4.0 0.0 0.0 0.0 14.0 1.0 0.0 1.0
revising 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 1.0 5.0 0.0 0.0
sexism 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 3.0 0.0 ... 1.0 0.0 2.0 0.0 0.0 1.0 0.0 0.0 9.0 0.0
vignette 0.0 0.0 0.0 0.0 2.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 10.0

36 rows × 36 columns

In [18]:
#saving the co-occurrance
coocc.to_csv('./data/cooccurance_matrix_interviews.csv')

The Fun Part -- Vizualizations!

This next section is just playing with some visualizations. The last cell is the interactive chart, which can be found in the "Vizualization" section of the Critical Fan Toolkit. I mainly relied on plotly and plotly express documentation to create these visualizations.

In [19]:
import matplotlib.pyplot as plt

plt.imshow(coocc, cmap='hot', interpolation='nearest')
plt.show()
In [20]:
fig = px.imshow(coocc, color_continuous_scale='purd')

fig.show()
In [21]:
fig = px.imshow(corr, color_continuous_scale='purd')

fig.show()

plotly.offline.plot(fig, filename='./images/correlation.html')
Out[21]:
'./images/correlation.html'
In [22]:
fig = go.Figure()

#adding correlation map
correlation = fig.add_trace(go.Heatmap(z=corr, x=corr.index, y=corr.index, colorscale="purd"))

#adding Co-concordance map
cooccurrence = fig.add_trace(go.Heatmap(z=coocc, x=coocc.index, y=coocc.index, colorscale="purd"))

#making it a square
fig.update_layout(
    width=800,
    height=800,
     template="plotly_white",)

fig.update_scenes(
    aspectratio=dict(x=1, y=1, z=0.7),
    aspectmode="manual"
)

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(
                    args = [{"visible": [False, True]}],
                    label="Adjacency Matrix",
                    method="restyle"
                ),
                dict(
                    args = [{"visible": [True, False]}],
                    label="Correlation Matrix",
                    method="restyle"
                )
            ]),
            direction="down",
            pad={"r":10, "t": 5},
            showactive=True,
            x=0.1,
            xanchor="right",
            y=1.1,
            yanchor="top"
        ),
    ]
)


plotly.offline.plot(fig, filename='./images/correlation_and_cooccurrence.html')
Out[22]:
'./images/correlation_and_cooccurrence.html'