Qualitative Coding: Description and Process

Table of Contents

Qualitative coding is a method used in the humanities and social science to analyze textual data, such as interviews. Researchers create a set of codes, which are themes and patterns that appear across their textual data; these codes usually reflect the particular heuristics or theories with which coders are working. Using these codes — or schema — researchers mark up the textual data and count how often particular themes appear across, make comparisons between, and analyze the textual data. Each researcher who chooses to incorporate qualitative coding may design their schema creation, coding process, and validation methods differently depending on their discipline, the theoretical framework they use, and their overall goals. Because I aim to be transparent with my research methods as well as provide a guide for any researchers hoping to develop their own research toolboxes, I will describe my qualitative coding process and why I chose to use XML.

What is Qualitative Coding?

In the above paragraph, I describe the overall process of qualitative coding, from data collection to analyzing the researchers’ codes. However, envisioning the qualitative coding process through just text alone can be difficult, so I will provide an example using a portion of the interviews from one of the fanfiction writers, specifically Kittya Cullen’s interview.

In order to qualitatively code, textual data is needed to actually build and apply your schema. Kittya Cullen’s original interview transcription is the textual data that I will use for this example. To provide some context, during the interview, I asked her to describe a choice she made in her fanfic in which she linked Asami’s trauma to her relationship with Korra. Kittya says:

I think I was both going for an understanding of how Asami herself is effected by this really terrible thing we see happen to Korra. Because at that point, everyone has seen someone who they think is all-powerful and infallible, invulnerable to an extent, be ... I don't want to say broken, but be injured in a really drastic way and them not being able to do anything much to help with her recovery in concrete ways.

In just a few short sentences, Kittya Cullen unravels a complex choice, and I wanted to capture the importance of this moment. However, this short paragraph is just a small percentage of the entire interview, and merely underlining or highlighting the quote would erase the complexities in Kittya’s words.

In its most basic sense, qualitative coding is a form or highlighting or underlining. However, instead of merely underlining text, I “annotated” text that captures a particular pattern or theme. The schema I created, which I describe below, incorporates language and theories form critical fan practices and rhetorical genre studies; visit the framework section of the CFT to read more. Using this schema, I began to mark up specific moments in Kittya Cullen’s interview that captured particular critical fan practices or RGS theories. If you are familiar with HTML, you may recognize the pointy-bracket structure. My codes are the bolded words between the pointy brackets, and the text being encoded are between the beginning <code> and end tags </code>.

<code writing-agency="reflection">I think I was both going for an understanding of how Asami herself is effected by this really terrible thing we see happen to Korra.</code> Because at that point, <code canon="canon-relation"> everyone has seen someone who they think is all-powerful and infallible, invulnerable to an extent, be ... I don't want to say broken, but be injured in a really drastic way and them not being able to do anything much to help with her recovery in concrete ways.</code>

With this document encoded, now the sentence “I think I was both going for an understanding of how Asami herself is effected by this really terrible thing we see happen to Korra” is labeled as “writing agency: reflection,” which is one of the codes used to mark up these interviews. I use the “writing agency: reflection” code to indicate when writers reflect about a specific choice they made while writing. I then use “canon: relation” to encode the last section of this excerpt. “Canon: relation” highlights when interviewees discuss how they identify with or relate to the canonical text. I continue this process throughout all the interviews based on the schema I created.

I conducted further analysis across the different interview, comparing and contrasting different interviewees’ perspectives. For example, I analyze all the times interviewees reflected on specific writing choices. What were the choices they made? Why might they have made these choices? What do these choices suggest about writing fanfiction and critical fan genre practices? I frame my arguments and answers to these questions by pulling information from the structured, qualitatively-coded interview documents. These results can be found in the “Interviews” portion of the CFT.

Why Qualitative Coding?

Qualitative coding, according to Johnny Saldaña (2015), is a heuristic, which basically means how knowledge is created, modeled, shaped, and circulated. Understanding qualitative coding as a heuristic emphasizes that the creation of schemas, the choices made when marking up the textual data, and the methods for analyzing the marked-up texts are all framed through the particular theoretical lens and discipline in which the researcher works. As I molded and created my schema to mark up the interviews with, I made deliberate decisions about thematic patterns I wanted to highlight in these interviews based on both the critical fans and rhetoric genre studies fields, which I will describe later in the “Schema Creation” section. The process of qualitative coding — from data collection, to schema creation, to coding the document, to analyzing the codes — models and makes transparent researchers’ understandings of data, and therefore unravels and reveals their knowledge.

Qualitative coding models both researchers’ knowledge and models the modeling process. In her description of “humanities computing,” Julia Flanders (2009) argues, “it is rather about modeling that knowledge and even in some cases about modeling the modeling process. It is an inquiry into how we know things and how we present them to ourselves for study, realized through a variety of tools which make the consequences of that inquiry palpable.” Flanders is focusing more on data modeling as found in the digital humanities, not qualitative coding. Data modeling in the digital humanities may include marking up archival documents to make them digitally readable — not just readable on computer screens, but readable by the computer — by using forms of XML.

Abbie Levesque DeCamp (2020) explicitly connects the process Flanders describes to qualitative coding, arguing it mirrors the types of data modeling described in the digital humanities: “The process of encoding forces an incredibly close reading - one must read and process all parts of a document, thinking deeply about each portion, sometimes down to the word, to accurately tag a document. That is, building in itself is a knowledge-making process.” Building schemas, using and creating qualitative coding tools, and coding textual documents are knowledge-making processes, processes that simultaneously construct and expound knowledge. As Flanders argues, this process “require[s] hat one distance oneself from one’s own representational strategies and turn them about in one’s hands like a complex and alien bauble.”

The process of qualitative coding, then, makes transparent “representational strategies,” forcing researchers to explicitly define their theoretical frameworks and how these frameworks are reflected within and by the data. In order to build my schema, I both had to explicitly choose the frameworks and disciplinary discourse I worked within as well as choose how I would qualitatively code the interviews.

XML: From Marking-Up to Publishing

One of the most popular software for qualitative coding is NVivo. NVivo allows researchers to create codes, mark-up multimodal files with these codes, take notes, and visualize results. However, I wanted to take a step further with my interviews — I wanted to publish the transcripts along with the qualitative codes. NVivo, unfortunately, does not have an easy output to translate encoded documents. XML, however, can both help researchers analyze and publish their data.

XML is a grammar for publishing large-scale digital data to then be transformed and/or analyzed. XML is not a software, unlike NVivo. XML is a set of grammatical rules that allow users to create their own language and vocabulary within these rules. Levesque DeCamp (2020) has an excellent explanation of XML and the steps of XML, which I will paraphrase here. XML is a tree-structured data format, unlike a comma separate variable data, which is read into spreadsheets. Above, I showed an excerpt from my qualitative coding and mentioned the “pointy-bracket structure” being similar to HTML. In case you are not as familiar with HTML, each HTML document has a beginning and end tag, and a bunch of layered tags, or “elements,” in between. Here is a very simplified sample of HTML:

            <p>Text here</p>

HTML and XML have similar overall formats, except HTML already has an established language and can transform data. If I use the <p> element, HTML knows that everything between the first and end tag is a paragraph. When I open an HTML file on my browser, the “Text goes here” is transformed into a paragraph. XML, however, does not transform documents and merely marks them up. XML has the same structure with elements and textual data in between tags. It also has attribute and attribute values, which help to provide further information and differentiate data. In the example in my introduction paragraph, I use this tag: <code writing-agency="reflection">. The element is the code, while “writing-agency” is the attribute and “reflection” is the attribute value. The element is the code, while “writing-agency” is the attribute and “reflection” is the attribute value. Because XML is just a structure without a set vocabulary, researchers have to create their own XML schemas, choose their elements and attributes/attribute values, in order to markup documents and then transform them.

I used RelaxNG to create my own XML schema and then validate the transcribed interviews that I marked up. If I either did not properly follow XML’s grammar or used an incorrect word as an element, Oxygen (an XML editor) let me know immediately. Validation is XML’s version of spell check. After I finished transcribing, though, I needed to actually be able to transform and do something with the XML document, as HTML does. In fact, I wanted to transform my XML document into HTML so that I could publish my transcriptions and my codes. I used XSL Transformations (XSLT), which can transform an XML document into another XML document, an HTML document, or another output. With XSLT, I could both analyze the qualitatively coded interviews and publish the interviews.

While NVivo is a useful qualitative coding and analyzing software, especially counting how many codes appeared and with what other codes, XML provides a sustainable aspect that NVivo does not. XML files can be stored and opened on pretty much every computer. Plus, because the XML documents can be transformed into HTML documents, I can actually publish the transcriptions along with the codes, although the codes are now changed into HTML classes.

XML will not work for every researcher interested in qualitatively coding; there are also a ton of XML uses that lie outside of qualitative research. For example, the Text Encoding Initiative (TEI) is an XML schema specifically for publishing digital editions of texts. However, because of both my need for a tool that analyzes and publishes, XML was an easy choice.

Schema Creation and Process

Before I began with the qualitative coding process, I interviewed the fanfiction writers. I went into the interview with a set of questions and general expectations for how interviewees may answer these questions. Before I began the qualitative coding process, I was already thinking through the theoretical framework in which I was working. I wanted interviewees to share their writing practices and genre choices; the interview questions also encouraged them to think through the complicated relationship among their identities, their positionalities, the cultural texts they loved, and the fandom communities in which they participate. I also chose the interview subjects based on the fanfics they wrote and if their fanfics and tagging choices reflected critical fan practices. Because I use rhetorical genre studies and critical fan studies as theoretical frameworks, I was interested in themes around genre, uptake, ideology, power, and identity representation. Someone from a literacy perspective or a media studies scholar may use a different approach. My theoretical frameworks determine how I structure my interview questions, coding schema, and my analysis.

Rhetorical genre studies and fan studies shape the vocabulary of my XML schema so I could mark up patterns that tie back to my overall research. I listened to the interviews multiple times, created a codebook, and reassessed the schema until I settled with a final version. A codebook is the documentation for a schema which describes when particular codes will be used and why; I provide the codebook for my interviews below, as well as in the actual RelaxNG schema to remind myself as I code. Even as I continued qualitative coding, I revised the schema when it did not fit. For example, I originally had a “canon analysis” attribute value, which I wound up using for almost every interview participants’ answers; the overwhelming amount led to an unuseful output for my analysis and did not align with my research goals. Another example is when I finished coding and realized I needed to give each “code” element an individual identification number to make the HTML transformation easier.

Many of the attributes I chose reflect specific vocabulary terms found in RGS or fan studies, as well as terms that resonate with some of my findings. For example, one attribute is “uptakes” while the values are “critical uptakes,” “implicit-explicit-uptakes,” and more. Uptake is an RGS term, while the values are based on terms I create to define different fan uptakes. When I was coding, I realized the “uptake” attribute missed a value that captured other types of fan uptakes, which I named the “fan-practices-uptake.” The creation and process of the schema demonstrates how schemas are not stable and, as the eXtensible in XML suggests, are easy to revise. All qualitative coding research requires researchers to continue reimagining their structure as they become more intimate with their data and discover new patterns that emerge.

Besides the meta-elements marking up information about the interview, like who is speaking, the elements to mark up the interviewers’ words are only “code” or “power-identity.” I kept the element code general and created multiple attributes to allow for a specific line to have many attributes. Each attribute, too, can have multiple values. Here is an example from Aria’s transcribed interview:

<code fan-community="fan-politics" rgs-genre="identity-bending">I don't want to be like, "Ah, but I'm writing the version of this character that's a woman." I hate that. I know that that's permissible with the meta text but I don't want to be part of that.</code>

Within the “code” element are two attributes: fan-community and rgs-genre, and each of the attributes have values.

The “power-identity” element is nested within the “code” element, which means it can only be used within the “code” element. I wanted to emphasize in my schema how systems of power and positionality are entangled in writing practices. Here is another example from Aria in which the “power-identity” element is used multiple times within the parent “code” element. I underlined the “code” element, which has multiple attributes and attribute values, and bolded the “power-identity” elements, which are used three times to capture themes around cultural difference, class, and LGBTQ+ identities.

<code important="important-quote" writing-agency="reflection" rgs-uptake="critical-uptake"> <power-identity describe="cultural-difference"> Whereas Korra grows up in this institution, Asami grows up with a relatively politically progressive father.</power-identity> He's an abuser, but he's also a very, very politically progressive man. <power-identity describe="class">He's a radical, he tries to overthrow his government, make the world a more equitable and fair place. It's self-interested, he's a bourgeois radical, but I wouldn't have said that at the time.</power-identity></code>

When I use XSLT to transform this into an HTML document for purposes of both publishing and preserving Aria’s interview and the codes I marked up, the transformation looks like this:

Whereas Korra grows up in this institution, Asami grows up with a relatively politically progressive father.Systems of Power: cultural-difference He's an abuser, but he's also a very, very politically progressive man. He's a radical, he tries to overthrow his government, make the world a more equitable and fair place. It's self-interested, he's a bourgeois radical, but I wouldn't have said that at the time.Systems of Power: classCode: important-quote, critical-uptake, reflection

The entire bolded section is what appears between the two </code> tags, while the underlined portions are the lines that appear between the </power-identity> tags. These are now all clickable, making all the interviews actively engageable. Readers can determine if they want to simply read the transcriptions or see how I qualitatively coded the transcriptions.


The final section of this is the final qualitative codebook that I use to markup and analyze the interviews. The RelaxNG schema and the XSLT transformations are available on my GitHub page. Each code is an attribute, while the children codes are attribute values. I have also provided a definition for each code and how it is meant to be used. I use this codebook directly in the “Explore the Interviews” definitions, so the codebook can either be accessed here or there.