Discussion Guide and Transcript
Episode Five
Research Ethics Reimagined Episode 5 “Research Ethics Across Domains With Nicholas Proferes, PhD, Sarah Gilbert, PhD, and Kyle Pittman, MPA”
September 18, 2024
- In this episode of PRIM&R's podcast, "Research Ethics Reimagined," we explore the intersection of research ethics, online communities, and emerging technologies, with a focus on Reddit. Our guests are Nicholas Proferes, Phd, who is an associate professor at Arizona State University; Sarah Gilbert, Phd, who is a research associate at Cornell University; and Kyle Pittman, MPA, who is moderator of the subreddit "Indian Country" and faculty member at Evergreen State College. Listen on Spotify | Listen on Apple| Listen on Amazon Discussion Questions
- 1.) Community-Led Ethics Frameworks
- Pittman describes the research approval process implemented in the "Indian Country" subreddit. How does this grassroots approach to research ethics compare to traditional IRB processes? What are its strengths and potential limitations?
- Gilbert mentions some Reddit communities have implemented similar approval processes, particularly those representing marginalized identities. How might these community-led approaches to research ethics influence the broader landscape of online research?
2.) Balancing Research Needs and Community Protection
- The guests discuss the challenges of conducting research on sensitive topics or in communities that may be vulnerable to exploitation. How can researchers balance the need for important scientific knowledge with the protection of community members' privacy and well-being?
- Proferes mentions that only 25% of the Reddit studies they reviewed discussed ethics in any capacity. What steps can be taken to encourage more widespread consideration and discussion of ethics in online research?
3.) The Future of Online Research Ethics
- Reddit is developing a platform called "Reddit for Researchers." How might this type of structured research access impact the ethical considerations of studying online communities?
- The guests express hope for more holistic ethics education across disciplines and increased transparency with research subjects. What specific changes in research practices or education would you like to see implemented to address these goals?
Key Terms and Acronyms
Subreddit: A specific community or forum within the Reddit website, usually dedicated to a particular topic or interest.
API (Application Programming Interface): A set of protocols and tools for building software applications, often used by researchers to access and collect data from online platforms.
IRB (Institutional Review Board): A committee that reviews and monitors research involving human participants to ensure ethical conduct.
Belmont Report: A foundational document in research ethics that outlines three core principles: respect for persons, beneficence, and justice.
Additional Resources
- Studying Reddit: A Systematic Overview of Disciplines, Approaches, Methods, and Ethics - This article offers a systematic analysis of 727 manuscripts that used Reddit as a data source, published between 2010 and 2020.
- Reddit for Researchers - Information about Reddit's developing platform for academic researchers.
- Association of Internet Researchers Ethics Guidelines - Guidelines for ethical decision-making in internet research.
- PRIM&R's Research Ethics Timeline - A resource for exploring the milestones of research ethics, including developments in digital and online research.
- Indigenous Data Sovereignty Network - Information on principles for ethical use of indigenous peoples' data.
Transcript
Transcript, Ep.5, "Research Ethics Across Domains"
Host: Ivy R. Tillman, EdD, CCRC, CIP, Executive Director of PRIM&R
Guests: Nicholas Proferes, PhD, Sarah Gilbert, PhD, and Kyle Pittman, MPA
A transcript generator was used to help create written show transcript. Written transcript of podcast is approximate and not meant for attribution.
Tillman: Today, I'm very pleased to have three guests with us to explore the intersection of research ethics oversight and emerging technologies, paying particular attention to Reddit. Our guests today are Dr. Nicholas Proferes, who is an associate professor at Arizona State University School of Social and Behavioral Sciences, where he is the interim director of the Social Data Science Program.
His research interests include users understandings of socio technical systems, such as social media, societal discourse about technology, and issues of power and ethics in the digital space. We also have Dr. Sarah Gilbert with us. Sarah is a research associate at Cornell University and a research director of the Citizens and Technology Lab, where her work focuses on supporting healthy online communities.
She explores how volunteer moderators' labor impacts community governance. She explores factors and interventions that encourage participation and reduce harmful behavior. And she studies how online data can be reused ethically. We are also pleased to have with us Kyle Pittman, who is the moderator of the sub Reddit “Indian Country,” which is one of the largest and most active communities for indigenous peoples on Reddit.
Kyle, who is a faculty member at Evergreen State College in Washington, has given thought to research ethics concerning indigenous peoples and communities both in Reddit and offline. Thank you all again for being with us here today. I wanted to give a primer to our listeners about Reddit. Reddit, which has billed itself as “the front page of the internet,” hosts thousands of online communities, which serve as an online discussion board on a wide range of topics. As of March, Reddit reported that they had 306 million weekly active users, 82 million daily active users, and more than 100, 000 communities.
Before we jump in to learn more about research on Reddit, Nick, can you share a little bit of an explainer of how Reddit works and then touch upon how researchers have used Reddit over the years?
Proferes: Sure. So Reddit's really a social media platform that's made up of a lot of different communities called “subreddits.”
Subreddits can be created actually by any Reddit user, and typically they're organized around a particular topic, location, identity, or concept. So, for example, there's subreddits dedicated to everything from cute photos of cats, to video games, to politics, to specific diseases or ailments that people want to find support for, to local spaces. So, for example, I’m a member of the state of Arizona subreddit as well as the city of Phoenix, as well as the Arizona State University, as well as the Arizona Diamondbacks, subreddit. So, the list really kind of goes on and on. If you're interested in it, you can probably find it on Reddit.
Subreddits are typically open to all Reddit users and users can be anonymous. And, some subreddits actually have like millions of people in them at this point. So these spaces can be really, really small or they can be really, really big. And, in these spaces, people make posts, sharing, for example, cute pictures of cats, and people can leave comments on those posts.
What's actually kind of unique about Reddit is that there's also a voting system. So people vote up or vote down content, which can increase or decrease the visibility of that content. And, you know, like you said, because there's over 100,000 active subreddits, people can actually have very, very different experiences of the platforms depending on what communities that they're part of.
Now, Reddit historically has had actually a very open application programming interface, which allowed researchers, to collect data, oftentimes in bulk in a very straightforward way. And researchers really, really benefited from this. And there's a pretty large, swath of papers that have been published using Reddit data.
So along with Sarah, and also our colleagues, Casey Feisler, UC Boulder, Michael Zimmer at Marquette, and Nate Jones at the UK Office of Statistics, we did a meta study where we were really trying to understand how scientists are making use of data from Reddit. We collected every peer reviewed Reddit study that we could find from 2010 to 2020.
We found a total of 727 papers and we read them and it took about about a year and a half and we classified them to understand, okay, who's you know, studying Reddit, what are they studying? What are the methods they're using as part of their process? And, you know, essentially what are the sort of ethical issues that they're, they're running into in this space?
And there's a lot that's going on in terms of how scientists are making use of Reddit. This has been historically very important space and it's and one that's actually been increasing dramatically over time. We've seen just an absolute massive growth and our, our study stopped at 2020.
So we can only, you know, imagine how much it's continued to grow since then.
Tillman: Wow. Thank you. And so I have a lot of questions just about, the study itself that we can hopefully get into later in our conversation. But speaking of subreddits, and I mentioned earlier, Kyle is the moderator of the subreddit, Indian Country, which is one of the largest and most active communities for indigenous peoples on Reddit.
Nearly 70, 000 people are on the Indian Country subreddit, which is considered among the top 2 percent by Reddit. The description of Indian Country describes the page as Native American and indigenous news, happenings, culture, politics, arts, community, and thought. This Reddit community has a list of rules, which states that hosts which acquire participants conduct formal research are among those that must gain permission from the moderators.
In fact, the fourth rule of the moderator policy states no unauthorized research requests. It says, In the past, many attempts by researchers and experts to study indigenous peoples have proved to be harmful. Unethical research practices that do not account for indigenous ways of understanding have resulted in intellectual abuse, cultural appropriation, and human rights disenfranchisement.
While much progress has been made, not all organizations and scholars have reconciled the damaged relationship between their fields of study and indigenous peoples. The rule for the Reddit sub threat goes on to state that research requests or attempts to study our community must have that explicit permission.
These rules are in effect kind of like a grassroots form of an IRB. And so I come from the IRB world and to hear that, the community itself began to put rules around the use of the data is both fascinating to me and aligns with the work that we do here at PRIM&R. So, Nick, is that kind of a fair description of this subreddit.
Proferes: I think that it certainly is, but, I'm sure that Kyle can sort of fill in more about the actual process there.
Tillman: Absolutely. So, Kyle, we'll start with kind of you sharing a little bit about your role on the subreddit. Share how you got involved and some of the rules around particularly use of data for research purposes.
Pittman: Yeah, sure thing. So the Indian country subreddit was created actually nearly 10 years ago this year, and it was made as an alternative to other indigenous based subreddit communities attempting to represent a pan Indian or a pan indigenous, perspective on, on the content we wanted to see.
I was initially brought on by the founder a few days after the creation of the sub due to my involvement on the other indigenous based subreddits and my calls for reform for these communities. So currently we have four moderators. Others have come and gone over the years, but we have four really consistent people.
And for myself, I'm the most active moderator on our team, performing approximately 70, 75 percent of all moderator actions. Regarding our rules, these have certainly evolved over time. In the beginning we had four. Fewer and more ambiguous guidelines due to our inexperience as moderators and the small size of our community.
We knew what we wanted because our genesis sprang from the desire to see more active and firmer moderation in this kind of space, compared to what we were experiencing from other communities. So at that time, nearly all of our rules primarily dealt with various forms of bigotry, nothing really related to research requests, or really like the unique rules that we have nowadays.
But after this period of time where we started gaining traction and being poised to be the most prominent indigenous subreddit, we realized that we were attracting much more than just indigenous users to our community. Yeah. Our notoriety meant that we really needed to expand our rules and draft some policies to handle matters that we really didn't encounter in the earlier years.
So among these things, we drafted an FAQ in 2017, created new auto moderator functions in 2019 to kind of automate some of our processes. We, implemented our aforementioned research request process in 2020. And then we even completely revised our rules and policies in 2021 to what the basis is now for our current iteration.
Thus, these rules now prohibit various things, numerous forms of bigotry that indigenous peoples experience. They require posts to be relevant and legitimate. It bars posts asking for like creative writing advice or people who want to ask about spiritual or taboo subjects. I just made a post the other day.
Reminding people to not post about owls because owls have a very negative connotation for many tribes. Very unique thing, right? To have a rule about that, but we do. And, also other things that require things like research requests to be vetted by the moderators.
Tillman: So, I have a question. Prior to this work, had you ever heard of the IRB system before or any type of oversight rules associated with research?
Pittman: So, by the time that we implemented this process, I had. I was in the beginning of my academic career. I had finished, I was in the middle of my graduate program. graduate studies, I was working as an adjunct faculty, and through my work with Ask Historians, and in my particular area of study, research methods and ethics had already been kind of, in my wheelhouse of things that I was studying, and preparing to teach others about, and in particular, a lot of my studies focused around tribal sovereignty and a key aspect of that is self determination and agency for indigenous peoples to make that suit their particular needs and their cultural context.
And so as the community started to grow, I realized, you know, even though this is the subreddit is not a tribal nation, it is a form that is meant To, be a safe harbor for indigenous knowledge, indigenous ways of knowing and indigenous peoples to share their experience. And so in a similar way, then we should have that agency, to determine who is going to interact with our community and in what ways, and so even though, at first I wouldn't say it started as a IRB like process, it has certainly gotten to that point now, especially as we started to attract more professional forms of research or academic, institutional related forms of research and researchers coming by our community.
And so, yeah, in a, in a similar way, how tribal nations can implement their own IRB process. And of course, academic institutions have that process. Now we as a community are kind of acting as another layer to reinforce, these kinds of ethical guidelines.
Tillman: So can you walk us through the approval process and, particularly who makes the decision around whether research is allowed?
Pittman: Yeah, great questions. I'll preface this by saying, you know, the process began because of that aforementioned notoriety that we were attracting the attention of not just native users or even like amateur or hobby interest researchers, but researchers from formal institutions or even businesses who see Reddit as a prime place to recruit people.
Participants and solicit feedback and so as was noted by you and our policy, indigenous peoples have a very fraught relationship with Western norms of conducting research. And many of our users feel exploited by unsolicited requests, being posted to our space. And this was kind of the impetus for this policy.
So overall, the process is really straightforward. Researchers of any type, whether they're professionals, graduate students, independent persons, they should read our rules first and foremost, as everybody should. And they'll see in those rules that there are instructions to complete a form that I had created.
The form asks researchers for basic information about who they are, their institutions, if they have any funding sources, who their advisors or supervisors are, and what their overall research tools are. And we also ask for any IRB or HSR related documentation they may have, and these are reviewed thoroughly and contact is made with the approving authority to confirm the legitimacy.
And then all. That data from the form is posted to a running log that we maintain on the subreddits wiki pages for public transparency. Essentially, we're ensuring that we're following principles involved in free, prior and informed consent. But arguably the more important aspects of this process are not just these technical sides to the research project, there is a cultural piece to this, as well. The form asks researchers to introduce themselves because among indigenous communities, the act of introducing yourself is actually a key cultural norm that serves to inform us of who you are, where you're coming from, and what your intentions are so that we as indigenous persons can begin to formulate a relationship with that researcher. We also ask what forms of compensation will be provided to any participants gained from our community and what the researchers experiences involving indigenous studies.
Part of the exploitation process that we're wary of is the act of strangers coming into our communities who have no intent of reciprocity, another major value among many tribes or indigenous communities. And so many requests have been denied on these grounds simply because we don't believe the researchers is able to meet them.
These cultural expectations, as for the deciding authority, it's usually just myself, with such a small moderation team and, and, and as an academic who has performed research required to be vetted by an IRB and who teaches about indigenous research methodologies, I was the most equipped on our team to vet these types of requests.
But this, yeah, and this process overall, though, very important to this was created with community input. It was proposed to the community. We received comments and feedback on it and, and made sure that, you know, as a collective, we were adopting this, and any decision that's made by me or another moderator who might happen to make a review if they do, can be appealed by any user of the community or by another moderator.
So there's community checks and balances involved in the process as well.
Tillman: That is an amazing process, and so, I immediately wondered, are there other communities who've used this process as a framework for use of research or use of their data for research? Are you aware of any have they contacted you?
Pittman: Yeah, I, I think Sarah might also have some thoughts on this. But in my experience, I have met a few other communities, that have implemented similar ones. Usually, their communities who represent marginalized identities or minoritized populations, and for very similar reasons, they feel that their communities have been exploited in some form or fashion, or are just, you know, bombarded with requests to study their community in ways that don't have much oversight.
And so they've taken similar routes and implemented these things. I actually got the Google form idea from another moderator at a convention. I had attended in Austin, Texas, and we had swapped some ideas about this. And so that was kind of where I got the idea of implementing the form for Indian Country.
But it does seem to be a common again, across communities, particularly the. Those that, feel targeted, by invasive research practices.
Tillman: So, Sarah, Kyle mentioned that you may have some more information about these other communities.
Gilbert: So, one of the things that Nick and I found, and our colleagues as we were, doing other research for this paper is that, the, the communities that are studied on Reddit are really unbalanced.
There are a number of communities that are really, really highly studied on Reddit, over and over and over again by particular researchers, and then some that, you know, aren't necessarily studied at all. And some of this correlates to the size of the community, so if you are just kind of scraping all of the data, you're probably going to get the subreddits that are the largest.
A number of years ago, Reddit had a system where they had a series of what they were called default subreddits. So these were subreddits that would automatically you would be subscribed to when you signed into Reddit. Now that's not the case anymore. But these series of subreddits, you know, they tend to be about very broad topics that are generally appealing to people and tend to have a lot of subscribers because, you know, a lot of people were automatically subscribed and they just never unsubscribed. So those show up a lot. And then there are communities that, offer pretty unique insight into particular research topics that people are interested in, around mostly like mental health and drug use.
And so we found that a lot of health researchers were studying these communities and even though they're a lot smaller, with way fewer subscribers, they were really like sort of highly represented in our study. And so those are the ones that tend to, like those smaller communities that either tend to be for, you know, mental health or drug use and, you know, stigmatized, you know, sort of stigmatized use where the, the, Discussions that are happening in them are really sensitive and people are really vulnerable.
And so in order to protect their users from some of these studies, they've created policies around this. Now part of the issue is that like, you know, you can put this out there, but there's no enforcement mechanism. So you can say, you know, please fill out my form, please get permission. This is what we want because up until recently, you know, Reddit had this freely available API.
It was very easy for people to just kind of go in and scrape the data anyway. And so some of these communications as you're reading them, you know, they almost feel a little bit hopeless or a little bit powerless that like, you know, we know, like we want you to do this, please do this, please make us aware, but We also kind of recognize that, like, this is, you know, even if you want to, there's not anything we can really do for it because a lot of the times, a lot of this public data scraping, it's not even reviewed by many IRBs.
Right. So they don't even have that as a recourse if the sort of extractive or invasive or potentially violating or uncomfortable use of their data is is being used. Wow.
Tillman: So does that get into like the whole power issue as well? Particularly with marginalized communities and populations who are often overstudied.
So I have a question. I want to back up a bit and ask you, Nick and Sarah, like, how did you get interested in focusing on online communities in your work?
Proferes: Sure. So, Along with Michael Zimmer, I actually had, back in 2013, 2014, done kind of a similar project where we were looking at the growth of the use of Twitter data as part of academic research.
So I've been studying online communities and information flow in relationship to online communities since about, you know, 2010. And, Reddit has sort of evolved into this really interesting space. And particularly, Reddit, I think, has gained a lot of prominence because of changes that have also happened simultaneously, in the larger information ecosystem.
So, for example, Twitter's APIs, were closed down. actually, there, there were several platforms who have closed down access, to, data for a variety of reasons. And one of the things that we, we saw sort of initially was, well, maybe Reddit's going to be an alternative space. And so there's been these migration patterns between platforms.
And you know, scientists always want to go where the data is to a large extent. And I think it's really interesting though, to think about the ways that our development of scientific knowledge are actually very dependent on data infrastructure that is provided by these systems. So, you know, essentially.
You know, the reason that we have so much knowledge right now about Reddit is because historically they've had these very open API's, made it very easy to collect vast amounts of data. The same thing was, was true prior to, with Twitter. And I've been studying questions around ethics in relationship to this for a while too.
So along with Casey Feisler, we actually did a study, did in 2018, where we looked at, Twitter, Twitter users feelings, about actually being used as part of these like massive studies. And we found that, you know, users have these really contextual beliefs about when it's appropriate to use, you know, their data.
For example, you know, they're, they're very comfortable, or at least more comfortable with the idea of, you know, being part of a, a data set of a billion other tweets, much less comfortable if you're looking at like 10 or a hundred, right? Like, Oh, why are you looking at me? Right. And thinking about these contextual factors, I actually think it's really, really important for thinking about research ethics.
How does it relate to the individual and the individual's relationship to a particular community? And we can see that, you know, sort of manifesting on, on Reddit right now. , and certainly, that's, that's kind of my interest in the space.
Gilbert: Just to follow up. So I've been really interested, back when I was doing my doctoral research on, you know, why people participate in different online communities, and, and how these differences, or like what, how these motivations differ across, various platforms. And so I was taking kind of a case study approach and the first one that I looked at was this Twitter community called HCSMCA Healthcare Social Media Canada.
And so back in 2015 or so, they were using this really kind of cool way of communicating on Twitter called a tweet chat where they would use this hashtag to meet up synchronously and have a conversation. And so it was this really cool space where , they would have the synchronous conversation usually over lunch once a week, and it was an opportunity for people across the healthcare system to have conversations.
In this community, you know, doctors were learning from patient advocates, for example. There was a sort of inverse of power. And so, I really, I, I kind of was new to this space, and one of the things that I learned just from observing this community and interviewing the members, particularly the patient advocates. , was, these issues that people have with medical research and the sort of the, the sort of, like, literally, like patients, like getting not data extracted from them, but like, you know, their, their actual sort of like body. Parts of their body, and then by participating in these research studies, and then never having any idea what came from it.
So, you know, they're taking these medications, they never find out if it's a placebo or the actual drug, or if it was effective, or like, or if it was just, you know, psychosomatic or whatever. And so they were advocating, you know, nothing about us without us, you know, you can't take these parts of our body.
And without, you know, like coming back to us without including us at all parts of the research project. And so that was really inspiring to me. I was like, I can do that. We should be doing that for online digital internet research as well. , and so when I was going into my next case study, which was actually on Reddit, I was an avid Reddit user at the time and decided I wanted to study my favorite community, which is Ask Historians, which Kyle has mentioned, and that's actually, that's how Kyle and I know each other is we're, I'm actually a moderator of Ask Historians as well, because I ended up researching it and then becoming a moderator, eventually myself.
And so after, you know, doing this research with the ask historians community, I really wanted to make sure that I was giving back and that they knew what I had found. And so I'd written up these kind of series of. posts, letting people know sort of what I found. It went really, really well. You know, it seemed to be really well received by the the wider community and the moderators.
And it sort of felt like, okay, there are these, you know, we can do this research in these spaces. And maybe if we think sort of innovatively about how we can engage with them ethically, you know, we can do this, you know, and give back to people in this, in this kind of way. , and so. You know, going, you know, moving forward, I, I became, I became a moderator in order to do more research with Ask Historian's community.
And in framing that, that project, I was actually really inspired by Kyle's writing. he had written a number of posts for Ask a Historian's about the extractive nature of, researchers, particularly ethnographers, which was this type of study I wanted to do, engaging with Indigenous communities, and some of the ways that you, you know, some of the things that you should not be doing, and some of the ways that were more productive, you know, don't be extractive, give back that reciprocity that he had mentioned earlier.
And so that's what inspired me to actually not just, you know, kind of come into the community of Ask Historians and just watch these people moderate, but actually kind of become a moderator. Myself, and figure out ways to kind of give back. , and so because I, you know, I've researched this, like Reddit in these contexts and sort of taken inspiration from people like Kyle or the community members that I studied on Twitter, you know, I was really interested in some of these ethical questions as well.
I also worked with some folks that, Nick has worked with as part of a postdoc also interested in looking at how users think and users feel, how they're comfortable with research uses of their data across these different kinds of contexts, you know, again, finding for me, it was, And across the studies that I did, informed consent was huge, which is obviously really challenging.
How do you get conformed consent when you're working with a data set of like sometimes tens of millions of people? And so, this is something that, you know, Nick and Casey and Michael and, you know, I know Kyle, a whole lot of people have been thinking about, how can we work with communities, you know, instead of necessarily getting all of that individual consents from into like, you know, people.
You know, can we work with communities and get consent that way? And how can we make, how can we encourage researchers to make their work more visible to people so that people know at various stages what kind of research is happening, you know, how their data might be used. , cause that was one of the things that we'd also found in the surveys that we had done in lieu of informed consent.
you know, just any kind of level of awareness increased people's comfort level with research being conducted. You know, so what are some creative ways that we can think of to do? And so we had recommended some of these things like, and like working with communities, finding ways to share back.
Cause that was also one of the things that we had looked at in that paper. Most people are not sharing their own work back with the Reddit community. It's never getting, it's never getting back there. All of these super valuable insights about Reddit. It's happening through this research and Redditors don't even know.
So that's been, that's been some of my inspirations.
Tillman: Wow. And, you know, it aligns with the ethical framework that IRBs and, you know, human subjects research works under, right? The Belmont Report, which you're, I'm sure, all familiar with, respect for a person's beneficence and justice. So, and you kind of talked about respect for persons, but where have you seen, I have two questions related to this.
Where have you seen the ethical frameworks, these ethical frameworks used in creative ways? Because you're right. You have to be creative about respect for persons. It's going to look different than the individual consent process that. IRBs prescribe, right? for research, where have you seen respect for persons, beneficence, and justice kind of show up in these frameworks that you've built, Kyle, or you've seen built, Nick and Sarah?
Pittman: Yeah, you know, it, it really will look different, I think, in, in each context or case by case. , one of the ways that I, that I see it and teach about as a, as an, as a research method or, or. Framework or paradigm really to, to approach this. , whether we're talking about online communities or, real life communities, is there needs to be the establishment of a relationship, right?
This, this concept we teach about in indigenous studies that you have a relationship to. Anything and everything, in this world is, is really key to many indigenous, paradigms, around, knowledge and research in general. And, you know, the days of the anthropologist, you know, peering behind the bush and looking at the people afar, studying them in their natural habitat, thankfully, are on their way out.
But there are still some who have very much inherited that kind of mentality or perspective. And that includes for online communities as well, where, whether it's with scraping of data, or trying to collect responses without being transparent or publicly noticeable. Some still try to attempt those things.
There is a big emphasis, when you do have a researcher in front of you to encourage them to not just limit their interactions to the community that are based around their research. We've had a number of people who have come by our community to do research and who have actually stayed, and who became part of the community.
They're recognized regular contributors to our community. They comment more than just about the things they're interested in research. And that does function as that form of reciprocity, but it touches more to the point of that, which is having that established relationship where somebody knows who you are, and you've built credibility with the community.
And that is something that happens, not just, on an individual to individual basis. , I might know that person because. I read their formal research request, but that that request will be denied, even if it's totally ethically approved by their I. R. B. It'll be denied if my community says that that's not a trustworthy person.
And so in a similar way, then, you know, Where you, where we see these kinds of frameworks, coming in, I think the best approach, even if it's not the easiest is that person needs to become part of the community. Especially with, with tribal people. If you're not part of the community, they're not going to reveal anything.
You know, the antithesis to what I think a lot of researchers feel, which is, oh, if I'm involved in the community, I'm going to create a bias that will be represented in my research. Right?
Pittman: you know, my response to that is that is a particular cultural framework that you're working from. And it is not 100 percent correct or true or objective, by depriving yourself of that kind of relationship or community connection.
You're not going to get the full story anyways. And so in that way, it's very much encouraging a paradigm shift and really a cultural shift in the researcher.
Proferes: Yeah. So if I could, if I could sort of tag on to that, so seconding everything that Kyle said, you know, one of the things that, we found in looking at the 727 papers on Reddit is that, actually only.
25 percent of them talked about ethics in any capacity, even if it was, we didn't seek IRB approval. We counted that as, that as talking about ethics.
Tillman: Right.
Proferes: So one of the big things that I think that people can do is just talk about what their ethics are and the research process. , literally just saying, you know, I'm following a process where I'm seeking, you know, approval from my IRB.
This is what they've made a determination about. These are the additional steps that I am choosing to take if they are choosing to take additional steps. , we've seen a lot of creative practices around how, people sort of manifest the Belmont Report principles in their work. Sometimes it's not just thinking about the individual, but also thinking about the impact, on the community as a whole.
That's really important, particularly for very, very small subreddits that are not going to have sort of the attention that might be drawn by, you know, a, a subreddit that has a million subscribers, right? If you're studying, you know, 20 people, the, the, the microscope seems, you know, a little more scary.
Proferes: you know, and so thinking about, You know, how to think about the community, the norms of the community, certainly, but also the expectations of people, in the community and in the community as a whole, thinking about ways to do things like obfuscate quotes, to not use usernames, in, particularly around controversial topics or sensitive information.
So, as Sarah mentioned. We found a lot of researchers have found that they are getting access to data that they can't even get in focus groups about things like drug abuse or recovery, about things like mental health support, you know, really, really critically important topics that scientists need to understand better so they can provide better support, but also situations where, you know, maybe it's not good to directly quote people in a way that makes them rediscoverable, through a Google search, there are steps that we can take to try to ensure respect for persons, beneficence, and, you know, certainly thinking about, ways of doing that.
And as Sarah said, also, you know, sharing back with the community. I actually, one of the things that we, one of the things that we found, we actually did a search to see whether or not these studies were ever shared on Reddit. And we found that in, in some cases they were, but they were often not shared back by the person who actually authored the paper, it had been some other user like, "Hey, we found that someone was studying us," and they were sharing it with the community, which can be both like, Oh, that's kind of cool that we're being studied or kind of terrifying, depending on, you know, the community.
Tillman: Yes. At PRIM&R, we talk a lot about lending to public trust, right?
And so to discover that you were studied does not necessarily bode well for trust. So, yeah.
Proferes: Yeah, absolutely.
Go ahead, Sarah.
Gilbert: Oh, just to sort of, it gets so complicated though, because one of the things about Reddit is that historically, it has been, a site that hosted communities that were very well known for instigating harassment and abuse, particularly against marginalized and vulnerable populations: a lot of hate, like literal slurs as the subreddit. , and this was allowed on Reddit for years and years and years, and so, and, and, and a lot of really, like, and violent movements as well, that are really important to study, like for, particularly if you're studying online harassment or disinformation or radicalization, you know, really important things that, you know, we as a society need to know about and understand. And these have been open communities where these discussions are happening and really valuable sources of information for like, you know, how do people adopt conspiracy theories, for example, you know, there've been studies on that kind of thing.
And so it becomes really tricky where you have almost this kind of like adversarial research relationship with the community itself. So the thing that you're studying is incredibly important for society, but the actual community itself might not be comfortable with you studying that. And so it's like, well, do you, should you be getting community consent at that point?
Should you be making people aware of the research that you're doing, particularly if you are a member of a vulnerable or marginalized population yourself? And even more so if you're a member of a vulnerable or marginalized population that is being targeted by this community, you yourself can, like, become and find yourself on the receiving end of this kind of harassment and abuse, which is a huge problem for researchers right now. And so, you know, there are certainly cases where that, you know, that power dynamic I think is inverted. Where it's actually the community that has that sort of, that power, that gaze over the, over the researcher. And then, and those are cases where like, you know, we would normally, we totally, you know, recommend, you know, making yourself more visible, being transparent, being accountable to the actual community itself. But there are certainly cases where, it's very complicated and you might not necessarily be able to do that without putting yourself or your students or your colleagues or your institution at risk.
Tillman: Wow. Interesting and complex, right?
Very, very complex. So we've heard that Reddit is in the process of creating a platform for researchers called Reddit for Researchers. How would that work, and what do you think that's going to accomplish?
Proferes: Yeah, so, I can jump in a little bit on that. So, Sanjay Karim is the new head of, research science at Reddit, and I believe that he and his team have launched a beta program for accessing research data, and it allows sort of folks to run queries and export data, and he stated that they are partnering with a group called OpenMind, who are thinking through how to put appropriate safeguards in place to sort of enforce Reddit standards around user privacy.
Privacy. And I know they have a plan to sort of build out their initial, sort of community governance model, which will actually ideally enable members of the research community to also provide feedback regarding research data requests. And, sort of doing this also based on existing sort of ethical guides and frameworks.
So right now, I think they're still figuring out how to sort of balance all of this. This is still a sort of a beta test as I understand it. , they just had the close of their first call for, for data, essentially the application process. I'm really hopeful that they're going to figure out some of the balance between the academic side, the user side and the community side of this, because I do think it's really important that researchers are researching in the space for all of the myriad of reasons that, you know, Sarah, has, has already listed.
And you know, this is a really tricky thing to navigate. I think that we're entering a space, in time in which many of the data infrastructures that researchers have historically relied on. are being closed off. So Facebook recently closed, down CrowdTangle. Twitter has, sort of narrowed in, what you can get from the API, at least without having to pay.
And the, the costs are also quite high for academics if you do want to pay. , and you know, having access to data is really, really important for having that sort of independent. , research in place. So I'm very hopeful for this. It's also a very big challenge for them to sort of, harness and figure out.
Tillman: Right. Okay. I bet. So my last question, I'm kind leading in with that being hopeful about this platform and others. What's coming next? What do you see coming next? What can we help to learn from researchers who are working in spaces like this? And how can moderators who are acting with these ethical frameworks in mind, continue their effort to protect, but also encourage these conversations and communities that have been created.
Pittman: Yeah, I'll jump in there. So, you know, it on on Indian Country, we get a lot of requests that center around mental and physical health, sociological studies, and even issues around, domestic partner disputes or, or, you know, very serious things. And I feel like a lot of these are related to a number of factors.
But prime among them is that indigenous peoples like American Indians, Alaska natives represent a very small percentage of the population in the United States. , as one place, for example, where we're approximately about 2 percent of the population, and among that, some of the most disadvantaged people in many statistics, like health, violence, poverty, police brutality.
And so because of that informed researchers who decide to include us at all really see us as, as being needed to be represented in their data pools, right?
Pittman: and there is a lot of, value merit in that, right? Ensuring that indigenous peoples are being represented and that way these disparities that we're facing can be accurately reported on and so forth.
So there's no, there's no dispute that there is merit to these efforts. But the problem is right that, researchers who are not informed enough about ways to, approach these communities to get access to their experiences, and this data, much like Sarah was saying, who, who do these in erroneous ways, right?
They end up propagating these misunderstanding stereotypes or even harming communities. And so what I see for the future here as far as like online research goes and particularly within Reddit. I very much see just an encouragement of the things that we've laid out here, and seeing community agency coming to the fore where now with restrictions on the API and, you know, more innovative or novel ways needing to be invented to conduct this kind of large scale research.
Researchers are being mindful, more mindful about how they can obtain that. But again... I think that means then it's incumbent upon these communities to also re evaluate their relationship to research and researchers to implement these. , and for researchers to be in support of that.
I don't like to think that any researcher going out there is intending to do harm. But the, the unfortunate reality is that many of them do inadvertently. And so if we are, saying that we want to be ethical people, that we want to hold ourselves to these principles and these guidelines that we've spoken about here, then that means researchers need to also be joining that call.
And if a community lacks, these kinds of mechanisms or safeguards that the researcher is ensuring they've built it into their own projects, own initiatives, just as Nick and Sarah had elaborated on earlier. So that's really where I see a lot of this future going is hopefully these shifts in access to data are going to create new paradigms for researchers are more responsible, in how they go, go about obtaining that data.
Tillman: Good point. Sara and Nick, anything you're hopeful about or, or what we should expect in the future?
Proferes: Sure. So I can go and I'll give Sarah maybe the closing word. I would say that the big things I'm hopeful for are, that this can be an impetus for thinking about ethics education across a variety of domains.
Proferes: And what I mean by this is we saw, so many different disciplines using data from Reddit and in our analysis of the 727 papers, it's not just, you know, medicine and, and, and, you know, social scientists, it's people in computer science, it's people in philosophy, it's people in law. And I think all of us collectively need to think about what and how we're training our students in, in terms of, ethical education. One of the difficulties is that many domains have had very different approaches to, training students into, ethical practices. , historically, you know, certainly, thinking about things like the Belmont Report had really been the domain of biosciences, medical students, right?
And so now suddenly other people are having to think about. how this applies. So I'm really hopeful for ways of thinking holistically about ethics education across all of these different disciplines of science. I'm also really excited about some of the norming effects that can go on by talking about our ethics and our ethical approaches in our papers.
One of the big challenges here is that this is a really gray space. Right. The IRBs have, and have been using federal definitions of what constitutes human subjects research for a very long time. And typically that does not always include, you know, publicly available information on the internet.
And, you know, suddenly, we're having the conversation. Well, there might still be ethical issues that are going on here and that are at stake. And having that conversation is really critically important, and again, across domains. And then, you know, finally, I think the big thing that I think this gives us an opportunity to sort of think about is again, that question of how do we make sure that we are being transparent to the people that we're studying or the communities that we're studying?
I think that You know, for me, one of the things I fear is that, you know, that scientists are just looking at this as a data source rather than people. And I think that we need to go back to, you know, the Belmont principle, certainly, but that idea actually that comes from, from Reddit in many ways of remember the human, right, that's on the other side of the screen, and so that's something else that sort of gives me hope.
Gilbert: Yeah, Nick, just definitely seconding that like that remembering that like that the data are actually people I think is incredibly important. So I just, I just want to, you know, do a plus plus one to that there. Also building on some of the education. I think this is a really cool, unique opportunity to educate both people and communities and moderators whose data is being used for research on what ethical research looks like and helping them think creatively about what ethical research looks like for their communities because as Kyle has mentioned, you know, it is really highly contextual, both, you know, within individual communities themselves and also across Reddit, you know, so maybe it's fine to use threads or data collected on certain topics within a community, but maybe maybe others.
And so understanding more about what users are comfortable with, what they find acceptable, and then helping them articulate that, helping them develop their own policies through this education, helping them evaluate it. I think it was really striking when Kyle said, I'm the only one on the team who has the education or the experience to be able to do this.
You know, the team was really like Kyle's team is really lucky to be able to have that, that expertise and not all moderation teams are going to be able to have that and not all communities are necessarily going to be aware of it. And so I think that there's a really cool opportunity here to provide more information about, you know, what research is happening, what are the risks, how do we mitigate them, what are people comfortable with so that the things that we are doing are in better alignment with what people expect so that we're not coming up against these clashes. I think that there's really important educational components and really sort of exciting, creative place that we can think about, how to develop these very user generated inspired community inspired protocols for how we work within, within these spaces.
And then again, it gets really complex when you're combining data from multiple communities. And how do you do that? There's just, there's so, there's so much more to learn here in about how to do research ethically on Reddit, because it is such a complex space. I think that there's a lot of room to grow, and I'm really hopeful that the conversations that, you know, that we are having and that like Reddit as this, you know, a very important spot and in like, are a very important data source.
One that's been in the news that they are one of the few companies that are at least working towards establishing a program for researchers that there's a lot of opportunity here to build something. Something that could be like a model for other, for other platforms.
Tillman: Absolutely. And even a model for offline research as well. There are many different concepts and principles that you talked about that can absolutely be applied, in offline situations . Well, our time is up for now. I believe that, the next year or so, if we get back together, our conversation may look a little bit different, with more that you've learned, and having gained more experience.
And so I look forward to speaking with you again, and, and continuing to learn about research in this space. So thank you, Sarah, Kyle, and Nick for your time and also your expertise.
##
His research interests include users understandings of socio technical systems, such as social media, societal discourse about technology, and issues of power and ethics in the digital space. We also have Dr. Sarah Gilbert with us. Sarah is a research associate at Cornell University and a research director of the Citizens and Technology Lab, where her work focuses on supporting healthy online communities.
She explores how volunteer moderators' labor impacts community governance. She explores factors and interventions that encourage participation and reduce harmful behavior. And she studies how online data can be reused ethically. We are also pleased to have with us Kyle Pittman, who is the moderator of the sub Reddit “Indian Country,” which is one of the largest and most active communities for indigenous peoples on Reddit.
Kyle, who is a faculty member at Evergreen State College in Washington, has given thought to research ethics concerning indigenous peoples and communities both in Reddit and offline. Thank you all again for being with us here today. I wanted to give a primer to our listeners about Reddit. Reddit, which has billed itself as “the front page of the internet,” hosts thousands of online communities, which serve as an online discussion board on a wide range of topics. As of March, Reddit reported that they had 306 million weekly active users, 82 million daily active users, and more than 100, 000 communities.
Before we jump in to learn more about research on Reddit, Nick, can you share a little bit of an explainer of how Reddit works and then touch upon how researchers have used Reddit over the years?
Proferes: Sure. So Reddit's really a social media platform that's made up of a lot of different communities called “subreddits.”
Subreddits can be created actually by any Reddit user, and typically they're organized around a particular topic, location, identity, or concept. So, for example, there's subreddits dedicated to everything from cute photos of cats, to video games, to politics, to specific diseases or ailments that people want to find support for, to local spaces. So, for example, I’m a member of the state of Arizona subreddit as well as the city of Phoenix, as well as the Arizona State University, as well as the Arizona Diamondbacks, subreddit. So, the list really kind of goes on and on. If you're interested in it, you can probably find it on Reddit.
Subreddits are typically open to all Reddit users and users can be anonymous. And, some subreddits actually have like millions of people in them at this point. So these spaces can be really, really small or they can be really, really big. And, in these spaces, people make posts, sharing, for example, cute pictures of cats, and people can leave comments on those posts.
What's actually kind of unique about Reddit is that there's also a voting system. So people vote up or vote down content, which can increase or decrease the visibility of that content. And, you know, like you said, because there's over 100,000 active subreddits, people can actually have very, very different experiences of the platforms depending on what communities that they're part of.
Now, Reddit historically has had actually a very open application programming interface, which allowed researchers, to collect data, oftentimes in bulk in a very straightforward way. And researchers really, really benefited from this. And there's a pretty large, swath of papers that have been published using Reddit data.
So along with Sarah, and also our colleagues, Casey Feisler, UC Boulder, Michael Zimmer at Marquette, and Nate Jones at the UK Office of Statistics, we did a meta study where we were really trying to understand how scientists are making use of data from Reddit. We collected every peer reviewed Reddit study that we could find from 2010 to 2020.
We found a total of 727 papers and we read them and it took about about a year and a half and we classified them to understand, okay, who's you know, studying Reddit, what are they studying? What are the methods they're using as part of their process? And, you know, essentially what are the sort of ethical issues that they're, they're running into in this space?
And there's a lot that's going on in terms of how scientists are making use of Reddit. This has been historically very important space and it's and one that's actually been increasing dramatically over time. We've seen just an absolute massive growth and our, our study stopped at 2020.
So we can only, you know, imagine how much it's continued to grow since then.
Tillman: Wow. Thank you. And so I have a lot of questions just about, the study itself that we can hopefully get into later in our conversation. But speaking of subreddits, and I mentioned earlier, Kyle is the moderator of the subreddit, Indian Country, which is one of the largest and most active communities for indigenous peoples on Reddit.
Nearly 70, 000 people are on the Indian Country subreddit, which is considered among the top 2 percent by Reddit. The description of Indian Country describes the page as Native American and indigenous news, happenings, culture, politics, arts, community, and thought. This Reddit community has a list of rules, which states that hosts which acquire participants conduct formal research are among those that must gain permission from the moderators.
In fact, the fourth rule of the moderator policy states no unauthorized research requests. It says, In the past, many attempts by researchers and experts to study indigenous peoples have proved to be harmful. Unethical research practices that do not account for indigenous ways of understanding have resulted in intellectual abuse, cultural appropriation, and human rights disenfranchisement.
While much progress has been made, not all organizations and scholars have reconciled the damaged relationship between their fields of study and indigenous peoples. The rule for the Reddit sub threat goes on to state that research requests or attempts to study our community must have that explicit permission.
These rules are in effect kind of like a grassroots form of an IRB. And so I come from the IRB world and to hear that, the community itself began to put rules around the use of the data is both fascinating to me and aligns with the work that we do here at PRIM&R. So, Nick, is that kind of a fair description of this subreddit.
Proferes: I think that it certainly is, but, I'm sure that Kyle can sort of fill in more about the actual process there.
Tillman: Absolutely. So, Kyle, we'll start with kind of you sharing a little bit about your role on the subreddit. Share how you got involved and some of the rules around particularly use of data for research purposes.
Pittman: Yeah, sure thing. So the Indian country subreddit was created actually nearly 10 years ago this year, and it was made as an alternative to other indigenous based subreddit communities attempting to represent a pan Indian or a pan indigenous, perspective on, on the content we wanted to see.
I was initially brought on by the founder a few days after the creation of the sub due to my involvement on the other indigenous based subreddits and my calls for reform for these communities. So currently we have four moderators. Others have come and gone over the years, but we have four really consistent people.
And for myself, I'm the most active moderator on our team, performing approximately 70, 75 percent of all moderator actions. Regarding our rules, these have certainly evolved over time. In the beginning we had four. Fewer and more ambiguous guidelines due to our inexperience as moderators and the small size of our community.
We knew what we wanted because our genesis sprang from the desire to see more active and firmer moderation in this kind of space, compared to what we were experiencing from other communities. So at that time, nearly all of our rules primarily dealt with various forms of bigotry, nothing really related to research requests, or really like the unique rules that we have nowadays.
But after this period of time where we started gaining traction and being poised to be the most prominent indigenous subreddit, we realized that we were attracting much more than just indigenous users to our community. Yeah. Our notoriety meant that we really needed to expand our rules and draft some policies to handle matters that we really didn't encounter in the earlier years.
So among these things, we drafted an FAQ in 2017, created new auto moderator functions in 2019 to kind of automate some of our processes. We, implemented our aforementioned research request process in 2020. And then we even completely revised our rules and policies in 2021 to what the basis is now for our current iteration.
Thus, these rules now prohibit various things, numerous forms of bigotry that indigenous peoples experience. They require posts to be relevant and legitimate. It bars posts asking for like creative writing advice or people who want to ask about spiritual or taboo subjects. I just made a post the other day.
Reminding people to not post about owls because owls have a very negative connotation for many tribes. Very unique thing, right? To have a rule about that, but we do. And, also other things that require things like research requests to be vetted by the moderators.
Tillman: So, I have a question. Prior to this work, had you ever heard of the IRB system before or any type of oversight rules associated with research?
Pittman: So, by the time that we implemented this process, I had. I was in the beginning of my academic career. I had finished, I was in the middle of my graduate program. graduate studies, I was working as an adjunct faculty, and through my work with Ask Historians, and in my particular area of study, research methods and ethics had already been kind of, in my wheelhouse of things that I was studying, and preparing to teach others about, and in particular, a lot of my studies focused around tribal sovereignty and a key aspect of that is self determination and agency for indigenous peoples to make that suit their particular needs and their cultural context.
And so as the community started to grow, I realized, you know, even though this is the subreddit is not a tribal nation, it is a form that is meant To, be a safe harbor for indigenous knowledge, indigenous ways of knowing and indigenous peoples to share their experience. And so in a similar way, then we should have that agency, to determine who is going to interact with our community and in what ways, and so even though, at first I wouldn't say it started as a IRB like process, it has certainly gotten to that point now, especially as we started to attract more professional forms of research or academic, institutional related forms of research and researchers coming by our community.
And so, yeah, in a, in a similar way, how tribal nations can implement their own IRB process. And of course, academic institutions have that process. Now we as a community are kind of acting as another layer to reinforce, these kinds of ethical guidelines.
Tillman: So can you walk us through the approval process and, particularly who makes the decision around whether research is allowed?
Pittman: Yeah, great questions. I'll preface this by saying, you know, the process began because of that aforementioned notoriety that we were attracting the attention of not just native users or even like amateur or hobby interest researchers, but researchers from formal institutions or even businesses who see Reddit as a prime place to recruit people.
Participants and solicit feedback and so as was noted by you and our policy, indigenous peoples have a very fraught relationship with Western norms of conducting research. And many of our users feel exploited by unsolicited requests, being posted to our space. And this was kind of the impetus for this policy.
So overall, the process is really straightforward. Researchers of any type, whether they're professionals, graduate students, independent persons, they should read our rules first and foremost, as everybody should. And they'll see in those rules that there are instructions to complete a form that I had created.
The form asks researchers for basic information about who they are, their institutions, if they have any funding sources, who their advisors or supervisors are, and what their overall research tools are. And we also ask for any IRB or HSR related documentation they may have, and these are reviewed thoroughly and contact is made with the approving authority to confirm the legitimacy.
And then all. That data from the form is posted to a running log that we maintain on the subreddits wiki pages for public transparency. Essentially, we're ensuring that we're following principles involved in free, prior and informed consent. But arguably the more important aspects of this process are not just these technical sides to the research project, there is a cultural piece to this, as well. The form asks researchers to introduce themselves because among indigenous communities, the act of introducing yourself is actually a key cultural norm that serves to inform us of who you are, where you're coming from, and what your intentions are so that we as indigenous persons can begin to formulate a relationship with that researcher. We also ask what forms of compensation will be provided to any participants gained from our community and what the researchers experiences involving indigenous studies.
Part of the exploitation process that we're wary of is the act of strangers coming into our communities who have no intent of reciprocity, another major value among many tribes or indigenous communities. And so many requests have been denied on these grounds simply because we don't believe the researchers is able to meet them.
These cultural expectations, as for the deciding authority, it's usually just myself, with such a small moderation team and, and, and as an academic who has performed research required to be vetted by an IRB and who teaches about indigenous research methodologies, I was the most equipped on our team to vet these types of requests.
But this, yeah, and this process overall, though, very important to this was created with community input. It was proposed to the community. We received comments and feedback on it and, and made sure that, you know, as a collective, we were adopting this, and any decision that's made by me or another moderator who might happen to make a review if they do, can be appealed by any user of the community or by another moderator.
So there's community checks and balances involved in the process as well.
Tillman: That is an amazing process, and so, I immediately wondered, are there other communities who've used this process as a framework for use of research or use of their data for research? Are you aware of any have they contacted you?
Pittman: Yeah, I, I think Sarah might also have some thoughts on this. But in my experience, I have met a few other communities, that have implemented similar ones. Usually, their communities who represent marginalized identities or minoritized populations, and for very similar reasons, they feel that their communities have been exploited in some form or fashion, or are just, you know, bombarded with requests to study their community in ways that don't have much oversight.
And so they've taken similar routes and implemented these things. I actually got the Google form idea from another moderator at a convention. I had attended in Austin, Texas, and we had swapped some ideas about this. And so that was kind of where I got the idea of implementing the form for Indian Country.
But it does seem to be a common again, across communities, particularly the. Those that, feel targeted, by invasive research practices.
Tillman: So, Sarah, Kyle mentioned that you may have some more information about these other communities.
Gilbert: So, one of the things that Nick and I found, and our colleagues as we were, doing other research for this paper is that, the, the communities that are studied on Reddit are really unbalanced.
There are a number of communities that are really, really highly studied on Reddit, over and over and over again by particular researchers, and then some that, you know, aren't necessarily studied at all. And some of this correlates to the size of the community, so if you are just kind of scraping all of the data, you're probably going to get the subreddits that are the largest.
A number of years ago, Reddit had a system where they had a series of what they were called default subreddits. So these were subreddits that would automatically you would be subscribed to when you signed into Reddit. Now that's not the case anymore. But these series of subreddits, you know, they tend to be about very broad topics that are generally appealing to people and tend to have a lot of subscribers because, you know, a lot of people were automatically subscribed and they just never unsubscribed. So those show up a lot. And then there are communities that, offer pretty unique insight into particular research topics that people are interested in, around mostly like mental health and drug use.
And so we found that a lot of health researchers were studying these communities and even though they're a lot smaller, with way fewer subscribers, they were really like sort of highly represented in our study. And so those are the ones that tend to, like those smaller communities that either tend to be for, you know, mental health or drug use and, you know, stigmatized, you know, sort of stigmatized use where the, the, Discussions that are happening in them are really sensitive and people are really vulnerable.
And so in order to protect their users from some of these studies, they've created policies around this. Now part of the issue is that like, you know, you can put this out there, but there's no enforcement mechanism. So you can say, you know, please fill out my form, please get permission. This is what we want because up until recently, you know, Reddit had this freely available API.
It was very easy for people to just kind of go in and scrape the data anyway. And so some of these communications as you're reading them, you know, they almost feel a little bit hopeless or a little bit powerless that like, you know, we know, like we want you to do this, please do this, please make us aware, but We also kind of recognize that, like, this is, you know, even if you want to, there's not anything we can really do for it because a lot of the times, a lot of this public data scraping, it's not even reviewed by many IRBs.
Right. So they don't even have that as a recourse if the sort of extractive or invasive or potentially violating or uncomfortable use of their data is is being used. Wow.
Tillman: So does that get into like the whole power issue as well? Particularly with marginalized communities and populations who are often overstudied.
So I have a question. I want to back up a bit and ask you, Nick and Sarah, like, how did you get interested in focusing on online communities in your work?
Proferes: Sure. So, Along with Michael Zimmer, I actually had, back in 2013, 2014, done kind of a similar project where we were looking at the growth of the use of Twitter data as part of academic research.
So I've been studying online communities and information flow in relationship to online communities since about, you know, 2010. And, Reddit has sort of evolved into this really interesting space. And particularly, Reddit, I think, has gained a lot of prominence because of changes that have also happened simultaneously, in the larger information ecosystem.
So, for example, Twitter's APIs, were closed down. actually, there, there were several platforms who have closed down access, to, data for a variety of reasons. And one of the things that we, we saw sort of initially was, well, maybe Reddit's going to be an alternative space. And so there's been these migration patterns between platforms.
And you know, scientists always want to go where the data is to a large extent. And I think it's really interesting though, to think about the ways that our development of scientific knowledge are actually very dependent on data infrastructure that is provided by these systems. So, you know, essentially.
You know, the reason that we have so much knowledge right now about Reddit is because historically they've had these very open API's, made it very easy to collect vast amounts of data. The same thing was, was true prior to, with Twitter. And I've been studying questions around ethics in relationship to this for a while too.
So along with Casey Feisler, we actually did a study, did in 2018, where we looked at, Twitter, Twitter users feelings, about actually being used as part of these like massive studies. And we found that, you know, users have these really contextual beliefs about when it's appropriate to use, you know, their data.
For example, you know, they're, they're very comfortable, or at least more comfortable with the idea of, you know, being part of a, a data set of a billion other tweets, much less comfortable if you're looking at like 10 or a hundred, right? Like, Oh, why are you looking at me? Right. And thinking about these contextual factors, I actually think it's really, really important for thinking about research ethics.
How does it relate to the individual and the individual's relationship to a particular community? And we can see that, you know, sort of manifesting on, on Reddit right now. , and certainly, that's, that's kind of my interest in the space.
Gilbert: Just to follow up. So I've been really interested, back when I was doing my doctoral research on, you know, why people participate in different online communities, and, and how these differences, or like what, how these motivations differ across, various platforms. And so I was taking kind of a case study approach and the first one that I looked at was this Twitter community called HCSMCA Healthcare Social Media Canada.
And so back in 2015 or so, they were using this really kind of cool way of communicating on Twitter called a tweet chat where they would use this hashtag to meet up synchronously and have a conversation. And so it was this really cool space where , they would have the synchronous conversation usually over lunch once a week, and it was an opportunity for people across the healthcare system to have conversations.
In this community, you know, doctors were learning from patient advocates, for example. There was a sort of inverse of power. And so, I really, I, I kind of was new to this space, and one of the things that I learned just from observing this community and interviewing the members, particularly the patient advocates. , was, these issues that people have with medical research and the sort of the, the sort of, like, literally, like patients, like getting not data extracted from them, but like, you know, their, their actual sort of like body. Parts of their body, and then by participating in these research studies, and then never having any idea what came from it.
So, you know, they're taking these medications, they never find out if it's a placebo or the actual drug, or if it was effective, or like, or if it was just, you know, psychosomatic or whatever. And so they were advocating, you know, nothing about us without us, you know, you can't take these parts of our body.
And without, you know, like coming back to us without including us at all parts of the research project. And so that was really inspiring to me. I was like, I can do that. We should be doing that for online digital internet research as well. , and so when I was going into my next case study, which was actually on Reddit, I was an avid Reddit user at the time and decided I wanted to study my favorite community, which is Ask Historians, which Kyle has mentioned, and that's actually, that's how Kyle and I know each other is we're, I'm actually a moderator of Ask Historians as well, because I ended up researching it and then becoming a moderator, eventually myself.
And so after, you know, doing this research with the ask historians community, I really wanted to make sure that I was giving back and that they knew what I had found. And so I'd written up these kind of series of. posts, letting people know sort of what I found. It went really, really well. You know, it seemed to be really well received by the the wider community and the moderators.
And it sort of felt like, okay, there are these, you know, we can do this research in these spaces. And maybe if we think sort of innovatively about how we can engage with them ethically, you know, we can do this, you know, and give back to people in this, in this kind of way. , and so. You know, going, you know, moving forward, I, I became, I became a moderator in order to do more research with Ask Historian's community.
And in framing that, that project, I was actually really inspired by Kyle's writing. he had written a number of posts for Ask a Historian's about the extractive nature of, researchers, particularly ethnographers, which was this type of study I wanted to do, engaging with Indigenous communities, and some of the ways that you, you know, some of the things that you should not be doing, and some of the ways that were more productive, you know, don't be extractive, give back that reciprocity that he had mentioned earlier.
And so that's what inspired me to actually not just, you know, kind of come into the community of Ask Historians and just watch these people moderate, but actually kind of become a moderator. Myself, and figure out ways to kind of give back. , and so because I, you know, I've researched this, like Reddit in these contexts and sort of taken inspiration from people like Kyle or the community members that I studied on Twitter, you know, I was really interested in some of these ethical questions as well.
I also worked with some folks that, Nick has worked with as part of a postdoc also interested in looking at how users think and users feel, how they're comfortable with research uses of their data across these different kinds of contexts, you know, again, finding for me, it was, And across the studies that I did, informed consent was huge, which is obviously really challenging.
How do you get conformed consent when you're working with a data set of like sometimes tens of millions of people? And so, this is something that, you know, Nick and Casey and Michael and, you know, I know Kyle, a whole lot of people have been thinking about, how can we work with communities, you know, instead of necessarily getting all of that individual consents from into like, you know, people.
You know, can we work with communities and get consent that way? And how can we make, how can we encourage researchers to make their work more visible to people so that people know at various stages what kind of research is happening, you know, how their data might be used. , cause that was one of the things that we'd also found in the surveys that we had done in lieu of informed consent.
you know, just any kind of level of awareness increased people's comfort level with research being conducted. You know, so what are some creative ways that we can think of to do? And so we had recommended some of these things like, and like working with communities, finding ways to share back.
Cause that was also one of the things that we had looked at in that paper. Most people are not sharing their own work back with the Reddit community. It's never getting, it's never getting back there. All of these super valuable insights about Reddit. It's happening through this research and Redditors don't even know.
So that's been, that's been some of my inspirations.
Tillman: Wow. And, you know, it aligns with the ethical framework that IRBs and, you know, human subjects research works under, right? The Belmont Report, which you're, I'm sure, all familiar with, respect for a person's beneficence and justice. So, and you kind of talked about respect for persons, but where have you seen, I have two questions related to this.
Where have you seen the ethical frameworks, these ethical frameworks used in creative ways? Because you're right. You have to be creative about respect for persons. It's going to look different than the individual consent process that. IRBs prescribe, right? for research, where have you seen respect for persons, beneficence, and justice kind of show up in these frameworks that you've built, Kyle, or you've seen built, Nick and Sarah?
Pittman: Yeah, you know, it, it really will look different, I think, in, in each context or case by case. , one of the ways that I, that I see it and teach about as a, as an, as a research method or, or. Framework or paradigm really to, to approach this. , whether we're talking about online communities or, real life communities, is there needs to be the establishment of a relationship, right?
This, this concept we teach about in indigenous studies that you have a relationship to. Anything and everything, in this world is, is really key to many indigenous, paradigms, around, knowledge and research in general. And, you know, the days of the anthropologist, you know, peering behind the bush and looking at the people afar, studying them in their natural habitat, thankfully, are on their way out.
But there are still some who have very much inherited that kind of mentality or perspective. And that includes for online communities as well, where, whether it's with scraping of data, or trying to collect responses without being transparent or publicly noticeable. Some still try to attempt those things.
There is a big emphasis, when you do have a researcher in front of you to encourage them to not just limit their interactions to the community that are based around their research. We've had a number of people who have come by our community to do research and who have actually stayed, and who became part of the community.
They're recognized regular contributors to our community. They comment more than just about the things they're interested in research. And that does function as that form of reciprocity, but it touches more to the point of that, which is having that established relationship where somebody knows who you are, and you've built credibility with the community.
And that is something that happens, not just, on an individual to individual basis. , I might know that person because. I read their formal research request, but that that request will be denied, even if it's totally ethically approved by their I. R. B. It'll be denied if my community says that that's not a trustworthy person.
And so in a similar way, then, you know, Where you, where we see these kinds of frameworks, coming in, I think the best approach, even if it's not the easiest is that person needs to become part of the community. Especially with, with tribal people. If you're not part of the community, they're not going to reveal anything.
You know, the antithesis to what I think a lot of researchers feel, which is, oh, if I'm involved in the community, I'm going to create a bias that will be represented in my research. Right?
Pittman: you know, my response to that is that is a particular cultural framework that you're working from. And it is not 100 percent correct or true or objective, by depriving yourself of that kind of relationship or community connection.
You're not going to get the full story anyways. And so in that way, it's very much encouraging a paradigm shift and really a cultural shift in the researcher.
Proferes: Yeah. So if I could, if I could sort of tag on to that, so seconding everything that Kyle said, you know, one of the things that, we found in looking at the 727 papers on Reddit is that, actually only.
25 percent of them talked about ethics in any capacity, even if it was, we didn't seek IRB approval. We counted that as, that as talking about ethics.
Tillman: Right.
Proferes: So one of the big things that I think that people can do is just talk about what their ethics are and the research process. , literally just saying, you know, I'm following a process where I'm seeking, you know, approval from my IRB.
This is what they've made a determination about. These are the additional steps that I am choosing to take if they are choosing to take additional steps. , we've seen a lot of creative practices around how, people sort of manifest the Belmont Report principles in their work. Sometimes it's not just thinking about the individual, but also thinking about the impact, on the community as a whole.
That's really important, particularly for very, very small subreddits that are not going to have sort of the attention that might be drawn by, you know, a, a subreddit that has a million subscribers, right? If you're studying, you know, 20 people, the, the, the microscope seems, you know, a little more scary.
Proferes: you know, and so thinking about, You know, how to think about the community, the norms of the community, certainly, but also the expectations of people, in the community and in the community as a whole, thinking about ways to do things like obfuscate quotes, to not use usernames, in, particularly around controversial topics or sensitive information.
So, as Sarah mentioned. We found a lot of researchers have found that they are getting access to data that they can't even get in focus groups about things like drug abuse or recovery, about things like mental health support, you know, really, really critically important topics that scientists need to understand better so they can provide better support, but also situations where, you know, maybe it's not good to directly quote people in a way that makes them rediscoverable, through a Google search, there are steps that we can take to try to ensure respect for persons, beneficence, and, you know, certainly thinking about, ways of doing that.
And as Sarah said, also, you know, sharing back with the community. I actually, one of the things that we, one of the things that we found, we actually did a search to see whether or not these studies were ever shared on Reddit. And we found that in, in some cases they were, but they were often not shared back by the person who actually authored the paper, it had been some other user like, "Hey, we found that someone was studying us," and they were sharing it with the community, which can be both like, Oh, that's kind of cool that we're being studied or kind of terrifying, depending on, you know, the community.
Tillman: Yes. At PRIM&R, we talk a lot about lending to public trust, right?
And so to discover that you were studied does not necessarily bode well for trust. So, yeah.
Proferes: Yeah, absolutely.
Go ahead, Sarah.
Gilbert: Oh, just to sort of, it gets so complicated though, because one of the things about Reddit is that historically, it has been, a site that hosted communities that were very well known for instigating harassment and abuse, particularly against marginalized and vulnerable populations: a lot of hate, like literal slurs as the subreddit. , and this was allowed on Reddit for years and years and years, and so, and, and, and a lot of really, like, and violent movements as well, that are really important to study, like for, particularly if you're studying online harassment or disinformation or radicalization, you know, really important things that, you know, we as a society need to know about and understand. And these have been open communities where these discussions are happening and really valuable sources of information for like, you know, how do people adopt conspiracy theories, for example, you know, there've been studies on that kind of thing.
And so it becomes really tricky where you have almost this kind of like adversarial research relationship with the community itself. So the thing that you're studying is incredibly important for society, but the actual community itself might not be comfortable with you studying that. And so it's like, well, do you, should you be getting community consent at that point?
Should you be making people aware of the research that you're doing, particularly if you are a member of a vulnerable or marginalized population yourself? And even more so if you're a member of a vulnerable or marginalized population that is being targeted by this community, you yourself can, like, become and find yourself on the receiving end of this kind of harassment and abuse, which is a huge problem for researchers right now. And so, you know, there are certainly cases where that, you know, that power dynamic I think is inverted. Where it's actually the community that has that sort of, that power, that gaze over the, over the researcher. And then, and those are cases where like, you know, we would normally, we totally, you know, recommend, you know, making yourself more visible, being transparent, being accountable to the actual community itself. But there are certainly cases where, it's very complicated and you might not necessarily be able to do that without putting yourself or your students or your colleagues or your institution at risk.
Tillman: Wow. Interesting and complex, right?
Very, very complex. So we've heard that Reddit is in the process of creating a platform for researchers called Reddit for Researchers. How would that work, and what do you think that's going to accomplish?
Proferes: Yeah, so, I can jump in a little bit on that. So, Sanjay Karim is the new head of, research science at Reddit, and I believe that he and his team have launched a beta program for accessing research data, and it allows sort of folks to run queries and export data, and he stated that they are partnering with a group called OpenMind, who are thinking through how to put appropriate safeguards in place to sort of enforce Reddit standards around user privacy.
Privacy. And I know they have a plan to sort of build out their initial, sort of community governance model, which will actually ideally enable members of the research community to also provide feedback regarding research data requests. And, sort of doing this also based on existing sort of ethical guides and frameworks.
So right now, I think they're still figuring out how to sort of balance all of this. This is still a sort of a beta test as I understand it. , they just had the close of their first call for, for data, essentially the application process. I'm really hopeful that they're going to figure out some of the balance between the academic side, the user side and the community side of this, because I do think it's really important that researchers are researching in the space for all of the myriad of reasons that, you know, Sarah, has, has already listed.
And you know, this is a really tricky thing to navigate. I think that we're entering a space, in time in which many of the data infrastructures that researchers have historically relied on. are being closed off. So Facebook recently closed, down CrowdTangle. Twitter has, sort of narrowed in, what you can get from the API, at least without having to pay.
And the, the costs are also quite high for academics if you do want to pay. , and you know, having access to data is really, really important for having that sort of independent. , research in place. So I'm very hopeful for this. It's also a very big challenge for them to sort of, harness and figure out.
Tillman: Right. Okay. I bet. So my last question, I'm kind leading in with that being hopeful about this platform and others. What's coming next? What do you see coming next? What can we help to learn from researchers who are working in spaces like this? And how can moderators who are acting with these ethical frameworks in mind, continue their effort to protect, but also encourage these conversations and communities that have been created.
Pittman: Yeah, I'll jump in there. So, you know, it on on Indian Country, we get a lot of requests that center around mental and physical health, sociological studies, and even issues around, domestic partner disputes or, or, you know, very serious things. And I feel like a lot of these are related to a number of factors.
But prime among them is that indigenous peoples like American Indians, Alaska natives represent a very small percentage of the population in the United States. , as one place, for example, where we're approximately about 2 percent of the population, and among that, some of the most disadvantaged people in many statistics, like health, violence, poverty, police brutality.
And so because of that informed researchers who decide to include us at all really see us as, as being needed to be represented in their data pools, right?
Pittman: and there is a lot of, value merit in that, right? Ensuring that indigenous peoples are being represented and that way these disparities that we're facing can be accurately reported on and so forth.
So there's no, there's no dispute that there is merit to these efforts. But the problem is right that, researchers who are not informed enough about ways to, approach these communities to get access to their experiences, and this data, much like Sarah was saying, who, who do these in erroneous ways, right?
They end up propagating these misunderstanding stereotypes or even harming communities. And so what I see for the future here as far as like online research goes and particularly within Reddit. I very much see just an encouragement of the things that we've laid out here, and seeing community agency coming to the fore where now with restrictions on the API and, you know, more innovative or novel ways needing to be invented to conduct this kind of large scale research.
Researchers are being mindful, more mindful about how they can obtain that. But again... I think that means then it's incumbent upon these communities to also re evaluate their relationship to research and researchers to implement these. , and for researchers to be in support of that.
I don't like to think that any researcher going out there is intending to do harm. But the, the unfortunate reality is that many of them do inadvertently. And so if we are, saying that we want to be ethical people, that we want to hold ourselves to these principles and these guidelines that we've spoken about here, then that means researchers need to also be joining that call.
And if a community lacks, these kinds of mechanisms or safeguards that the researcher is ensuring they've built it into their own projects, own initiatives, just as Nick and Sarah had elaborated on earlier. So that's really where I see a lot of this future going is hopefully these shifts in access to data are going to create new paradigms for researchers are more responsible, in how they go, go about obtaining that data.
Tillman: Good point. Sara and Nick, anything you're hopeful about or, or what we should expect in the future?
Proferes: Sure. So I can go and I'll give Sarah maybe the closing word. I would say that the big things I'm hopeful for are, that this can be an impetus for thinking about ethics education across a variety of domains.
Proferes: And what I mean by this is we saw, so many different disciplines using data from Reddit and in our analysis of the 727 papers, it's not just, you know, medicine and, and, and, you know, social scientists, it's people in computer science, it's people in philosophy, it's people in law. And I think all of us collectively need to think about what and how we're training our students in, in terms of, ethical education. One of the difficulties is that many domains have had very different approaches to, training students into, ethical practices. , historically, you know, certainly, thinking about things like the Belmont Report had really been the domain of biosciences, medical students, right?
And so now suddenly other people are having to think about. how this applies. So I'm really hopeful for ways of thinking holistically about ethics education across all of these different disciplines of science. I'm also really excited about some of the norming effects that can go on by talking about our ethics and our ethical approaches in our papers.
One of the big challenges here is that this is a really gray space. Right. The IRBs have, and have been using federal definitions of what constitutes human subjects research for a very long time. And typically that does not always include, you know, publicly available information on the internet.
And, you know, suddenly, we're having the conversation. Well, there might still be ethical issues that are going on here and that are at stake. And having that conversation is really critically important, and again, across domains. And then, you know, finally, I think the big thing that I think this gives us an opportunity to sort of think about is again, that question of how do we make sure that we are being transparent to the people that we're studying or the communities that we're studying?
I think that You know, for me, one of the things I fear is that, you know, that scientists are just looking at this as a data source rather than people. And I think that we need to go back to, you know, the Belmont principle, certainly, but that idea actually that comes from, from Reddit in many ways of remember the human, right, that's on the other side of the screen, and so that's something else that sort of gives me hope.
Gilbert: Yeah, Nick, just definitely seconding that like that remembering that like that the data are actually people I think is incredibly important. So I just, I just want to, you know, do a plus plus one to that there. Also building on some of the education. I think this is a really cool, unique opportunity to educate both people and communities and moderators whose data is being used for research on what ethical research looks like and helping them think creatively about what ethical research looks like for their communities because as Kyle has mentioned, you know, it is really highly contextual, both, you know, within individual communities themselves and also across Reddit, you know, so maybe it's fine to use threads or data collected on certain topics within a community, but maybe maybe others.
And so understanding more about what users are comfortable with, what they find acceptable, and then helping them articulate that, helping them develop their own policies through this education, helping them evaluate it. I think it was really striking when Kyle said, I'm the only one on the team who has the education or the experience to be able to do this.
You know, the team was really like Kyle's team is really lucky to be able to have that, that expertise and not all moderation teams are going to be able to have that and not all communities are necessarily going to be aware of it. And so I think that there's a really cool opportunity here to provide more information about, you know, what research is happening, what are the risks, how do we mitigate them, what are people comfortable with so that the things that we are doing are in better alignment with what people expect so that we're not coming up against these clashes. I think that there's really important educational components and really sort of exciting, creative place that we can think about, how to develop these very user generated inspired community inspired protocols for how we work within, within these spaces.
And then again, it gets really complex when you're combining data from multiple communities. And how do you do that? There's just, there's so, there's so much more to learn here in about how to do research ethically on Reddit, because it is such a complex space. I think that there's a lot of room to grow, and I'm really hopeful that the conversations that, you know, that we are having and that like Reddit as this, you know, a very important spot and in like, are a very important data source.
One that's been in the news that they are one of the few companies that are at least working towards establishing a program for researchers that there's a lot of opportunity here to build something. Something that could be like a model for other, for other platforms.
Tillman: Absolutely. And even a model for offline research as well. There are many different concepts and principles that you talked about that can absolutely be applied, in offline situations . Well, our time is up for now. I believe that, the next year or so, if we get back together, our conversation may look a little bit different, with more that you've learned, and having gained more experience.
And so I look forward to speaking with you again, and, and continuing to learn about research in this space. So thank you, Sarah, Kyle, and Nick for your time and also your expertise.
##