Newsletter
24.09.2024
# 1
#DSA40 Data Access Newsletter
Risk in focus: Image-based Sexual Abuse
Hello everyone,
this is the first #DSA40 Collaboratory newsletter. Every month, it’s supposed to give you a short overview over (1) new developments at the Collaboratory, (2) what happened in the realm of platform data access, and (3) news items relating to various (potentially systemic) risks, with a monthly focus on a specific risk category.
Let’s get straight to it.
Regarding the Collaboratory
At the start of July, the Advisory Council to Germany’s DSC, the “Bundesnetzagentur” (engl. Federal Network Agency), was appointed. One of our principal investigators, Ulrike Klinger, has been appointed as a member and took part in the first meeting of the council on September 18. The establishment of an advisory council is not mandated in the DSA itself but was included in Germany’s transposition into national law. The council is composed of 14 independent representatives from academia, industry, and civil society. It is meant to advise the DSC in matters of DSA application, enforcement, and with regard to scientific questions (especially concerning data processing). Thus far, there have been no comparable initiatives in other EU member states.
We are also happy to welcome Sophia Graf to the core team of the Collaboratory. Since August, she has been supporting our work as an student assistant.
Data Access Update
The past months have seen the release of multiple reports on data access: in late July, the Coalition for Independent Technology Research (CITR) published Blocking our Right to Know: Surveying the Impact of Meta’s CrowdTangle Shutdown. It provides the results of a survey among researchers using CrowdTangle to describe the impact of its shutdown and examines the lack of functionality of Meta’s current access programme. It ends with a plea for Meta to keep CrowdTangle online.
Eventually, CrowdTangle was shut down – about a month ago, on 14 August 2024, ahead of the US elections. The other reports were thus written in anticipation of or after the end of CrowdTangle.
In early August, the Mozilla Foundation released Public Data Access Programs: A First Look, and at the beginning of September, the European Digital Media Observatory (EDMO) published a public summary report on a Workshop on Platform Data Access for Researchers held in May, which was followed only a few days later by Democracy Reporting International’s Access Granted: Why the European Commission Should Issue Guidance on Access to Publicly Available Data Now. Even though the addressees of these three reports vary, all are concerned with existing issues for data access based on DSA Art. 40 paragraph 12, mandating access to “data [that] is publicly accessible” for “researchers, including those affiliated to not for profit bodies”. We have prepared a short overview but encourage you to take a close look at the documents yourself.
report title | Public Data Access Programs: A First Look | Workshop on Platform Data Access for Researchers | Access Granted: Why the European Commission Should Issue Guidance on Access to Publicly Available Data Now |
---|---|---|---|
published by | Mozilla Foundation | European Digital Media Observatory | Democracy Reporting International |
method | scoring based on platforms’ documentation and written responses | summary of workshop panel and presentations | survey of existing publications |
identified issues |
|
|
|
recommendations | |||
for platforms |
|
| |
for regulators |
|
| develop (through public consultation process) and issue separate guidelines for the implementation of DSA 40(12), which should address
|
for research community |
|
| |
for all stakeholders |
|
The recently published paper “Fulfilling data access obligations: How could (and should) platforms facilitate data donation studies?” by Hase et al. also provides a list of recommendations as a response to 14 challenges to access identified through a structured review of data donation projects. It mainly focuses on data access based on Art. 15 GDPR but also holds relevance for data access more generally: According to the paper, platforms “do not provide accessible, transparent, and complete data to platform users”. The authors propose researchers and policymakers should lobby for their inclusion in legal frameworks, develop guidelines on data access rights, monitor and, where applicable, sanction non-compliance with data access rights, and collaborate to extend existing and propose new infrastructures for improving data access.
While this is not the place to dig deeper, we agree with the shared sentiment of all publications: The current state of data access is riddled with issues and needs all stakeholders – regulators, platforms, and researchers – to collaborate in order to improve it. The capacity to adequately understand and react to systemic risk rests upon necessary changes to the current modes of access. Because, as Steve Rathje, postdoctoral fellow in Psychology in the Social Identity and Morality Lab at New York University, succinctly summarised the purpose of data access in Nature’s correspondence section at the start of the month: “Data access for independent researchers will help us to assess the risks of online platforms and inform evidence-based policymaking.”
The minimal examples in DSA 34(1) provide an indication of how many (potential) negative consequences need to be considered for systemic risk assessment: including negative effects on fundamental rights, civic discourse and electoral processes, or the protection of minors. This month we will focus on a risk that lies at the intersection of two kinds of systemic risk explicitly named in paragraph 1(d): gender-based violence and negative effects in relation to the protection of minors.
Risk in focus:
Image-based sexual abuse
“We’re exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts”
– Sam Altman, CEO OpenAI, May 2024
What is image-based sexual abuse?
Image-based sexual abuse (IBSA), is a form of technology-facilitated sexual violence and describes the non-consensual creation, distribution, and threat to distribute, nude or sexual images. Depending on the age of the person depicted, the IBSA content is differentiated into non-consensual intimate images (NCII) or child sexual abuse material (CSAM). IBSA disproportionately affects women, young people, and members of marginalised communities.
How are online platforms and services related to this risk?
Regarding the distribution of nude or sexual images, online platforms are especially prone to facilitate IBSA, as they connect large numbers of people. Additionally, with the rapid development of Convolutional Neural Networks, so-called “AI” models for image generation have been used to generate NCII and CSAM.
It has been shown that an image generator provided by stable diffusion was used to generate CSAM and a survey released late August found that “1 in 10 US Minors say their friends use AI to generate nudes of other kids”.
In January, explicit AI-generated images of Taylor Swift went viral on X which were generated using image generation tools provided by Microsoft. Also, at the start of the month, journalist Ko Narin unveiled that in Korea members of dozens of chat groups on Telegram had created deepfake pornography based on images of university – but also high school and middle school – students and teachers in a “systematic and organised” process. While Telegram apologised this time around, it had enabled similar crimes involving deepfake porn, blackmail and rape only five years ago.
What are possible risk mitigation measures?
Two weeks ago, the White House extracted private sector voluntary commitments to combat IBSA, which provide an idea of potential measures can be taken by companies to prevent and reduce the facilitation of this form of technology-facilitated sexual violence:
To reduce the spread of known material, some platforms and search engines have partnered with organisations providing hash-based prevention tools to detect and remove content containing abusive imagery before it’s uploaded or displayed. Functional reporting and content moderation mechanisms are other key mitigation measures to deal with new or altered material. The fact that for the last 6 months, the DSA Transparency Database lists 6986 decisions that specify the reason for moderation as “non-consensual image sharing” (6968) or “non-consensual items containing deepfake or similar technology using a third party’s features” (18), underlines the importance of this kind of oversight (while excluding the majority of insufficiently labelled moderation decisions).
Addressing the generation of new material, providers of image generation models have vowed to responsibly source their datasets and to safeguard them from image-based sexual abuse or to adapt their model development process.
Measures committed to by other companies include curbing payment services for companies producing, soliciting, or publishing IBSA.
However, it is important to stress that these measures are self-policed and thus exclude all potential ways of mitigation that may lie outside the platforms’ and services’ potential action space, such as interventions in institutions where people had engaged in IBSA or simple law enforcement.
How does it relate to systemic risk and data access?
Combatting IBSA links the fight against gender-based violence with the protection of minors. This shows that the different categories of systemic risk listed in DSA 34(1) should not be understood as mutually exclusive as the lines between them are blurry. But IBSA also blurs the lines between platforms: considering AI-generated IBSA, it becomes apparent that these kinds of risks might originate on one platform or service but can then spread on a completely different platform. This means IBSA is not just a cross-cutting risk in the sense of the OECD’s revised typology of risks for children in the digital environment, cutting across different categories of risk, but also cuts across platforms. This means that to fully understand the creation and potential mitigation of such risks, data from multiple platforms needs to be combined. While some advances can surely be made by examining public data or information in the DSA Transparency Database, the fact that a lot of IBSA is (thankfully) subject to moderation requires access to non-public data in order to dig as deep as necessary into how IBSA is generated and how it ends up and potentially spreads on platforms. Not all of these platforms will be VLOPS or providers of general-purpose AI models with systemic risk, but the analysis should start for those that are and might inform other kinds of investigations in the future.
Other Risks | Other News
We hope you enjoyed this first edition of the #DSA 40 Data Access Collaboratory Newsletter. If you have feedback or want to follow-up, please don’t hesitate to reach out! Perhaps you’re even considering forwarding it to people who you think could benefit from it. We’re not going to stop you.
If you’re in Washington, DC, we are also not going to stop you from attending the CrowdTangle funeral on September 30. If you go, please send our condolences.
We will be back next month with a new newsletter. Until then, keep asking for data access.