01_IBSA – DSA 40 Collaboratory

24.09.2024

# 1

#DSA40 Data Access Newsletter

Risk in focus: Image-based Sexual Abuse

Hello everyone,

this is the first #DSA40 Collaboratory newsletter. Every month, it’s supposed to give you a short overview over (1) new developments at the Collaboratory, (2) what happened in the realm of platform data access, and (3) news items relating to various (potentially systemic) risks, with a monthly focus on a specific risk category.

Let’s get straight to it.

Regarding the Collaboratory

At the start of July, the Advisory Council to Germany’s DSC, the “Bundesnetzagentur” (engl. Federal Network Agency), was appointed. One of our principal investigators, Ulrike Klinger, has been appointed as a member and took part in the first meeting of the council on September 18. The establishment of an advisory council is not mandated in the DSA itself but was included in Germany’s transposition into national law. The council is composed of 14 independent representatives from academia, industry, and civil society. It is meant to advise the DSC in matters of DSA application, enforcement, and with regard to scientific questions (especially concerning data processing). Thus far, there have been no comparable initiatives in other EU member states.
We are also happy to welcome Sophia Graf to the core team of the Collaboratory. Since August, she has been supporting our work as an student assistant.

Data Access Update

The past months have seen the release of multiple reports on data access: in late July, the Coalition for Independent Technology Research (CITR) published Blocking our Right to Know: Surveying the Impact of Meta’s CrowdTangle Shutdown. It provides the results of a survey among researchers using CrowdTangle to describe the impact of its shutdown and examines the lack of functionality of Meta’s current access programme. It ends with a plea for Meta to keep CrowdTangle online.
Eventually, CrowdTangle was shut down – about a month ago, on 14 August 2024, ahead of the US elections. The other reports were thus written in anticipation of or after the end of CrowdTangle.
In early August, the Mozilla Foundation released Public Data Access Programs: A First Look, and at the beginning of September, the European Digital Media Observatory (EDMO) published a public summary report on a Workshop on Platform Data Access for Researchers held in May, which was followed only a few days later by Democracy Reporting International’s Access Granted: Why the European Commission Should Issue Guidance on Access to Publicly Available Data Now. Even though the addressees of these three reports vary, all are concerned with existing issues for data access based on DSA Art. 40 paragraph 12, mandating access to “data [that] is publicly accessible” for “researchers, including those affiliated to not for profit bodies”. We have prepared a short overview but encourage you to take a close look at the documents yourself.

report title	Public Data Access Programs: A First Look	Workshop on Platform Data Access for Researchers	Access Granted: Why the European Commission Should Issue Guidance on Access to Publicly Available Data Now
published by	Mozilla Foundation	European Digital Media Observatory	Democracy Reporting International
method	scoring based on platforms’ documentation and written responses	summary of workshop panel and presentations	survey of existing publications
identified issues	detailed applications and vetting or specific and time-bound research questions required (limits research into real-time emerging threats) information, data documentation, and technical support all appear to be extremely limited no clear or agreed upon definition of what qualifies as “public dataˮ limited scope and availability of data through current access modes some platforms do not offer explicit data access points (but permit scraping)	lack of awareness of the data access tools among research community limited accessibility of existing data access programs complex application procedures requirement for applications to be linked to specific projects significant risks with regard to liability and fines imposed in contracts by the platforms concerns around data quality and reliability	complicated, varied and lengthy application processes stricter interpretation of eligible researchers and research questions (introduces additional hurdles and limits broad risk identification) application restriction of applications to the ones based on specific research questions (limits ongoing social media monitoring) potential legal risks for researchers due to Terms of Use unclear definition of “publicly available data” limited scope and availability of data through current access modes
recommendations
for platforms	maintain and enable access to time-series data increase rate-limits create path for rapid-response access publish more extensive, multilingual documentation enable and encourage the creation of third-party tooling develop robust and findable researcher resources allow organisation-based and project-based access to research programs create offerings for non-technical researchers	provide clear legal requirements that are not prohibitive for smaller organisations standardise clear data access API provisions establish more streaming/real-time APIs develop tools for non-technical researchers
for regulators	offer a formal definition of “public dataˮ in separate 40(12) guidelines ensure safe harbour for independent data collection for public interest research clarify the legitimacy of scraping provide funding to support initial exploration and testing of access programs	establish independent entity that could test and ensure data quality allow applications at an organisational level, not linked to specific projects	develop (through public consultation process) and issue separate guidelines for the implementation of DSA 40(12), which should address definition of “publicly available data” standardised application forms clear principles for eligibility criteria clear response timelines data protection expectations access under Article 40(12) as free of charge the legal status of public interest scraping permissible and impermissible clauses in terms of use dispute resolution mechanisms to resolve disputes related to the denial of data access
for research community	actively request data document experiences and edge cases to expand shared understanding of possible research based on platform data	support should be provided to researchers entering legal contracts foster collaboration
for all stakeholders		further shared understanding on data access and research and its essential role for society and democratic integrity

The recently published paper “Fulfilling data access obligations: How could (and should) platforms facilitate data donation studies?” by Hase et al. also provides a list of recommendations as a response to 14 challenges to access identified through a structured review of data donation projects. It mainly focuses on data access based on Art. 15 GDPR but also holds relevance for data access more generally: According to the paper, platforms “do not provide accessible, transparent, and complete data to platform users”. The authors propose researchers and policymakers should lobby for their inclusion in legal frameworks, develop guidelines on data access rights, monitor and, where applicable, sanction non-compliance with data access rights, and collaborate to extend existing and propose new infrastructures for improving data access.

While this is not the place to dig deeper, we agree with the shared sentiment of all publications: The current state of data access is riddled with issues and needs all stakeholders – regulators, platforms, and researchers – to collaborate in order to improve it. The capacity to adequately understand and react to systemic risk rests upon necessary changes to the current modes of access. Because, as Steve Rathje, postdoctoral fellow in Psychology in the Social Identity and Morality Lab at New York University, succinctly summarised the purpose of data access in Nature’s correspondence section at the start of the month: “Data access for independent researchers will help us to assess the risks of online platforms and inform evidence-based policymaking.”

The minimal examples in DSA 34(1) provide an indication of how many (potential) negative consequences need to be considered for systemic risk assessment: including negative effects on fundamental rights, civic discourse and electoral processes, or the protection of minors. This month we will focus on a risk that lies at the intersection of two kinds of systemic risk explicitly named in paragraph 1(d): gender-based violence and negative effects in relation to the protection of minors.

Risk in focus:
Image-based sexual abuse

“We’re exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts”
– Sam Altman, CEO OpenAI, May 2024

What is image-based sexual abuse?
Image-based sexual abuse (IBSA), is a form of technology-facilitated sexual violence and describes the non-consensual creation, distribution, and threat to distribute, nude or sexual images. Depending on the age of the person depicted, the IBSA content is differentiated into non-consensual intimate images (NCII) or child sexual abuse material (CSAM). IBSA disproportionately affects women, young people, and members of marginalised communities.

How are online platforms and services related to this risk?
Regarding the distribution of nude or sexual images, online platforms are especially prone to facilitate IBSA, as they connect large numbers of people. Additionally, with the rapid development of Convolutional Neural Networks, so-called “AI” models for image generation have been used to generate NCII and CSAM.
It has been shown that an image generator provided by stable diffusion was used to generate CSAM and a survey released late August found that “1 in 10 US Minors say their friends use AI to generate nudes of other kids”.
In January, explicit AI-generated images of Taylor Swift went viral on X which were generated using image generation tools provided by Microsoft. Also, at the start of the month, journalist Ko Narin unveiled that in Korea members of dozens of chat groups on Telegram had created deepfake pornography based on images of university – but also high school and middle school – students and teachers in a “systematic and organised” process. While Telegram apologised this time around, it had enabled similar crimes involving deepfake porn, blackmail and rape only five years ago.

What are possible risk mitigation measures?
Two weeks ago, the White House extracted private sector voluntary commitments to combat IBSA, which provide an idea of potential measures can be taken by companies to prevent and reduce the facilitation of this form of technology-facilitated sexual violence:
To reduce the spread of known material, some platforms and search engines have partnered with organisations providing hash-based prevention tools to detect and remove content containing abusive imagery before it’s uploaded or displayed. Functional reporting and content moderation mechanisms are other key mitigation measures to deal with new or altered material. The fact that for the last 6 months, the DSA Transparency Database lists 6986 decisions that specify the reason for moderation as “non-consensual image sharing” (6968) or “non-consensual items containing deepfake or similar technology using a third party’s features” (18), underlines the importance of this kind of oversight (while excluding the majority of insufficiently labelled moderation decisions).
Addressing the generation of new material, providers of image generation models have vowed to responsibly source their datasets and to safeguard them from image-based sexual abuse or to adapt their model development process.
Measures committed to by other companies include curbing payment services for companies producing, soliciting, or publishing IBSA.
However, it is important to stress that these measures are self-policed and thus exclude all potential ways of mitigation that may lie outside the platforms’ and services’ potential action space, such as interventions in institutions where people had engaged in IBSA or simple law enforcement.

How does it relate to systemic risk and data access?
Combatting IBSA links the fight against gender-based violence with the protection of minors. This shows that the different categories of systemic risk listed in DSA 34(1) should not be understood as mutually exclusive as the lines between them are blurry. But IBSA also blurs the lines between platforms: considering AI-generated IBSA, it becomes apparent that these kinds of risks might originate on one platform or service but can then spread on a completely different platform. This means IBSA is not just a cross-cutting risk in the sense of the OECD’s revised typology of risks for children in the digital environment, cutting across different categories of risk, but also cuts across platforms. This means that to fully understand the creation and potential mitigation of such risks, data from multiple platforms needs to be combined. While some advances can surely be made by examining public data or information in the DSA Transparency Database, the fact that a lot of IBSA is (thankfully) subject to moderation requires access to non-public data in order to dig as deep as necessary into how IBSA is generated and how it ends up and potentially spreads on platforms. Not all of these platforms will be VLOPS or providers of general-purpose AI models with systemic risk, but the analysis should start for those that are and might inform other kinds of investigations in the future.

Other Risks | Other News

Protection of minors	Civic discourse and electoral processes	Public security
YouTube to limit teens’ exposure to videos about fitness and weight across global markets- Tech Crunch TikTok must face lawsuit over 10-year-old girl’s death resulting from “blackout challenge” – Reuters Kids Online Safety Act passed US Senate – Reuters	Taylor Swift endorses Kamala Harris in response to fake AI Trump endorsement – Verge A new report examines how TikTok users decide what to believe – Washington Post Google rolls out safeguards for more of its AI products ahead of the US presidential election – TechCrunch Photos of European influencers used to push pro-Trump propaganda on fake X accounts – CNN	China uses LinkedIn to recruit academics for espionage, Czech intelligence warns – Euractiv Facebook flagged and removed emergency wildfire information as ‘spam’ – Washington Post

We hope you enjoyed this first edition of the #DSA 40 Data Access Collaboratory Newsletter. If you have feedback or want to follow-up, please don’t hesitate to reach out! Perhaps you’re even considering forwarding it to people who you think could benefit from it. We’re not going to stop you.

If you’re in Washington, DC, we are also not going to stop you from attending the CrowdTangle funeral on September 30. If you go, please send our condolences.

We will be back next month with a new newsletter. Until then, keep asking for data access.

Newsletter

24.09.2024

# 1

The DSA40 Collaboratory connects, informs and supports researchers with platform data access applications based on DSA Article 40.

If you have submitted an accepted, rejected or pending application, let us know through our tracker.

Get our newsletter straight to your inbox!

Newsletter

24.09.2024

# 1

#DSA40 Data Access Newsletter

Risk in focus: Image-based Sexual Abuse

Regarding the Collaboratory

Data Access Update

Risk in focus:Image-based sexual abuse

The DSA40 Collaboratory connects, informs and supports researchers with platform data access applications based on DSA Article 40.

If you have submitted an accepted, rejected or pending application, let us know through our tracker.

Get our newsletter straight to your inbox!

Risk in focus:
Image-based sexual abuse