Tuesday, November 5, 2024

ChatGPT Under Fire for Inaccuracy by EU Data Protection Board

Must read

OpenAI is not doing enough to combat inaccurate and false output generated by ChatGPT, the European Data Protection Board (EDPB) task force informed in its preliminary report investigating the chatbot’s compliance with the European Union’s privacy-protection law General Data Protection Regulation (GDPR).

“Although the measures taken to comply with the transparency principle are beneficial to avoid misinterpretation of the output of ChatGPT, they are not sufficient to comply with the data accuracy principle,” the taskforce noted.

The EDPB is composed of EU national data protection authorities and the European Data Protection Supervisor and assists the European Union in ensuring compliance with data protection rules across EU countries. In 2023, after Italy imposed a temporary ban on ChatGPT, the EDPB launched a task force to cooperate and “exchange information on possible enforcement actions conducted by data protection authorities” against AI companies like OpenAI.

The EDPB taskforce has now published a preliminary report on certain aspects of the ongoing investigation.

Highlights from the report:

1. In determining the lawfulness of processing personal data, the taskforce categorized different stages of processing into a collection of training data, pre-processing actions including data filtering, training, prompts, and ChatGPT output, and training the bot with prompts.

2. The report showed that the first three stages which mainly include collection of training data via methods like “web scraping” pose risks to fundamental rights and freedoms of individuals. This is because such methods enable the extraction of information from different publicly available sources on the Internet, which can also contain users sensitive data. According to the report, in the context of web scraping, OpenAI has brought forward Article 6(1)(f) of the GDPR, which allows for processing of the legitimate interests of the controller or OpenAI in this case, however, this has to be evaluated against the data subject’s fundamental rights.

3. The taskforce’s investigation into the lawfulness of data processing is still pending, but it emphasized safeguards for addressing the impact on data subjects. This includes technical measures, defining precise criteria for data collection, methods for filtering data, and deletion or anonymisation of personal data collected via web scraping. Further, the burden of proof for the effectiveness of the measures lies with OpenAI.

4. When it comes to ChatGPT input or prompts and output, considered to be the “content” the taskforce noted that OpenAI provides the users an option to opt out such content for training purposes. However, the taskforce stated that OpenAI should, in any case, “clearly and demonstrably” inform users that the content may be used for training purposes, which will also be an essential factor in determining lawfulness.

5. In compliance with the principle of fairness under Article 5(1)(a) of the GDPR, the report noted:

“With regards to ChatGPT, this means that the responsibility for ensuring compliance with GDPR should not be transferred to data subjects, for example by placing a clause in the Terms and Conditions that data subjects are responsible for their chat inputs.

Rather, if ChatGPT is made available to the public, it should be assumed that individuals will sooner or later input personal data. If those inputs then become part of the data model and, for example, are shared with anyone asking a specific question, OpenAI remains responsible for complying with the GDPR and should not argue that the input of certain personal data was prohibited in the first place.”

It stated that while OpenAI already presented measures they put in place to address these issues, the taskforce has yet to examine the effectiveness of the measures undertaken.

6. Taking into consideration that due to the probabilistic nature of the system and the tendency of the model to produce biased or made-up outputs, the taskforce highlighted that, in any case, it is necessary to comply with the principle of data accuracy as per Article 5(1)(d) GDPR.

“In line with the principle of transparency pursuant to Article 5(1)(a) GDPR, it is of importance that proper information on the probabilistic output creation mechanisms and their limited level of reliability is provided by the controller, including explicit reference to the fact that the generated text, although syntactically correct, may be biased or made up,” the report stated.

Also Read:

Latest article