Data: Regulation and Governance, Present and Future.

by Natasha Kong


Data is the fuel of the Fourth Industrial Revolution. Estimates from the IDC predict that the volume of data globally will grow from 33 zettabytes in 2018 to an expected 175 zettabytes in 2025.[1] This growth is inextricably linked to the advancement of artificial intelligence (AI), which is set to develop exponentially beyond the 2030s. AI is the basis of many “smart” products, the driver behind productivity across multiple economic sectors, and is harnessed by governments to enable better policy making.[2]Reference to AI usually means “machine learning”, where algorithms recognise patterns in large data sets (big data) consisting of personal data, learn from it and subsequently draw conclusions in an autonomous manner. Therefore, big data is the fuel for these AI systems.

But with big data comes big responsibility. Data protection has been at the forefront of law’s engagement with AI as many AI applications require the mass processing of personal data (big data). In the landscape of data protection laws, the GDPR stands as a monolith. But it is not without its flaws. Review of scholarly discussion suggests some of the fundamental principles underpinning the GDPR limit its effectiveness at regulating AI and there is some reasonable doubt whether the GDPR can balance its twin objectives of protecting the rights of data subjects and supporting commercial interests. Data trusts are a way of experimenting with new data infrastructures and have been discussed as a way to address the GDPR’s drawbacks. But they are also not without their flaws, as a discussion of a recent attempt will show.

At its core, the difficulty of legislating data is that data has undergone a paradigm shift into an entire field of politics in and of itself. Thus, one over-reaching legislation such as the GDPR will not suffice to address the issues that arise from data, now and in the future. Instead a governing framework consisting of different building blocks such as data infrastructure will have to be built. Navigating those while updating data protection for an AI supplemented world will be the major regulatory challenge way into the 2030s.

The advent of AI in the 2020s and beyond: the double-edged sword

AI is on the rise. McKinsey’s simulation predicts around 70% of businesses will adopt at least one type of AI technologies by 2030.[3] AI makes decisions faster, more accurately and efficiently in a large scale. For example, research suggests that AI can better detect skin cancer than human dermatologists.[4] Therefore, AI supposedly promises to solve some of society’s most pressing issues.

But AI is a double-edged sword. The same properties of scale and autonomous decision-making which make AI such a powerful tool, create potential for errors to occur before an organisation can be aware or prevent them. Growing evidence shows that AI systems can perpetuate historic biases and discriminate against classes of individuals. Examples include COMPAS, the sentencing advice software used in various U.S. jurisdictions, which was found to discriminate against black defendants.[5] Or when the Gender Shades Audit found that commercial facial recognition systems sold to the police had higher error rates for people of colour, which places people of colour at higher risk.[6] Nevertheless, this is rarely intentional on the data scientist’s part. Rather, such bias is commonly attributable to historically grown inequality reflected in the data that is given an appearance of objectivity by treating the algorithm as impartial or decision-making systems that assume correlation necessarily implies causation.[7]

As tech firms are unaware that their algorithms could produce such errors, they are sold to third parties and deployed in increasingly impactful decision areas. These are causes for concern as these decisions can affect human rights. Michigan’s automated welfare fraud detection system was found to have made a wrong decision in 93% of its fraud determinations[8], raising concerns that the world is stumbling into a “digital welfare dystopia”.[9] Nor can companies be left to self-govern. Despite studies showing its own facial recognition software was disproportionately incorrectly matching people of colour with mugshots of those who have been arrested, Amazon continued selling their product to US law enforcement. As AI driven technology is predicted to grow exponentially throughout the 2020s and beyond the 2030s[10], these concerns gain even more prescience and require regulatory oversight.

Law’s engagement with AI: data protection

Data protection has been at the forefront of law’s engagement with AI as many AI applications require the mass processing of personal data (big data). The GDPR is an example of data protection regulation. Touted as one of the strictest and most far-reaching data protection regulation, the “GDPR is interesting because it is the first time that the EU is exporting regulation.”[11]This is due to the fact that the GDPR applies to any businesses globally that collect or process EU resident data, in virtue of Article 3(2).[12] Therefore, any company, even if not situated in the EU, has to comply with the GDPR if they want to have access to the EU market. Additionally, the GDPR has been a point of reference for other nations when enacting or updating their own national data privacy laws. Examples include California’s CCPA[13] and Canada’s PIPEDA.[14] In the landscape of data protection laws, the GDPR stands as a monolith and will play an integral role in the foreseeable future.

The GDPR holds several tools to protect data subjects, but arguably has some fundamental problems that prevent it from being entirely effective. The provisions in the GDPR such as; Article 22[15] ( right not to be subject to a decision based solely on automated means), Article 35[16] (data protection impact assessment) and the Data Protection Authorities[17] provide some protection to the data subject from algorithmic decision-making. While European Court of Justice (ECJ) jurisprudence has yet to establish the exact operationalisation of many of the GDPR’s provision, various analyses by legal scholars[18] have already pointed out that the narrow focus of the GDPR on personal data and consent as a lawful basis of data processing make it highly unlikely that the GDPR confers sufficient protection for data subjects against wrongful algorithmic decision-making.[19]

This is no wonder, as the GDPR has been as much a product of market regulation as of human rights law. The EU’s Strategy for Data states that its central focus will always be on fundamental human rights.[20] But it faces pressures, both economically and geopolitically. The EU has made clear it wants global leadership in AI.[21] However, it has been lagging behind[22] its other competitors (USA and China) in the “AI arms race”. Blame has been attributed towards the EU’s highly regulated market environment.[23] As the “AI arms race” has taken a winner-takes-all dynamic, computational efficiency of AI systems may be prioritised over non-functional considerations (such as privacy, transparency and fairness). Therefore, the concept of “personal data” and “consent” are not only for the protection of data subjects, but also mechanisms to enable data to be used for commercial activity. They are necessary concessions that the EU has made to remain competitive in the digital economy.

Personal data

The remit of the GDPR to only “personal data” is currently problematic. “Personal data” is defined narrowly in the GDPR as “data relating to natural persons who can be identified or are identifiable directly or indirectly from the data in question.”[24] But “in a big data world, what calls for scrutiny is often not the accuracy of the raw data but rather the accuracy of the inferences drawn from the data.”[25] Therefore, the actual risks posed by algorithmic decision-making are the underpinning inferences that determine how data subjects are being viewed and evaluated by third parties. However, as Oostveen[26] points out that inferences do not necessitate having identifiable data as Big Data is only interested in determining patterns, and therefore, inferences are not considered “personal data”. Thus, what constitutes the most harmful aspect of big data analysis is excluded from the remit of the GDPR.

But the GDPR’s limitation to “personal data” is not the crux of the issue. Wachter and Mittelstadt[27] demonstrate that the ECJ considers the assessment of the accuracy of inferences drawn on a data subject are not within the remit of data protection regulation. Data subjects’ rights are significantly curtailed when it comes to inferences, often requiring a greater balance with the controller’s interests. Thus, the GDPR gives “data subjects control over how their personal data is collected and processed, but very little control over how it is evaluated.”[28]


When the GDPR was enforced in May 2018, email inboxes were inundated with emails from companies requesting consumers to “reconsent” to the collection and usage of their personal data. Consent is one of the basis of lawful personal data processing[29]contained in the GDPR, where it must be “freely given, specific, informed and unambiguous”[30] and must be “a clear affirmative action”.[31]

But is consent adequate? Larsson[32] challenges the feasibility of asking consumers to protect themselves through consent mechanisms due to the widening information asymmetry between consumers and service providers. There is much concern that terms and conditions on online platforms etc. are overlong and full of technical terms, making it difficult to comprehend for a lay person. Therefore, genuine informed consent is unlikely. Furthermore, the threshold for what it means to give informed consent is increases by the day as new technology emerges. Do data subjects understand the technologies underlying the developments brought by the Fourth Industrial Revolution, such as ubiquitous, ambient data collection via the Internet of Things and distributed ledger technologies such as Blockchain? This gap is growing, making it harder for data subjects to understand the ramifications of their choices to share data. But the onus is frequently placed on the data subject to be “informed” which is referred to as “atomisation” by Ruppert et al.[33] where data subjects are individually responsible for their own protection. This is despite the power and knowledge asymmetries that exist between the individual and large corporations. But as Scassa[34]points out, “clicking ‘I agree’ is an act of surrender, not of consent.”

Going forward: data’s new instruments for data’s governance

As previously discussed, a concern of the current data regime is the power and knowledge disparities between data subjects and large corporations. Laudable as it is, the GDPR cannot by itself reverse these power imbalances between data controllers and data subjects. Current experiments in new forms of organizing and governing data are being developed to address this issue.

Enter the data trust.

Data trusts are one the current data infrastructures being researched that aims to empower data subject’s to exert their rights.[35] Data trusts are based on the fiduciary trust, that gives trustees authority to decide on the utilisation of data on behalf of a group of people. It allows different structures of data governance to provide solutions for a range of problems and needs. An example would be Delacroix and Lawrence’s[36] bottom-up data trusts where citizens are able to pool their rights within a trust and wield collective power to exert influence on how their data is used. Whilst a data trust ecosystem currently does not exist, it is an area of interest for the EU[37] and will likely be a possibility in the near future. When a data trusts marketplace emerges, data subjects would be able to choose one that would suit their needs and aspirations. This would seem to address concerns about data subject’s atomization and satisfy the current needs for data and economic freedom.

Data trusts’ flexibility also means that they are not a guarantee of anything. Sidewalk’s Urban Data Trust[38] is a case in point. The trust was introduced in response to public concerns about the collection of personal data in their proposed smart city development in Toronto. It allegedly gave the public a share in profits and an oversight role in the governance of personal data but instead functioned to store and protect the data collected in the smart city and generate profits from Intellectual Property and licensing fees. Whilst theoretically data subjects would have collective bargaining power, Sidewalk (sister company of Google) had tremendous lobbying power, funding and influence over local governing institutions. Thus, there is a danger that companies would engage in “trusts-washing” to skirt data protection laws in the pursuit of profit. Whilst data trusts can be a piece of the puzzle, it will not inherently create good data governance or resolve the questions surrounding data, now and in the future.


In summary, the GDPR is underpinned by some fundamental principles that will not enable it to meet all the issues regarding data protection. Nor will the implementation of data trusts or other data infrastructures be the solution to everything. It is clear that no one legal instrument is the panacea to resolve the host of issues that data brings. Further work will be necessary. And this is no wonder, as authors from the social sciences have long pointed out that data is not merely a field of commerce, but an entire field of politics in and of itself. As Ruppert et al. point out, politics that wants to address data has to be “concerned with not only political struggles around data collection and its deployments, but how data is generative of new forms of power relations and politics at different and inter-connected scales”[39]. The regulation of data is only in its infancy and Rome was not built in a day. Instead, this process will be an incremental one which can last beyond the Internet of the 2030s, as more and more building blocks are added to the edifice so that the issues regarding data protection, now and in the future, can be resolved to its minutiae.

Natasha Kong is a student at Cardiff University and an Editor of the Law Review.

[1] IDC ‘The Growth in Connected IoT Devices Is Expected to Generate 79.4ZB of Data in 2025, According to a New IDC Forecast’ <> accessed 19 July 2020. [2] Ibid. [3] Mckinsey ‘Notes from the AI Frontier: Modeling the Impact of AI on the World Economy’ <> accessed 19 July 2020. [4] Holger A Haenssle and others, ‘Man against Machine: Diagnostic Performance of a Deep Learning Convolutional Neural Network for Dermoscopic Melanoma Recognition in Comparison to 58 Dermatologists’ (2018) 29 Annals of Oncology 1836. [5] Julia Angwin and others, ‘Machine Bias’ (ProPublica) <> accessed 15 July 2020. [6] ‘The Two-Year Fight to Stop Amazon from Selling Face Recognition to the Police’ (MIT Technology Review) <> accessed 16 July 2020. [7] Ansgar Koene and others, ‘A Governance Framework for Algorithmic Accountability and Transparency’. [8] Michele Gilman, ‘AI Algorithms Intended to Root out Welfare Fraud Often End up Punishing the Poor Instead’ (The Conversation) <> accessed 14 July 2020. [9] ‘OHCHR | World Stumbling Zombie-like into a Digital Welfare Dystopia, Warns UN Human Rights Expert’ <> accessed 16 July 2020 [10] ‘Sizing the Prize: What’s the Real Value of AI for Your Business and How Can You Capitalise’ (pwc) <> accessed 19 July 2020. [11] ‘GDPR Has Established Europe as Leaders in Data Protection’ (Raconteur, 30 May 2018) <> accessed 19 July 2020 [12] GDPR, Article 3(2) [13] Adam Dietrich, ‘California’s Mini GDPR and It What It Means For Market Research’ <> accessed 20 July 2020. [14] ‘Benchmarking Data on the First Anniversary of the GDPR’ <> accessed 18 July 2020. [15] GDPR, Article 22 [16] GDPR, Article 35 [17] GDPR, Article 51 [18] Sandra Wachter and Brent Mittelstadt, ‘A Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI’ [2019] Colum. Bus. L. Rev. 494; Manon Oostveen, ‘Identifiability and the Applicability of Data Protection to Big Data’ (2016) 6 International Data Privacy Law 299. [19] Koene and others (n 7). [20] Commission, ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions, a European Strategy of Data’ COM (2020) 66 final, 4. [21] European Commission ‘Shaping Europe’s Digital Future’ <> accessed 11 July 2020. [22] ‘Europe Must Innovate to Have a Chance of Winning the AI “Arms Race”’ <> accessed 12 July 2020. [23] Ibid. [24] GDPR, Article 4(1) [25] Omer Tene and Jules Polonetsky, ‘Big Data for All: Privacy and User Control in the Age of Analytics’ (2012) 11 Nw. J. Tech. & Intell. Prop. xxvii. [26] Oostveen (n 18). [27] Wachter and Mittelstadt (n 18). [28] Tene and Polonetsky (n 26). [29] GDPR, Article 6 (1)(a) [30] GDPR, Recital 32 [31] Ibid. [32] Stefan Larsson, ‘Algorithmic Governance and the Need for Consumer Empowerment in Data-Driven Markets’ (2018) 7 Internet Policy Review 1. [33] Evelyn Ruppert, Engin Isin and Didier Bigo, ‘Data Politics’ (2017) 4 Big Data & Society 2053951717717749. [34] Bianca Wylie, ‘What Is a Data Trust?’ (Centre for International Governance Innovation) <> accessed 23 July 2020. [35] Commission (n 22) 10. [36] Sylvie Delacroix and Neil D Lawrence, ‘Bottom-up Data Trusts: Disturbing the ‘One Size Fits All’Approach to Data Governance’ (2019) 9 International Data Privacy Law 236. [37] Commission (n 22) 10. [38] Constance Carr and Markus Hesse, ‘When Alphabet Inc. Plans Toronto’s Waterfront: New Post-Political Modes of Urban Governance’ (2020) 5 Urban Planning 69. [39] Ruppert, Isin and Bigo (n 33).

© 2019 Cardiff University Law Review, presented by Cardiff University Law Society.