Notes from the IAPP Europe Data Protection Congress: oh, the futility of regulation

 

  • the creators of all this wonderful technical infrastructure we live in are under social and legal pressure to comply with expectations that can be difficult to translate into computational and business logics
  • it requires a healthy interdisciplinary team of legal scholars, business mavens, computer scientists, cognitive scientists, and tech geeks to understand data privacy – and that is lacking at most legal technology conferences
  • what GDPR misses is not the issue of data minimization but that these platforms exercise quasi-sovereign powers to actually institute a quasi-totalitarian rule across the many contexts we navigate
  • just two discussion points from his event in the following post:
    • data control ain’t the issue
    • explanation is distinct from transparency
       

6 December 2018 (Washington, DC) – The ubiquity of systems using artificial intelligence (AI) has brought increasing attention to how … and if … those systems should be regulated. AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before. That said, our AIs continue to lag in common sense reasoning – something John McCarthy noted way back in 1960, McCarthy being the American computer scientist and cognitive scientist who is considered a founder of the discipline of artificial intelligence. He said that our biggest issue was going to be that AI scientists will try to program computers to act like humans “without first understanding what intelligence is and what it means to understand”. He said we’ll need an “interdisciplinary approach” to deal with this, noting we’ll need computer scientists, neuroscientists (the word came into fashion in the early 1960s), lawyers and philosophers to work together.

So when it comes to events and workshops about the EU’s new General Data Protection Regulation (GDPR) … really any data privacy event … I tend to steer clear of legal-orientated tech events and try to attend only conferences that have a healthy interdisciplinary team of legal scholars, business mavens, computer scientists, cognitive scientists, and tech geeks. Because the creators of all this wonderful technical infrastructure we live in are under social and legal pressure to comply with expectations that can be difficult to translate into computational and business logics. This stuff is about privacy engineering and information security and data economics. Dramatically amplifying the privacy impacts of these technologies are transformations in the software engineering industry – with the shift from shrink-wrap software to services – spawning an agile and ever more powerful information industry. The resulting technologies like social media, user generated content sites, and the rise of data brokers who bridged this new-fangled sector with traditional industries, all contribute to a data landscape filled with privacy perils. You need folks that can fully explain the technical realities.

An event that did not meet my criteria was the IAPP Europe Data Protection Congress 2018 held last week in Brussels, Belgium. But I went anyway because I live in Brussels much of my time and one of my clients gave me a free ticket. And he was a session panelist. My big take-away from this event: we will not see peak enforcement for GDPR in 2018 due to:

  • the poor staffing and understanding of the law at the EU Data Protection Authority offices, and
  • the continuing legal war zone of obfuscation/corporate trickery regarding GDPR interpretation

“GDPR will save us!!”

Everybody seemed to have the same mantra: how the EU was “finally getting serious” with its GDPR data privacy enforcement and the whole world will follow it, and Europe’s Directorate General for Competition (DG COMP) was going after the misdeeds of the Silicon Valley Goliaths. They pointed to the DG COMP decision against Google. Yes, that fat fine was the clearest statement yet that Google’s practices break the law. Further, the restrictions DG COMP imposed on Google’s business model would “crimp Google’s behavior in key ways”. DG COMP does deserve thanks. Given the political power of Google, their actions took courage.

But what has really happened. In an industry that changes by the day, the Google case took eight years to complete. Further, it deals with just one part of a problem that is now very large and sprawling – that these platforms exercise quasi-sovereign powers to actually institute a quasi-totalitarian rule across the many contexts we navigate. And even after that €4.3 billion fine levied by DG Comp, did Google really care? Why should it. It was still left holding more than $100bn in cash. Said good friend Barry Lynn (he directs the Open Markets Institute which uses journalism to promote greater awareness of the political and economic dangers of monopolization and has done a deep dive into GDPR):

“Vestager’s fighters put out the fire on the first floor, but only after the blaze had spread to the rest of the building”.

And if you have been reading current developments, Google is still violating the DG COMP order and now the original complainants are screaming at DG COMP “Do something!!” But, alas, this Commission has begun to wind down as everybody preps for the election of a new Commission head next year. So it will be on the agenda for the next Commission.

Or the Uber case last year which came before the Court of Justice of the European Union (CJEU). The CJEU had an excellent chance to opine on platforms and competition law (several brilliant briefs had been presented the Court) but instead opted for an easy route: the Court ruled against Uber based purely on EU legislation concerning the provision of transport services.

Many legal vendors selling GDPR “solutions” indicated they were pissed off since they were expecting a “money tree” from the new GDPR. But the €€€ ain’t flowing. So they have cranked up webinars and workshops ad nauseam to get your attention. Everybody is waiting for some big fine, some big case to open the floodgates. Many were hoping the post-GDPR British Airlines data breach case would do it but that case now seems to be falling apart.

So hope springs eternal on the post-GDPR Facebook data breach case. Politically, Facebook is the perfect target — an increasingly unpopular American tech company with significant opponents on both the left and right. With the law still working itself out, the details of the case are less important than the overwhelming political logic. But the basic facts of the case have yet to be nailed down, and it’s tricky. As I noted in a post earlier, the new breach is a real contrast with previous GDPR fights, which have largely had to do with policy decisions and terms of service. This is not a “Cambridge Analytica” type case. The recent breach should be far simpler. It wasn’t a data-sharing project or an API gone wrong — so it’s hard to read the fallout as anything other than a breakdown in Facebook security. But the forensics on this stuff isn’t easy. The case is particularly complicated because the hack extended beyond Facebook itself, with third party accounts involved.

And nobody has ever litigated these issues before, so we only have a hazy sense of what a strong or weak GDPR case looks like. The company could be in for years of legal warfare and a billion-dollar payout — or it could walk away scot free. We’re just months into the GDPR regime, and there’s simply no roadmap for how it can be used.

Control is not the issue

Everyone at this event emphasized “control” of personal data as core to privacy. The need for data minimization. It was certainly the dominant privacy goal of the “Privacy by Design” efforts which seemed to dominate the 2 days I was in attendance.

But control is the wrong goal for privacy by design, and perhaps the wrong goal for data protection in general. Too much zeal for control dilutes efforts to design information tech correctly. This idealized idea of control is impossible. Control is illusory. It’s a shell game. It’s mediated and engineered to produce a particular control. If you are going to focus on anything, design is everything. The essence of design is to nudge us into choices. Asking companies to engineer user control incentivizes self-dealing at the margins. Even when good intentioned, companies ultimately act in their own self interests. Even if there were some kind of perfected control interface, there is still a mind-boggling number of companies with which users have to interact. We have an information economy that relies on the flow of information across multiple contexts. How could you meaningful control all those relationships? When control is the “north star”, lawmakers aren’t left with much to work with. It’s not clear that more control and more choices are actually going to help us. What is the end game we’re actually hoping for with control? If data processing is so dangerous that we need these complicated controls, maybe we should just not allow the processing at all? How about that idea? Anybody? Bueller? Bueller?

Regulators: do you really want to help? Then stop forcing complex control burdens on citizens, and make real rules that mandate deletion or forbid collection in the first place for high risk activities. But you won’t. You had your chance but you got played. Which will be my (sort of) “Part Deux” to this piece next week. Because despite all the sound and fury at this event, the implication of fully functioning privacy in a digital democracy is that individuals would control and manage their own data and organizations would have to request access to that data. Not the other way around. But the tech companies know it is too late to impose that structure so they will make sure any new laws that seek to redress that issue work in their favor.

The GDPR: the role of explanation

While there are many tools to increasing accountability in AI systems, I want to focus on the one that made it into the GDPR: explanation. The point is simple. By exposing the logic behind a decision, explanation can be used to prevent errors and increase trust. Explanations can also be used to ascertain whether certain criteria were used appropriately or inappropriately in case of a dispute.

What intrigues me is that despite the enormous number of Commission and Parliamentary hearings (plus the sea of white papers) on the question of when and what kind of explanation might be required of AI systems in the GDPR, the ultimate version of the GDPR only requires explanation in very limited contexts.

And if you work your way through the “right to explanation” provisions in the GDPR … notably the right to notification (Articles 13 and 14), the right to access (Article 15) and the right not to be subject to automated decision-making (Article 22) … the combination of these articles does not provide a full-fledged right to explanation. What you really get is a more limited “right to be informed”. Nice job, lobbyists.

When we talk about an explanation for a decision, we generally mean the reasons or justifyications for that particular outcome, rather than a description of the decision-making process in general. To take the definition from a white paper submitted to the Commission, when we use the term explanation we mean “a human-interpretable description of the process by which a decision-maker took a particular set of inputs and reached a particular conclusion”.

And of course we do not mean an explanation for EVERY decision, because explanations are not free. Generating them takes time and effort , thus reducing the time and e ort available to spend on other, potentially more bene ficial conduct. Therefore, the utility of explanations must be balanced against the cost of generating them. One of the white papers I noted above had a medical profession analogy. A doctor who explained every diagnosis and treatment plan to another doctor or patient might make fewer mistakes, but would also see fewer patients. So there has to be a cost-benefit analysis of generating explanations.

So what the GDPR tries to do is to allow only “demand explanations” when some element of the decision-making process … the inputs, the output, or the context of the process.. conflicts with our expectation of how the decision will or should be made and therefore the focus is on:

-Unreliable or inadequate inputs

-Inexplicable outcomes

-distrust in the integrity of the system

But you see the problem. The question of when it is reasonable to demand an explanation is more complex than identifying the presence or absence of these three factors. Each of these three factors may be present in varying degree, and no single factor is dispositive. When a decision has resulted in a serious and plainly redressable injury, we might require less evidence of improper decision-making. Conversely, if there is a strong reason to suspect that a decision was improper, we might demand an explanation for even a relatively minor harm. Moreover, even where these three factors are absent, a decision-maker may want to voluntarily off er an explanation as a means of increasing trust in the decision-making process.

So that is why the drafters of the GDPR actually looked at U.S. law for some guidance because, as one participant in the drafting process told me, the United States legal system “maps well” the three conditions I noted above which is why the astute U.S. reader will see the embodiment of the doctrines of standing, injury, causation, and redressability … as well as the general rule that the complaining party must allege some kind of mistake or wrongdoing before the other party is obligated to off er an explanation in the legal system.

Again, we are on new ground. As I indicted above, nobody has litigated these issues yet or fully lodged a request of this nature, so we only have a hazy sense of what a strong or weak request/GDPR case looks like.

My big problem at this event was that presenters were conflating explanation and transparency. Explanation is distinct from transparency. Explanation does not require knowing the flow of bits through an AI system, no more than explanation from humans requires knowing the flow of signals through neurons (neither of which would be interpretable to a human anyway). Instead, explanation, as required under the law is about answering how certain factors were used to come to the outcome in a specific situation. The regulation around explanation from AI systems should consider the explanation system as distinct from the AI system.

And it should be obvious. There will exist challenges in mapping inputs and intermediate representations in AI systems to human-interpretable concepts. While the notion of how explanations are used under the law can be formalized computationally, there remains a key technical challenge of converting the inputs to an AI system … presumably some large collection of variables, such as pixel values … into human-interpretable terms such as age or gender.

And another big issue … which points to the need for interdisciplinary teams to explain this stuff … is a  subtlety lost on several presenters and that is to create the required terms, the AI system will need access to potentially sensitive information. And if denied that information, AI systems can still create it. Currently, we often assume that if the human did not have access to a particular term, such as race, then it could not have been used in the decision. However, it is very easy for AI systems to reconstruct sensitive terms from high-dimensional inputs. The examples I use in my AI presentations is that data about shopping patterns can be used to identify term such as age, gender, and socio-economic status, as can data about healthcare utilization. Especially with AI systems, excluding a protected category does not mean that a proxy for that category is not being created.

From what I can see having spent a long time going through the GDPR … and which will be the subject of “Part Deux” … is that the GDPR cannot prevent the generation of these “proxies”. But that also gets into design issues, far beyond the space I have allowed for this already long post. And even if we could design against that happening, even if we could verify that an AI system is not discriminating against a protected term … who can guarantee that a human decision-maker is not accessing and combining the forbidden information with the AI system’s recommendation to make a final choice. Tricky stuff.

And the challenges increase if the relevant terms cannot be determined in advance. For example, in litigation scenarios. From Mason Kortz of the Harvard Law School Cyberlaw Clinic:

The list of relevant terms is generally only determined ex post. In such cases, AI systems may struggle; unlike humans, they cannot be asked to re fine their explanations after the fact without additional training data. For example, we cannot identify what proxies there are for age in a data set if age itself has never been measured. For such situations, we first note that there is precedent for what to do in litigation scenarios when some information is not available, ranging from drawing inferences against the party that could have provided the information to imposing civil liability for unreasonable record-keeping practices. Second, while not always possible, in many cases it may be possible to quickly train a proxy especially if AI designers have designed the system to be updated or have the parties mutually agree (perhaps via a third party) what are acceptable proxies. The parties may also agree to assessment via non-explanation-based tools.

And one other important point which my e-discovery readers will identify with (assuming they have read this far) is that one key diff erence between AIs and humans is the need to pre-plan explanations. We assume that humans will, in the course of making a decision, generate and store the information needed to explain that decision later if doing so becomes useful. Or mandatory.

In contrast, AI systems do not automatically store information about their decisions. Often, this feature is considered an advantage: unlike human decision-makers, AI systems can delete information to optimize their data storage and protect privacy. As explained by Ryan Budoish of the School of Engineering at Harvard University:

An AI system designed this way would not be able to generate ex post explanations the way a human can. Instead, whether resources should to be allocated to explanation generation becomes a question of system design. This is analogous to the question of whether a human decision-maker should be required to keep a record. The di fference is that with an AI system this design question must always be addressed explicitly. That said, AI systems can be designed to store their inputs, intermediate steps, and outputs exactly (although transparency may be required to verify this). Therefore, they do not suffer from the cognitive biases that make human explanations unreliable. Additionally, unlike humans, AI systems are not vulnerable to the social pressures that could alter their decision-making processes. Accordingly, there is no need to shield AI systems from generating explanations, for example, the way the law shields juries.

CONCLUSION

In summary, if you really want to build AI systems that can provide explanation in terms of human-interpretable terms, we must both list those terms and allow the AI system access to examples to learn them. System designers should design systems to learn these human-interpretable terms, and also store data from each decision so that is possible to reconstruct and probe a decision post-hoc if needed. Policy makers should develop guidelines to ensure that the explanation system is being faithful to the original AI.

But we are really talking about an overhaul of the existing data structure. Why? Simple. And it goes to my comments about platforms. As these platforms have moved into hardware, they have reconfigured our built environment into a cyberphysical infrastructure that determines the “choice architecture” we face in everyday life. They have erased the difference between online and offline and further complicating our access to reality with a layer of backend systems that determine the hidden affordances of the interfaces.

Data-driven platforms are shaping the material world we navigate. In 1994, in the early days of the World Wide Web, Philip Agre wrote a pioneering article titled Surveillance and Capture about the way that computing systems “capture” data. I love the word “capture”, an implication of a measure of violence as these systems force their environment (us) into the modus of data-engines. He predicted that we’d see a permanent translation of the flux of life into machine readable bits and bytes. And without using the word “platform” he said due to their reliance on data-driven decision systems, these “platforms” would need ever more behavioral data and as they move into cyberphysical systems. It would completely overhaul our existing information infrastructure.

And so all eyes are on enforcement: by the national EU data protection authorities, by the EU Commission, and by the Court of Justice of the European Union. I do not think any of them will succeed in enforcing demonstrable data protection by design, or proper procedure and documentation of all relevant forms of processing. Yes, they do now have far reaching powers to investigate compliance. But they don’t “get it”.

As I will point out in “Part Deux” I believe the “American way of life” will prevail as far as data protection is concerned, and assuming that data minimization and purpose limitation are bad for innovation in machine learning, the EU approach will not redefine the market for data ecosystems in a way that it was intended.

“For the rational study of the law the blackletter man may be the man of the present, but the man of the future is the man of statistics and the master of economics. It is revolting to have no better reason for a rule of law than that so it was laid down in the time of Henry IV. It is still more revolting if the grounds upon which it was laid down have vanished long since, and the rule simply persists from blind imitation of the past.”

– Oliver Wendell Holmes, Jr., “The Path of the Law”, 1897 

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top