On October 5th, Frances Haugen, a former Facebook Product Manager, testified to Congress on Facebook and Instagram’s complicity in each one of our interactions with those services. This testimony (which is further contextualized by MIT Technology Review) illustrated how Facebook’s algorithms essentially cause division and danger to our (and our teenagers’) mental health. A little-noted part of her testimony is that the decisions on what data to use to train the News Feed algorithm are based on clicks and shares. Especially if those clicks or shares lead to more clicks on ads (either directly or indirectly). To paraphrase Kara Swisher, “If the product is free, you are the product.”
Ad revenue goes up with more engagement. Engagement is, at least in part, with sharing and commenting. Since the most likely material to elicit that response is something that we violently disagree or agree with, that is the material that affects the algorithm.
Tech reporters and the Twitterverse have seemingly converged on regulation as the answer to this business practice. Some Popular answers seem to be:
- Revising Section 230 to put some pressure on these companies to do their own policing of content
- “Breaking Up the Monopolies” – i.e., roll back some of Facebook’s acquisitions
- “Focus on Privacy” – implement something like GDPR or the California Consumer Privacy Act nationwide
I think all of these answers are wrong. The issue is that we are the product. To change Social Media’s revenue model, they can either turn to a subscription model or use our input into their algorithm for higher ad revenue. It doesn’t have much to do with privacy; how the sausage is made is the problem. And what’s going into the grinder is your outrage.
Regardless of what legislation Congress decides to pass, we are beholden to the platforms to serve our content. Their algorithms will always try to keep engagement high and revenue at an all-time high.
So Why Instagram for Kids?
This excellent article by Scott Galloway features that Facebook is incredibly inaccurate in its targeting:
Plaintiffs in a class-action suit against Facebook have alleged its targeting algorithm’s “accuracy” was between 9% and 41%, and quoted internal Facebook emails describing the company’s targeting as “crap” and “abysmal.”– The Imminent Collapse of Digital Advertising
So companies are cutting their digital spending:
These are most likely warning bells, and probably why Facebook started working on an Instagram for Kids. They need to increase their waning audience size in a valuable demographic (teenagers). This growth protects their flank on the ad revenue side by increasing their audience size. Legislating these algorithms would have helped five years ago, but at the speed of Congress lately, especially in the face of Social Media lobbyists, it won’t matter by the time they get around to it.
We don’t need more privacy or data reporting. We need our data and a personalized algorithm – one where we control the training data. In John Scalzi’s book, The Android’s Dream, there is a great scene when the protagonist downloads an AI from Quaker Oats. It’s called an Agent and is a modest AI. He progresses by breaking it down and utilizing the algorithms for his own needs, including data that he has accumulated for over twenty years. Not everyone is a Data Scientist, but we can all drop a CSV into a bucket for processing.
If Facebook wanted to sell a subscription for an algorithm to its 2.89 Billion users for $5 a year, that would be $15 Billion a year. To break even on their ad revenue from last year ($84.17 billion), they would have to charge $29.12 a year. Would you pay $30 a year to control your algorithm? What about protecting your kids?
It’s the Data Stupid
For anyone who has ever tried to create a Machine Learning algorithm, the first question is: how much data do you have? Google and Netflix have a lot of data that leads to significant recommendation engines. These companies still have ulterior motives, i.e., pushing you to content that will continue your engagement or have a better experience on the web respectively. GDPR and CCPA both require that they provide the data to you, and they do. The friction of getting that data to train your own AI/ML algorithms is so high that the data is essentially useless to you. I created a walk-through on how to do it with Hulu, but it is exhausting.
You probably don’t only consume content from Netflix or only use Google to discover stuff online. No one uses only one online service, and while GDPR and CCPA allow you to “control” your data, it’s hard to aggregate it.
I wrote this article about how Media Companies can better use AI/ML. Still, I think that future legislation in this space shouldn’t focus on privacy as much as data portability.
Whether it is creating a standard JSON format of data, the necessity of a company of a specific size to provide a public API, or a way to regular scheduled exports of our data, a consumer who owns the process to import their own data into a recommendation engine can lessen the societal costs of major corporations deciding on what is surfaced algorithmically. We should incentivize distributions of AI/ML models creations and provide the data to train our personal recommendation engines. Users should have the freedom to train their own AI “home page,” and power it with the data they want to include, whether it’s search, content, or health data, regardless of where it was generated. Personalized and Independently trained AI models can still provide revenue by those that create the models but also limit the ability of one Silicon Valley revenue ambitions from driving us all crazy.