The uses and disadvantages of website archiving for life
Is digital preservation an oxymoron?
For a company whose motto was 'move fast and break things', Facebook is pretty good at storing your data indefinitely.
If someone posted a drunken photo of you 14 years ago the chances are that that photo still exists in the archive. Look at this photo of me from many years ago — found within two minutes.
The idea that the internet never forgets is a truism, but, as Jeremy Keith says, it is not true at all. The internet forgets everything all the time. And what it doesn't forget, it bans, deletes, or edits.
If you've ever bookmarked an article, saved a video to a playlist, or liked a tweet for later, you'll know how fragile the web can be. Link rot, the process of web content disappearing over time, meaning that hyperlinks no longer function, is very real.
Here are some of the many ways it can happen.
1. The person who created the content deletes it
This is the one I am most guilty of. I will see an old photo or article and decide that it needs to be stricken from the public record. Or, I will wonder about the ethics of the company it is hosted on and decide to move it elsewhere.
2. The site it is hosted on has a redesign
MySpace, with its weird CSS limitations, was the home of some uniquely ugly web design. But it was unique and should have been preserved somehow; or, if not your pink flashing background, then at least your content. But, no. They sold to Rupert Murdoch, redesigned the site and destroyed millions of hours of creativity.
3. The company goes out of business or shuts down a division
Look at the graveyard of sites that Google has ‘sunsetted’. They owe you nothing, though you can thankfully export your data. All those Google Plus profiles that people still link to, all useless now. All your updates gone.
4. You forget to renew your hosting (or die)
Buying your own domain name and hosting means that you are not subject to the whims of corporate accounting. However, your domain and hosting is always rented. You can't buy a domain for more than 10 years and, even if you could, the company selling it to you could go out of business. You could put a stipulation in your will to put enough money in an investment trust that it will pay for someone to maintain and host it indefinitely. Do you trust all those people to maintain all those systems?
5. The content is banned from the site
Whatever you think of Donald Trump you can't deny that he is a world historical figure whose actions have had consequences. Banning him from Twitter for breaking the rules is fine, but deleting all his tweets seems like an act of cultural vandalism. And, of course, because of how such sites are made — full of infinite scrolling and popup — the archive.org backup isn't that useful.
6. The content goes behind a paywall
This isn't exactly link rot, but who wants to pay £30 a month to read one article?
7. History is rewritten
People rewrite webpages all the time and what you link to often isn't the same thing a few years later. Even the supposed immutability of blockchain technologies isn't immutable if people agree that something should be removed.
Last week I wrote about a man who died eight years ago. Thankfully he published his blog on WordPress.com, a free, hosted blogging platform run by decent stewards of the web who have also managed to create a profitable business. Imagine if he'd been running it on his own URL, hosting his own CMS. Chances are that it would no longer exist except on archive.org, a miraculous service without which 99% of everything on the web would have been lost.
Digital data is simultaneously persistent and fragile. It can be copied and multiplied, without change, from server to server, yet it requires a massive infrastructure to remain accessible.
Digital data is also abundant and such abundance creates its own problems. According to some recent (uncited) statistics there are:
300 hours of video are uploaded to YouTube every minute
300 million photos uploaded to Facebook every day
1 billion tweets added to Twitter every week
3 billion snaps created on Snapchat every day
As Coleridge says in The Rime of the Ancient Mariner, 'water, water everywhere and not a drop to drink'.
We have an inhuman amount of data being created every second: more than all the books published during Shakespeare's lifetime. There is some debate over who was the last person to have read everything, but it was a long time ago — and they didn't have Substack to contend with.
It is liberating to write and to publish, but who judges what is worth preserving? The big data approach has been to just store everything and create algorithms that can pick up on signals of noteworthiness. Unfortunately, this has created a hyperpartisan society where engagement is equal to the amount of people a piece of content inflames.
Nietzsche's early essay, The Use and Abuse of History for Life, makes the point that too much history leads to problems:
History, so far as it serves life, serves an unhistorical power, and thus will never become a pure science like mathematics. The question how far life needs such a service is one of the most serious questions affecting the well-being of a man, a people and a culture. For by excess of history life becomes maimed and degenerate, and is followed by the degeneration of history as well.
It is notable that the internet era has coincided with a sense that culture is stuck replaying the same old patterns. Every film is a sequel or a remake. Every album a pastiche. It seems that having access to everything ever made has a smothering effect which leaves little room for the new. We are paralysed by a surfeit of information, picking at the carcass of past cultural achievement.
In an era of such abundance, with so many opinions, where we fear any kind of top-down messaging as totalitarian, it is difficult to know what we value. The bemusement that greeted the news that Chinese censors have edited Fight Club so that the authorities win shows that we have no sense of cultural value beyond a vague sense of freedom of expression.
Sometimes it seems like you only know what you value when that thing is threatened with deletion. On the web, why the lucky stiff and Mark Pilgrim both disappeared and deleted their archives leading to some frantic attempts to piece their work back together. Scott Alexander, when he heard that the New York Times were going to try and cancel him, pre-emptively retaliated by deleting his blog. Worried that it might never return, someone linked to The Library of Scott Alexandria, a samizdat backup of most (but not all) of the important posts.
Doing anything for posterity is a crazy idea on the basis that posterity doesn't care how much you care about posterity. It is notable that no one ever thought to keep Shakespeare's plays in a pristine condition at the time and that Socrates and Jesus never wrote anything. In both cases, they existed in the realm of action, doing what they thought was right at the time. There is a lesson there, I think.