Login

Library of Congress won't launch its Twitter archive anytime soon

twitter fail whale

In 2010, the Library of Congress announced plans to collect every public Twitter post in a single searchable archive, as part of a bold attempt to create a new repository of digital information. Two years later, however, the project has yet to get off the ground, primarily because the Library hasn't come up with an efficient way to harness such a massive amount of data.

On Friday, the LOC published a white paper explaining the delay, which it attributes to a lack of available software and constrained budgets. The organization has already created a private archive, but it remains virtually unsearchable. According to the library, a single query on its current system "could take 24 hours" to yield results. Fixing this problem, it says, "would require an extensive infrastructure of hundreds if not thousands of servers," which would be well beyond the Library's current budget.

"What we have here is a large and growing lake."

Deputy Librarian of Congress Robert Dizard Jr. tells the Washington Post that the LOC has thus far invested "tens of thousands" of dollars in the project, but recent budget cuts have tightened its purse strings, making it difficult to spend money on the kind of massive computing overhaul the project would demand. Colorado-based data company Gnip is in charge of creating the archive, and has so far collected more than 133 terabytes of Twitter data. The fundamental problem is that the Library hasn't found a way to make any sense of this information.

"You often hear a reference to Twitter as a fire hose, that constant stream of tweets going around the world," Dizard said. "What we have here is a large and growing lake. What we need is the technology that allows us to both understand and make useful that lake of information."

Complicating matters even further is the fact that Twitter's terms of agreement may make it difficult for the Library to make its archive fully accessible. The agreement, which hadn't been made public until today, prohibits "a substantial portion of the collection on its web site in a form that can be easily downloaded." This would suggest, then, that the social network may have been wary of fully committing to the project from the very beginning — perhaps because it already had plans to launch a similar service of its own.

The Verge
X
Log In Sign Up

forgot?
Log In Sign Up

Please choose a new Verge username and password

As part of the new Verge launch, prior users will need to choose a permanent username, along with a new password.

Your username will be used to login to Verge going forward.

I already have a Vox Media account!

Verify Vox Media account

Please login to your Vox Media account. This account will be linked to your previously existing Eater account.

Please choose a new Verge username and password

As part of the new Verge launch, prior MT authors will need to choose a new username and password.

Your username will be used to login to Verge going forward.

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.
Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.