Wednesday, November 16, 2011

Former Reddit co-owner arrested for excessive JSTOR downloads

Former Reddit co-owner arrested for excessive JSTOR downloads

Former Reddit co-owner arrested for excessive JSTOR downloads
Aaron Swartz, the 24-year-old wunderkind who co-authored the RSS specification at age 14 and sold his stake in Reddit to Condé Nast (which also owns Ars Technica) before his 20th birthday, was arrested Tuesday on charges of wire fraud, computer fraud, "unlawfully obtaining information from," and "recklessly damaging" a "protected computer." He is accused of downloading 4.8 million documents from the academic archiveJSTOR, in violation of its terms of use, and of evading MIT's efforts to stop him from doing so.
Swartz is a founder of the advocacy organization Demand Progress. In a statement, Demand Progress executive director David Segal blasted the arrest. "It's like trying to put someone in jail for allegedly checking too many books out of the library," he said. Demand Progress also quoted James Jacobs, the Government Documents Librarian at Stanford University, who said that the arrest "undermines academic inquiry and democratic principles." 
According to the complaint, Swartz purchased a laptop in September 2010 and registered it under the name "Gary Host" (username: "ghost") on the MIT network. He then ran a Python script that rapidly downloaded articles from the JSTOR. JSTOR detected the script and blocked his IP address. The complaint alleges that there followed a game of cat and mouse in which Swartz repeatedly changed his IP and MAC address to evade JSTOR and MIT's efforts to block access. Swartz also bought a second laptop to speed up the downloading process. Finally, on October 9, JSTOR gave up and and blocked the entire MIT campus from using JSTOR.
When JSTOR lifted the block a few weeks later, Swartz started using his downloading script once again. (Update: To be clear, Swartz resumed his downloading "a few weeks later," but the complaint doesn't say JSTOR access was blocked that whole time.) This time, he entered an MIT network closet, "hard-wired into the network and assigned himself two IP addresses. He hid the Acer laptop and a succession of external storage drives under a box in the closet, so that they would not be obvious to anyone who might enter the closet."
Swartz entered the networking closet for the last time in January. The complaint describes the scene: "As Swartz entered the wiring closet, he held his bicycle helmet like a mask to shield his face, looking through ventilation holes in the helmet. Swartz then removed his computer equipment from the closet, put it in his backpack, and left, again masking his face with the bicycle helmet before peering through a crack in the double doors and cautiously stepping out."
The complaint alleges that "Swartz intended to distribute a significant portion of JSTOR's archive of digitized journal articles through one or more file-sharing sites." But it offers no evidence for this claim. In fact, in astatement following the arrest, JSTOR acknowledged that "we secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed."
Indeed, Wired reports that JSTOR, the alleged victim, has denied seeking Swartz's prosecution.
Open access to information has long been a passion for Swartz, and he has a history of using unorthodox and controversial means to pursue it. In 2008, he used an automated script to download more than 2 million documents from PACER, the website the federal judiciary uses to distribute court documents. PACER is ordinarily paywalled, but the judicial branch was experimenting with offering paywall-free access to selected libraries. Swartz used the program to circumvent the paywall. The effort led to an FBI investigation, but no charges were ever filed.
There's an important difference between PACER and JSTOR. As works of the federal government, PACER documents are in the public domain. In contrast, many JSTOR documents are protected by copyright. The PACER documents Swartz downloaded are now available for download. Distributing the JSTOR documents, in contrast, would be a clear case of copyright infringement.
Contacted by e-mail, Swartz declined to comment on what he was planning to do with the documents. But he pointed to his bio in the Demand Progress statement, which notes that "in conjunction with Shireen Barday, he downloaded and analyzed 441,170 law review articles to determine the source of their funding; the results werepublished in the Stanford Law Review."
It's not clear, then, whether this was an attempt to liberate the documents from behind the JSTOR paywall or whether he was intending to use the documents for a personal research project.
According to the Boston Globe, Swartz has been released on $100,000 bail.

No comments:

Post a Comment