Web scraping—also referred to as data scraping, web extraction, or simply “scraping”—uses automated tools to capture data from a website or set of websites. By leveraging content accessible online, software tools can analyze, aggregate, and otherwise use online data to build better applications, integrate systems, and offer other benefits. The idea of combining, pooling, and analyzing data to add value (and also avoid “reinventing the wheel”) provides strong incentives to scrape existing data. But before engaging in web scraping, it is critical to evaluate particular restrictions for the targeted websites and assess possible legal exposure for scraping activities. While some web scraping is commonplace, it can lead to litigation if approached haphazardly.
For example, if a new company wanted to use software analytics to provide employment insights, it might want to leverage data from LinkedIn when building a database of employment information instead of starting from scratch. However, LinkedIn has little incentive to share its collected user data with a startup in the same industry and is likely to respond strongly to protect its turf. In fact, this is roughly the fact pattern of a leading, long-running web-scraping battle between hiQ labs and LinkedIn (hiQ Labs, Inc. v. LinkedIn Corp., 938 F.3d 985 )—a case that continues to develop the rules of web scraping as it progresses.
The LinkedIn litigation is clarifying the application of the Computer Fraud and Abuse Act (CFAA) to web scraping in the wake of the Supreme Court’s Van Buren decision. The Ninth Circuit recently addressed the question of whether the CFAA would allow a scraping target to prevent unauthorized access to a server in a situation where prior authorization is not generally required. The likely answer is no. The CFAA does not apply in this situation since the Ninth Circuit affirmed a preliminary injunction preventing LinkedIn from denying hiQ access to publicly available member profiles on the LinkedIn website.
In addition to LinkedIn, other recent high-profile web-scraping disputes involve Meta and Southwest Airlines.
Web-scraping disputes are enormously fact-intensive and can turn on a number of details surrounding website access and technological restrictions. Various claims may be available in a dispute, with the CFAA claim being the most common until recent clarifications limiting its scope. Many states also have related computer crime statutes that may apply in web-scraping situations. Other web-scraping disputes may involve claims for:
- Breach of contract. A target website’s terms of service (ToS) is critical in understanding what web-scraping activities may be allowed or restricted. Many scraping disputes include a claim that the scraping activities violate the ToS. Of course, this makes the enforceability of the ToS a central issue, making it essential to consider and update those terms when attempting to restrict scraping
- Trespass to chattels. This type of claim has been less commonly pursued in the last few years because it generally requires a showing of actual harm to a server—a requirement that may prove difficult in typical scraping scenarios
- Copyright infringement. If the data being scraped (or a portion thereof) contains expression protected by copyright law, a claim can be pursued after a copyright registration is obtained from the U.S. Copyright Office
- Digital Millennium Copyright Act (DMCA). The DMCA may apply in “hacking” situations (i.e., where the scraping includes the breach of a technological measure or safeguard)
Is web-scraping activity a part of your business plan? If you’re a new company preparing to launch, and web scraping is a core piece of your business plan, it is best to assess your risk before getting started. If your company is already engaged in web-scraping activity or trying to prevent certain web-scraping efforts, I encourage you to evaluate risks and potentially explore actions or adjustments to help to mitigate those risks.