Value of training datasets in AI Technology

By Deanna R. Kunze

While many focus on AI algorithms and their outputs, the underlying datasets used to train AI systems and then form the backbone of many systems may be as valuable, if not more so, than the actual AI technology itself. Plaintiff Planner 5D asserts it datasets for “scene-recognition technology” will be worth approximately $50–60 billion over the next five years, and that is just one reason why its case against Facebook and Princeton University over the use and ownership of these datasets will provide important guidance on how and when to protect these valuable datasets.[1]

In general, scene-recognition technology is the ability of machines to recognize three-dimensional scenes, but the challenge to its development is supplying the data.[2] Planner 5D alleges “that creating lifelike digital scenes is ‘extremely time- and labor-intensive,’ and that to create truly ‘realistic’ scenes, ‘human designers must arrange the objects in real-life configurations.’”[3]

It’s because of the rarity of this accumulated data, says Planner 5D, that engineers at Princeton University bypassed Planner 5D’s terms and conditions to copy its entire three-dimensional object database—and then publish it as the SUNCG database on a publicly accessible website. After that, according to Planner 5D, Facebook made its own copy of the SUNCG database, then posted another link to that database through a public Stanford University link. Facebook then allegedly published the Stanford link in connection with Facebook’s Scene Understanding and Modelling Challenge (“SUMO Challenge”), seeking submissions of scene-recognition papers and algorithms. Submitters might win cash prizes and a chance to speak at a “SUMO Challenge” conference—in exchange for Facebook’s right to commercialize any submission’s technology and ideas on scene-recognition.

Planner 5D initially did not register the copyright(s) in its own programs and database. But Planner 5D’s core business objective evolved from providing home design tools. Instead, realizing the value of its compendium of images, Planner 5D sought to become “the leader and innovator in computer scene recognition.”[4]

Not until after it instituted suit against the defendants did Planner 5D file applications to register the totality of the works created, including both the individual works, as well as the compilation thereof. But the registration process has not been so simple, and the Copyright Office refused the initial applications. Thereafter, Planner 5D sought reconsideration from the Copyright Office.

Now, in the ruling on the third motion to dismiss in this case, Judge Orrick holds that Planner 5D’s case can proceed on its copyright infringement claim even while the U.S. Copyright Office reconsiders its prior rejection of Planner 5D’s application for registration of its copyright, because all that isrequired is “action” on the application before litigation, not approval.[5]

This ruling is particularly significant because it is one of the first cases interpreting Fourth Estate Pub. Benefit Corp. v., LLC, 139 S. Ct. 881 (2019), which resolved a circuit split in holding that a copyright must be registered, not just applied for, before filing suit for copyright infringement. Except, according to Judge Orrick, that Fourth Estate does not apply in this action. Instead, the plain text of Section 411(a) permits an infringement action either when “registration of the copyright claim has been made” or when that registration “has been refused.”[6] The defendants argued that while a reconsideration is pending, neither of those two actions has occurred.

Judge Orrick disagreed. Noting that the pleading stage had continued for nearly two years, he concluded that the case should continue because the statute and prior case law is clear that even if the reconsideration is refused, plaintiffs are entitled to seek the court’s ruling on their entitlement to a copyright registration.

Without even getting to the merits, this case demonstrates the value of filing copyright registration applications early in their development process to ensure protection of some of the most valuable assets associated with AI technology, i.e., the underlying datasets.

  1. Id. The trade secret misappropriation claim was upheld after the First Amended Complaint.
  2. Id.
  3. Id.
  4. Id.
  5. UAB “Planner 5D” v. Facebook, Inc., Case No. 19-cv-03132-WHO (N.D. Cal. Apr. 14, 2021).
  6. Id, citing 17 U.S.C. § 411(a).
author img


Deanna R. Kunze


Posts By this author