Blogs

Publicly Available Information Explained

Every day, billions of people and countless devices create an ocean of data. The scale is unprecedented—IDC predicts 463 exabytes* will be generated every day by 2025—and it will continue to grow as more people connect to the internet, more devices come online, and more of our lives move to the digital sphere.

Much of this data is publicly available information (PAI). As PAI proliferates, the digital landscape changes, and the challenge of connecting the dots grows. But when we connect those dots, PAI plays a critical role in safeguarding and advancing government and commercial interests. 

What is publicly available information (PAI)?

Publicly available information is an umbrella term covering a range of data found in various public sources. PAI sources include:

  • Internet; open online publications, blogs, discussion groups, public-generated content, e.g., cell phone videos, user-created content, YouTube, and other public-facing social media websites, e.g., Reddit, Weibo, Instagram, etc.
  • Media, digital newspapers, magazines, streaming radio, and television from the globe.
  • Public government data, public government reports, budgets, hearings, telephone directories, press conferences, websites, and speeches.
  • Professional and academic publications, information acquired from online journals, conferences, symposia, academic papers, dissertations, and theses open to the public.
  • Commercial data, commercial imagery, financial and industrial assessments, and public databases.
  • Gray literature, technical reports, preprints, patents, working papers, business documents, and newsletters.
  • Technical data, IP addresses, public domain information, open devices across the internet, including IoT (Internet of Things).

PAI can also include the deep and dark web; However, these sources are harder to reach without extra effort and knowledge of the right access points; they can be invaluable to provide insight into illicit activities and risks to government, military and commercial operations. Some conversations start on the “surface web,” then move to more secure channels, including dark web forums; others, such as ransomware markets, exist only in hard-to-access locations. If these sources are relevant to your risk strategy, they need to be part of your analysis. The right tools can provide access without directly exposing you and your organization to threats.

*One exabyte is equal to 1,000 bytes to the sixth power — that’s a “1” followed by 18 zeroes. It is estimated that 5 exabytes would be sufficient to store all the words ever spoken by human beings (see HighScalability).

Why is leveraging publicly available information important for both commercial organizations and national security?

Nearly every aspect of life today touches the internet, leaving a digital footprint. As a result, many threats, ranging from fraudsters and criminal gangs to terrorists and nation-states, use cyberspace to communicate, plan, and carry out actions that threaten commercial businesses and national security. 

Leveraging PAI in a digital world is critical because much of the intelligence that private companies and government organizations require is hiding in plain sight. In fact, PAI supplies the ingredients of open-source intelligence (OSINT), which can form the foundation for many intelligence products across DoD and the Intelligence Community (IC).

In a sense, PAI is fuel for OSINT—it’s data, not knowledge. It becomes useful only when it is relevant to your search for answers. Social media, for example, can be beneficial depending on what you are trying to learn or who you want to understand. But social posts aren’t necessarily the key to generating results—you need to view them through a lens that joins and reviews all data sources to meet your objective. 

That’s why determining relevance—how much weight to give any particular piece of data—has to start with the question you’re trying to answer. Often, the relevance of any one piece of PAI to your issue isn’t clear until it’s viewed alongside other data in the context of solving a specific problem. 

Why are AI-based tools necessary to analyze PAI?

The challenge of leveraging PAI is akin to boiling the ocean. The velocity and the magnitude of PAI have surpassed the human ability to locate and analyze relevant data. Simply put, intelligence analysts face information overload. To connect the dots, they must leverage Artificial Intelligence (or AI-based capabilities) and Machine Learning (ML) to analyze PAI at scale. This is Babel Street’s approach—our AI-driven platform is highly scalable, with filtering capabilities that can eliminate the noise from the relevant information in seconds, no matter the volume of data. 

That process begins with the discovery phase, leveraging AI-enabled, cross-lingual, conceptual, and persistent search of information from billions of platforms and websites worldwide. From there, our platform serves as a single gateway to an end-to-end experience that helps decipher actionable insights. Finally, our AI-based tools empower security teams, businesses, and federal agencies by arming them with critical and timely insights on a single pane of glass for immediate analysis, action, and mission success.

In real-world applications, the impact of these uses of PAI is significant: 

  • For law enforcement, it can provide a starting point for investigations. PAI alone isn’t targetable information but can give users the first steps to build an investigative direction.
  • For the military operating in data-poor environments, language-agnostic PAI can help better-informed decisions about violence, brewing conflicts or terrorist activity. Assessing PAI generated by native language searches can also provide hyper-local, near real-time information that can inform disaster recovery and humanitarian efforts.
  • As large-scale event venues reopen, PAI can spot relevant chatter online—not just on mainstream social channels but on fringe sites where the conversations can reveal potential threats. This can help event planners and security teams understand and address risks early.
  • Supply chain disruptions are in the news due to fluctuating availability of goods of every description. PAI can be essential to understanding—and anticipating—the causes, whether it’s due to material or workforce shortages; cyberattacks or technology issues; weather, disease, or other natural impacts; localized unrest; or any combination of those factors. The knowledge generated from assessing PAI can also tell you about the companies and people in your supply chain and provide insights into your customers that can help refine your market strategies.

How do good corporate citizens become thoughtful stewards of publicly available information?

Good corporate citizenship goes hand in hand with a commitment to the rule of law. As one of the first in our industry to hire a Chief Privacy Officer, Babel Street seeks to exceed customer needs in data protection, privacy, and compliance. We also employ a Privacy by Design approach in the development of new features and products. And to meet specific customer requirements, we offer customizable options within our software. Custom solutions range from user-specific data sources to top-level auditing and reporting capabilities. We take pride in being thoughtful stewards of PAI because we know that good corporate citizens abide by the law to defend the rule of law.

David Dillow
Director of Publicly Available Information (PAI) – Babel Street

McDaniel Wicker
Vice President, Business Development — Babel Street