To be successful, Censys needs to expose data back to the community, which ranges from researchers who need to quickly perform a simple query to those who want to perform in-depth analysis on raw data. In order to meet these disparate needs, we are exposing the data to researchers through several interfaces, which offer varying degrees of flexibility: (1) a web-based query and reporting interface, (2) a programmatic REST API, (3) public Google BigQuery tables, and (4) raw downloadable scan results. We further plan to publish pre-defined dashboards that are accessible to users outside of the research community. In this section, we describe each of these interfaces in depth.
- Search Interface
The search interface supports basic predicate logic (e.g. (location.country_code: US OR location.country_code: CA) AND 80.http.server: Apache ), ranges (e.g., 80.http.http_status.code > 200 **), wildcards (e.g.,443.https.certificate.certificate.issuer.*:GoDaddy**)
- Viewing Individual Records
Users can view the details any host, certificate, or domain returned by a query. This includes a user-friendly view of how each service is con- figured, the most recent raw data describing the host, user- provided metadata and tags, and historical scan data. We similarly display geographic location, routing, and WHOIS information.
- Dynamic Reports
Once a query completes, users can generate reports on the breakdown of any field present on the resulting datasets. For example, users can view the break- down of server chosen cipher suites for IPv4 HTTPS hosts with browser-trusted certificates by performing the query 443.https.tls.validation.browser_trusted: True and generating a report on 443.https.cipher_suite.name.
- Viewing Individual Records
The Censys REST API provides programmatic access to the same data accessible through web interface. API access is governed by our Terms of Service and all scripted access should use this API. You can use API provide by Censys .
The search interface only exposes current data and the query syntax is limited. To support more complex analysis and historical queries, Censys exposes daily snapshots of each dataset through Google BigQuery tables. These can be queried through the web interface and API, or imported into existing BigQuery projects.
For example, the following query would show the breakdown of cipher suites that IPv4 hosts with browser trusted certificates chose in December, 2015:
SELECT p443.https.tls.cipher_suite.name, count(ip) FROM ipv4.20150902 WHERE p443.https.tls.validation.browser_trusted=true GROUP BY p443.https.tls.cipher_suite.name;
Or you could download all data about hosts in the 220.127.116.11/16 network by exporting the results from the following query:
SELECT * FROM ipv4.20150902 WHERE ipint > PARSE_IP("18.104.22.168") and ipint < PARSE_IP("22.214.171.124");;
Warning : By default, SQL access is restricted to verified researchers and academic accounts.
- Raw Data
Lastly, we are publishing all of the raw data from our scans, along with our curated ZDb snapshots of the IPv4 address space, Alexa Top 1 Million websites, and known certificates. We will be posting these as structured JSON documents, along with data definitions, and schemas for com- mon databases at censys.io/data. We previously posted scan data on https://scans.io, a generic scan data repository that our team hosts. We will continue to maintain the scans.io interface, provide continued access to our historical datasets, and allow researchers to upload other data. However, we will no longer post our regular scans to https://scans.io, but rather encourage users to download these directly from Censys’s web interface.
I think it’s enough for you to find some websites, ips or certificates on the Internet and expose data .
For more information : censys.io tutorial