Generate Random US Street Addresses with the United States Place Sampler (USPS) Tool

Are you in need of Random Us Street Addresses for research, analysis, or journalistic investigations? Finding reliable and unbiased address data can be a significant hurdle. Recognizing this challenge, Big Local News and The Markup have collaborated to develop a groundbreaking tool: The United States Place Sampler (USPS). This innovative tool simplifies the process of generating random US street addresses, offering an invaluable resource for anyone requiring street-level data.

This article introduces the beta version of the United States Place Sampler (USPS), a tool designed to provide easy access to random street addresses across the United States. Inspired by the difficulties encountered during The Markup’s investigation into internet disparities across 38 major US cities, USPS aims to streamline data acquisition for similar projects and beyond. You can access the tool here: https://usps.biglocalnews.org/.

USPS leverages the U.S. Census Bureau’s definition of “place,” encompassing various geographic entities such as cities, towns, municipalities, neighborhoods, and Census Designated Places. This broad scope allows users to sample random US street addresses from diverse locations, including both legally incorporated areas and unincorporated communities.

The tool is versatile, capable of generating a specific number or percentage of street addresses within chosen geographic boundaries, ranging from Census block groups to entire counties. This precision makes USPS exceptionally useful for analyzing disparities in various address-level outcomes in conjunction with socioeconomic data from sources like the U.S. Census Bureau and historical redlining maps from the University of Richmond’s Mapping Inequality project. To facilitate this, each address record provided by USPS includes the surrounding area’s Federal Information Processing Series (FIPS) codes, down to the Census block group level.

Beyond replicating The Markup’s original investigation, USPS is intended to be a valuable asset for journalists, researchers, and anyone seeking to leverage street-level data for accountability and deeper insights.

Potential applications of USPS for generating random US street addresses include:

  • Utility Outage Analysis: Test for disparities in utility outages (electricity, gas, cable) by using service outage portals. Examples include electricity, gas, and cable outage maps.
  • Accessibility Audits: Assess neighborhood and city-level access to essential services like grocery stores, hospitals, trauma centers, and polling places.
  • Location-Based Service Availability: Investigate disparities in the availability and pricing of location-based services such as ride-sharing (Uber, Lyft, Revel), food delivery (GrubHub, DoorDash, Uber Eats), and e-commerce across different areas.

A step-by-step guide detailing how to replicate The Markup’s internet disparity investigation using USPS will be released soon. To stay updated on this and other developments, sign up for their newsletter here.

The Genesis of USPS: Addressing the Need for Random US Street Addresses

The creation of USPS was directly inspired by the challenges encountered during The Markup’s investigation into internet service provider practices. This investigation analyzed over one million internet plans across 38 major cities, revealing that lower-income, less-White, and historically redlined areas were often offered slower internet speeds at the same price as faster speeds in more affluent areas of the same city.

A key obstacle in this research was the seemingly straightforward task of obtaining a representative sample of random addresses within a city. The team quickly discovered the scarcity and inadequacy of publicly available street address databases. While readily accessible options like “clicking around Google Maps” might seem convenient, statistical experts rightly cautioned against their use due to inherent biases.

The decision was made to utilize OpenAddresses, an open-source repository compiling addresses and geographic coordinates from public data sources at state and local levels. Although OpenAddresses offered the best available option, it lacked complete nationwide coverage and often presented incomplete address data, missing crucial information like cities, zip codes, and other geographic markers essential for analysis.

To overcome these data gaps and standardize addresses, the U.S. Census Bureau’s geocoder API was employed. This tool allowed for the mapping of Census block groups and designated places (cities) to the geographic coordinates provided in the OpenAddresses dataset. Further details on this methodology can be found in the companion article to The Markup’s investigation: https://themarkup.org/show-your-work/2022/10/19/how-we-uncovered-disparities-in-internet-deals.

Following the publication of The Markup’s findings, journalists in numerous cities utilized the data to localize the investigation’s findings. However, interest in replicating this methodology extended beyond these initial 38 cities. Journalists, researchers, city officials, educators, and concerned citizens from other locations expressed a desire to map internet speeds in their own communities.

Initially, the team began developing a guide for manually collecting internet offer information from ISP websites, aiming to empower individuals and groups to conduct similar investigations. However, the challenge of reliably sourcing random street addresses remained a significant barrier. Recognizing the limitations and potential biases of readily available but flawed methods, the partnership with Big Local News was formed. This collaboration aimed to download, clean, and index a comprehensive dataset of US addresses, ultimately simplifying the process of sampling random US street addresses and making it accessible to a wider audience.

How to Use USPS to Generate Random US Street Addresses

Using USPS to generate a sample of random addresses is a straightforward process, accessible through the website: usps.biglocalnews.org.

  1. Specify Location: Enter the name of the geographic location from which you want to sample addresses. This could be a city, county, or town.
  2. Define Sample Size: Indicate the desired sample size, either as a specific number of addresses or as a percentage of the total addresses in the chosen location.
  3. Initiate Search: Click the “Search” button to initiate the address sampling process.
  4. Download Data: Once the search is complete, a map displaying the sampled addresses will appear. Click the “+” sign and then select “download CSV” to download the address data in CSV format.

Due to the vast dataset of over 200 million addresses that USPS queries, searches may require a short wait time.

Advanced Search Options for Precise Random Address Generation

For users requiring more granular control, USPS offers advanced search capabilities using custom query strings to specify searches by Census block group, county, or city.

To search by city or town name, use the place query syntax:

place: buffalo city

Similar query syntax can be used for state, county, county-subdivision (cousub), Census tract, and Census block group (bg).

Future updates will include documentation of the underlying API, enabling programmatic access to address samples. Keep an eye on Big Local News’s GitHub page and usps.biglocalnews.org for updates and further information.

The Technical Architecture of USPS: Powering Random US Street Address Generation

USPS is built upon a robust technical foundation, utilizing public data from the OpenAddresses project and geographic metadata from the U.S. Census Bureau’s 2022 Tiger Shapefiles.

The initial dataset for USPS incorporated address data from OpenAddresses collected between February 19th and 23rd, encompassing over 200 million US addresses.

PostGIS, a spatial extension for the PostgreSQL database, is the engine powering USPS. PostGIS enables the storage and efficient querying of geographic objects and spatial data.

To optimize performance and minimize strain on the U.S. Census Bureau’s servers, USPS employs PostGIS’ Tiger Geocoder for preprocessing addresses, rather than directly querying the Census geocoder API for each address. This approach mirrors the original methodology but streamlines the data pipeline for large-scale address processing.

PostGIS’s “spatial indexing” capabilities are crucial for USPS’s efficiency. Spatial indexing uses geographic bounding boxes as reference points, enabling rapid and efficient geographic-based queries.

USPS utilizes spatial indices built around Census Tiger Shapefiles for various geographic designations (Census block group, place, county, etc.). These indices serve as reference points for quickly filtering addresses based on their geographic coordinates. Prior to this, each address’s geographic coordinates are transformed into the same coordinate system as the TIGER data (epsg:4269) and stored in a separate spatial index.

Understanding the Output Data: Random US Street Address Fields

When you download data from USPS in CSV format, you receive a structured dataset containing the following fields for each random US street address:

Header Description Source
address_full Full street address (number, street, city, state, zip). OpenAddresses
number Street number. OpenAddresses
street Street name. OpenAddresses
city City name. OpenAddresses
state 2-letter state abbreviation. OpenAddresses
zip Postal zip code. OpenAddresses
longitude Address longitude. OpenAddresses
latitude Address latitude. OpenAddresses
statefp State FIPS code. Census Tiger Files
countyfp County FIPS code. Census Tiger Files
tractce Census tract FIPS code. Census Tiger Files
blkgrpce Census block group FIPS code. Census Tiger Files

This article was originally published on The Markup and is republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *