Enricher API

Overview

After the individual contribution data has been imported from the Federal Election Commission (FEC) API into the FEC Data Platform, we further enrich the data with additional attributes from official sources. The enriched data is part of the data model.

Our data enrichment process is intended to be 100% transparent. We publish the Enricher API that automates the enriching workflow if no public API is available on the census TIGERweb API, or if the enrichment process is not trivial (for example, calculating the median household income).

The Enricher API provides endpoints for enrichment for:

The API supports both:

  • GET requests for the use with single input query
  • POST requests for larger bulk processing

We make the individual contribution data more meaningful using the following attributes in our Enricher API:

AttributeDescription
AGEThis is estimated from a donor's first name and the date of donation.
GENDERThis is estimated from a donor's first name.
ETHNICITYThis is estimated from a donor's first name and year of transaction date (to choose census).
PCT_<ETHNICITY>The probability for each ethnicity descriptor as given by census.gov.
FIPS, COUNTY, TRACT_CD, BLOCK_CDThe geocoded location and census details that is fetched from census.gov.
HH_INCOME_MEDIANThe median household income of a donor's county.
3002

Enricher API

Transparency

Transparency of all our import and enrichment processes is extremely important to us. All the data is checkable, reproducible and involves no black-box workflows.

We guarantee transparency by:

  • Using only official (.gov) data sources,
  • Describing all our enrichment steps in detail
  • Giving you access to the Enricher API , a helpful tool for the enrichment processes

Methods and Sources for Data Enrichment

Overview

In this section, we'll share how we derive the following attributes of the donors:

Age and Gender

Calculating Gender of the Donor

The Social Security Agency (SSA) publishes the numbers of applicants for social security cards for each first name with gender for each year starting in the year 1880. Note that names with occurrences lower than 5 times are excluded. From this data, the most probable gender for a first name can be derived.

Calculating Age of the Donor

The age of a donor is relative to a reference date which is the day of the donation. For example, if a donor donated on Jan 1, 2020, we will calculate the age of the donor on the Jan 1, 2020.

The deduction of the age for a given first name requires two additional sources of data, namely the:

Here are the steps we took to calculate age of the donor for a given first name:

  1. Get the number of donors with a specific name
  • With the SSA data from above, we get the number of babies born with specific names.
  1. Calculate age distribution
  • To estimate the age for a name, we need to know the age distribution. This requires taking the life expectancy into account using the CDC's actuarial life tables and the SSA's historic life tables. The life tables define the number of deaths each year given the date of birth. To get a smooth distribution curve, we estimate a daily death rate. The results give us the percentage of living and dead people in the age distribution. We only include the living since only those who can donate give us the real estimated age distribution.
  1. Use the median age
  • We then use the median age of this age distribution as the most likely age for a donor's first name at a donation date. This age is used to enrich the contribution data.

📘

Note: Enrichment is only possible if a donor's first name is part of the published SSA name data. The remaining donors are not assigned an age or gender.

Ethnicity

We use the US Census Bureau data for the 2000 census and 2010 census to enrich the data with the donor's ethnicity. The Census data counts the recorded ethnicities for each last name occurring at least 100 times in the census.

For a given donor's last name, we enrich the Elections data is using the:

  • Most frequent ethnicity in the census (using the field ETHNICITY)
  • Probability for each ethnicity (as categorized by the Census) derived from the data (using the field PCT_<ETHNICITY>)

We use both data points above to estimate the ethnicity of a particular donor and the estimated ethnicity distribution in a group of donors. We also use the closest Census year to the donation date. For example, if a donor donated in the years 2005 and earlier, we use the Census 2000 data; if a donor donated in the years 2006 and later, we use the Census 2010 data.

Geocoding and Census Data

For geocoding the donors' addresses, we use the US Census Bureau's TIGERweb services.

We use TIGERweb's RESTful API to access to US Census Bureau's TIGER database. This enables us to geocode an address and enrich the FEC Data with the:

  • Name and FIPS code of the county (which is useful for county-level mapping)
  • Census Tract and Blocked code to further narrow down the donor's location

The TIGERweb services give us the:

  • Normalization of the donor's addresses.
    • For example, the reported addresses "123 Main ST" and "123 Main Street, APT 4" will both be normalized to "123 Main Street". This enables us to assign different donations by the same donor with different spellings of the same address to the same location.
  • Ability to assign different donors at the same location to a unique address.
    • The unique address will be referenced by the field ADDRESS_ID in the data (currently planned featured).
  • Latitude and longitude pair of a donor's address
    • However, we do not include this in the FEC Data database for privacy reasons.

If the TIGERweb services return no geocoding result, the contribution data is not enriched, and have NULL values in our database. This can be due to various reasons, such as the address is a post office box, the address is not (yet) in the TIGER database, the address is misspelt, or the address is outside of the U.S..

Household Income

Our source is the US Census Bureau's median household income. This is the 2018 Poverty and Median Household Income Estimates for Counties, States, and National, and is published bu the U.S. Census Bureau, Small Area Income and Poverty Estimates (SAIPE) Program on December 2020.

We include the median household income per county and enrich each contribution data set by the median household income of the county geocoded using the TIGERweb services.

Pre-requisites

The API specification with a description of the parameters is available at our interactive Enricher API Swagger page.

To test the API on this page, make sure you authenticate your API key by completing steps 1 to 5.

Age and Gender

The age and gender endpoints give age and gender estimates for a given first name and a date in the format yyyy-mm-dd. Based on the number of new borns' first names per year and the actuarial tables from SSA, we estimate the age distribution at the given date and the most likely gender.

For more details on the calculation, refer to the age and gender section.

Age and gender estimates for given name(s) and reference date(s)

Description

Returns age and gender estimates for given name(s) and reference date(s).

Endpoint

POST /age_gender
curl -X POST "https://data.eventures.vc/enrich/v1/age_gender?apiKey=<API_KEY>" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"queries\":[{\"date\":\"2020-08-01\",\"name\":\"Joe\"}]}"
https://data.eventures.vc/enrich/v1/age_gender?apiKey=<API_KEY>

Parameters

No parameters.

Request body

A JSON object containing names and reference dates.

{
  "queries": [
    {
      "date": "2020-08-01",
      "name": "Joe"
    }
  ]
}

You can also enter multiple dates and names.

multiple dates and names
multiple dates and names.

Response

Status: 200 OK

{
  "results": [
    {
      "age": 60,
      "gender": "m"
    }
  ]
}
access-control-allow-origin: https://data.eventures.vc 
 cache-control: private 
 content-encoding: gzip 
 content-length: 79 
 content-type: application/json 
 date: Sat, 01 Aug 2020 04:27:07 GMT 
 server: Google Frontend 
 status: 200 
 vary: Accept-Encoding, Origin 
 x-cloud-trace-context: cb213a104fd56aec06e44694a332415e;o=1

Gender and age distribution for a given name and a reference date

Description

Returns gender and age distribution for a given name and a reference date.

Endpoint

GET /age_gender/name/{name}/date/{date}

📘

Note: Replace {name} and {date} with the name and date that you'd like to query for. The format for the date is yyyy-mm-dd.

curl -X GET "https://data.eventures.vc/enrich/v1/age_gender/name/Alexandria/date/2020-08-01?apiKey=<API_KEY>" -H "accept: text/html"
https://data.eventures.vc/enrich/v1/age_gender/name/Alexandria/date/2020-08-01?apiKey=<API_KEY>

Parameters

NameTypeDescription
namestringThe name to return the age and gender estimates for.
datestringThe reference date for the estimation. The format for the date is yyyy-mm-dd.

Response

Status: 200 OK
Response body: A HTML file displaying the result.

{
  "html": {
    "summary": "Example result",
    "value": "<html><body><ul><li>item 1</li><li>item 2</li></ul></body></html>"
  }
}

Ethnicity

The ethnicity endpoints give the most probable ethnicity and the probability for each ethnic group. The underlying data comes from the 2000 and 2010 Census.

The ethnic groups defined in the Census data are:

  • pct2prace: Percent Non-Hispanic Two or More Races
  • pactaian: Percent Non-Hispanic American Indian and Alaska Native Alone
  • pctapi: Percent Non-Hispanic Asian and Native Hawaiian and Other Pacific Islander Alone
  • pctblack: Percent Non-Hispanic Black or African American Alone
  • pcthispanic: Percent Hispanic or Latino origin
  • pctwhite: Percent Non-Hispanic White Alone

Ethnicities for a surname and a year given in a JSON file

Description

Returns the ethnicities for a surname and a year given in a JSON file.

Endpoint

POST /ethnicity/
curl -X POST "https://data.eventures.vc/enrich/v1/ethnicity?apiKey=<API_KEY>" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"queries\":[{\"surname\":\"cortez\",\"year\":2020}]}"
https://data.eventures.vc/enrich/v1/ethnicity?apiKey=<API_KEY>

Parameters

No parameters.

Response

Status: 200 OK
Response body: An JSON with the estimates.

{
  "results": [
    {
      "pct2prace": 0.44,
      "pctaian": 0.29,
      "pctapi": 2.92,
      "pctblack": 0.7,
      "pcthispanic": 89.65,
      "pctwhite": 6,
      "race": "hispanic",
      "surname": "cortez"
    }
  ]
}

📘

Note:

  • "pct2prace" : 0.44 means that there are 0.44 Percent Non-Hispanic with Two or More Race with the surname cortez. For more details, please refer to the ethnic group fields.
  • "race": "hispanic" means that a majority of individuals with the surname cortez belong to the hispanic race.

Ethnicities for a surname and a year

Description

Returns the ethnicities for a surname and a year.

Endpoint

GET /ethnicity/surname/{surname}/year/{year}

📘

Note: Replace {surname} and {year} with the name and year that you'd like to query for. The format for the year is yyyy.

curl -X GET "https://data.eventures.vc/enrich/v1/ethnicity/surname/ahmed/year/2020?apiKey=<API_KEY>" -H "accept: application/json"
https://data.eventures.vc/enrich/v1/ethnicity/surname/ahmed/year/2020?apiKey=<API_KEY>

Parameters

NameTypeDescription
surnamestringThe surname to return estimates for ethnicity.
yearintegerThe reference year. The format for the year is yyyy.

Response

Status: 200 OK
Response body: An JSON with the estimates.

{
  "pct2prace": 3.96,
  "pctaian": 0.36,
  "pctapi": 56.54,
  "pctblack": 22.02,
  "pcthispanic": 1.44,
  "pctwhite": 15.69,
  "race": "api",
  "surname": "ahmed"
}

📘

Note:

  • "pctapi": 56.54 means that there are 56.54 Percent Non-Hispanic Asian and Native Hawaiian and Other Pacific Islander Alone with the surname ahmed. For more details, please refer to the ethnic group fields.
  • "race": "api" means that a majority of individuals with the surname ahmed belong to the hispanic race.