Skip navigation

About Sleuth’s Data

Sleuth’s data comes from two sources:

  1. Contributions through our Sleuths website or app by parents who opt in to sharing
  2. Surveys of parents in the general U.S. population

All parents who contribute to Sleuth through any avenue must consent to our use of their answers (and may withdraw that consent at any time). We provide small payments to parents who provide survey responses on important topics. Even among paid respondents, we often get more generous and thorough responses than we request.

Who are Sleuth’s parent contributors?

Sleuth’s contributors are a representative mix of the geographic, ethnic/racial, and income mix in the U.S. As of this writing (February 28, 2023):

Sleuth %U.S. Population
Household Income(source: Wikipedia)
< $17,50021%11%
$17,500 - $37,50021%16%
$37,500 - $62,50018%18%
$62,500 - $112,50024%24%
$112,500+16%24%
Parent Race and Ethnicity
White (including identifying Hispanic)75%75%
Hispanic or Latino16%19%
Black, or African American13%12%
Asian3%7%
American Indian or Alaska Native1%2%
Pacific Islander0.4%0.5%

There is one notable exception to Sleuth’s balanced representation. Because moms are more likely than dads to contribute to Sleuth, we do not have equal representation of men and women among contributors. (Please, men: You are welcome to change this!) This does have some effects. Research shows that moms and dads describe their kids a bit differently[1]. But these differences shouldn’t affect how you interpret 99% of the content in our platform.

There is a slight bias toward representation of male children in our data:

Sleuth %U.S. Population
Child Gender
Male54%51%
Female45%49%
Non-binary0.2%NA
Prefer not to say0.4%

And the children represented on Sleuth tend toward older ages:

Age of Children Represented on Sleuth

How do we prepare data?

Sleuth’s data gets put through the wringer. We start with the raw text that parents enter into surveys, which is noisy content! For example, many parents are asked to pick and describe a condition their child has experienced, discuss their responses (such as conferring with an expert), and describe the long term outcomes.

Because of our open-ended methods, we cut more data than most academic research processes. At the same time, we are able to handle more varied and complex content.

Our aim is to focus on informativeness and fidelity in the topics we cover, and to systematically exclude material we don’t yet have the sophistication to handle. Our approach is the unique product of our team’s 20+ years’ of survey research experience and 40+ years’ of work with Artificial Intelligence.

Here are steps Sleuth takes:

  1. Sleuth’s surveys include questions to check that participants are paying close attention. These checks are stringent, and cut about 10% of all participants.
  2. We drop data that triggers certain content flags or ambiguous wording. “Unsafe” data is surprisingly rare - about 0.1% of the content - and often relates to controversial topics that we hope to address thoughtfully in the future.
  3. We use natural language processing and Artificial Intelligence to classify parents’ responses into common themes. This process starts with GPT-3’s text embeddings from OpenAI and substantially improves it.
  4. All classifications of new forms of parents’ text are checked for accuracy by (beleaguered) humans with training in Sleuth’s 500+ categories of information.
  5. For our quizzes and some library content, we run follow-up surveys, cluster analysis, and analytical tests to confirm and provide accurate benchmarks of parents’ observations.
  6. Our methods and summary statistics of Sleuth’s content are reviewed by medical, data, and domain experts, and checked for violations of existing knowledge.
  7. We work with UX and UI experts to experiment with ways to represent Sleuth’s information safely and clearly. There are dozens of additional steps we will take to share Sleuth’s data in 2023.

This list does not include Sleuth’s controls for data safety and data privacy (as per Sleuth’s Privacy Policy).

Sleuth’s methods are constantly subject to review and improvement. Sleuth is a long term program to build the most reliable resource for children’s health and development data on the Internet. Our goal is to be rigorous and thorough, but please bear with us through any hiccups in our early days.

References

  1. Ryan, Gery, and Thomas Weisner. "Analyzing words in brief descriptions: Fathers and mothers describe their children." CAM Journal 8.3 (1996): 13-16.