How our evidence ratings work (updated 2023)
Evidence Ratings in the Prosocial Design Network's library aim to give technologists and researchers an indication of how confidently they can know a design pattern is effective, based on public research.
By effective we mean effective in producing the prosocial outcome a pattern is designed for.
So for a design pattern meant to, say, reduce the spread of misinformation, there would be evidence that it leads users on a social media platform to, indeed, share less misinformation.
It's worth noting that evidence ratings mostly make sense for design patterns that have an intended outcome on how users engage. Design patterns that implement prosocial policies like data privacy, for example, may be more or less effective (i.e. in the case of data privacy, some may be more or less susceptible to breaches), but we don't consider those standards here (at least for now).
Our ratings also only apply to design patterns that have public research that has tested their effectiveness. For the most part, that research comes from scholars who aim to publish their research in journals. However, it occasionally includes research from platforms that they share, or that is leaked.
The Ratings
The Process
Evidence ratings are given by PDN's Library Team, whose members meet biweekly to review individual studies and their related design patterns. Through discussion we reach consensus on the strength of evidence in each study and the overall strength of evidence for a design pattern across studies.
Library Team members are social scientists and data scientists trained in inference, i.e. detecting how clearly data indicates a cause (e.g. design pattern) leads to an effect (e.g. prosocial outcome).
The Criteria
For each study, the Library Team considers the following:
The type of research design
In general, highest ratings go to field experiments. These are sometimes called randomized control trials (or RCTs) or A/B tests, which use random assignment to test a design pattern on an operating platform.
High ratings also go to "natural experiments" using data on platforms and experiments conducted in simulated environments in which participants have reason to believe they are interacting on a real platform.
We also review experiments conducted in the lab or in online survey experiments, as well as observational studies, i.e. in where no randomized assignment occurs.
The strength of research design
We consider how well the study is designed to pinpoint the effectiveness of a design pattern and can rule out other interpretations from its findings.
The strength of statistical findings
Closely related to research design, we consider the statistical analysis used in the study and the overall strength of the findings.
Finally, for each design pattern, the Library Team reaches consensus on the appropriate evidence rating by considering both the overall strength of evidence across studies; and whether evidence exists in studies conducted by multiple research teams.