Data distributions in the human services as they relate to regulatory compliance are generally very skewed distributions which means that the majority of facilities being assessed/inspected will usually fall very close to the 100% compliance level. There will also be an equally large number of facilities that are in substantial regulatory compliance (99% – 98% compliance levels). And then there are much fewer facilities that are either at a mid or low level of regulatory compliance (97% or lower compliance levels). One might say that getting a score of 97% on anything doesn’t sound like it is mediocre or low but keep in mind we are addressing basic health and safety rules and not quality standards. So having several health and safety rules out of compliance is a big deal when it comes to risk assessment. It could be argued that a state licensing agency was not upholding its gatekeeper function by allowing programs to operate with such regulatory non-compliance.
Why is the regulatory compliance data distribution important from a statistical point of view. Generally when we are dealing with social science data, the data are normally distributed or pretty close to being normally distributed. It is a trade mark of a well designed assessment tool for example. So when data are compared to other normally distributed data, there is a good chance that some form of a linear relationship will be ascertained, albeit, not reaching statistical significance in many cases but linear regardless.
When a very skewed data distribution is one of the variables as in the case with regulatory compliance data and it is compared with a normally distributed data set such as a program quality tool, ERS or CLASS. Well, the result is generally a non-linear relationship with a marked ceiling effect or plateau effect. In other words, the data distribution is more curvilinear than linear. From a practical standpoint this creates selection problems in the inability to identify the best programs that have full regulatory compliance. This can create a public policy nightmare in that those programs which are in substantial but not full regulatory compliance are as good or in some cases of higher quality than those programs in full regulatory compliance. The interesting question is does the combination of normally distributed data distributions with variables that have skewed data distributions always produce this nonlinear result?!
And lastly, will having two variables that are skewed data distributions produce a more random result than if one of the two above conditions are present?