قالب وردپرس درنا توس
Home https://server7.kproxy.com/servlet/redirect.srv/sruj/smyrwpoii/p2/ Health https://server7.kproxy.com/servlet/redirect.srv/sruj/smyrwpoii/p2/ Scientists are rising against statistical significance

Scientists are rising against statistical significance



When did you last hear that the speaker said that there was no difference between the two groups, since the difference was "statistically insignificant"?

If your experience meets ours, there is a likelihood that this happened during the last conversation you attended. We hope that at least someone in the audience was surprised if, as is often the case, the plot or the table show that there is actually a difference. look? For several generations, researchers have been warned that a statistically insignificant result does not "prove" the null hypothesis (the hypothesis that there is no difference between groups or no effect of treatment on a measurable result) 1

]. Also, statistically significant results are not proved by other hypotheses. Such mistakes are known to distort the literature with overreacting claims and, less known, have led to conflicts between studies where they are not.

We have some suggestions to keep the scientists from being the victim of these mistakes

. ] Let's understand what should be stopped: we should never conclude that there is no "no difference" or "no association" just because the value P exceeds a threshold such as 0.05 or equivalent , because trust The interval includes zero. Nor should we conclude that two studies are conflicting, since one has a statistically significant result, and the other is not. These mistakes spend on research and disinforming policy decisions

For example, let's look at a series of analyzes of unintended effects of anti-inflammatory drugs 2 . Because their results were statistically insignificant, one group of researchers concluded that the effect on the drugs was "not associated" with the new initial atrial fibrillation (the most common heart rhythm disturbance) and that the results stood unlike the earlier study with a statistically significant outcome

Now let's look at the actual data. Researchers describing their statistically insignificant results found a risk ratio of 1.2 (that is, a 20% greater risk in patients under patient influence than unexposed ones). They also revealed a 95% confidence interval that covered everything from a risk reduction of 3% to a significant increase in risk by 48% ( P = 0.091, our calculation). Researchers from earlier, statistically significant, studies have found exactly the same degree of risk 1.2. This study was simply more precise, with an interval ranging from 9% to 33% higher risk ( P = 0.0003, our calculation). It is unwise to conclude that statistically insignificant results have shown "no association" when the interval assessment included a serious increase in risk; It is equally absurd to assert that these results were opposite to the previous results, which show identical observed effect. However, these common practices show that dependence on the threshold values ​​of statistical significance can mislead us (see "Take care of false conclusions").


Source: V. Amréin and others. A poll of hundreds of articles found that statistically insignificant results are interpreted as evidence of "no difference" or "lack of effect" in about half (see "Wrong Interpretations" and Additional Information).

In 2016, the American Statistical Association issued a statement in American Statistics warning against the abuse of statistical significance and P values. The question also contained a lot of comments on this topic. This month, a special issue in the same magazine is trying to continue these reforms. It presents over 40 articles on the topic "Statistical Conclusion in the 21st Century: The World Beyond P <0,05." The editors represent the collection with caution "do not speak" statistically significant "" 3 . Another article 4 with dozens of signers also encourages authors and editors of magazines to abandon these terms.

We agree and encourage the abandonment of the whole concept of statistical significance.


Source: V Amrhein et al.

We are far from loneliness. When we invited others to read the draft of this comment and sign their names if they agreed with our message, 250 did so within the first 24 hours. A week later we had over 800 signatories – all checked for academic affiliation or other information about current or past work in the industry, which depends on statistical simulation (see List and Total Counting of Signatories in Additional Information). These include statistics, clinical and medical researchers, biologists and psychologists from more than 50 countries and on all continents, except Antarctica. One lawyer called it "a surgical blow against thoughtless testing of statistical significance" and "the ability to register their voice for best scientific practice." We do not call for a prohibition on the value P . We also do not claim that they can not be used as a decision criterion in certain specialized applications (for example, determining whether a production process meets a certain quality control standard). And we also do not advocate a "nothing like" situation, in which weak evidence suddenly becomes reliable. Most likely, and in accordance with many others for decades, we call for the abolition of the use of P values ​​in a simple, dichotomous way – to decide whether a result refutes or supports the scientific hypothesis [5] .

Turn off categorization

The problem is human and cognitive more than statistical: the summation of results in "statistically significant" and "statistically insignificant" forces people to assume that the elements designated in this way are different 6 8 . The same problems are likely to arise in any proposed statistical alternative, which implies dichotomization, whether it is frequent, Bayesian or other.

Unfortunately, the erroneous belief that the crossing of the threshold of statistical significance is sufficient to show that the result of the "real" made scholars and journal editors privilege such results, thus distorting the literature. The statistically significant estimates are shifted upwards in magnitude and potentially largely, while statistically insignificant estimates are shifted towards a decrease. Consequently, any discussion that focuses on the estimates selected for their meaning will be biased. In addition, the rigid focus on statistical significance encourages researchers to select data and methods that give statistical significance to some desired (or simply published) results or which give statistical insignificance for an undesirable result, such as potential side effects

Pre-registration of research and The obligation to publish all the results of all analyzes can make a lot of effort to resolve these issues. However, even the results of pre-registered research can be biased, when decisions invariably remain open in analytical terms 9 . This happens even with best intentions.

Again, we do not advocate the prohibition of values, confidence intervals or other statistical measures on P – only that we should not treat them categorically. This includes dichotomization as statistically significant or not, as well as categorization based on other statistical measures, such as Bayes factors.

One of the reasons to avoid such a "dichotomania" is that all statistical data, including P values ​​and confidence intervals, naturally change from research to research, and often do it in an amazing degree. In fact, random variations by themselves can easily lead to large disproportions in the values ​​ P which far exceeds the drop of only 0.05 on both sides. For example, even if researchers could conduct two perfect replication research with some real effect, each with 80% power (chance) achievement <0.05, it would not be very surprising to get P < 0.01 and the other > 0.30. Is P small or large, caution is justified.

We must learn to accept uncertainty. One practical way to do this is to rename trusted intervals as "interrogation intervals" and interpret them in such a way as to avoid convincing. In particular, we recommend authors to describe the practical implications of all values ​​within the interval, especially the observed effect (or point estimate) and the boundary. In doing so, they must remember that all values ​​between the intervals of the interval are fully compatible with the data, taking into account the statistical assumptions used to calculate the interval 7 . Therefore, the allocation of one specific value (such as a zero value) in the interval, as shown "not shown", makes no sense.

Frankly, we do not see that in such presentations, scientific articles, reviews, and educational materials one can see such meaningless "proofs of zero" and claims to non-alignment. The interval containing the zero value often also contains non-zero values ​​of great practical value. If you think that all values ​​inside the interval are virtually unimportant, then you will be able to say something like "our results are most compatible without significant effect"

. mind four things. First, just because the interval gives values ​​that are most consistent with the data, given the assumptions, this does not mean that the values ​​outside of it are incompatible; they are simply less compatible. In fact, values ​​outside the interval do not differ significantly from those inside the interval. Consequently, it is incorrect to assert that the interval shows all possible values.

Secondly, not all values ​​inside are equally compatible with the data, taking into account the assumptions. Point estimates are the most consistent, and the values ​​near them are more compatible than those that are near the boundaries. That is why we encourage authors to discuss point estimates, even if they are of great importance or a wide range, and discuss the boundaries of this interval. For example, the aforementioned authors might write: "As in the previous study, our results indicate a 20% increase in the risk of atrial fibrillation in patients receiving anti-inflammatory drugs. However, the difference in risk, starting with a 3% reduction, a small negative association, an increase of 48%, a significant positive association, is also sufficiently consistent with our data, given our assumptions.

Third, as well as the 0.05 threshold from which it came, by default 95%, used to calculate intervals, is the most arbitrary condition. It is based on the false idea that the 95% chance that the calculated interval in itself is true, in conjunction with the uncertain feeling that this is the basis for a sure solution. Another level may be justified, depending on the program. And, as in the case of anti-inflammatory drugs, interval estimates can overcome the problems of statistical significance when the dichotomization that they impose is considered as a scientific standard.

Last and, most importantly, modest: the compatibility estimates depend on the correctness of the statistical assumptions used to calculate the interval. In practice, these assumptions are at best exposed to considerable uncertainty 7 8 10 . Make these assumptions as clear as possible and check them out, for example, by building your data and picking up alternative models and then reporting on all the results.

Regardless of statistics, you can propose reasons for your results. but discuss a number of possible explanations, not just privileged ones. The conclusions must be scientific, and this goes far beyond mere statistics. Factors such as reference data, research design, data quality, and understanding of underlying mechanisms are often more important than statistical measures, such as values ​​ P or intervals

. it is necessary to make a decision "yes or no". But for choices that are often required in regulatory, political and business environments, decisions based on costs, benefits, and the likelihood of all potential consequences always go beyond those that are based solely on statistical significance. Moreover, there is no simple relationship between the values ​​ P and the probable results of further research to decide on the further research idea.

What will be the statistical significance? We hope that the sections of the methods and the calculation of the data will be more detailed and nuanced. Authors will emphasize their ratings and uncertainty in them – for example, explicitly discussing the lower and upper limits of their intervals. They will not rely on significance tests. When reported P they will be provided with reasonable precision to denote statistical significance, and not as binary inequalities ( R <0.05 or P > 0.05). Decisions on the interpretation or disclosure of results will not be based on statistical thresholds. People spend less time on statistical software and more time for reflection

Our call to release statistical significance and use confidence intervals as intermediate intervals is not a panacea. Although this will lead to the elimination of many bad practices, it can introduce new ones. Thus, monitoring literature for statistical abuses should be a constant priority for the scientific community. But eradication of categorization will help stop too self-confident claims, unjustified declarations "without distinction" and absurd claims about "failure of replication" when the results of original and replication research are very consistent. Misuse of statistical significance has greatly damaged the scientific community and those who relied on scientific advice. P values, intervals and other statistical measures have their place, but it is time for statistical significance.


Source link