The Growth in GWAS

Melinda C. Mills & & Charles Rahal (2019): A scientometric review of genome-wide association studies

This scientometric review of genome-wide association studies (GWAS) from 2005 to 2018 (3639 studies; 3508 traits) reveals extraordinary increases in sample sizes, rates of discovery and traits studied. A longitudinal examination shows fluctuating ancestral diversity, still predominantly European Ancestry (88% in 2017) with 72% of discoveries from participants recruited from three countries (US, UK, Iceland). US agencies, primarily NIH, fund 85% and women are less often senior authors. We generate a unique GWAS H-Index and reveal a tight social network of prominent authors and frequently used data sets. We conclude with 10 evidence-based policy recommendations for scientists, research bodies, funders, and editors.

There are many more GWAS being carried out over the years, with much larger sample sizes:

Japan’s prominence is perhaps surprising.

Comments

  1. Spisarevski says

    Japan’s prominence is perhaps surprising.

    Not surprising at all, Japan is destined to bring genetically engineered catgirls to the world.

  2. Peter Johnson says

    The wonderful thing about GWAS from an alt-right perspective is that, although it is never a funded objective of the big research projects, the researchers cannot help finding overwhelming evidence for human biodiversity (HBD). So HBD gets free-ride confirmation on the back of the broader GWAS research effort. As the researchers push to expand GWAS into African DNA samples they are bound to stumble upon endless evidence for HBD between the African and European/Asian genomes. So the indirect evidence for HBD will continue to accumulate.

  3. prime noticer says

    cue post from utu claiming it is all a wash and practically useless.

  4. Philip Owen says

    Evil Brits.

  5. I believe that will be handled predominantly through cybernetic wearables.

  6. What I find amusing is how commercial DNA testing companies are automatically pushing HBD, just by virtue of existing, for instance, 23andme.

    They wanted to be a health database company, but in order to get people into their database, they had to go into a verboten topic – race and ancestry. And their original ancestry breakdown was too focused on Europeans, so that upset the SWJs, so they expanded it into more African groups, for instance. As well as changing the chip to have more SNPs that are uncommon in whites.

    Competitive forces mean they try to scrounge the most information possible out of their database. What are they doing now? They are encouraging their users to do reaction speed tests!

  7. Do not take utu’s name in vain. Unlike other supreme entities utu responds to his name invocation.

    Human activities that call themselves studies usually are not science. See for example

    Gender studies
    https://en.wikipedia.org/wiki/Gender_studies

    Social studies
    https://en.wikipedia.org/wiki/Social_studies

    Why did they call it a genome wide association studies? Somebody who came up with the name was honest and insightful and understood the epistemic limitation of this activity. Unfortunately this can’t be said about all practitioners of GWAS so there is plenty of room for pseudoscience, pathological science, junk science or even fraudulent science. Fraudulent science is when you cheat. Does it happen in GWAS? Junk science when the results are meaningless. Pathological science is when you fool yourself. It is close to junk science because you go through all the right motions and produce meaningless results and on surface they seem to be methodologically valid which leads to fooling yourself and fooling reviewers and dilettante fanboys including A. Karlin. And pseudoscience is when you spend more effort on confirming hypothesis that you like than on the effort of disproving them. Then inevitable you end up with more hits confirming you hypotheses than otherwise. This bias is systemic.

    Why do GWAStudies fall into the category of pseudo/pathological/junk science? This is because it deals with severely undetermined problem that has virtually infinite number of solutions. Number of potential causes in terms of gene combinations is many many orders of magnitude larger than the number of data you may have. For a given sample size it is almost impossible to come up with a sequence of numbers that could not be associated to some gene combinations. The result is that any trait you can come up with will end up being positively associated which confirms the hypothesis. Keep in mind that the hypotheses are posed by those who believe in the genetic determinism and only they engage in this activity of confirming them. This is the systemic bias. So you have a combination of bias and mathematical inevitability. It is a deadly combination for fooling yourself.

    For instance you can find genes associated with BDP (Booger Disposal Preference). You divide your sample into the flickers (they flick their boogers), the eaters (they eat their boogers) and the stickers (they stick their boogers under the desk or chair). And you will find gene combination that will be associate with BDP. But the underdeterminacy of the system means that it is much worse. You can divide any sample randomly into three subsets and you will find gene combinations that associates with any particular division into subsets.

    Why is it so? Because there are 10 millions of SNPs meaning there is 2^10,000,000 of possible solutions. Actually there is much more if we include alleles with frequencies lower than 1%.

    So now ask yourself: should we be surprised that “extraordinary increases in discovery and traits studied” is occurring? How many traits like Booger Disposal Preferences are among them? The field is very fruitful. Just put money into it and it will bearings fruits.

  8. Dacian Julien Soros says

    To restate what utu said, the issue is a variant of the multiple comparison problem. It’s in Wiki, do read about it. It boils down to the fact that every time you do a comparison using statistics such as t test, you are accepting a small chance of type I error (that is, of claiming your measurements found a true difference when in fact it was only an accident of randomness). When you are taking that small risk millions of times, you are essentially taking no chance. All the GWAS studies find some crap correlation, just like of fMRI studies did.

    Also, to restate what I have said: GWAS did not find anything useful, ever, because any condition or disease that depends on 1 or 2 places in the genome was already detected by linkage analysis and old-school genetic markers. This is the same silly discussion that I imagine having about CRISPR, in that there s nothing CRISPR could achieve that hasn’t been already done through zinc fingers, TALENs, and cre/lox. Just because Elizabeth Holmes-like fraudsters talk crap about CRISPR on Bloomberg, there is no reason to imagine any of it is true.

    I am open to examples of discoveries that have been made possible by CRISPR and GWAS. But, as I said, I am imaging a conversation could be had, while fanbois simply ignore whatever does not fit their narrative.

    I am more hopeful for analysis of RNA and / or protein, but we have been able to analyze DNA for half a century now, so there’s little left to find, namely string causes of very rare conditions, or very small causal contributions to the more common conditions. In other words, essentially nothing.

  9. Peter Johnson says

    In the most gentlemanly way possible I must inform you that you are seriously deluded about this. GWAS researchers treat the data-mining problem with extreme care, using critical values based on confidence levels of 1 – 10^6 to find significant coefficients. Then to publish they have to get it by two external reviewers and an editor who all worry endlessly about data mining and require extreme care regarding it. My suggestion is that you take a very long walk (approx 1-2 hours long) and think hard about the possibility that you are deluding yourself on this. Here is question that you should pose for yourself as you reconsider your theory: if a given published GWAS study with 11% explained variance was repeated (using the same pre-estimated coefficients) on a completely fresh sample, what explained variance would emerge. The correct answer is approximately 11%.

  10. Here is question that you should pose for yourself as you reconsider your theory: if a given published GWAS study with 11% explained variance was repeated (using the same pre-estimated coefficients) on a completely fresh sample, what explained variance would emerge. The correct answer is approximately 11%.

    You need to have an assurance that you have “a completely fresh sample.” How do you know that the completely fresh sample was not a part of the original exploratory sample? The methodology of exploratory-validation samples is not full proof and it can be sabotaged. Imagine that you have all 8 bilion people genotyped and you find a polygenic score that correlates with some trait and explains, say 11% of variance. To validate this result you would need to wait until the validation sample is born. But until then you will not be able to prove whether the correlation you found is spurious or not within the exploratory-validation samples scheme.

    “confidence levels of 1 – 10^6 to find significant coefficients” – This is meaningless. Confidence levels do not guard you against overfitting and spurious correlations.

  11. What I find amusing is how commercial DNA testing companies are automatically pushing HBD, just by virtue of existing, for instance, 23andme.

    The mechanism is simple. People are naturally susceptible to divination schemes like astrology. There is a good reason why Bible was warning against it:

    Learn not the way of the nations, nor be dismayed at the signs of the heavens because the nations are dismayed at them. – Jeremiah 10:1-2

    You shall be blameless before the Lord your God, for these nations, which you are about to dispossess, listen to fortune-tellers and to diviners. – Deuteronomy 18:10-14

    So do not listen to your prophets, your diviners, your dreamers, your fortune-tellers, or your sorcerers, who are saying to you, ‘You shall not serve the king of Babylon.’ For it is a lie that they are prophesying to you, with the result that you will be removed far from your land, and I will drive you out, and you will perish. – Jeremiah 27:9-10

    For the household gods utter nonsense, and the diviners see lies; they tell false dreams and give empty consolation. Therefore the people wander like sheep; they are afflicted for lack of a shepherd. – Zechariah 10:2

    Or Catholic Church:

    The Catechism of the Catholic Church states, “All forms of divination are to be rejected: recourse to Satan or demons, conjuring up the dead or other practices falsely supposed to ‘unveil’ the future. Consulting horoscopes, astrology, palm reading, interpretation of omens and lots, the phenomena of clairvoyance, and recourse to mediums all conceal a desire for power over time, history, and, in the last analysis, other human beings, as well as a wish to conciliate hidden powers. They contradict the honor, respect, and loving fear that we owe to God alone” (CCC 2116).

    The analog of astrology for the DNA divination is particularly apt. It also heavily depend on mathematical methods that are formally correct.

    What purpose did “Finding Your Roots” on PBS hosted by HENRY LOUIS GATES, JR. serve? Did it target mostly Blacks? For sure it was free advertising (was it really free?) for the DNA ancestry sites. What kind of beliefs did it construct and reinforce?

    One should be careful about the ancestry sites like 23andme. In the first approximation their methodology is simple. Based on a reference sample they define races using principal component analysis and them in the principal components space races are defines as clusters. How good it is depends on the reference sample. But often the reference samples are small and then some companies are really ‘confabulating’ their results. While the methodology is correct however it must be tweaked when the races it is dealing with are more complicated like the Ashkenazi Jews who have several lineages (multiple disjoint clusters) that often have less than 50% of Midld Eastern admixture and at the same time the result must reinforce the Zionist dogma of true Jews returning to their homeland and not being some imposters. Good illustration is Bennett Greenspan who founded the Family Tree DNA whose chief concern apart form making money seems to be proving that Jews are Jews:

    http://www.avotaynuonline.com/2015/06/genetic-census-of-the-jewish-people/

    The urgency of our work is magnified by the fact that the legitimacy of the Jewish people and its claim to our ancestral home is currently under constant pseudo-historical attack. The media, particularly on the web, carries regular features from enemies of Israel describing theories to the effect that Ashkenazi Jews have no connection to the land of Israel and are, in fact, European and Central Asian interlopers.

    The Y-chromosome studies demonstrably prove otherwise — a majority of Ashkenazi male lineages are from the Middle East. As the various publicly known DNA test providers have assembled Jewish DNA databases — not just FamilyTreeDNA but my colleagues at 23andMe and Ancestry as well — we have found unmistakable evidence that Ashkenazi Jews are closely related to one another, meaning that from a genetic standpoint, all Jews are indeed part of one genetically united people with ample Middle Eastern and Mediterranean forebears.

  12. Peter Johnson says

    For exact clarity note there is a small typo in my last comment it should be 1 – 10^(-6) = 99.9999% rather than 1 – 10^6. Hopefully no one was confused by that.

  13. Dacian Julien Soros says

    10^-6 is too lenient. In any other case, a Bonferroni correction would be the only acceptable threshold, and that would take your needed p value to 0.5/10^6, for the simple case where one million SNPs are assumed to be independent. But they are not independent, so thresholds should be more like 10^-20.

  14. Peter Johnson says

    Using 1 – 10^(-6) is a Bonferroni correction to preserve 95% confidence in the presence of multiple tests; in the case of 1 million independent statistics there is a 5% probability of finding at least one statistic exceeding this boundary. Lack of independence always decreases, rather than increases, the critical value. So the Bonferroni-corrected critical value will be smaller, not larger, if one allows for dependence. The biggest correction to the critical value occurs when the statistics are independent.

  15. I agree that their ancestry models must be taken with a grain of salt. I think they should be more open about how they are calculated.

    Ashkenazi ancestry is probably overstated, along with certain other factors. Possibly even sub-Saharan. If they released more info, it could be scrutinized mathematically. I think they also tend to turn people into hodgepodges, somewhat unrealistically. Someone in Japan should probably be considered Japanese rather than a hodgepodge of Chinese and Korean, or however they denote it.

  16. silviosilver says

    https://eginotes.wordpress.com/2019/01/26/test-follies-2019/

    This guy is a far right geneticist whose plan for determining inclusion/exclusion in a white ethnostate is to simply test everyone, and who claims in the linked post to have lost all faith in the ancestry info provided by testing companies. His take might be of value to people interested in this sort of thing. (He is one of the most disagreeable SOB’s I’ve ever run into on the internet, and he truly has it in for me, but I prefer to cautiously support anyone whose heart is in the right place and ignore or laugh at the dailystormer-like rant-n-raving his blog is replete with.)

  17. Interesting link. I wish more comparisons were done and publicized.

    I’ve finally lost all confidence in the commercially available “state of the art.”

    The guy still believes however that “state of the art” does exist somewhere.