Covid-19 comorbidity – Home for a flying bioinformatician

A recent[1] study investigated the susceptibility of Covid-19 patients based on a GWAS (Genome wide association study). From Italy 835 patients and 1255 population derived control were included as well as 775 patients and 950 control from Spain. In short two genomic single nucleotide polymorphisms have been identified to be associated to Covid-19 susceptibility: rs11385942 and rs657152.

The article mainly discuses checking for those SNPs in own genomic sequencing data (WGS) from a sequencing provider, here Dantelabs. There are four BAM files and five VCF files from WGS available, all BAM files have a corresponding VCF file, one VCF is without a corresponding BAM file. There is one 23andme report available matching to 1 WGS and 1 VCF file.

To check for the variants rs11385942 and rs657152 I used a database I setup earlier, but a quick grep command on the annotated VCF files provided by Dantelabs would have done the same job. Well the annotation is not so quick, unfortunately Dantelabs does not provide the VCF files annotated, otherwise there wouldn’t be much of other services to sell, I guess. BTW, recently I received an email about an offer to check for Covid-19 comorbidity and of course that’s not for free. Wouldn’t be so easy to sell that service if they would give out annotated VCF files.

Dantelab commercial offer for checking Covid-19 associated SNPs

Back to testing for rs11385942 and rs657152 results in the following:

personal_genomes=> SELECT "dataset_id", "Gene.refGene", "Ref", "Alt", "avsnp147", "GT", "AD_1", "AD_2", "DP" FROM snp WHERE avsnp147 = 'rs11385942';
 dataset_id | Gene.refGene | Ref | Alt | avsnp147 | GT | AD_1 | AD_2 | DP 
------------+--------------+-----+-----+----------+----+------+------+---
(0 rows)

That query resulted exactly in no result. Ok. What about rs657152?

personal_genomes=> SELECT "dataset_id", "Gene.refGene", "Ref", "Alt", "avsnp147", "GT", "AD_1", "AD_2", "DP" FROM snp WHERE avsnp147 = 'rs657152';
  
 dataset_id | Gene.refGene | Ref | Alt | avsnp147 | GT  | AD_1 | AD_2 | DP 
------------+--------------+-----+-----+----------+-----+------+------+----
          0 | ABO          | C   | A   | rs657152 | 0/1 |   22 |   17 | 39
          5 | ABO          | C   | A   | rs657152 | 1/1 |    0 |   18 | 18
          1 | ABO          | C   | A   | rs657152 | 0/1 |   11 |   21 | 32
(3 rows)

I should shortly explain what’s happening here. I annotated the VCF files from Dantelabs and imported them into a database. In the query I ask for some attributes which are:

dataset_id (internal)
Gene.refGene (the gene name)
Ref (base as in the reference genome)
Alt (base as defined by the SNP or SNP identifier, rsid)
avsnp147 (the SNP identifier)
GT (genotype)
AD_1 (number of reads supporting reference base)
AD_2 (number of reads supporting the SNP)
DP (total coverage at that position)

Ok, that would mean we found something here. For dataset 0 and 1 rs657152 is heterozygous while for dataset 5 rs657152 is homoyzgous. As 0 is offspring of 5 one copy of rs657152 was passed on to 0.

To check the different alleles present at the position for rs657152 we can check directly in the bam files and count for number of reads supporting Ref and supporting Alt as reported above. Glad to finally received bam and fastq files from Dante!

I guess one can do this in various ways, the easy and quick way is IGV and check that graphically which is ok if there are only a few samples.