bwa mem requires sorted fastq files

While running WGS sequencing alignments bwa mem threw some new error:

[M::mem_pestat] low and high boundaries for proper pairs: (33, 376)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "CL100098912L1C001R001_20", "CL100098912L1C001R001_9"

This seemed strange as other samples ran through without any such complaint. In the end it’s because bwa requires the fastq files to be sorted for paired end sequencing which is usually the case. Well, usually…

If not one has to sort first, I found most suited for this task fastq-pair which from my understanding of quickly checking the code, loads the 1st Readfile into memory (probably just the IDs and pointers) and iterates through the 2nd Readfile and thus finding the proper pair. The memory requirements are still high and adjusting the -t, for the number of entries to keep in memory, is essential for performance. Setting -t to the number of entries in R1 makes sense, which can be determined by wc -l R1.fastq and dividing by 4. A disadvantage is that fastq-pair requires the fastq files to be uncompressed as it makes use of random file access which gets more tricky if the file is compressed.

Another program I saw but didn’t test was bbtools which seemed to be java based. Here links to both programs:

https://github.com/linsalrob/fastq-pair

Covid-19 comorbidity

A recent[1] study investigated the susceptibility of Covid-19 patients based on a GWAS (Genome wide association study). From Italy 835 patients and 1255 population derived control were included as well as 775 patients and 950 control from Spain. In short two genomic single nucleotide polymorphisms have been identified to be associated to Covid-19 susceptibility: rs11385942 and rs657152.

The article mainly discuses checking for those SNPs in own genomic sequencing data (WGS) from a sequencing provider, here Dantelabs. There are four BAM files and five VCF files from WGS available, all BAM files have a corresponding VCF file, one VCF is without a corresponding BAM file. There is one 23andme report available matching to 1 WGS and 1 VCF file.

To check for the variants rs11385942 and rs657152 I used a database I setup earlier, but a quick grep command on the annotated VCF files provided by Dantelabs would have done the same job. Well the annotation is not so quick, unfortunately Dantelabs does not provide the VCF files annotated, otherwise there wouldn’t be much of other services to sell, I guess. BTW, recently I received an email about an offer to check for Covid-19 comorbidity and of course that’s not for free. Wouldn’t be so easy to sell that service if they would give out annotated VCF files.

Dantelab commercial offer for checking Covid-19 associated SNPs

Back to testing for rs11385942 and rs657152 results in the following:

personal_genomes=> SELECT "dataset_id", "Gene.refGene", "Ref", "Alt", "avsnp147", "GT", "AD_1", "AD_2", "DP" FROM snp WHERE avsnp147 = 'rs11385942';
 dataset_id | Gene.refGene | Ref | Alt | avsnp147 | GT | AD_1 | AD_2 | DP 
------------+--------------+-----+-----+----------+----+------+------+---
(0 rows)

That query resulted exactly in no result. Ok. What about rs657152?

personal_genomes=> SELECT "dataset_id", "Gene.refGene", "Ref", "Alt", "avsnp147", "GT", "AD_1", "AD_2", "DP" FROM snp WHERE avsnp147 = 'rs657152';
  
 dataset_id | Gene.refGene | Ref | Alt | avsnp147 | GT  | AD_1 | AD_2 | DP 
------------+--------------+-----+-----+----------+-----+------+------+----
          0 | ABO          | C   | A   | rs657152 | 0/1 |   22 |   17 | 39
          5 | ABO          | C   | A   | rs657152 | 1/1 |    0 |   18 | 18
          1 | ABO          | C   | A   | rs657152 | 0/1 |   11 |   21 | 32
(3 rows)

I should shortly explain what’s happening here. I annotated the VCF files from Dantelabs and imported them into a database. In the query I ask for some attributes which are:

dataset_id (internal)
Gene.refGene (the gene name)
Ref (base as in the reference genome)
Alt (base as defined by the SNP or SNP identifier, rsid)
avsnp147 (the SNP identifier)
GT (genotype)
AD_1 (number of reads supporting reference base)
AD_2 (number of reads supporting the SNP)
DP (total coverage at that position)

Ok, that would mean we found something here. For dataset 0 and 1 rs657152 is heterozygous while for dataset 5 rs657152 is homoyzgous. As 0 is offspring of 5 one copy of rs657152 was passed on to 0.

To check the different alleles present at the position for rs657152 we can check directly in the bam files and count for number of reads supporting Ref and supporting Alt as reported above. Glad to finally received bam and fastq files from Dante!

I guess one can do this in various ways, the easy and quick way is IGV and check that graphically which is ok if there are only a few samples.

SNP rs657152 in four different WGS samples sequenced at Dantelabs.

rs657152 is the variant centered in the view. So looks pretty much true to me!

[1] https://www.nejm.org/doi/full/10.1056/NEJMoa2020283

Conway’s Game of Life will live on

Conways’ Game of life is one of the best examples for a cellular automaton and artificial life. Simple rules creating complex behaviours one does not anticipate. That never gets boring!

The Game of Life by Conway from 1970 is one my personal favorites in this category. Unfortunately John Horton Conway died at age 82 on 11th of April 2020, reportedly from COVID-19 complications.

Seeing the news of his death probably triggered something almost nostalgic. Google led me to just another awesome example what can be implemented with these simple rules: a ‘digital’ (haha) clock.

It’s mainly based on the simple ‘glider’ construct. But this cellular automation is far from simple. Surely these gliders are representing the actual digit display, but the hard part comes with the actual representation of the actual states. Well I have to say, I did not go too much into detail but it is just amazing to see: little gliders are exchanged top down in this clockwork.

See for yourself if you like. There is a website of Life for javascript. The implementation of the clock can be found here and has to be imported in the website above. More information how the clock came into life can be found here.

Corona Sharona

Konnte es mir nicht verkneifen doch selber mal in die Daten von Johns Hopkins ( basierend auf https://github.com/CSSEGISandData/COVID-19 ) reinzuschauen. Rein deskriptiv, aber vielleicht dennoch interessant. Hier ein paar ausgewählte Länder. Auf der linken Seite die Anzahl der bestätigten Neuinfektionen, auf der rechten die mit COVID-19 in Zusammenhang gebrachten Fälle. Die Plots sollten täglichen ab 10 Uhr auf den aktuellen Stand sein, soll heißen, normalerweise mit einem Tag Verzug.

Deutschland
USA
England
Italien
Thailand
Pakistan

Andere Ländern können nachgeschaut werden unter der URL:

http://analytics.imbusch.net/corona/png/

Bleibt nur noch abzuwarten wie der weitere Verlauf ausschaut.

Eine interessante Quelle bzgl. Anzahl von durchgeführten Tests ist:

https://ourworldindata.org/covid-testing

Unitymedia ping stats

This started by asking myself how consistent is Unitymedia’s internet connection in terms of latency. So if I ping servers on the internet is the response time more or less the same over time? Surely not only Unitymedia is a factor but could be that also the server or host has been relocated for example.

I started by collecting ping data measured every five minutes or so, with three measurement and taking the mean ping response time. The timescale is from about march until now (October). I choose different hosts which are located in Germany (super close) or further away.

One can not conclude too much directly about Unitymedia and it’s network quality from this data since there are many other factors influencing as well. Anyhow I believe the spikes in early April (long response times) are mainly due to Unitymedia service as was seen in all hosts tested.

Anyway this was also mainly an exercise for collecting data, processing and presenting.

Response time in ms from a host in Germany (using the Unitymedia provider) to several other arbitrary hosts during March to October 2019.

DBeaver: great tool for database design and maintenance

Just recently discovered this great tool which is also available under Mac. It provides database connection to many database architectures, especially I was looking for Postgresql support. It also allows batch import of csv data, a short instruction for data import I found here.

For more efficient data import into Postgresql the ‘copy’ command is anyhow much more efficient and is documented here.

Schlüsseldienst Betrüger ohne Halt

Anfang 2018 habe ich einen Schlüsseldienst benötigt: schnell ist das Internet mit einer Auskunft behilflich. Scheinbar lokales und faires Angebot und eine nette Dame am Telefon vermittelte recht schnell, und in der Tat war der Schlüsseldienst zügig zur Stelle. Ich fragte nach dem Preis und war schon erstaunt, dass es mindestens mehr als 150€ sein sollten für eine zugefallene Tür plus der Mehraufwand und Materialkosten, was das auch immer sein sollte. Eine Unterschrift wollte er noch haben bevor er loslegen würde. Der Preis war klar über dem was man verlangen würde für solch eine Dienstleistung und ich ärgerte mich, dass ich mein eigenes Lock Picking Werkzeug in der Wohnung hatte. Ich entschied mich nicht zu unterschreiben, worauf der Mann vom Schlüsseldienst mir entgegensetzte, dass dann keiner mehr kommen würde, weil ich dann im System auf eine schwarze Liste komme. Blabla, aber in der Not, kann so eine Drohung eventuell schon ziehen bei verzweifelten Kunden. Schließlich, was ist denn die Alternative, wenn man jetzt zurück in seine eigene Wohnung will oder muß? So schusselig wie ich bin, hatte ich schonmal einen Schlüsseldienst benötigt bei dem ich gute Erfahrungen gemacht hatte. Ich mußte nur etwas länger suchen auf dem Smartphone: dieser Schlüsseldienst kam ohne Verzögerung, öffnete die Tür problemlos, mit Gesamtkosten von ca. 80€ mit Rechnung.

Dass es Betrüger gibt, hatte mich nicht verwundert, dass ich ihnen selber so begegnen würde schon etwas mehr, aber die eigentlich Verwunderung trat ein als mich ein Brief Ende diesen Jahres erreichte. In diesem Schreiben von der oben beschriebenen Firma, dessen Dienste ich nicht in Anspruch genommen habe, verlangte 49,90€ für einen Schlüssel Notdienst bzw. die Stornierung des Notdienstes. Wie war das nochmal? Der Schlüsseldienst warb auch damit, dass keine Anfahrtskosten anfallen, also was sind die Kosten hier? Und warum heiße ich Karl auf der Rechnung? 😉

Der Firmenname NVD GmbH sagte mir schonmal gar nichts. Die Webseite http://nvd24.de gibt etwas Auskunft:

Wir suchen Partner deutschlandweit!
18. Januar 2017
Wir sind ein Unternehmen das im Bereich der Vermittlung von Schlüsselnotdienst und Sicherheitstechnik Aufträgen tätig ist und suchen zur Verstärkung selbstständige Monteure oder Vermittlungen mit Berufserfahrung. Wir vermitteln Aufträge deutschlandweit.

Ihre Tätigkeitsfelder:

• Türöffnungen rund um die Uhr
• Montage, Wartung, Service und Instandsetzung von mechanischer Sicherheitstechnik
• Einbruchschadenbehebung
• u. v. m.

Anforderungen:

• Wichtig: !Erfahrungen im Schlüsselnotdienst-Bereich!
• handwerkliches Geschick
• selbständige, umsichtige und eigenverantwortliche Arbeitsweise
• Abrufbereitschaft rund um die Uhr, auch an Sonn- und Feiertagen
• Flexibilität
• eigener PKW und Führerschein
• Aufträge im Umkreis von 80km bedienen können

Wir bieten regelmäßige Aufträge in ganz Deutschland.

Bei Interesse senden Sie bitte Ihre Kontaktdaten per E-Mail an Herrn Andrej Danilow (a.danilow@nvd24.de)!

Passt ja thematisch wunderbar zu meiner Rechnung nur das hier Personal gesucht wird, Deutschlandweit! Das Impressum gibt noch folgende Auskunft:

NVD GmbH
Bahnstraße 69
D-06682 Teuchern

Vertretungsberechtiger Geschäftsführer: Susanne Hoppe
Inhaltlich Verantwortlich: Susanne Hoppe

Nach etwas Internetrecherche werde ich überhäuft mit passenden Treffern und finde sogar einen Fernsehbeitrag bei der ARD von Plusminus. Da bin ich ja schon erstaunt, es geht um überteuerte Preise, über ein Netzwerk von Schlüsseldiensten, bzw. wird dies sogar auf andere Handwerke ausgedehnt, alle mit dem gleichen Prinzip mit dem Kunden in der Not. Auch die vermeintliche Geschäftsführerin “Susanne Hoppe” ist schon bekannt. Überraschend finde ich, dass die Verantwortlichen dieses Netzwerkes bekannt sind und sogar schon vor Gericht stehen. Gut will man meinen, aber auch im Plusminus Beitrag wird laut, dass trotzdem das Netzwerk weiter Geld scheffelt. Das deckt sich ja auch gut mit meiner Rechnung über die Stornierung des Notdienstes. Bin ich froh, dass ich nicht unterschrieben habe, bzw. nicht wirklich den Dienst in Anspruch genommen habe.

Mittlerweile scheint die Webseite der “Deutschen Schlüsseldienst Zentrale” eingestellt zu sein. Aber die Webseite nvd24.de hat noch Bestand. Wer weiß schon, wie das nächste Netzwerk heißen wird, hoffentlich ohne “Susanne Hoppe & Co.”, aber Vorsicht auch in der Not ist immer angebracht.

raspberry pi as a gateway for sharing wireless connection by ethernet

For people who would like to share the wireless connection from someone e.g. neighbour (of course by their consent) with multiple devices might face the issue of having to connect a device, such as the Fritz Box for making VoIP calls, via ethernet and connect it by some mean to the internet.

The simple setup is like this:

From what I saw and tried for hours is that the AVM FRITZ!Box 7270 Wlan V1 is not capable of accessing the internet via an existing wireless connection. The newer models are capable of doing so such as the FRITZ!Box 7390, but yeah they also cost more 🙂

The setup deals with the following issues in a quick and dirty way:

  1. setup raspi wifi
  2. setup raspi as a gateway which performs NAT, works as a DHCP and DNS server
  3. establish a mechanism to ssh to the raspi remotely behind the wifi from the neighbour (includes dealing with IPv6 issues from Unitymedia)
setup raspi wifi

I essentially rely on ssh and command line for the raspi, using https://www.howtogeek.com/167425/how-to-setup-wi-fi-on-your-raspberry-pi-via-the-command-line/ since I have no screen or keyboard/mouse connected. I setup wlan0 as my wireless interface.

setting up raspi for NAT, DHCP and DNS

Based on https://raspberrypi.stackexchange.com/questions/48307/sharing-the-pis-wifi-connection-through-the-ethernet-port I used for NAT iptables for DHCP isc-dhcp-server and for DNS dnsmasq. At this point I could already connect my Fritz Box successfully and have VoIP functionality.

establish a mechanism to ssh on raspi remotely

Of course from time to time I need / would like to be able to access the raspi for maintenance and by remotely I really mean via internet, so I’m not closely located.  The raspi is behind another NAT and I am not able to change the routers configuration, no port-forwarding and alike. After reading some time I found autossh as a handy tool to establish and maintain a ssh connection. Well, that does not really seem to help but the actual workhorse here is ssh and constructing a reverse tunnel.

For this purpose I used https://raymii.org/s/tutorials/Autossh_persistent_tunnels.html to create a reverse tunnel to connect to my computer at home.

Hmm, yeah in principle but since I’m a customer of Unitymedia, which provides good internet speed but unfortunately I wasn’t so lucky to get a IPv4 address, only IPv6.

Is that a problem? Not really, but one just has to move some services to others which do support IPv6.

In terms of ssh and autossh I do not always want an active ssh portforwarding so that I can login but rather want something on demand. But one thing after the other:

  • I adapted the router settings on my side (Unitymedia connect box) to allow ssh on my local computer, I really like http://www.ipv6scanner.com/ now so it helps to check your router configuration
  • I setup a hostname using https://dynv6.com which provides IPv4 and IPv6 dynamic DNS services for free
  • to deal with the issue of requesting a ssh connection on demand I wrote a small script which fetches a small textfile from my webspace and checks it; depending on the content it starts autossh or not; the scripts is run hourly by crontab

My rasbora_start_autossh.sh script:


#!/bin/bash
scp username@webhostname:rasbora_start_autossh.txt .
if [ "FALSE" == "`cat rasbora_start_autossh.txt`" ]
then
echo "Not starting autossh"
else
echo "Starting autossh"
# try to get out!
# for ssh
autossh -N -f -M 45678 -o "PubkeyAuthentication=yes" -o "PasswordAuthentication=no" -i /home/username/.ssh/id_rsa -R 6666:localhost:22 username@hostname.dynv6.net
fi

By changing the content of rasbora_start_autossh.txt on the webspace the raspi connects to my machine on demand.

how to migrate imap account(s)

I recently had to change my website/mail provider, of course I also wanted to move my mails from one mailaccount to the other. With the Mac’s native email client I wasn’t very pleased dealing with this task, so my final solution for this is imapsync.

I just had very few accounts to migrate (two) so the commands for this were quite straightforward:

imapsync --host1 old_providers_host --user1 oldaccount --password1 oldpassword --port1 993 --ssl1 \
--host2 new_providers_host --user2 newaccount --password2 newpassword --ssl2 \--justfolders --dry "$@" --timeout 20 --delete2duplicates

The port setting were actually different and I had to explicitly activate SSL, but than it worked nicely. With the --dry option it just connects and does not really copy anything. I also liked to set the timeout lower, so that it fails quicker. Also to ensure I have no duplicates in my new home I added the last option.

After successful connection and testing I removed the options (--justfolders --dry "$@" --timeout 20) and let it copy the mailboxes.