Some tools like jetpack: https://jetpack.com/support/lazy-images/
use a temporary srcset element set to a data image when lazy
loading, these are later removed by JS. We test if there is
a valid src attribute and if the srcset attribute is a data embed
to remove these.
* remove arrow image when parsing
* ignore m_article classname element which indicates a mobile version of the website
* generate test page for city journal
This uses negative lookahead to reject nodes that have outstream
ads embedded. Previously they were being accepted because they
contained `$article` in the class name.