is it possible to get the value of multiline attributes with jsdom (I use it with Node.js+JQuery)?
The site to scrape includes this HTML:
<li><a data-title="<strong>hello world
this is a test</strong>" href="example.org</strong>">A link</a></li>
Unfortunately, this gets parsed to
<li><a data-title="data-title"><strong>hello world
this is a test</strong>' href="example.org">A link</a></li>
and so i cannot extract the title and href attribute e.g. via JQuery: $("a").attr("data-title").
Any ideas?
Yes, that is a bug in jsdom parser. This is because it does not use a full HTML5-compliant parser. You can see such bugs are still unresolved :
You can try cheerio for scraping.