I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...
My pattern:
<p>(.*?)</p>
Subject:
<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>
Result :
My content
What I want:
My content. Second sentence.
There is no "capture all group matches" (analogous to PHP's preg_match_all) in JavaScript, but you can cheat by using .replace:
var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
//arguments[0] is the entire match
matches.push(arguments[1]);
});
To get more than one match of a pattern the global flag g is added.
The match method ignores capture groups () when matching globally, but the exec method does not. See MDN exec.
var m,
rex = /<p>(.*?)<\/p>/g,
str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';
while ( ( m = rex.exec( str ) ) != null ) {
console.log( m[1] );
}
// My content.
// Second sentence.
If there may be newlines between the paragraphs, use [\s\S], meaning match any space or non-space character, instead of ..
Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.