c# - Regexp that matches all the text content of a HTML input -
I have articles on my website which I want to get right and want to translate automatically. But I need to get content without surrounding HTML tags.
This idea should be a regex that can capture all the content between the tag (and, if possible, the content in the tag is found in fields such as & lt; img alt = 'Little House '& gt;
). The problem is that I really do not know how to write such regex. Any ideas? Instead of relying on a regex, I would recommend using one.
HTML parsing with Regex is not usually a number and it is almost impossible to get the right for all cases. There are many questions here that reach the same conclusion.
Edit It seems that we both had the same idea ... besides that, which discusses more parser. / P>
Comments
Post a Comment