如何通过REGEX从String中删除HTML标签?

我从Mysql获取数据,但问题是“HTML标签,即

<p>LARGE</p><p>Lamb;
;li;ul; 

也是我的数据提取我只需要从上面的“大”和“羔羊”。 如何从String中分离/删除HTML标记?

试试这个

 // erase html tags from a string public static string StripHtml(string target) { //Regular expression for html tags Regex StripHTMLExpression = new Regex("<\\S[^><]*>", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled); return StripHTMLExpression.Replace(target, string.Empty); } 

呼叫

 string htmlString="
hello world!
"; string strippedString=StripHtml(htmlString);

我将假设HTML完好无损,可能如下所示:

 
  • LARGE

    Lamb

 

在这种情况下,我会使用HtmlAgilityPack获取内容而不必诉诸正则表达式。

 var html = "
  • LARGE

    Lamb


 "; var hap = new HtmlDocument(); hap.LoadHtml(html); string text = HtmlEntity.DeEntitize(hap.DocumentNode.InnerText); // text is now "LARGELamb " string[] lines = hap.DocumentNode.SelectNodes("//text()") .Select(h => HtmlEntity.DeEntitize(h.InnerText)).ToArray(); // lines is { "LARGE", "Lamb", " " }

如果我们假设你要修复你的html elements

  static void Main(string[] args) { string html = WebUtility.HtmlDecode("<p>LARGE</p><p>Lamb</p>"); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); List spanNodes = doc.DocumentNode.Descendants().Where(x => x.Name == "p").ToList(); foreach (HtmlNode node in spanNodes) { Console.WriteLine(node.InnerHtml); } } 

您需要使用HTML Agility Pack。您可以像这样添加引用:

 Install-Package HtmlAgilityPack 

假如说:

  • 原始字符串总是采用该特定格式
  • 你无法添加HTMLAgilityPack,

这是一种快速而肮脏的方式来获得你想要的东西:

  static void Main(string[] args) { // Split original string on the 'separator' string. string originalString = "<p>LARGE</p><p>Lamb;
;li;ul; "; string[] sSeparator = new string[] { "</p><p>" }; string[] splitString = originalString.Split(sSeparator, StringSplitOptions.None); // Prepare to filter the 'prefix' and 'postscript' strings string prefix = "<p>"; string postfix = ";
;li;ul; "; int prefixLength = prefix.Length; int postfixLength = postfix.Length; // Iterate over the split string and clean up string s = string.Empty; for (int i = 0; i < splitString.Length; i++) { s = splitString[i]; if (s.Contains(prefix)) { s = s.Remove(s.IndexOf(prefix), prefixLength); } if (s.Contains(postfix)) { s = s.Remove(s.IndexOf(postfix), postfixLength); } splitString[i] = s; Console.WriteLine(splitString[i]); } Console.ReadLine(); }
 // Convert < > etc. to HTML String sResult = HttpUtility.HtmlDecode(sData); // Remove HTML tags delimited by <> String result = Regex.Replace(sResult, @"enter code here<[^>]*>", String.Empty);