在c＃中使用正则表达式突出显示单词列表

我有一些包含缩写的网站内容。我有一个公认的网站缩写列表，以及他们的解释。我想创建一个正则表达式，这将允许我用一些标记替换内容中找到的所有已识别的缩写。

例如：

内容：

 这只是对膜的一个小测试，看它是否被拾起。 
 Deb当然也应该在这里抓住。

缩写：

  memb =会员;  deb =首次亮相;

结果：

 这只是[a title =“Member”] memb [/ a]的一个小测试，看看它是否被拾起。 
 [a title =“Debut”] Deb [/ a]当然也应该在这里抓到。

（这只是简单的示例标记）。

谢谢。

编辑：

CraigD的答案几乎就在那里，但也有问题。我只想匹配整个单词。我还想保持每个单词被替换的正确大小写，因此deb仍然是deb，而deb仍然是原始文本的Deb。例如，这个输入：

这只是对memb的一点测试。 
而另一个膜，但不是阿曼巴。 
 Deb当然也应该在这里抓到.deb！

首先，您需要Regex.Escape()所有输入字符串。

然后你可以在字符串中查找它们，并用你想到的标记迭代地替换它们：

 string abbr = "memb"; string word = "Member"; string pattern = String.Format("\b{0}\b", Regex.Escape(abbr)); string substitue = String.Format("[a title=\"{0}\"]{1}[/a]", word, abbr); string output = Regex.Replace(input, pattern, substitue);

编辑：我问一个简单的String.Replace()是不够的 – 但我可以看到为什么正则表达式是可取的：你只能通过制作一个使用单词边界锚点的模式来使用它来强制执行“全字”替换。

您可以从所有转义输入字符串构建单个模式，如下所示：

 \b(?:{abbr_1}|{abbr_2}|{abbr_3}|{abbr_n})\b

然后使用匹配评估程序找到正确的替代品。这样您就可以避免多次迭代输入字符串。

不确定这会扩展到一个大单词列表，但我认为它应该给你想要的输出（尽管在你的问题中’结果’似乎与’内容’相同）？

无论如何，让我知道这是你所追求的

 using System; using System.Collections.Generic; using System.Linq; using System.Text.RegularExpressions; namespace ConsoleApplication1 { class Program { static void Main(string[] args) { var input = @"This is just a little test of the memb to see if it gets picked up. Deb of course should also be caught here."; var dictionary = new Dictionary { {"memb", "Member"} ,{"deb","Debut"} }; var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")"; foreach (Match metamatch in Regex.Matches(input , regex /*@"(memb)|(deb)"*/ , RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture)) { input = input.Replace(metamatch.Value, dictionary[metamatch.Value.ToLower()]); } Console.Write (input); Console.ReadLine(); } } }

我怀疑它的表现会比正常的string.replace更好，所以如果性能是关键的衡量标准（重构一点来使用编译的正则表达式）。你可以做正则表达式版本：

 var abbrsWithPipes = "(abbr1|abbr2)"; var regex = new Regex(abbrsWithPipes); return regex.Replace(html, m => GetReplaceForAbbr(m.Value));

您需要实现GetReplaceForAbbr，它接收匹配的特定abbr。

我正在做的正是你在我的应用程序中寻找的东西，这对我有用：参数str是你的内容：

 public static string GetGlossaryString(string str) { List glossaryWords = GetGlossaryItems();//this collection would contain your abbreviations; you could just make it a Dictionary so you can have the abbreviation-full term pairs and use them in the loop below str = string.Format(" {0} ", str);//quick and dirty way to also search the first and last word in the content. foreach (string word in glossaryWords) str = Regex.Replace(str, "([\\W])(" + word + ")([\\W])", "$1$2$3", RegexOptions.IgnoreCase); return str.Trim(); }

对于任何感兴趣的人，这是我的最终解决方案它适用于.NET用户控件。它使用带有匹配评估器的单一模式，如Tomalak所建议的那样，因此没有foreach循环。这是一个优雅的解决方案，它为我提供了样本输入的正确输出，同时为匹配的字符串保留了正确的shell。

 public partial class Abbreviations : System.Web.UI.UserControl { private Dictionary dictionary = DataHelper.GetAbbreviations(); protected void Page_Load(object sender, EventArgs e) { string input = "This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!"; var regex = "\\b(?:" + String.Join("|", dictionary.Keys.ToArray()) + ")\\b"; MatchEvaluator myEvaluator = new MatchEvaluator(GetExplanationMarkup); input = Regex.Replace(input, regex, myEvaluator, RegexOptions.IgnoreCase); litContent.Text = input; } private string GetExplanationMarkup(Match m) { return string.Format("{1}", dictionary[m.Value.ToLower()], m.Value); } }

输出如下（如下）。请注意，它只匹配完整的单词，并且保留了原始字符串的大小写：

 This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!

在c＃中使用正则表达式突出显示单词列表

根据值选中或取消选中复选框？

如何用roslyn获得一个方法体的il？

Master-Detail使用Razor，ASP.NET MVC 3和.NET 4.0创建视图

使用nunit重新加载app.config

桂忙着弹出

任务MaxDegreeOfParallelism可以每次从我的列表中获取前n个对象吗？

在ASP.Net Identity（MVC）中的“注册”期间保存其他配置文件数据

如何在Web Core API中调试启动？

将包含hex值的字节数组转换为十进制值

Ftp.MakeDirectory嵌套结构