C＃Regex性能非常慢

我是正则表达式主题的新手。我想用以下正则表达式解析日志文件：

(?(.*?))[|](?(.*?))[|](?(.*?))[|](?[1-3])[|](?(.*?))[|][|][|](?(.*?))[|][|](?(.*?))[|](?(.*))

日志行如下所示：

 2001.07.13 09:40:20|1|SomeSection|3|====== Some log message::Type: test=sdfsdf|||.\SomeFile.cpp||60|-1

带appr的日志文件 3000行需要很长时间来解析它。你有一些提示来加快表现吗？谢谢…

更新：我使用正则表达式，因为我使用不同的日志文件，不具有相同的结构，我使用它：

 string[] fileContent = File.ReadAllLines(filePath); Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat)); foreach (var line in fileContent) { // Split log line Match match = pattern.Match(line); string logDate = match.Groups["time"].Value.Trim(); string logLevel = match.Groups["level"].Value.Trim(); // And so on... }

解：
谢谢你的帮助。我用以下结果测试了它：

1.）只添加了RegexOptions.Compiled：
从00：01：10.9611143到00：00：38.8928387

2.）使用Thomas Ayoub正则表达式
从00：00：38.8928387至00：00：06.3839097

3.）使用了WiktorStribiżew正则表达式
从00：00：06.3839097 至00：00：03.2150095

非常感谢你的帮助!!!

让我将我的评论“转换”成答案，因为现在我看到你可以对正则表达式的表现做些什么。

正如我上面提到的，替换所有.*? 使用[^|]* ，以及所有重复[|][|][|]和[|]{3} （或类似的，取决于[|]的数量。另外，不要使用嵌套的捕获组，也会影响性能！

 var logFileFormat = @"(?[^|]*)[|](?[^|]*)[|](?[^|]*)[|](?[1-3])[|](?[^|]*)[|]{3}(?[^|]*)[|]{2}(?[^|]*)[|](?.*)";

只有最后一个.*可以保持“狂野的”，因为它将抓住剩余的线。

以下是RegexHero中您和我的正则表达式模式的比较。

在此处输入图像描述

然后，使用RegexOptions.Compiled ：

 Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat), RegexOptions.Compiled);

如果您多次使用相同的正则表达式，请确保编译它，以便您不会每次都重新创建正则表达式。这可以产生多个数量级。

 var regex = new Regex(".*", RegexOptions.Compiled);

以下LinqPad代码显示了使用正则表达式的3种方法，从最快到最慢。

regexFast方法大约需要5秒， regexSlow方法需要6秒， regexSlowest大约需要50秒。

 void Main() { var sw = new Stopwatch(); var regex = @"(?T[he]{2})\s*\w{5}.*"; // This is the fastest method. sw.Restart(); var regexFast = new Regex(regex, RegexOptions.Compiled); for (int i = 0; i < 9999999; i++) { regexFast.Match("The quick brown fox"); } sw.Stop(); sw.ElapsedMilliseconds.Dump(); // This is a little slower - we didn't compile the regex so it has // to do some extra work on each iteration. sw.Restart(); var regexSlow = new Regex(regex); for (int i = 0; i < 9999999; i++) { regexSlow.Match("The quick brown fox"); } sw.Stop(); sw.ElapsedMilliseconds.Dump(); // This method is super slow - we create a new Regex each time, so // we have to do *lots* of extra work. sw.Restart(); for (int i = 0; i < 9999999; i++) { var regexSlowest = new Regex(regex); regexSlowest.Match("The quick brown fox"); } sw.Stop(); sw.ElapsedMilliseconds.Dump(); }

您的正则表达式可以优化为：

 (?([^|]*))[|](?([^|]*))[|](?([^|]*))[|](?[1-3])[|](?([^|]*))[|]{3}(?([^|]*))[|][|](?([^|]*))[|](?([^|]*))

使用否定的char类而不是惰性量词。它减少了回溯。通过此更改，Regex101从316步骤变为47步。将它与RB结合起来。你应该没问题

C＃Regex性能非常慢

DotNetZip中大型文件存档的压缩问题

从C＃中的DataGridView读取数据

使用foreach循环和XmlNodeList C＃将新节点附加到节点列表

在C＃中实现阻塞队列

Datagridview列限制

是否有一个同步类来保证C＃中的FIFO顺序？

WPF绑定到集合中所有项的属性

找到网站的主机头值？

FormStartPosition.CenterParent不起作用

ImmutableList 中的性能降低Microsoft.Bcl.Immutable中的删除方法