提取相似前缀和后缀Bug之间的标题

我在上一个问题的答案中发现了一个错误。我想看看是否有人可以帮忙修复它。

题
排除字符串列表中的相似性以提取差异

回答
https://stackoverflow.com/a/49714706/6806643

除了书的标题之外，我有一个相同的句子列表。

代码循环，找到类似的prefixes和suffixes ，并在中间提取标题。

这本名为蝇王的书是经典之作。
名为To Kill a Mockingbird的书是经典之作。
这本名为The Catcher in the Rye的书是经典之作。

The book named （前缀） The book named和is a classic （后缀）将被删除。

演示
http://rextester.com/FXXSVN30342

错误

http://rextester.com/DUJZN22339

如果字符串没有类似的suffix并且书籍标题都以相同的字符结尾，则该字符最终会被删除。在这种情况下。

这本书名为蝇王
这本书名叫“呼啸山庄”
这本书名为Great Expectations

过滤

Flie之王
呼啸山庄
很期待

C夏普

注意：这是一个示例列表，我在不同于书名的字符串上使用它。

 public static void Main(string[] args) { var sentences = new List() { "The book named Lord of the Flies", "The book named Wuthering Heights", "The book named Great Expectations" }; var titles = ExtractDifferences(sentences); Console.WriteLine(string.Join("\n", titles)); } static List ExtractDifferences(List sentences) { var firstDiffIndex = GetFirstDifferenceIndex(sentences); var lastDiffIndex = GetFirstDifferenceIndex(sentences.Select(s => new string(s.Reverse().ToArray())).ToList()); return sentences.Select(s => s.Substring(firstDiffIndex, s.Length - lastDiffIndex - firstDiffIndex)).ToList(); } static int GetFirstDifferenceIndex(IList strings) { int firstDifferenceIndex = int.MaxValue; for (int i = 0; i  new { CurrentChar = c, Index = j }) .FirstOrDefault(ci => ci.CurrentChar != prev[ci.Index]) .Index; if (firstDiffIndex < firstDifferenceIndex) { firstDifferenceIndex = firstDiffIndex; } } return firstDifferenceIndex; }

您可以通过回溯到最近的单词边界来处理删除单词部分的问题。在这里，我只是假设这是一个空间，但如果需要，您可能希望扩展它。

在处理带有常用词的书籍时，我首先想到的是假设它们将被大写。所以除了句子的第一个字母之外，你也可以停在第一个作为首都的角色。

此外，您可以通过不比较第一个算法来改进当前算法。只比较第1和第2，然后第2和第3，依此类推到倒数第二个，最后一个就足够了。如果它确定差异的开始为零，您可以立即返回。

 static int GetFirstDifferenceIndex(IList strings) { int firstDifferenceIndex = int.MaxValue; for (int i = 1; i < strings.Count; i++) { var current = strings[i]; var prev = strings[i - 1]; // Index of first character that is different or that is a capital letter // other than the first character of the sentence. var firstDiffIndex = current .Select((c, j) => new { CurrentChar = c, Index = j }) .FirstOrDefault(ci => ci.CurrentChar != prev[ci.Index] || (ci.Index != 0 && char.IsUpper(ci.CurrentChar))) .Index; // back track to the beginning or until the previous char is a space while(firstDiffIndex > 0 && current[firstDiffIndex-1] != ' ') { firstDiffIndex--; } if(firstDiffIndex == 0) return 0; if (firstDiffIndex < firstDifferenceIndex) { firstDifferenceIndex = firstDiffIndex; } } return firstDifferenceIndex; }

这将采取句子

这本书名为“指环王”

这本书名为蝇王

和输出

指环王

蝇王

由于后向跟踪，当您反转句子时，它也可以与具有共同结尾的书名一起使用

这本名为The Old Man and The Sea的书是经典之作

这本名为Alone on a Wide，Wide Sea的书是经典之作

会导致

老人与海

独自在宽阔的海面上

但是，当然这依赖于书籍标题的第一个和最后一个单词以大写字母开头，只有前缀的第一个字符是大写字母（并且没有后缀以大写字母开头）。要处理可能失败的情况，您必须开始分析会导致非常复杂算法的词性。

假设：只有当前缀（名为书）和后缀（经典。）出现时 – 书名才会成为输出的一部分。例如：

这本名为蝇王的书是经典之作。 – >会传 – 书名：蝇王
“蝇王”一书是经典之作。 – >不会通过
这本名为蝇王的书很经典。 – >不会通过

如果以上是正确的 – 为什么不使用Regex – 为模式匹配目的而构建：

 ^(?The book named) (?.+) (?is a classic.)$

此正则表达式将为您找到书名（请记住使用忽略大小写正则表达式选项）。查看截图：

在此处输入图像描述

提取相似前缀和后缀Bug之间的标题

错误

C夏普

c＃中基于文件系统的B + Tree实现

IQueryable C＃选择

C＃make class自动注册

允许Enter键登录asp.net？

使用具有不同绑定的DataTrigger的样式

如何避免System.Runtime.InteropServices.COMException？

添加到类中的列表

XML文档中的反序列化错误（1,1）

语音/语音到文本

解析excel文件并读取单元格