在字符串上正则表达unicode字符

我正在C＃中做一些OCR工作，并提取了我需要使用的文本。现在我需要使用正则表达式解析一行。

string checkNum; string routingNum; string accountNum; Regex regEx = new Regex(@"\u9288\d+\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\u9286\d{9}\u9286"); match = regEx.Match(numbers); if(match.Success) routingNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\d{10}\u9288"); match = regEx.Match(numbers); if (match.Success) accountNum = match.Value.Remove(match.Value.Length - 1, 1);

问题是，当我执行.ToCharArray()并检查字符串的内容时，字符串包含必要的Unicode字符，但是当我解析查找字符串的字符串时，它似乎永远不会识别Unicode字符。我认为C＃中的字符串默认是Unicode。

我想到了。我使用十进制值而不是hex代码换句话说，而不是使用\u9288 and \u9286我应该使用\u2448 and \u2446 http://www.ssec.wisc.edu/~tomw/java/unicode的.html＃x2440

谢谢你们带领我走向正确的方向。

这一行：

 match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1);

导致exception，因为第一次Remove的结果长度小于原始match.Value.Length 。

我建议你使用组来提取值。例如：

 Regex regEx = new Regex(@"\u9288(\d+)\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Groups[1].Value;

有了它，我可以正确地提取值。

.NET中的字符串是UTF-16编码的。

此外，Regex引擎与Unicode字符不匹配，但与Unicode代码点不匹配。看这篇文章。

在字符串上正则表达unicode字符

运行时Autofac通用服务解析

在asp.net中创建Excel工作簿

IP地址为127.0.0.1的地理位置错误

使用c＃删除活动目录中的用户

Azure引导字体

如何在C＃中使用LINQ方法或查询表达式检索最后5条记录

C＃中的并行迭代？

将httpHandler附加到httpclientFactory webapi aspnetcore 2.1

最近的邮政编码搜索使用asp.net

Directory.GetFiles仅获取今天的文件