C＃.NET中的UTF-16安全子字符串

我想得到一个给定长度的子串150.但是，我想确保我不切断unicode字符之间的字符串。

例如，请参阅以下代码：

var str = "Hello😀 world!"; var substr = str.Substring(0, 6);

这里substr是一个无效的字符串，因为笑脸字符被切成两半。

相反，我想要一个如下function：

 var str = "Hello😀 world!"; var substr = str.UnicodeSafeSubstring(0, 6);

其中substr包含“Hello😀”

作为参考，以下是我将如何使用rangeOfComposedCharacterSequencesForRange在Objective-C中rangeOfComposedCharacterSequencesForRange

 NSString* str = @"Hello😀 world!"; NSRange range = [message rangeOfComposedCharacterSequencesForRange:NSMakeRange(0, 6)]; NSString* substr = [message substringWithRange:range]];

C＃中的等效代码是什么？

这应该返回从索引startIndex开始的最大子字符串，并且长度达到“完整”字形的length …因此将删除初始/最终“分裂”代理对，初始组合标记将被删除，最终字符将缺少其组合标记将被删除。

请注意，可能它不是你问的…你似乎想要使用字素作为度量单位（或者你想要包括最后一个字母，即使它的长度超过length参数）

 public static class StringEx { public static string UnicodeSafeSubstring(this string str, int startIndex, int length) { if (str == null) { throw new ArgumentNullException("str"); } if (startIndex < 0 || startIndex > str.Length) { throw new ArgumentOutOfRangeException("startIndex"); } if (length < 0) { throw new ArgumentOutOfRangeException("length"); } if (startIndex + length > str.Length) { throw new ArgumentOutOfRangeException("length"); } if (length == 0) { return string.Empty; } var sb = new StringBuilder(length); int end = startIndex + length; var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex); while (enumerator.MoveNext()) { string grapheme = enumerator.GetTextElement(); startIndex += grapheme.Length; if (startIndex > length) { break; } // Skip initial Low Surrogates/Combining Marks if (sb.Length == 0) { if (char.IsLowSurrogate(grapheme[0])) { continue; } UnicodeCategory cat = char.GetUnicodeCategory(grapheme, 0); if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark) { continue; } } sb.Append(grapheme); if (startIndex == length) { break; } } return sb.ToString(); } }

Variant将简单地在子串的末尾包含“额外”字符，如果有必要使整个字形：

 public static class StringEx { public static string UnicodeSafeSubstring(this string str, int startIndex, int length) { if (str == null) { throw new ArgumentNullException("str"); } if (startIndex < 0 || startIndex > str.Length) { throw new ArgumentOutOfRangeException("startIndex"); } if (length < 0) { throw new ArgumentOutOfRangeException("length"); } if (startIndex + length > str.Length) { throw new ArgumentOutOfRangeException("length"); } if (length == 0) { return string.Empty; } var sb = new StringBuilder(length); int end = startIndex + length; var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex); while (enumerator.MoveNext()) { if (startIndex >= length) { break; } string grapheme = enumerator.GetTextElement(); startIndex += grapheme.Length; // Skip initial Low Surrogates/Combining Marks if (sb.Length == 0) { if (char.IsLowSurrogate(grapheme[0])) { continue; } UnicodeCategory cat = char.GetUnicodeCategory(grapheme, 0); if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark) { continue; } } sb.Append(grapheme); } return sb.ToString(); } }

这将返回你问的"Hello😀 world!".UnicodeSafeSubstring(0, 6) == "Hello😀" 。

看起来你正在寻找在字形上拆分字符串，即在单个显示的字符上。

在这种情况下，您有一个方便的方法： StringInfo.SubstringByTextElements ：

 var str = "Hello😀 world!"; var substr = new StringInfo(str).SubstringByTextElements(0, 6);

C＃.NET中的UTF-16安全子字符串

entity framework：如何避免表中的Discriminator列？

是否可以使FolderBrowserDialog的默认路径显示在库而不是实际磁盘中？

使用WCF的身份validation服务

在SharePoint 2010中启用会话状态？

是否有.NET就绪方法来处理HttpListener HttpListenerRequest主体的响应主体？

ASP.NET MVC和登录身份validation

如何设置pdf页面设置以打印属性对话框？

有没有办法在.NET中找到嵌入式资源的最后修改日期？

为什么我不能做foreach（DataTable.Rows中的var Item）？

c＃中的powers powershell命令如何