如何用C＃解析文本文件

通过文本格式我意味着更复杂的东西。

起初我开始手动将我要问这个问题的文本文件中的5000行添加到我的项目中。

文本文件有5000行，长度不同。例如：

1 1 ITEM_ETC_GOLD_01 골드(소) xxx xxx xxx_TT_DESC 0 0 3 3 5 0 180000 3 0 1 0 0 255 1 1 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 0 100 0 0 0 xxx item\etc\drop_ch_money_small.bsr xxx xxx xxx 0 2 0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 표현할 골드의 양(param1이상) -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx 0 0 1 4 ITEM_ETC_HP_POTION_01 HP 회복 약초 xxx SN_ITEM_ETC_HP_POTION_01 SN_ITEM_ETC_HP_POTION_01_TT_DESC 0 0 3 3 1 1 180000 3 0 1 1 1 255 3 1 0 0 1 0 60 0 0 0 1 21 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 0 100 0 0 0 xxx item\etc\drop_ch_bag.bsr item\etc\hp_potion_01.ddj xxx xxx 50 2 0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP회복양 0 HP회복양(%) 0 MP회복양 0 MP회복양(%) -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx 0 0 1 5 ITEM_ETC_HP_POTION_02 HP 회복약 (소) xxx SN_ITEM_ETC_HP_POTION_02 SN_ITEM_ETC_HP_POTION_02_TT_DESC 0 0 3 3 1 1 180000 3 0 1 1 1 255 3 1 0 0 1 0 110 0 0 0 2 39 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 0 100 0 0 0 xxx item\etc\drop_ch_bag.bsr item\etc\hp_potion_02.ddj xxx xxx 50 2 0 0 2 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP회복양 0 HP회복양(%) 0 MP회복양 0 MP회복양(%) -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx 0 0

第一个字符（1）和第二个字符（1/4/5）之间的文本不是空格，它是一个制表符。该文本文件中没有空格。

我想要的是：

我想获得第二个整数（在我上面发布的三行中，第二个整数是1,4和5），每行中间的字符串表示路径（以“item”开头，以文件扩展名“.ddj”）。

我的问题：

当我谷歌“文本格式化C＃” – 我得到的是如何打开文本文件以及如何在C＃中编写文本文件。我不知道如何在文本文件中搜索文本。我也无法搜索对于第一个整数，因为如果它是一个像我上面发布的三行中的小整数，我将无法找到正确的位置，因为例如“1”可能存在于不同的位置。

我的问题：

这将是最好的如果我写一个程序，将删除任何东西，但我需要什么。

在我的脑海中另一种方式是直接搜索该文件，但正如我上面提到的 – 如果它太低，我可能会得到第二个整数的错误位置。

请提出建议，我不能手工格式化所有这些。

好的，这就是我们的工作：打开文件，逐行读取，然后按标签拆分。然后我们抓住第二个整数并遍历其余整数以找到路径。

 StreamReader reader = File.OpenText("filename.txt"); string line; while ((line = reader.ReadLine()) != null) { string[] items = line.Split('\t'); int myInteger = int.Parse(items[1]); // Here's your integer. // Now let's find the path. string path = null; foreach (string item in items) { if (item.StartsWith("item\\") && item.EndsWith(".ddj")) path = item; } // At this point, `myInteger` and `path` contain the values we want // for the current line. We can then store those values or print them, // or anything else we like. }

另一种解决方案，这次使用正则表达式：

 using System.Text.RegularExpressions; ... Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)"); StreamReader reader = FileInfo.OpenText("filename.txt"); string line; while ((line = reader.ReadLine()) != null) { Match match = parts.Match(line); if (match.Success) { int number = int.Parse(match.Group(1).Value); string path = match.Group(2).Value; // At this point, `number` and `path` contain the values we want // for the current line. We can then store those values or print them, // or anything else we like. } }

那个表达有点复杂，所以在这里它被分解了：

 ^ Start of string \d+ "\d" means "digit" - 0-9. The "+" means "one or more." So this means "one or more digits." \t This matches a tab. (\d+) This also matches one or more digits. This time, though, we capture it using brackets. This means we can access it using the Group method. \t Another tab. .+? "." means "anything." So "one or more of anything". In addition, it's lazy. This is to stop it grabbing everything in sight - it'll only grab as much as it needs to for the regex to work. \t Another tab. (item\\[^\t]+\.ddj) Here's the meat. This matches: "item\.ddj"

你可以这样做：

 using (TextReader rdr = OpenYourFile()) { string line; while ((line = rdr.ReadLine()) != null) { string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC int theInt = Convert.ToInt32(fields[1]); } }

搜索“格式化”时未找到相关结果的原因是您正在执行的操作称为“解析”。

就像它已经提到的那样，我强烈建议使用正则表达式（在System.Text中）来完成这种工作。

与RegexBuddy这样的实用工具相结合，您正在考虑处理任何复杂的文本记录解析情况，以及快速获得结果。该工具使其变得非常简单。

希望有所帮助。

我发现在这种情况下非常有用的一种方法是使用Jet OLEDB提供程序以及schema.ini文件来使用ADO.Net读取大型制表符分隔文件。显然，如果您知道要导入的文件的格式，此方法才真正有用。

 public void ImportCsvFile(string filename) { FileInfo file = new FileInfo(filename); using (OleDbConnection con = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + file.DirectoryName + "\"; Extended Properties='text;HDR=Yes;FMT=TabDelimited';")) { using (OleDbCommand cmd = new OleDbCommand(string.Format ("SELECT * FROM [{0}]", file.Name), con)) { con.Open(); // Using a DataReader to process the data using (OleDbDataReader reader = cmd.ExecuteReader()) { while (reader.Read()) { // Process the current reader entry... } } // Using a DataTable to process the data using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd)) { DataTable tbl = new DataTable("MyTable"); adp.Fill(tbl); foreach (DataRow row in tbl.Rows) { // Process the current row... } } } } }

一旦你有一个像数据表这样的漂亮格式的数据，过滤掉你需要的数据变得非常简单。

试试正则表达式。您可以在文本中找到某种模式，并将其替换为您想要的内容。我现在无法给你确切的代码，但你可以用这个来测试你的表达式。

http://www.radsoftware.com.au/regexdesigner/

您可以打开文件并使用StreamReader.ReadLine逐行读取文件。然后，您可以使用String.Split将每一行分成多个部分（使用\ t分隔符）来提取第二个数字。

由于项目数量不同，您需要在字符串中搜索模式’item \ * .ddj’。

要删除项目，您可以（例如）将所有文件的内容保留在内存中，并在用户单击“保存”时写出新文件。

如何用C＃解析文本文件

这可以用Moq嘲笑吗？

发布模式下的无限循环

为什么EF 6教程使用异步调用？

基于带标志的枚举的MultiSelect WPF ComboBox / ListBox

如何ToLookup（）与多个索引？

在C＃中向TabControl选项卡添加按钮

动态计算字符串表达式的结果

在构建后事件上更新本地nuget包

有没有办法生成WMI代码/类？

按下后退按钮时跳过页面，WP7