快速有效地将空格分隔的数字文件读入数组？

我需要一种快速有效的方法来将带有数字的空格分隔文件读入数组。文件格式如下：

4 6 1 2 3 4 5 6 2 5 4 3 21111 101 3 5 6234 1 2 3 4 2 33434 4 5 6

第一行是数组[行列]的维度。以下行包含数组数据。

数据也可以格式化，没有任何新行，如下所示：

 4 6 1 2 3 4 5 6 2 5 4 3 21111 101 3 5 6234 1 2 3 4 2 33434 4 5 6

我可以读取第一行并使用行和列值初始化数组。然后我需要用数据值填充数组。我的第一个想法是逐行读取文件并使用split函数。但列出的第二种格式让我暂停，因为整个数组数据将一次性加载到内存中。其中一些文件位于100 MB中。第二种方法是以块的forms读取文件，然后逐个解析它们。也许其他人有更好的方法这样做？

怎么样：

  static void Main() { // sample data File.WriteAllText("my.data", @"4 6 1 2 3 4 5 6 2 5 4 3 21111 101 3 5 6234 1 2 3 4 2 33434 4 5 6"); using (Stream s = new BufferedStream(File.OpenRead("my.data"))) { int rows = ReadInt32(s), cols = ReadInt32(s); int[,] arr = new int[rows, cols]; for(int y = 0 ; y < rows ; y++) for (int x = 0; x < cols; x++) { arr[y, x] = ReadInt32(s); } } } private static int ReadInt32(Stream s) { // edited to improve handling of multiple spaces etc int b; // skip any preceeding while ((b = s.ReadByte()) >= 0 && (b < '0' || b > '9')) { } if (b < 0) throw new EndOfStreamException(); int result = b - '0'; while ((b = s.ReadByte()) >= '0' && b <= '9') { result = result * 10 + (b - '0'); } return result; }

实际上，这对于分隔符并不是非常具体 - 它几乎都假设任何非整数都是分隔符，它只支持ASCII（如果需要其他编码，则使用阅读器）。

一旦加载数据，您的使用模式是什么？您通常需要触摸每个数组元素还是只进行稀疏/随机访问？

如果您需要触摸大多数数组元素，将其加载到内存中可能是最好的方法。

如果您只需要访问某些元素，则可能需要将所需的元素延迟加载到内存中。一种策略是确定文件使用的两个布局中的哪一个（使用/不使用换行符）并创建一个算法以根据需要直接从磁盘加载特定元素（查找给定的文件偏移量，读取和解析）。为了有效地重新访问相同的元素，将元素（一旦读取）保留在由偏移量索引的字典中是有意义的。在转到特定值的文件之前先检查字典。

总的来说，我会采用简单的路线，除非你的测试certificate你需要走更复杂的路线（避免过早优化）。

一次读取一个字符文件。如果是空白，请开始一个新号码。如果是数字，请使用它。

对于具有多个数字的数字，请保留计数器变量：

 int counter = 0; while (fileOpen) { char ch = readChar(); // use your imagination to define this method. if (isDigit(ch)) { counter *= 10; counter += asciiToDecimal(ch); } else if (isWhitespace(ch)) { appendToArray(counter); counter = 0; } else { // Error? } }

编辑澄清。

除非您正在解析这些文本文件的机器是有限的，否则几百MB的文件仍应适合内存。我建议你采用第一种逐行阅读方法并使用拆分方法。

如果内存成为一个问题，你的第二种阅读方法应该可以正常工作。

基本上我所说的只是实现它并测量性能是否是一个问题。

让我们假设我们已经将整个文件读成字符串。
你说前两个是行和列，所以我们肯定需要解析数字。
之后，我们可以采用前两个，创建我们的数据结构，并相应地填充它。

 var fileData = File.ReadAllText(...).Split(' '); var convertedToNumbers = fileData.Select(entry => int.Parse(entry)); int rows = convertedToNumbers.First(); int columns = convertedToNumbers.Skip(1).First(); // Now we have the number of rows, number of columns, and the data. int[,] resultData = new int[rows, columns]; // Skipping over rows and columns values. var indexableData = convertedToNumbers.Skip(2).ToList(); for(int i=0; i


 另一种方法是从流中读取前两个，初始化数组，然后一次读取n个值，这将是复杂的。 此外，最好尽可能在最短的时间内保持文件打开。



		      	 您希望将文件流式传输到内存中并随时解析。 
 private IEnumerable StreamAsSpaceDelimited(this StreamReader reader) { StringBuilder builder = new StringBuilder(); int v; while((v = reader.Read()) != -1) { char c = (char) v; if(Char.IsWhiteSpace(c)) { if(builder.Length >0) { yield return builder.ToString(); builder.Clear(); } } else { builder.Append(c); } } yield break; } 
 这会将文件解析为空间分隔的字符串集合（ 懒惰 ），然后您可以将它们读作双打，就像： 
 using(StreamReader sr = new StreamReader("filename")) { var nums = sr.StreamAsSpaceDelimited().Select(s => int.Parse(s)); var enumerator = nums.GetEnumerator(); enumerator.MoveNext(); int numRows = enumerator.Current; enumerator.MoveNext(); int numColumns = enumerator.current; int r =0, c = 0; int[][] destArray = new int[numRows][numColumns]; while(enumerator.MoveNext()) { destArray[r][c] = enumerator.Current; c++; if(c == numColumns) { c = 0; r++; if(r == numRows) break;//we are done } } 
 因为我们使用迭代器，所以一次不应该读取多个字符。 这是用于解析大文件的常用方法（例如，这是LINQ2CSV的工作方式）。 



		      	 这有两种方法 
 IEnumerable GetArrays(string filename, bool skipFirstLine) { using (StreamReader reader = new StreamReader(filename)) { if (skipFirstLine && !reader.EndOfStream) reader.ReadLine(); while (!reader.EndOfStream) { string temp = reader.ReadLine(); int[] array = temp.Trim().Split().Select(s => int.Parse(s)).ToArray(); yield return array; } } } int[][] GetAllArrays(string filename, bool skipFirstLine) { int skipNumber = 0; if (skipFirstLine ) skipNumber = 1; int[][] array = File.ReadAllLines(filename).Skip(skipNumber).Select(line => line.Trim().Split().Select(s => int.Parse(s)).ToArray()).ToArray(); return array; } 
 如果你正在处理大文件，那么第一个可能是首选的。 如果文件很小，那么第二个文件可以将整个文件加载到锯齿状数组中。



  如何在WPF中通过HitTesting使用坐标/点获取TreeViewItem？
  最佳实践方法用于逆向工程VB6代码，不知道域
	在RichtextBox中着色文本，C＃
我可以让我的类在运行时inheritance自另一个类吗？
如何检查DataGridView是否包含“x”列并且“x”列是否可见？
C＃out参数vs返回
在C＃中将AT命令发送到USB连接的GPRS调制解调器
散列/分片ActionBlocks
System.NullReferenceException未将对象引用设置为对象的实例
如何自动响应msgbox
我可以在Rhino-Mocks 3.6中使用AAA语法测试方法调用顺序吗？

快速有效地将空格分隔的数字文件读入数组？

将varbinary转换为图像

C＃SharpZipLib剥离不相关的目录名称

Visual Studio 2010的HWnd

随机遭遇不是那么随意

jqGrid – 如何根据* initial *列值设置自定义editoptions？

将ARGB拆分为字节值

密码更改Active Directory用户时出错

如何检测mp3文件何时播放完毕

使用Replace和Length检查避免SQL Not IN

更新构建控制器/代理以构建C＃6 /.NET 4.6应用程序