使用FileStream.Seek

我正在尝试使用FileStream.Seek快速跳转到一行并阅读它。

但是,我没有得到正确的结果。 我试着看了一会儿,却无法理解我做错了什么。

环境:
操作系统:Windows 7
框架:.NET 4.0
IDE:Visual C#Express 2010

文件位置中的示例数据:C:\ Temp \ Temp.txt

 0001 | 100 2500!
技术| 100 2500!
 0003 | 100 2500!
 0004 | 100 2500!
 0005 | 100 2500!
 0006 | 100 2500!
 0007 | 100 2500!
 0008 | 100 2500!
 0009 | 100 2500!
 0010 | 100 2500!

代码:

class PaddedFileSearch { private int LineLength { get; set; } private string FileName { get; set; } public PaddedFileSearch() { FileName = @"C:\Temp\Temp.txt"; // This is a padded file. All lines are of the same length. FindLineLength(); Debug.Print("File Line length: {0}", LineLength); // TODO: This purely for testing. Move this code out. SeekMethod(new int[] { 5, 3, 4 }); /* Expected Results: * Line No Position Line * ------- -------- ----------------- * 3 30 0003|100!2500 * 4 15 0004|100!2500 * 5 15 0005|100!2500 -- This was updated after the initial request. */ /* THIS DOES NOT GIVE THE EXPECTED RESULTS */ SeekMethod(new int[] { 5, 3 }); /* Expected Results: * Line No Position Line * ------- -------- ----------------- * 3 30 0003|100!2500 * 5 30 0005|100!2500 */ } private void FindLineLength() { string line; // Add check for FileExists using (StreamReader reader = new StreamReader(FileName)) { if ((line = reader.ReadLine()) != null) { LineLength = line.Length + 2; // The 2 is for NewLine(\r\n) } } } public void SeekMethod(int[] lineNos) { long position = 0; string line = null; Array.Sort(lineNos); Debug.Print(""); Debug.Print("Line No\t\tPosition\t\tLine"); Debug.Print("-------\t\t--------\t\t-----------------"); using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None)) { using (StreamReader reader = new StreamReader(fs)) { foreach (int lineNo in lineNos) { position = (lineNo - 1) * LineLength - position; fs.Seek(position, SeekOrigin.Current); if ((line = reader.ReadLine()) != null) { Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line); } } } } } } 

我得到的输出:

文件行长度:15

线无位置线
 ------- -------- -----------------
 3 30 0003 | 100!2500
 4 15 0004 | 100!2500
 5 45 0005 | 100!2500

线无位置线
 ------- -------- -----------------
 3 30 0003 | 100!2500
 5 30 0004 | 100!2500

我的问题是以下输出:

线无位置线
 ------- -------- -----------------
 5 30 0004 | 100!2500

Line的输出应为: 0005 | 100!2500

我不明白为什么会这样。

难道我做错了什么? 有解决方法吗? 还有更快的方法来使用像搜索这样的东西吗?
(我正在寻找基于代码的选项而不是 Oracle或SQL Server。为了参数的缘故,我们也可以说文件大小为1 GB。)

任何帮助是极大的赞赏。

谢谢。

更新:
我在这里找到了4个很棒的答案 非常感谢。

样品计时:
基于几次运行,以下是从最佳到良好的方法。 即使是好的也是非常接近最好的。
在包含10K行的文件中,2.28 MB。 我使用所有选项搜索相同的5000个随机行。

  1. Seek4:时间流逝:00:00:00.0398530 ms – Ritch Melton
  2. Seek3:时间流逝:00:00:00.0446072 ms – Valentin Kuzub
  3. 寻求1:时间流逝:00:00:00.0538210 ms – 杰克
  4. Seek2:经过的时间:00:00:00.0889589 ms – bitxwise

下面显示的是代码。 保存代码后,您只需键入TestPaddedFileSeek.CallPaddedFileSeek(); 。 您还必须指定命名空间和“使用引用”。

`

 ///  /// This class multiple options of reading a by line number in a padded file (all lines are the same length). /// The idea is to quick jump to the file. /// Details about the discussions is available at: http://stackoverflow.com/questions/5201414/having-a-problem-while-using-filestream-seek-in-c-solved ///  class PaddedFileSeek { public FileInfo File {get; private set;} public int LineLength { get; private set; } #region Private methods private static int FindLineLength(FileInfo fileInfo) { using (StreamReader reader = new StreamReader(fileInfo.FullName)) { string line; if ((line = reader.ReadLine()) != null) { int length = line.Length + 2; // The 2 is for NewLine(\r\n) return length; } } return 0; } private static void PrintHeader() { /* Debug.Print(""); Debug.Print("Line No\t\tLine"); Debug.Print("-------\t\t--------------------------"); */ } private static void PrintLine(int lineNo, string line) { //Debug.Print("{0}\t\t\t{1}", lineNo, line); } private static void PrintElapsedTime(TimeSpan elapsed) { Debug.WriteLine("Time elapsed: {0} ms", elapsed); } #endregion public PaddedFileSeek(FileInfo fileInfo) { // Possibly might have to check for FileExists int length = FindLineLength(fileInfo); //if (length == 0) throw new PaddedProgramException(); LineLength = length; File = fileInfo; } public void CallAll(int[] lineNoArray, List lineNoList) { Stopwatch sw = new Stopwatch(); #region Seek1 // Create new stopwatch sw.Start(); Debug.Write("Seek1: "); // Print Header PrintHeader(); Seek1(lineNoArray); // Stop timing sw.Stop(); // Print Elapsed Time PrintElapsedTime(sw.Elapsed); sw.Reset(); #endregion #region Seek2 // Create new stopwatch sw.Start(); Debug.Write("Seek2: "); // Print Header PrintHeader(); Seek2(lineNoArray); // Stop timing sw.Stop(); // Print Elapsed Time PrintElapsedTime(sw.Elapsed); sw.Reset(); #endregion #region Seek3 // Create new stopwatch sw.Start(); Debug.Write("Seek3: "); // Print Header PrintHeader(); Seek3(lineNoArray); // Stop timing sw.Stop(); // Print Elapsed Time PrintElapsedTime(sw.Elapsed); sw.Reset(); #endregion #region Seek4 // Create new stopwatch sw.Start(); Debug.Write("Seek4: "); // Print Header PrintHeader(); Seek4(lineNoList); // Stop timing sw.Stop(); // Print Elapsed Time PrintElapsedTime(sw.Elapsed); sw.Reset(); #endregion } ///  /// Option by Jake ///  ///  public void Seek1(int[] lineNoArray) { long position = 0; string line = null; Array.Sort(lineNoArray); using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None)) { using (StreamReader reader = new StreamReader(fs)) { foreach (int lineNo in lineNoArray) { position = (lineNo - 1) * LineLength; fs.Seek(position, SeekOrigin.Begin); if ((line = reader.ReadLine()) != null) { PrintLine(lineNo, line); } reader.DiscardBufferedData(); } } } } ///  /// option by bitxwise ///  public void Seek2(int[] lineNoArray) { string line = null; long step = 0; Array.Sort(lineNoArray); using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None)) { // using (StreamReader reader = new StreamReader(fs)) // If you put "using" here you will get WRONG results. // I would like to understand why this is. { foreach (int lineNo in lineNoArray) { StreamReader reader = new StreamReader(fs); step = (lineNo - 1) * LineLength - fs.Position; fs.Position += step; if ((line = reader.ReadLine()) != null) { PrintLine(lineNo, line); } } } } } ///  /// Option by Valentin Kuzub ///  ///  #region Seek3 public void Seek3(int[] lineNoArray) { long position = 0; // totalPosition = 0; string line = null; int oldLineNo = 0; Array.Sort(lineNoArray); using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None)) { using (StreamReader reader = new StreamReader(fs)) { foreach (int lineNo in lineNoArray) { position = (lineNo - oldLineNo - 1) * LineLength; fs.Seek(position, SeekOrigin.Current); line = ReadLine(fs, LineLength); PrintLine(lineNo, line); oldLineNo = lineNo; } } } } #region Required Private methods ///  /// Currently only used by Seek3 ///  ///  ///  ///  private static string ReadLine(FileStream stream, int length) { byte[] bytes = new byte[length]; stream.Read(bytes, 0, length); return new string(Encoding.UTF8.GetChars(bytes)); } #endregion #endregion ///  /// Option by Ritch Melton ///  ///  #region Seek4 public void Seek4(List lineNoList) { lineNoList.Sort(); using (var fs = new FileStream(File.FullName, FileMode.Open)) { lineNoList.ForEach(ln => OutputData(fs, ln)); } } #region Required Private methods private void OutputData(FileStream fs, int lineNumber) { var offset = (lineNumber - 1) * LineLength; fs.Seek(offset, SeekOrigin.Begin); var data = new byte[LineLength]; fs.Read(data, 0, LineLength); var text = DecodeData(data); PrintLine(lineNumber, text); } private static string DecodeData(byte[] data) { var encoding = new UTF8Encoding(); return encoding.GetString(data); } #endregion #endregion } static class TestPaddedFileSeek { public static void CallPaddedFileSeek() { const int arrayLenght = 5000; int[] lineNoArray = new int[arrayLenght]; List lineNoList = new List(); Random random = new Random(); int lineNo; string fileName; fileName = @"C:\Temp\Temp.txt"; PaddedFileSeek seeker = new PaddedFileSeek(new FileInfo(fileName)); for (int n = 0; n < 25; n++) { Debug.Print("Loop no: {0}", n + 1); for (int i = 0; i < arrayLenght; i++) { lineNo = random.Next(1, arrayLenght); lineNoArray[i] = lineNo; lineNoList.Add(lineNo); } seeker.CallAll(lineNoArray, lineNoList); lineNoList.Clear(); Debug.Print(""); } } } 

`

我对你的预期位置感到困惑,排在第30和第45位的5号线,4号线在15号线,3号线在30号线?

这是读逻辑的核心:

  var offset = (lineNumber - 1) * LineLength; fs.Seek(offset, SeekOrigin.Begin); var data = new byte[LineLength]; fs.Read(data, 0, LineLength); var text = DecodeData(data); Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text); 

完整的样本在这里:

 class PaddedFileSearch { public int LineLength { get; private set; } public FileInfo File { get; private set; } public PaddedFileSearch(FileInfo fileInfo) { var length = FindLineLength(fileInfo); //if (length == 0) throw new PaddedProgramException(); LineLength = length; File = fileInfo; } private static int FindLineLength(FileInfo fileInfo) { using (var reader = new StreamReader(fileInfo.FullName)) { string line; if ((line = reader.ReadLine()) != null) { var length = line.Length + 2; return length; } } return 0; } public void SeekMethod(List lineNumbers) { Debug.Print(""); Debug.Print("Line No\t\tPosition\t\tLine"); Debug.Print("-------\t\t--------\t\t-----------------"); lineNumbers.Sort(); using (var fs = new FileStream(File.FullName, FileMode.Open)) { lineNumbers.ForEach(ln => OutputData(fs, ln)); } } private void OutputData(FileStream fs, int lineNumber) { var offset = (lineNumber - 1) * LineLength; fs.Seek(offset, SeekOrigin.Begin); var data = new byte[LineLength]; fs.Read(data, 0, LineLength); var text = DecodeData(data); Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text); } private static string DecodeData(byte[] data) { var encoding = new UTF8Encoding(); return encoding.GetString(data); } } class Program { static void Main(string[] args) { var seeker = new PaddedFileSearch(new FileInfo(@"D:\Desktop\Test.txt")); Debug.Print("File Line length: {0}", seeker.LineLength); seeker.SeekMethod(new List { 5, 3, 4 }); seeker.SeekMethod(new List { 5, 3 }); } } 

将它放在SeekMethod(int[] lineNos)的内部循环中:

 position = (lineNo - 1) * LineLength; fs.Seek(position, SeekOrigin.Begin); reader.DiscardBufferedData(); 

问题是您的position变量基于其先前的值而更改,并且StreamReader维护缓冲区,因此您需要在更改流位置时清除缓冲的数据。

对于第一个lineno和相对的进一步lineno,你有非常恶心的位置混合位置

仔细看看这里以及你获得的实际结果

 position = (lineNo - 1) * LineLength - position; fs.Seek(position, SeekOrigin.Current); 

对于值3,4,5,你得到数字30,15,45,而显而易见的是,如果你的使用相对位置它应该是30,15,15,因为如果你的读取方法执行SEEK,行长度是15 或30,0,0作为副作用,像filestream.Read那样。 并且您的测试输出是正确的(仅适用于字符串值,而不是位置),您应该使用不是测试序列并更仔细地查看位置值,以查看与显示的字符串和位置值没有连接。

实际上你的StreamReader忽略了fs.Seek调用,只是逐行读取=)

这是3 5 9输入结果:)

 Line No Position Line ------- -------- ----------------- 3 30 0003|100!2500 5 30 0004|100!2500 9 90 0005|100!2500 

我相信跟随最接近你想要实现的function,一个新function

 private static string ReadLine(FileStream stream, int length) { byte[] bytes= new byte[length]; stream.Read(bytes, 0, length); return new string(Encoding.UTF8.GetChars(bytes)); } 

和新的循环代码

 int oldLine = 0; using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None)) { foreach (int lineNo in lineNos) { position = (lineNo - oldLine -1) * LineLength; fs.Seek(position, SeekOrigin.Current); line = ReadLine(fs, LineLength); Console.WriteLine("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line); oldLine = lineNo; } } 

请注意,现在stream.Read函数相当于额外的stream.Seek (Length)

新的正确输出和逻辑位置更改

 Line No Position Line ------- -------- ----------------- 3 30 0003|100!2500 4 0 0004|100!2500 5 0 0005|100!2500 Line No Position Line ------- -------- ----------------- 3 30 0003|100!2500 5 15 0005|100!2500 

PS:它是如此奇怪你认为001:行是第一行而不是第0行..如果你使用程序员计数方法=)整个-1可以删除

我不想说问题是你手动管理位置值的努力,而是StreamReader.ReadLine改变了流的位置值。 如果您单步执行代码并监视本地值,则会在每次ReadLine调用后看到流的位置发生变化(在第一次调用之后为148)。

编辑

最好直接改变流的位置,而不是使用Seek

 public void SeekMethod(int[] lineNos) { string line = null; long step; Array.Sort(lineNos); Debug.Print(""); Debug.Print("Line No\t\tPosition\t\tLine"); Debug.Print("-------\t\t--------\t\t-----------------"); using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None)) { foreach (int lineNo in lineNos) { StreamReader reader = new StreamReader(fs); step = (lineNo - 1) * LineLength - fs.Position; fs.Position += step; if ((line = reader.ReadLine()) != null) { Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, step, line); } } } } 

问题是你手动跟踪位置,但没有考虑到你读完那条线之后实际文件位置会变得更远的事实。 所以你需要减去额外的读数—但只有在实际发生的情况下。

如果你真的想这样做,那么不要保持position ,而是获得实际的文件位置; 或者从给定的行号ad计算绝对文件位置,直接在那里寻找而不是从当前文件偏移量。