C#Performance – Chunking使用AppendAllText写入文件

是否有更优雅/更快的方式来编写下面的代码? 目前大约需要45秒。

query.sql长度为200,000行,并且每行中的SQL都与此完全相同:

SELECT N'+dave' AS [AccountName], N'20005' AS [EmployeeID], N'-6' AS [PlatformID] UNION ALL 

我发现通过分块到1000块,事情比等到结束并使用WriteAllText(大约需要20分钟才能运行)要快得多

 static void Main(string[] args) { var s = new Stopwatch(); s.Start(); string textToWrite = ""; string[] lines = File.ReadAllLines(@"e:\temp\query.sql"); int i = 0; foreach (var line in lines) { var bits = line.Split('\''); var value1 = bits[1]; var value2 = bits[3]; var value3 = bits[5]; var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" + "'" + value1 + "', " + value2 + ", " + value3 + ")"; textToWrite += message + Environment.NewLine; if (i % 1000 == 0) { Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString()); File.AppendAllText(@"e:\temp\query2.sql", textToWrite); textToWrite = ""; } i++; } //File.WriteAllText(@"e:\temp\query2.sql", textToWrite); File.AppendAllText(@"e:\temp\query2.sql", textToWrite); s.Stop(); TimeSpan ts = s.Elapsed; Console.WriteLine("Timespan: {0}m", ts.TotalMinutes); Console.WriteLine("Total records: " + i); Console.ReadLine(); } 

编辑:StringBuilder解决方案(1000毫秒):

 static void Main2(string[] args) { var s = new Stopwatch(); s.Start(); var textToWrite = new StringBuilder(); string[] lines = File.ReadAllLines(@"e:\temp\query.sql"); int i = 0; foreach (var line in lines) { var bits = line.Split('\''); var value1 = bits[1]; var value2 = bits[3]; var value3 = bits[5]; var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" + "'" + value1 + "', " + value2 + ", " + value3 + ")" + Environment.NewLine; textToWrite.Append(message); // Buffering if (i % 1000 == 0) { Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString()); File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString()); textToWrite = new StringBuilder(); } i++; } File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString()); s.Stop(); TimeSpan ts = s.Elapsed; Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds); Console.WriteLine("Total records: " + i); Console.ReadLine(); } 

编辑:StreamWriter解决方案(450毫秒)

 static void Main(string[] args) { var s = new Stopwatch(); s.Start(); string[] lines = File.ReadAllLines(@"e:\temp\query.sql"); int i = 0; using (StreamWriter writer = File.AppendText(@"e:\temp\query2.sql")) { foreach (var line in lines) { var bits = line.Split('\''); var value1 = bits[1]; var value2 = bits[3]; var value3 = bits[5]; writer.WriteLine("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})", value1, value2, value3); i++; } } s.Stop(); TimeSpan ts = s.Elapsed; Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds); Console.WriteLine("Total records: " + i); Console.ReadLine(); } 

正如其他人指出的那样,使用StringBuilder 。 所以在你的情况下,声明:

 StringBuilder textToWrite = new StringBuilder(); 

然后:

 textToWrite.AppendLine(message); if (i % 1000 == 0) { Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString()); File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString()); textToWrite = new StringBuilder(); } 

虽然你可能最好完全放弃缓冲:

 using (StreamWriter writer = File.AppendText(filename)) { // initialization stuff here foreach (var line in lines) { var bits = line.Split('\''); var value1 = bits[1]; var value2 = bits[3]; var value3 = bits[5]; var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" + "'" + value1 + "', " + value2 + ", " + value3 + ")"; writer.WriteLine(message); // write the line } } 

一个好的开始是使用.net中的类内置的StringBuilder。 这将避免一堆字符串分配和复制。

请参阅MSDN文档,了解它的工作原理: http : //msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

另请参阅此Stackoverflowpost以获取更多信息: 连接字符串的最有效方法是什么?

例:

 StringBuilder a = new StringBuilder(); a.Append("some text"); a.Append("more text"); string result = a.ToString(); 

什么版本的sql server? 执行此操作的最佳方法是不使用一个巨大的sql脚本,而是使用表值参数或使用sql server批量复制支持。

最好的方法很可能是同时打开两个文件,随时读取和写入每一行,然后关闭文件。

但是,您最有可能遇到的最大问题是字符串连接。 .NET中的字符串是不可变的,因此每个连接都会导致分配一个新副本,这会占用时间和内存(尽管GC最终会回复后者)。

如果将textToWrite替换为StringBuilder ,并且最后只执行一个ToString() ,您将看到更好的性能。

或者说,老实说,你可以在整个事情上做一个正则表达式替换并完成它,尽管我相信你必须首先将整个文件读入内存,就像你已经在做的那样。

MemoryMappedFiles非常有效,因此值得研究。

 string[] lines = File.ReadAllLines(@"e:\temp\query.sql"); using (var mmf = MemoryMappedFile.CreateFromFile(@"e:\temp\query2.sql", FileMode.Create, "txt", new FileInfo(@"e:\temp\query.sql")Length)) { StringBuilder sb = new StringBuilder(); using (MemoryMappedViewStream mmvs = mmf.CreateViewStream()) { StreamWriter writer = new StreamWriter(mmvs); for (int i = 0; i < lines.Length; i++) { var bits = lines[i].Split('\''); var value1 = bits[1]; var value2 = bits[3]; var value3 = bits[5]; sb.AppendFormat("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})", value1, value2, value3); writer.WriteLine(message.ToString()); } } } 

您可能会发现首先构建整个文本然后将整个文本写入MemoryMappedFiled会更好,因为对ToString的调用较少。