解析文本文件并删除双引号内的逗号

我有一个文本文件需要转换为csv文件。 我的计划是:

  • 逐行解析文件
  • 用空格搜索和替换双引号内的逗号
  • 然后删除所有双引号
  • 将该行附加到新的csv文件

问题:我需要一个能识别双引号内的逗号并替换它的函数。

这是一个示例行:

“MRS Brown”,“4611 BEAUMONT ST”,“”,“WARRIOR RUN,PA”

您的文件似乎已经采用CSV投诉格式。 任何好的CSV阅读器都能正确读取它。

如果您的问题只是正确读取字段值,那么您需要以正确的方式阅读它。

这是一种方法:

 using Microsoft.VisualBasic.FileIO; private void button1_Click(object sender, EventArgs e) { TextFieldParser tfp = new TextFieldParser("C:\\Temp\\Test.csv"); tfp.Delimiters = new string[] { "," }; tfp.HasFieldsEnclosedInQuotes = true; while (!tfp.EndOfData) { string[] fields = tfp.ReadFields(); // do whatever you want to do with the fields now... // eg remove the commas and double-quotes from the fields. for (int i = 0; i < fields.Length;i++ ) { fields[i] = fields[i].Replace(","," ").Replace("\"",""); } // this is to show what we got as the output textBox1.AppendText(String.Join("\t", fields) + "\n"); } tfp.Close(); } 

编辑:

我刚刚注意到这个问题是在C#,VB.NET-2010下提交的。 这是VB.NET版本,以防您在VB中编码。

 Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim tfp As New FileIO.TextFieldParser("C:\Temp\Test.csv") tfp.Delimiters = New String() {","} tfp.HasFieldsEnclosedInQuotes = True While Not tfp.EndOfData Dim fields() As String = tfp.ReadFields '' do whatever you want to do with the fields now... '' eg remove the commas and double-quotes from the fields. For i As Integer = 0 To fields.Length - 1 fields(i) = fields(i).Replace(",", " ").Replace("""", "") Next '' this is to show what we got as the output TextBox1.AppendText(Join(fields, vbTab) & vbCrLf) End While tfp.Close() End Sub 

这是一个简单的函数,它将删除嵌入在字符串中两个双引号之间的逗号。 你可以传入一个长字符串,它有多次出现的“abc,123”,10/13/12,“some description”等等。 它还将删除双引号。

 Private Function ParseCommasInQuotes(ByVal arg As String) As String Dim foundEndQuote As Boolean = False Dim foundStartQuote As Boolean = False Dim output As New StringBuilder() '44 = comma '34 = double quote For Each element As Char In arg If foundEndQuote Then foundStartQuote = False foundEndQuote = False End If If element.Equals(Chr(34)) And (Not foundEndQuote) And foundStartQuote Then foundEndQuote = True Continue For End If If element.Equals(Chr(34)) And Not foundStartQuote Then foundStartQuote = True Continue For End If If (element.Equals(Chr(44)) And foundStartQuote) Then 'skip the comma...its between double quotes Else output.Append(element) End If Next Return output.ToString() End Function 

感谢Baz,VB中的Glockster答案,我刚用C#转换它,它的效果很好。 使用此代码,您不需要任何第三方解析器。

 string line = reader.ReadLine(); line = ParseCommasInQuotes(line); private string ParseCommasInQuotes(string arg) { bool foundEndQuote = false; bool foundStartQuote = false; StringBuilder output = new StringBuilder(); //44 = comma //34 = double quote foreach (char element in arg) { if (foundEndQuote) { foundStartQuote = false; foundEndQuote = false; } if (element.Equals((Char)34) & (!foundEndQuote) & foundStartQuote) { foundEndQuote = true; continue; } if (element.Equals((Char)34) & !foundStartQuote) { foundStartQuote = true; continue; } if ((element.Equals((Char)44) & foundStartQuote)) { //skip the comma...its between double quotes } else { output.Append(element); } } return output.ToString(); } 

我以前不明白你的问题。 现在我很确定我做对了:

 TextFieldParser parser = new TextFieldParser(@"c:\file.csv"); parser.TextFieldType = FieldType.Delimited; parser.SetDelimiters(","); while (!parser.EndOfData) { //Processing row string[] fields = parser.ReadFields(); foreach (string field in fields) { //TODO: Do whatever you need } } parser.Close(); 
 var result = Regex.Replace(input, @"[^\""]([^\""])*[^\""]", m => m.Value.Replace(",", " ") ); 

听起来好像你所描述的内容最终会成为一个csv文件,但回答你的问题我会这样做。

首先,您需要将文本文件转换为可以循环使用的一些可用代码,如下所示:

  public static List GetTextListFromDiskFile(String fileName) { List list = new List(); try { //load the file into the streamreader System.IO.StreamReader sr = new System.IO.StreamReader(fileName); //loop through each line of the file while (sr.Peek() >= 0) { list.Add(sr.ReadLine()); } sr.Close(); } catch (Exception ex) { list.Add("Error: Could not read file from disk. Original error: " + ex.Message); } return list; } 

然后循环遍历列表并使用简单的foreach循环并在列表上运行replace,如下所示:

  foreach (String item in list) { String x = item.Replace("\",\"", "\" \""); x = x.Replace("\"", ""); } 

执行此操作后,您需要逐行创建csv文件。 我会再次使用StringBuilder,然后只需执行一个sb.AppendLine(x)来创建将成为文本文件的String,然后使用类似的东西将其写入磁盘。

  public static void SaveFileToDisk(String filePathName, String fileText) { using (StreamWriter outfile = new StreamWriter(filePathName)) { outfile.Write(fileText); } } 

这对我有用。 希望它可以帮助别人。

 Private Sub Command1_Click() Open "c:\\dir\file.csv" For Input As #1 Open "c:\\dir\file2.csv" For Output As #2 Do Until EOF(1) Line Input #1, test$ 99 c = InStr(test$, """""") If c > 0 Then test$ = Left$(test$, c - 1) + Right$(test$, Len(test$) - (c + 1)) GoTo 99 End If Print #2, test$ Loop End Sub