从txt文件中计算唯一单词的数量和每个单词的出现次数

目前我试图创建一个应用程序来做一些文本处理来读取文本文件，然后我使用字典来创建单词索引，从技术上讲它将是这样的..程序将运行并读取文本文件然后检查它，查看该单词是否已存在于该文件中，以及该单词作为唯一单词的id字。如果是这样，它将打印出他们遇到的每个单词的索引号和外观总数，并继续检查整个文件。并产生这样的东西： http ： //pastebin.com/CjtcYchF

下面是我正在输入的文本文件的示例： http ： //pastebin.com/ZRVbhWhV快速ctrl-F显示“not”出现2次，“that”出现4次。我需要做的是索引每个单词并像这样调用它：

sample input : "that I have not that place sunrise beach like not good dirty beach trash beach" dictionary : output.txt / output.dat: index word 1 I 4:2 1:1 2:1 3:2 5:1 6:1 7:3 8:1 9:1 10:1 11:1 2 have 3 not 4 that 5 place 6 sunrise 7 beach 8 like 9 good 10 dirty 11 trash

我试图实现一些代码来创建字典。这是我到目前为止：

  private void bagofword_Click(object sender, EventArgs e) { //creating dictionary in background //Dictionary dict = new Dictionary(); string rawinputbow = File.ReadAllText(textBox31.Text); //string[] inputbow = rawinputbow.Split(' '); var inputbow = rawinputbow.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries) .ToList(); var dict = new OrderedDictionary(); var output = new List(); foreach (var element in inputbow.Select((word, index) => new { word, index })) { if (dict.Contains(element.word)) { var count = (int)dict[element.word]; dict[element.word] = ++count; output.Add(GetIndex(dict, element.word)); //textBoxfile.Text = output.ToString(); // textBoxfile.Text = inputbow.ToString(); string result = string.Join(",", output); textBoxfile.Text = result.ToString(); } else { dict[element.word] = 1; output.Add(GetIndex(dict, element.word)); //textBoxfile.Text = dict.ToString(); string result = string.Join(",", output); textBoxfile.Text = result.ToString(); } } } public int GetIndex(OrderedDictionary dictionary, string key) { for (int index = 0; index < dictionary.Count; index++) { if (dictionary[index] == dictionary[key]) return index; // We found the item //textBoxfile.Text = index.ToString(); } return -1; }

有谁知道如何完成该代码？任何帮助深表感谢！

使用此代码

  string input = "that I have not that place sunrise beach like not good dirty beach trash beach"; var wrodList = input.Split(null); var output = wrodList.GroupBy(x => x).Select(x => new Word { charchter = x.Key, repeat = x.Count() }).OrderBy(x=>x.repeat); foreach (var item in output) { textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine; }

用于保存数据的类

  public class word { public string charchter { get; set; } public int repeat { get; set; } }

在空白上分裂是不够的。你有一些像temple, photos. 或cafes/restaraunts 。更好的方法是使用像\w+这样的正则表达式。这些词也应该以不区分大小写的方式进行比较。

我的方法是：

 var words = Regex.Matches(File.ReadAllText(filename), @"\w+").Cast() .Select((m, pos) => new { Word = m.Value, Pos = pos }) .GroupBy(s => s.Word, StringComparer.CurrentCultureIgnoreCase) .Select(g => new { Word = g.Key, PosInText = g.Select(z => z.Pos).ToList() }) .ToList(); foreach(var item in words) { Console.WriteLine("{0,-15} POS:{1}", item.Word, string.Join(",", item.PosInText)); } for (int i = 0; i < words.Count; i++) { Console.Write("{0}:{1} ", i, words[i].PosInText.Count); }

 ### Sample code for you to tweak for your needs: touch test.txt echo "ravi chandran marappan 30" > test.txt echo "ramesh kumar marappan 24" >> test.txt echo "ram lakshman marappan 22" >> test.txt sed -e 's/ /\n/g' test.txt | sort | uniq | awk '{print "echo """,$1, """`grep -wc ",$1," test.txt`"}' | sh Results: 22 -1 24 -1 30 -1 chandran -1 kumar -1 lakshman -1 marappan -3 ram -1 ramesh -1 ravi -1

从txt文件中计算唯一单词的数量和每个单词的出现次数

使用Newtonsoft Json.NET解析多维JSON数组

无法反序列化我的json

无法启动Windows Phone模拟器

Array与Array List有显着差异？

C＃通过用户控件连接到mysql

WebGrid正在打印一个空表

将XML转换为通用列表

更改选项卡时刷新/重新加载MUI WPF页面

DataBinding到计算字段

用于ComboBox项目选择的事件处理程序（选定项目未必更改）