1071 Speech Patterns (25 分)

1. 题目

People often have a preference among synonyms of the same word. For example, some may prefer “the police”, while others may prefer “the cops”. Analyzing such patterns can help to narrow down a speaker’s identity, which is useful when validating, for example, whether it’s still the same person behind an online avatar.

Now given a paragraph of text sampled from someone’s speech, can you find the person’s most commonly used word?

Each input file contains one test case. For each case, there is one line of text no more than 1048576 characters in length, terminated by a carriage return \n. The input contains at least one alphanumerical character, i.e., one character from the set [0-9 A-Z a-z].

For each test case, print in one line the most commonly occurring word in the input text, followed by a space and the number of times it has occurred in the input. If there are more than one such words, print the lexicographically smallest one. The word should be printed in all lower case. Here a “word” is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.

Note that words are case insensitive.

Can1: "Can a can can a can?  It can!"
can 5

2. 题意

给出一串字符串,找出其中出现次数最多的单词,输出该单词的小写形式及出现次数。 注:单词由[0-9 A-Z a-z]字符组成,单词间以非字母或数字相隔,单词不区分大小写。

3. 思路——字符串+map

4. 代码

#include
#include
#include
#include

using namespace std;

int main()
{
    string str;
    getline(cin, str);

    // 将输入字符串中的所有大写字母转化为小写
    for (int i = 0; i < str.length(); ++i)
        if (isupper(str[i])) str[i] = str[i] - 'A' + 'a';

    string temp = "";
    int maxCnt = 0;
    string maxStr = "";
    map res;
    for (int i = 0; i < str.length(); ++i)
    {
        if (!isalnum(str[i]))
        {
            // 这个if主要是排除掉空字符串计数的问题
            // 空字符串出现的原因主要有连续几个字符都是非字母或非数字
            // 只要判断当前字符的前一个字符是否也非字母或非数字,如果是则不计数
            if (i && !isalnum(str[i - 1]))
            {
                temp = "";
                continue;
            }
            // 当碰到非字符或非数字,进行计数,并置空temp字符串,重新获取单词信息
            res[temp] += 1;
            if (res[temp] > maxCnt)
            {
                maxCnt = res[temp];
                maxStr = temp;
            }
            temp = "";
        } else
        {
            temp += str[i];
        }
    }
    // 避免最后一个单词没有计数的问题
    // 因为如果最后一个字符为字母或数字,那么上面循环结束后最后一个单词没有计数
    if (isalnum(str[str.length() - 1]))
    {
        res[temp] += 1;
        if (res[temp] > maxCnt)
        {
            maxCnt = res[temp];
            maxStr = temp;
        }
    }
    cout << maxStr << " " << maxCnt << endl;
    return 0;
}

Original: https://www.cnblogs.com/vanishzeng/p/15479945.html
Author: vanish丶
Title: 1071 Speech Patterns (25 分)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/584678/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球