Assign an ID based on keywords present in Tweets
I have extracted Tweets by feeding in 44 different keywords, and the output is in a file which consists of 400k tweets in total. The output file has tweets that contain the relevant keywords. How could I create a separate ID column which contains the keyword present in that tweet?
Eg: The tweet is:
Andhra Pradesh is the highest state with crimes against women
the keyword here is "crimes against women"
I would like to create a column that assigns the keyword "crimes against women" to the tweet, a sort of ID column to be precise.
#input column 1
Tweet<-("Andhra Pradesh is the highest state with crimes against women")
#expected output column 2 beside the Tweet column
Keyword<-("crimes against women")
Edit: I do not want to extract any part of the tweet, I just want to be able to assign to the tweet, in a new column, the keyword it contains so it will help me segregate the tweets based on this keyword.
r nlp uniqueidentifier
add a comment |
I have extracted Tweets by feeding in 44 different keywords, and the output is in a file which consists of 400k tweets in total. The output file has tweets that contain the relevant keywords. How could I create a separate ID column which contains the keyword present in that tweet?
Eg: The tweet is:
Andhra Pradesh is the highest state with crimes against women
the keyword here is "crimes against women"
I would like to create a column that assigns the keyword "crimes against women" to the tweet, a sort of ID column to be precise.
#input column 1
Tweet<-("Andhra Pradesh is the highest state with crimes against women")
#expected output column 2 beside the Tweet column
Keyword<-("crimes against women")
Edit: I do not want to extract any part of the tweet, I just want to be able to assign to the tweet, in a new column, the keyword it contains so it will help me segregate the tweets based on this keyword.
r nlp uniqueidentifier
Do you have a list of the keywords that you want to extract from the tweets?
– A. Stam
Jan 18 at 12:16
Yes, I have the list of the keywords- 44 to be exact. I used the keywords to extract the tweets in the first place.
– Skurup
Jan 18 at 12:25
Oh, sorry. I thought that is what you were looking for. I misread. Let me re-open your question
– Sotos
Jan 18 at 12:30
add a comment |
I have extracted Tweets by feeding in 44 different keywords, and the output is in a file which consists of 400k tweets in total. The output file has tweets that contain the relevant keywords. How could I create a separate ID column which contains the keyword present in that tweet?
Eg: The tweet is:
Andhra Pradesh is the highest state with crimes against women
the keyword here is "crimes against women"
I would like to create a column that assigns the keyword "crimes against women" to the tweet, a sort of ID column to be precise.
#input column 1
Tweet<-("Andhra Pradesh is the highest state with crimes against women")
#expected output column 2 beside the Tweet column
Keyword<-("crimes against women")
Edit: I do not want to extract any part of the tweet, I just want to be able to assign to the tweet, in a new column, the keyword it contains so it will help me segregate the tweets based on this keyword.
r nlp uniqueidentifier
I have extracted Tweets by feeding in 44 different keywords, and the output is in a file which consists of 400k tweets in total. The output file has tweets that contain the relevant keywords. How could I create a separate ID column which contains the keyword present in that tweet?
Eg: The tweet is:
Andhra Pradesh is the highest state with crimes against women
the keyword here is "crimes against women"
I would like to create a column that assigns the keyword "crimes against women" to the tweet, a sort of ID column to be precise.
#input column 1
Tweet<-("Andhra Pradesh is the highest state with crimes against women")
#expected output column 2 beside the Tweet column
Keyword<-("crimes against women")
Edit: I do not want to extract any part of the tweet, I just want to be able to assign to the tweet, in a new column, the keyword it contains so it will help me segregate the tweets based on this keyword.
r nlp uniqueidentifier
r nlp uniqueidentifier
edited Jan 18 at 15:39
James Z
11.1k71835
11.1k71835
asked Jan 18 at 12:10
SkurupSkurup
749
749
Do you have a list of the keywords that you want to extract from the tweets?
– A. Stam
Jan 18 at 12:16
Yes, I have the list of the keywords- 44 to be exact. I used the keywords to extract the tweets in the first place.
– Skurup
Jan 18 at 12:25
Oh, sorry. I thought that is what you were looking for. I misread. Let me re-open your question
– Sotos
Jan 18 at 12:30
add a comment |
Do you have a list of the keywords that you want to extract from the tweets?
– A. Stam
Jan 18 at 12:16
Yes, I have the list of the keywords- 44 to be exact. I used the keywords to extract the tweets in the first place.
– Skurup
Jan 18 at 12:25
Oh, sorry. I thought that is what you were looking for. I misread. Let me re-open your question
– Sotos
Jan 18 at 12:30
Do you have a list of the keywords that you want to extract from the tweets?
– A. Stam
Jan 18 at 12:16
Do you have a list of the keywords that you want to extract from the tweets?
– A. Stam
Jan 18 at 12:16
Yes, I have the list of the keywords- 44 to be exact. I used the keywords to extract the tweets in the first place.
– Skurup
Jan 18 at 12:25
Yes, I have the list of the keywords- 44 to be exact. I used the keywords to extract the tweets in the first place.
– Skurup
Jan 18 at 12:25
Oh, sorry. I thought that is what you were looking for. I misread. Let me re-open your question
– Sotos
Jan 18 at 12:30
Oh, sorry. I thought that is what you were looking for. I misread. Let me re-open your question
– Sotos
Jan 18 at 12:30
add a comment |
2 Answers
2
active
oldest
votes
You can perform this analysis with the stringr
package, however, I don't think you need to use sapply
.
Consider the following keyword list and table with tweets:
keyword_list <- c("crimes against women", "downloading tweets", "r analysis")
tweets <- data.frame(
tweet = c("Andhra Pradesh is the highest state with crimes against women",
"I am downloading tweets",
"I love r analysis",
"downloading tweets helps with my r analysis")
)
First, you want to combine your keywords into one regular expression that searches for any of the strings.
keyword_pattern <- paste0(
"(",
paste0(keyword_list, collapse = "|"),
")"
)
keyword_pattern
#> [1] "(crimes against women|downloading tweets|r analysis)"
Finally, we can add a column to the data frame that extracts the keyword from the tweet.
tweets$keyword <- str_extract(tweets$tweet, keyword_pattern)
> tweets
#> tweet keyword
#> 1 Andhra Pradesh is the highest state with crimes against women crimes against women
#> 2 I am downloading tweets downloading tweets
#> 3 I love r analysis r analysis
#> 4 downloading tweets helps with my r analysis downloading tweets
As the final example illustrates, you need to think about what you want to do when a tweet contains multiple keywords. In this case, the keyword returned is simply the first one found in the tweet. However, you can also use str_extract_all
to return ALL keywords found in the tweet.
add a comment |
We can use stringr
which is very handy for string operations and simply use str_extract
, i.e.
str_extract(Tweet, Keyword)
#[1] "crimes against women"
For multiple keywords and multiple strings you need to apply, i.e.
Keyword <- c("crimes against women", "something")
Tweet <- c("Andhra Pradesh is the highest state with crimes against women",
"another string with something else")
sapply(Tweet, function(i)str_extract(i, paste(Keyword, collapse = '|')))
# Andhra Pradesh is the highest state with crimes against women another string with something else
# "crimes against women" "something"
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54253780%2fassign-an-id-based-on-keywords-present-in-tweets%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can perform this analysis with the stringr
package, however, I don't think you need to use sapply
.
Consider the following keyword list and table with tweets:
keyword_list <- c("crimes against women", "downloading tweets", "r analysis")
tweets <- data.frame(
tweet = c("Andhra Pradesh is the highest state with crimes against women",
"I am downloading tweets",
"I love r analysis",
"downloading tweets helps with my r analysis")
)
First, you want to combine your keywords into one regular expression that searches for any of the strings.
keyword_pattern <- paste0(
"(",
paste0(keyword_list, collapse = "|"),
")"
)
keyword_pattern
#> [1] "(crimes against women|downloading tweets|r analysis)"
Finally, we can add a column to the data frame that extracts the keyword from the tweet.
tweets$keyword <- str_extract(tweets$tweet, keyword_pattern)
> tweets
#> tweet keyword
#> 1 Andhra Pradesh is the highest state with crimes against women crimes against women
#> 2 I am downloading tweets downloading tweets
#> 3 I love r analysis r analysis
#> 4 downloading tweets helps with my r analysis downloading tweets
As the final example illustrates, you need to think about what you want to do when a tweet contains multiple keywords. In this case, the keyword returned is simply the first one found in the tweet. However, you can also use str_extract_all
to return ALL keywords found in the tweet.
add a comment |
You can perform this analysis with the stringr
package, however, I don't think you need to use sapply
.
Consider the following keyword list and table with tweets:
keyword_list <- c("crimes against women", "downloading tweets", "r analysis")
tweets <- data.frame(
tweet = c("Andhra Pradesh is the highest state with crimes against women",
"I am downloading tweets",
"I love r analysis",
"downloading tweets helps with my r analysis")
)
First, you want to combine your keywords into one regular expression that searches for any of the strings.
keyword_pattern <- paste0(
"(",
paste0(keyword_list, collapse = "|"),
")"
)
keyword_pattern
#> [1] "(crimes against women|downloading tweets|r analysis)"
Finally, we can add a column to the data frame that extracts the keyword from the tweet.
tweets$keyword <- str_extract(tweets$tweet, keyword_pattern)
> tweets
#> tweet keyword
#> 1 Andhra Pradesh is the highest state with crimes against women crimes against women
#> 2 I am downloading tweets downloading tweets
#> 3 I love r analysis r analysis
#> 4 downloading tweets helps with my r analysis downloading tweets
As the final example illustrates, you need to think about what you want to do when a tweet contains multiple keywords. In this case, the keyword returned is simply the first one found in the tweet. However, you can also use str_extract_all
to return ALL keywords found in the tweet.
add a comment |
You can perform this analysis with the stringr
package, however, I don't think you need to use sapply
.
Consider the following keyword list and table with tweets:
keyword_list <- c("crimes against women", "downloading tweets", "r analysis")
tweets <- data.frame(
tweet = c("Andhra Pradesh is the highest state with crimes against women",
"I am downloading tweets",
"I love r analysis",
"downloading tweets helps with my r analysis")
)
First, you want to combine your keywords into one regular expression that searches for any of the strings.
keyword_pattern <- paste0(
"(",
paste0(keyword_list, collapse = "|"),
")"
)
keyword_pattern
#> [1] "(crimes against women|downloading tweets|r analysis)"
Finally, we can add a column to the data frame that extracts the keyword from the tweet.
tweets$keyword <- str_extract(tweets$tweet, keyword_pattern)
> tweets
#> tweet keyword
#> 1 Andhra Pradesh is the highest state with crimes against women crimes against women
#> 2 I am downloading tweets downloading tweets
#> 3 I love r analysis r analysis
#> 4 downloading tweets helps with my r analysis downloading tweets
As the final example illustrates, you need to think about what you want to do when a tweet contains multiple keywords. In this case, the keyword returned is simply the first one found in the tweet. However, you can also use str_extract_all
to return ALL keywords found in the tweet.
You can perform this analysis with the stringr
package, however, I don't think you need to use sapply
.
Consider the following keyword list and table with tweets:
keyword_list <- c("crimes against women", "downloading tweets", "r analysis")
tweets <- data.frame(
tweet = c("Andhra Pradesh is the highest state with crimes against women",
"I am downloading tweets",
"I love r analysis",
"downloading tweets helps with my r analysis")
)
First, you want to combine your keywords into one regular expression that searches for any of the strings.
keyword_pattern <- paste0(
"(",
paste0(keyword_list, collapse = "|"),
")"
)
keyword_pattern
#> [1] "(crimes against women|downloading tweets|r analysis)"
Finally, we can add a column to the data frame that extracts the keyword from the tweet.
tweets$keyword <- str_extract(tweets$tweet, keyword_pattern)
> tweets
#> tweet keyword
#> 1 Andhra Pradesh is the highest state with crimes against women crimes against women
#> 2 I am downloading tweets downloading tweets
#> 3 I love r analysis r analysis
#> 4 downloading tweets helps with my r analysis downloading tweets
As the final example illustrates, you need to think about what you want to do when a tweet contains multiple keywords. In this case, the keyword returned is simply the first one found in the tweet. However, you can also use str_extract_all
to return ALL keywords found in the tweet.
answered Jan 18 at 12:50
A. StamA. Stam
820314
820314
add a comment |
add a comment |
We can use stringr
which is very handy for string operations and simply use str_extract
, i.e.
str_extract(Tweet, Keyword)
#[1] "crimes against women"
For multiple keywords and multiple strings you need to apply, i.e.
Keyword <- c("crimes against women", "something")
Tweet <- c("Andhra Pradesh is the highest state with crimes against women",
"another string with something else")
sapply(Tweet, function(i)str_extract(i, paste(Keyword, collapse = '|')))
# Andhra Pradesh is the highest state with crimes against women another string with something else
# "crimes against women" "something"
add a comment |
We can use stringr
which is very handy for string operations and simply use str_extract
, i.e.
str_extract(Tweet, Keyword)
#[1] "crimes against women"
For multiple keywords and multiple strings you need to apply, i.e.
Keyword <- c("crimes against women", "something")
Tweet <- c("Andhra Pradesh is the highest state with crimes against women",
"another string with something else")
sapply(Tweet, function(i)str_extract(i, paste(Keyword, collapse = '|')))
# Andhra Pradesh is the highest state with crimes against women another string with something else
# "crimes against women" "something"
add a comment |
We can use stringr
which is very handy for string operations and simply use str_extract
, i.e.
str_extract(Tweet, Keyword)
#[1] "crimes against women"
For multiple keywords and multiple strings you need to apply, i.e.
Keyword <- c("crimes against women", "something")
Tweet <- c("Andhra Pradesh is the highest state with crimes against women",
"another string with something else")
sapply(Tweet, function(i)str_extract(i, paste(Keyword, collapse = '|')))
# Andhra Pradesh is the highest state with crimes against women another string with something else
# "crimes against women" "something"
We can use stringr
which is very handy for string operations and simply use str_extract
, i.e.
str_extract(Tweet, Keyword)
#[1] "crimes against women"
For multiple keywords and multiple strings you need to apply, i.e.
Keyword <- c("crimes against women", "something")
Tweet <- c("Andhra Pradesh is the highest state with crimes against women",
"another string with something else")
sapply(Tweet, function(i)str_extract(i, paste(Keyword, collapse = '|')))
# Andhra Pradesh is the highest state with crimes against women another string with something else
# "crimes against women" "something"
answered Jan 18 at 12:36
SotosSotos
29k51640
29k51640
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54253780%2fassign-an-id-based-on-keywords-present-in-tweets%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Do you have a list of the keywords that you want to extract from the tweets?
– A. Stam
Jan 18 at 12:16
Yes, I have the list of the keywords- 44 to be exact. I used the keywords to extract the tweets in the first place.
– Skurup
Jan 18 at 12:25
Oh, sorry. I thought that is what you were looking for. I misread. Let me re-open your question
– Sotos
Jan 18 at 12:30