For R: How to exclude some data files based on file language
I'm rather new to R (and in law school so this is all very new to me), so apologies if this is poorly worded.
I have a series of about 1500 documents that I am importing into R to categorize and analyze later. The first thing that I need to do is exclude all documents that are written in French, which are labelled with an "FR" in the title/doc.info. I was curious what kind of code I could use to exclude that before importing the files to have a clean data set before analyzing anything (since it will obvious make a mess of processes like sentiment analysis).
Any help is appreciated (even if that help is explaining how to better talk about coding).
Kind regards!
edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)
submissions_text<-submissions$text
submission_number<- numeric()
submission_person<- factor()
submission_code<- factor()
submission_language<-factor()
submission_location<-factor()
for (submission_name in submissions$doc_id) {
submission_name<-gsub(".txt","",submission_name)
number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])
submission_number<-c(submission_number,number)
person<-strsplit(submission_name, "_")[[1]][2]
submission_person<-c(submission_person, person)
code<-strsplit(submission_name, "_")[[1]][3]
submission_code<-c(submission_code, code)
lang<-strsplit(submission_name, "_")[[1]][4]
submission_language<-c(submission_language, lang)
location<-strsplit(submission_name, "_")[[1]][5]
submission_location<-c(submission_location, location)
}
submissions<-cbind(submissions,submission_number)
submissions<-cbind(submissions,submission_person)
submissions<-cbind(submissions,submission_code)
submissions<-cbind(submissions,submission_language)
submissions<-cbind(submissions,submission_location)
submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]
This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).
r
add a comment |
I'm rather new to R (and in law school so this is all very new to me), so apologies if this is poorly worded.
I have a series of about 1500 documents that I am importing into R to categorize and analyze later. The first thing that I need to do is exclude all documents that are written in French, which are labelled with an "FR" in the title/doc.info. I was curious what kind of code I could use to exclude that before importing the files to have a clean data set before analyzing anything (since it will obvious make a mess of processes like sentiment analysis).
Any help is appreciated (even if that help is explaining how to better talk about coding).
Kind regards!
edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)
submissions_text<-submissions$text
submission_number<- numeric()
submission_person<- factor()
submission_code<- factor()
submission_language<-factor()
submission_location<-factor()
for (submission_name in submissions$doc_id) {
submission_name<-gsub(".txt","",submission_name)
number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])
submission_number<-c(submission_number,number)
person<-strsplit(submission_name, "_")[[1]][2]
submission_person<-c(submission_person, person)
code<-strsplit(submission_name, "_")[[1]][3]
submission_code<-c(submission_code, code)
lang<-strsplit(submission_name, "_")[[1]][4]
submission_language<-c(submission_language, lang)
location<-strsplit(submission_name, "_")[[1]][5]
submission_location<-c(submission_location, location)
}
submissions<-cbind(submissions,submission_number)
submissions<-cbind(submissions,submission_person)
submissions<-cbind(submissions,submission_code)
submissions<-cbind(submissions,submission_language)
submissions<-cbind(submissions,submission_location)
submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]
This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).
r
This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.
– phalteman
Jan 19 at 0:30
Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.
– televised-god
Jan 19 at 0:34
1
You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.
– phalteman
Jan 19 at 0:42
My bad. Again, all new to me. Thanks for the patience!
– televised-god
Jan 19 at 0:46
add a comment |
I'm rather new to R (and in law school so this is all very new to me), so apologies if this is poorly worded.
I have a series of about 1500 documents that I am importing into R to categorize and analyze later. The first thing that I need to do is exclude all documents that are written in French, which are labelled with an "FR" in the title/doc.info. I was curious what kind of code I could use to exclude that before importing the files to have a clean data set before analyzing anything (since it will obvious make a mess of processes like sentiment analysis).
Any help is appreciated (even if that help is explaining how to better talk about coding).
Kind regards!
edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)
submissions_text<-submissions$text
submission_number<- numeric()
submission_person<- factor()
submission_code<- factor()
submission_language<-factor()
submission_location<-factor()
for (submission_name in submissions$doc_id) {
submission_name<-gsub(".txt","",submission_name)
number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])
submission_number<-c(submission_number,number)
person<-strsplit(submission_name, "_")[[1]][2]
submission_person<-c(submission_person, person)
code<-strsplit(submission_name, "_")[[1]][3]
submission_code<-c(submission_code, code)
lang<-strsplit(submission_name, "_")[[1]][4]
submission_language<-c(submission_language, lang)
location<-strsplit(submission_name, "_")[[1]][5]
submission_location<-c(submission_location, location)
}
submissions<-cbind(submissions,submission_number)
submissions<-cbind(submissions,submission_person)
submissions<-cbind(submissions,submission_code)
submissions<-cbind(submissions,submission_language)
submissions<-cbind(submissions,submission_location)
submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]
This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).
r
I'm rather new to R (and in law school so this is all very new to me), so apologies if this is poorly worded.
I have a series of about 1500 documents that I am importing into R to categorize and analyze later. The first thing that I need to do is exclude all documents that are written in French, which are labelled with an "FR" in the title/doc.info. I was curious what kind of code I could use to exclude that before importing the files to have a clean data set before analyzing anything (since it will obvious make a mess of processes like sentiment analysis).
Any help is appreciated (even if that help is explaining how to better talk about coding).
Kind regards!
edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)
submissions_text<-submissions$text
submission_number<- numeric()
submission_person<- factor()
submission_code<- factor()
submission_language<-factor()
submission_location<-factor()
for (submission_name in submissions$doc_id) {
submission_name<-gsub(".txt","",submission_name)
number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])
submission_number<-c(submission_number,number)
person<-strsplit(submission_name, "_")[[1]][2]
submission_person<-c(submission_person, person)
code<-strsplit(submission_name, "_")[[1]][3]
submission_code<-c(submission_code, code)
lang<-strsplit(submission_name, "_")[[1]][4]
submission_language<-c(submission_language, lang)
location<-strsplit(submission_name, "_")[[1]][5]
submission_location<-c(submission_location, location)
}
submissions<-cbind(submissions,submission_number)
submissions<-cbind(submissions,submission_person)
submissions<-cbind(submissions,submission_code)
submissions<-cbind(submissions,submission_language)
submissions<-cbind(submissions,submission_location)
submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]
This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).
r
r
edited Jan 19 at 0:51
televised-god
asked Jan 18 at 23:50
televised-godtelevised-god
313
313
This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.
– phalteman
Jan 19 at 0:30
Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.
– televised-god
Jan 19 at 0:34
1
You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.
– phalteman
Jan 19 at 0:42
My bad. Again, all new to me. Thanks for the patience!
– televised-god
Jan 19 at 0:46
add a comment |
This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.
– phalteman
Jan 19 at 0:30
Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.
– televised-god
Jan 19 at 0:34
1
You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.
– phalteman
Jan 19 at 0:42
My bad. Again, all new to me. Thanks for the patience!
– televised-god
Jan 19 at 0:46
This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.
– phalteman
Jan 19 at 0:30
This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.
– phalteman
Jan 19 at 0:30
Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.
– televised-god
Jan 19 at 0:34
Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.
– televised-god
Jan 19 at 0:34
1
1
You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.
– phalteman
Jan 19 at 0:42
You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.
– phalteman
Jan 19 at 0:42
My bad. Again, all new to me. Thanks for the patience!
– televised-god
Jan 19 at 0:46
My bad. Again, all new to me. Thanks for the patience!
– televised-god
Jan 19 at 0:46
add a comment |
2 Answers
2
active
oldest
votes
The functionality you are after can be found in the list.files()
function. Documentation can be found here.
In short, your code will likely end up looking something like this:
setwd("c:/path/to/your/data/here")
files <- list.files()
non_french_files <- files[!grepl("FR", files)]
lapply(non_french_files, function(x) {
f <- read.csv(x)
#do stuff with f
}]
Note - you could directly leverage the pattern
parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...
...good luck and welcome to R!
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
You can also specify a regex pattern inlist.files()
and potentially skip have to assignnon_french_files()
.
– dylanjm
Jan 19 at 2:11
add a comment |
Here's an alternative similar to @Chase 's:
#set wd
files<-list.files()[!grepl("FR",list.files())]
lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each
1
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54262837%2ffor-r-how-to-exclude-some-data-files-based-on-file-language%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The functionality you are after can be found in the list.files()
function. Documentation can be found here.
In short, your code will likely end up looking something like this:
setwd("c:/path/to/your/data/here")
files <- list.files()
non_french_files <- files[!grepl("FR", files)]
lapply(non_french_files, function(x) {
f <- read.csv(x)
#do stuff with f
}]
Note - you could directly leverage the pattern
parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...
...good luck and welcome to R!
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
You can also specify a regex pattern inlist.files()
and potentially skip have to assignnon_french_files()
.
– dylanjm
Jan 19 at 2:11
add a comment |
The functionality you are after can be found in the list.files()
function. Documentation can be found here.
In short, your code will likely end up looking something like this:
setwd("c:/path/to/your/data/here")
files <- list.files()
non_french_files <- files[!grepl("FR", files)]
lapply(non_french_files, function(x) {
f <- read.csv(x)
#do stuff with f
}]
Note - you could directly leverage the pattern
parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...
...good luck and welcome to R!
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
You can also specify a regex pattern inlist.files()
and potentially skip have to assignnon_french_files()
.
– dylanjm
Jan 19 at 2:11
add a comment |
The functionality you are after can be found in the list.files()
function. Documentation can be found here.
In short, your code will likely end up looking something like this:
setwd("c:/path/to/your/data/here")
files <- list.files()
non_french_files <- files[!grepl("FR", files)]
lapply(non_french_files, function(x) {
f <- read.csv(x)
#do stuff with f
}]
Note - you could directly leverage the pattern
parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...
...good luck and welcome to R!
The functionality you are after can be found in the list.files()
function. Documentation can be found here.
In short, your code will likely end up looking something like this:
setwd("c:/path/to/your/data/here")
files <- list.files()
non_french_files <- files[!grepl("FR", files)]
lapply(non_french_files, function(x) {
f <- read.csv(x)
#do stuff with f
}]
Note - you could directly leverage the pattern
parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...
...good luck and welcome to R!
answered Jan 19 at 0:21
ChaseChase
49.3k12117152
49.3k12117152
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
You can also specify a regex pattern inlist.files()
and potentially skip have to assignnon_french_files()
.
– dylanjm
Jan 19 at 2:11
add a comment |
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
You can also specify a regex pattern inlist.files()
and potentially skip have to assignnon_french_files()
.
– dylanjm
Jan 19 at 2:11
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!
– televised-god
Jan 19 at 0:37
You can also specify a regex pattern in
list.files()
and potentially skip have to assign non_french_files()
.– dylanjm
Jan 19 at 2:11
You can also specify a regex pattern in
list.files()
and potentially skip have to assign non_french_files()
.– dylanjm
Jan 19 at 2:11
add a comment |
Here's an alternative similar to @Chase 's:
#set wd
files<-list.files()[!grepl("FR",list.files())]
lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each
1
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
add a comment |
Here's an alternative similar to @Chase 's:
#set wd
files<-list.files()[!grepl("FR",list.files())]
lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each
1
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
add a comment |
Here's an alternative similar to @Chase 's:
#set wd
files<-list.files()[!grepl("FR",list.files())]
lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each
Here's an alternative similar to @Chase 's:
#set wd
files<-list.files()[!grepl("FR",list.files())]
lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each
answered Jan 19 at 11:01
NelsonGonNelsonGon
2,069622
2,069622
1
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
add a comment |
1
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
1
1
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
Thanks for the suggestion, you guys have been great!
– televised-god
Jan 19 at 22:20
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54262837%2ffor-r-how-to-exclude-some-data-files-based-on-file-language%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.
– phalteman
Jan 19 at 0:30
Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.
– televised-god
Jan 19 at 0:34
1
You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.
– phalteman
Jan 19 at 0:42
My bad. Again, all new to me. Thanks for the patience!
– televised-god
Jan 19 at 0:46