For R: How to exclude some data files based on file language

I'm rather new to R (and in law school so this is all very new to me), so apologies if this is poorly worded.
I have a series of about 1500 documents that I am importing into R to categorize and analyze later. The first thing that I need to do is exclude all documents that are written in French, which are labelled with an "FR" in the title/doc.info. I was curious what kind of code I could use to exclude that before importing the files to have a clean data set before analyzing anything (since it will obvious make a mess of processes like sentiment analysis).
Any help is appreciated (even if that help is explaining how to better talk about coding).
Kind regards!

edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)

submissions_text<-submissions$text



submission_number<- numeric()

submission_person<- factor()

submission_code<- factor()

submission_language<-factor()

submission_location<-factor()



for (submission_name in submissions$doc_id) {

  submission_name<-gsub(".txt","",submission_name)

  number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])

  submission_number<-c(submission_number,number)

  person<-strsplit(submission_name, "_")[[1]][2]

  submission_person<-c(submission_person, person)

  code<-strsplit(submission_name, "_")[[1]][3]

  submission_code<-c(submission_code, code)

  lang<-strsplit(submission_name, "_")[[1]][4]

  submission_language<-c(submission_language, lang)

  location<-strsplit(submission_name, "_")[[1]][5]

  submission_location<-c(submission_location, location)

}



submissions<-cbind(submissions,submission_number)

submissions<-cbind(submissions,submission_person)

submissions<-cbind(submissions,submission_code)

submissions<-cbind(submissions,submission_language)

submissions<-cbind(submissions,submission_location)





submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]

This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).

edited Jan 19 at 0:51

asked Jan 18 at 23:50

televised-god

313

This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.

– phalteman
Jan 19 at 0:30

Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.

– televised-god
Jan 19 at 0:34

1

You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.

– phalteman
Jan 19 at 0:42

My bad. Again, all new to me. Thanks for the patience!

– televised-god
Jan 19 at 0:46

add a comment |

edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)

submissions_text<-submissions$text



submission_number<- numeric()

submission_person<- factor()

submission_code<- factor()

submission_language<-factor()

submission_location<-factor()



for (submission_name in submissions$doc_id) {

  submission_name<-gsub(".txt","",submission_name)

  number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])

  submission_number<-c(submission_number,number)

  person<-strsplit(submission_name, "_")[[1]][2]

  submission_person<-c(submission_person, person)

  code<-strsplit(submission_name, "_")[[1]][3]

  submission_code<-c(submission_code, code)

  lang<-strsplit(submission_name, "_")[[1]][4]

  submission_language<-c(submission_language, lang)

  location<-strsplit(submission_name, "_")[[1]][5]

  submission_location<-c(submission_location, location)

}



submissions<-cbind(submissions,submission_number)

submissions<-cbind(submissions,submission_person)

submissions<-cbind(submissions,submission_code)

submissions<-cbind(submissions,submission_language)

submissions<-cbind(submissions,submission_location)





submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]

This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).

edited Jan 19 at 0:51

asked Jan 18 at 23:50

televised-god

313

This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.

– phalteman
Jan 19 at 0:30

Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.

– televised-god
Jan 19 at 0:34

1

You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.

– phalteman
Jan 19 at 0:42

My bad. Again, all new to me. Thanks for the patience!

– televised-god
Jan 19 at 0:46

add a comment |

edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)

submissions_text<-submissions$text



submission_number<- numeric()

submission_person<- factor()

submission_code<- factor()

submission_language<-factor()

submission_location<-factor()



for (submission_name in submissions$doc_id) {

  submission_name<-gsub(".txt","",submission_name)

  number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])

  submission_number<-c(submission_number,number)

  person<-strsplit(submission_name, "_")[[1]][2]

  submission_person<-c(submission_person, person)

  code<-strsplit(submission_name, "_")[[1]][3]

  submission_code<-c(submission_code, code)

  lang<-strsplit(submission_name, "_")[[1]][4]

  submission_language<-c(submission_language, lang)

  location<-strsplit(submission_name, "_")[[1]][5]

  submission_location<-c(submission_location, location)

}



submissions<-cbind(submissions,submission_number)

submissions<-cbind(submissions,submission_person)

submissions<-cbind(submissions,submission_code)

submissions<-cbind(submissions,submission_language)

submissions<-cbind(submissions,submission_location)





submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]

This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).

edited Jan 19 at 0:51

asked Jan 18 at 23:50

televised-god

313

edit 1
The code that I am using is readtext(folder), which you can see below:
folder<-"C:/[pathway]"
submissions<-readtext(folder)

submissions_text<-submissions$text



submission_number<- numeric()

submission_person<- factor()

submission_code<- factor()

submission_language<-factor()

submission_location<-factor()



for (submission_name in submissions$doc_id) {

  submission_name<-gsub(".txt","",submission_name)

  number<-as.numeric(strsplit(submission_name, "_|-")[[1]][1])

  submission_number<-c(submission_number,number)

  person<-strsplit(submission_name, "_")[[1]][2]

  submission_person<-c(submission_person, person)

  code<-strsplit(submission_name, "_")[[1]][3]

  submission_code<-c(submission_code, code)

  lang<-strsplit(submission_name, "_")[[1]][4]

  submission_language<-c(submission_language, lang)

  location<-strsplit(submission_name, "_")[[1]][5]

  submission_location<-c(submission_location, location)

}



submissions<-cbind(submissions,submission_number)

submissions<-cbind(submissions,submission_person)

submissions<-cbind(submissions,submission_code)

submissions<-cbind(submissions,submission_language)

submissions<-cbind(submissions,submission_location)





submissions<-submissions[order(submissions$submission_number, decreasing = FALSE),]

This is just the organizational aspect of my code. I am looking to hopefully exclude all of the French data before this point (but if it comes afterward, I would also be more than happy with that).

edited Jan 19 at 0:51

asked Jan 18 at 23:50

televised-god

313

edited Jan 19 at 0:51

asked Jan 18 at 23:50

televised-god

313

edited Jan 19 at 0:51

asked Jan 18 at 23:50

televised-god

313

asked Jan 18 at 23:50

televised-god

313

asked Jan 18 at 23:50

televised-god

313

This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.

– phalteman
Jan 19 at 0:30

Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.

– televised-god
Jan 19 at 0:34

1

You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.

– phalteman
Jan 19 at 0:42

My bad. Again, all new to me. Thanks for the patience!

– televised-god
Jan 19 at 0:46

add a comment |

This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.

– phalteman
Jan 19 at 0:30

Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.

– televised-god
Jan 19 at 0:34

1

You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.

– phalteman
Jan 19 at 0:42

My bad. Again, all new to me. Thanks for the patience!

– televised-god
Jan 19 at 0:46

This is a really broad question - probably too broad to be easily answerable here. Have you taken attempts to write any code, for example just to import all your documents? SO is not a code-writing service, so it's better to show some attempt (even if you're not confident in it) and then people can point you in better directions.

– phalteman
Jan 19 at 0:30

Sure. I can show you what I currently have. folder<-"C:/[path]" submissions<-readtext(folder) submissions<-mutate(submissions, order = 1:182) submissions<-submissions %>% select(order, doc_id:text) #This will be for the first number, the ordering submission_number<- numeric() submission_person<- factor() submission_code<- factor() submission_language<-factor() submission_location<-factor() for (submission_name in submissions$doc_id) { submission_name<-gsub(".txt","",submission_name) [...] it goes on but my character count is too long.

– televised-god
Jan 19 at 0:34

You can put this as an edit into your question, so that others can give your code a try (though it looks like you've got a good answer already!). Keep it mind that you'll get better help if you include a minimal example.

– phalteman
Jan 19 at 0:42

My bad. Again, all new to me. Thanks for the patience!

– televised-god
Jan 19 at 0:46

add a comment |

2 Answers
2

active

oldest

votes

The functionality you are after can be found in the list.files() function. Documentation can be found here.

In short, your code will likely end up looking something like this:

setwd("c:/path/to/your/data/here")

files <- list.files()

non_french_files <- files[!grepl("FR", files)]

lapply(non_french_files, function(x) {

  f <- read.csv(x)

  #do stuff with f

}]

Note - you could directly leverage the pattern parameter found in `list.files(), but I chose to do that in two steps in case you wanted to do something else with the French files. This also simplifies what each line of code is doing...

...good luck and welcome to R!

answered Jan 19 at 0:21

Chase

49.3k12117152

Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!

– televised-god
Jan 19 at 0:37

You can also specify a regex pattern in list.files() and potentially skip have to assign non_french_files().

– dylanjm
Jan 19 at 2:11

add a comment |

Here's an alternative similar to @Chase 's:

#set wd

files<-list.files()[!grepl("FR",list.files())]

lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each

answered Jan 19 at 11:01

NelsonGon

2,069622

1

Thanks for the suggestion, you guys have been great!

– televised-god
Jan 19 at 22:20

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54262837%2ffor-r-how-to-exclude-some-data-files-based-on-file-language%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

The functionality you are after can be found in the list.files() function. Documentation can be found here.

In short, your code will likely end up looking something like this:

setwd("c:/path/to/your/data/here")

files <- list.files()

non_french_files <- files[!grepl("FR", files)]

lapply(non_french_files, function(x) {

  f <- read.csv(x)

  #do stuff with f

}]

...good luck and welcome to R!

answered Jan 19 at 0:21

Chase

49.3k12117152

Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!

– televised-god
Jan 19 at 0:37

You can also specify a regex pattern in list.files() and potentially skip have to assign non_french_files().

– dylanjm
Jan 19 at 2:11

add a comment |

The functionality you are after can be found in the list.files() function. Documentation can be found here.

In short, your code will likely end up looking something like this:

setwd("c:/path/to/your/data/here")

files <- list.files()

non_french_files <- files[!grepl("FR", files)]

lapply(non_french_files, function(x) {

  f <- read.csv(x)

  #do stuff with f

}]

...good luck and welcome to R!

answered Jan 19 at 0:21

Chase

49.3k12117152

Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!

– televised-god
Jan 19 at 0:37

You can also specify a regex pattern in list.files() and potentially skip have to assign non_french_files().

– dylanjm
Jan 19 at 2:11

add a comment |

The functionality you are after can be found in the list.files() function. Documentation can be found here.

In short, your code will likely end up looking something like this:

setwd("c:/path/to/your/data/here")

files <- list.files()

non_french_files <- files[!grepl("FR", files)]

lapply(non_french_files, function(x) {

  f <- read.csv(x)

  #do stuff with f

}]

...good luck and welcome to R!

answered Jan 19 at 0:21

Chase

49.3k12117152

The functionality you are after can be found in the list.files() function. Documentation can be found here.

In short, your code will likely end up looking something like this:

setwd("c:/path/to/your/data/here")

files <- list.files()

non_french_files <- files[!grepl("FR", files)]

lapply(non_french_files, function(x) {

  f <- read.csv(x)

  #do stuff with f

}]

...good luck and welcome to R!

answered Jan 19 at 0:21

Chase

49.3k12117152

answered Jan 19 at 0:21

Chase

49.3k12117152

answered Jan 19 at 0:21

Chase

49.3k12117152

answered Jan 19 at 0:21

Chase

49.3k12117152

Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!

– televised-god
Jan 19 at 0:37

You can also specify a regex pattern in list.files() and potentially skip have to assign non_french_files().

– dylanjm
Jan 19 at 2:11

add a comment |

Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!

– televised-god
Jan 19 at 0:37

You can also specify a regex pattern in list.files() and potentially skip have to assign non_french_files().

– dylanjm
Jan 19 at 2:11

Thanks! I am going to give this a try. I have been using readtext so far (probably should have mentioned that, sorry), but I will try this list.files. Thanks again!

– televised-god
Jan 19 at 0:37

You can also specify a regex pattern in list.files() and potentially skip have to assign non_french_files().

– dylanjm
Jan 19 at 2:11

add a comment |

Here's an alternative similar to @Chase 's:

#set wd

files<-list.files()[!grepl("FR",list.files())]

lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each

answered Jan 19 at 11:01

NelsonGon

2,069622

1

Thanks for the suggestion, you guys have been great!

– televised-god
Jan 19 at 22:20

add a comment |

Here's an alternative similar to @Chase 's:

#set wd

files<-list.files()[!grepl("FR",list.files())]

lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each

answered Jan 19 at 11:01

NelsonGon

2,069622

1

Thanks for the suggestion, you guys have been great!

– televised-god
Jan 19 at 22:20

add a comment |

Here's an alternative similar to @Chase 's:

#set wd

files<-list.files()[!grepl("FR",list.files())]

lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each

answered Jan 19 at 11:01

NelsonGon

2,069622

Here's an alternative similar to @Chase 's:

#set wd

files<-list.files()[!grepl("FR",list.files())]

lapply(files,function(x) read.csv(x)) #reads all at once, might want to save each

answered Jan 19 at 11:01

NelsonGon

2,069622

answered Jan 19 at 11:01

NelsonGon

2,069622

answered Jan 19 at 11:01

NelsonGon

2,069622

answered Jan 19 at 11:01

NelsonGon

2,069622

1

Thanks for the suggestion, you guys have been great!

– televised-god
Jan 19 at 22:20

add a comment |

1

Thanks for the suggestion, you guys have been great!

– televised-god
Jan 19 at 22:20

Thanks for the suggestion, you guys have been great!

– televised-god
Jan 19 at 22:20

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku