Leaking goroutines, typically have three times as many running as I want
I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.
Only I can't get this to work.
I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want
This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.
I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.
func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}
I'm expecting it to be constantly at 80 goroutines - or at least constant.
This happens when I run it from a benchmarking test or when i run it from the main function.
Thanks very much for any tips!
EDIT
getIndividualScrape
calls another function:
func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}
which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body
I'd have covered that but maybe not?
go concurrency goroutine
|
show 7 more comments
I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.
Only I can't get this to work.
I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want
This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.
I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.
func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}
I'm expecting it to be constantly at 80 goroutines - or at least constant.
This happens when I run it from a benchmarking test or when i run it from the main function.
Thanks very much for any tips!
EDIT
getIndividualScrape
calls another function:
func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}
which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body
I'd have covered that but maybe not?
go concurrency goroutine
There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?
– Adrian
Jan 18 at 21:51
@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)
– dircrys
Jan 18 at 21:53
But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.
– Adrian
Jan 18 at 21:57
I assumegetIndividualScrape
is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.
– JimB
Jan 18 at 21:59
The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.
– dircrys
Jan 18 at 21:59
|
show 7 more comments
I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.
Only I can't get this to work.
I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want
This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.
I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.
func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}
I'm expecting it to be constantly at 80 goroutines - or at least constant.
This happens when I run it from a benchmarking test or when i run it from the main function.
Thanks very much for any tips!
EDIT
getIndividualScrape
calls another function:
func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}
which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body
I'd have covered that but maybe not?
go concurrency goroutine
I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.
Only I can't get this to work.
I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want
This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.
I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.
func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}
I'm expecting it to be constantly at 80 goroutines - or at least constant.
This happens when I run it from a benchmarking test or when i run it from the main function.
Thanks very much for any tips!
EDIT
getIndividualScrape
calls another function:
func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}
which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body
I'd have covered that but maybe not?
go concurrency goroutine
go concurrency goroutine
edited Jan 22 at 2:56
reticentroot
2,2891925
2,2891925
asked Jan 18 at 21:49
dircrysdircrys
186
186
There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?
– Adrian
Jan 18 at 21:51
@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)
– dircrys
Jan 18 at 21:53
But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.
– Adrian
Jan 18 at 21:57
I assumegetIndividualScrape
is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.
– JimB
Jan 18 at 21:59
The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.
– dircrys
Jan 18 at 21:59
|
show 7 more comments
There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?
– Adrian
Jan 18 at 21:51
@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)
– dircrys
Jan 18 at 21:53
But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.
– Adrian
Jan 18 at 21:57
I assumegetIndividualScrape
is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.
– JimB
Jan 18 at 21:59
The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.
– dircrys
Jan 18 at 21:59
There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?
– Adrian
Jan 18 at 21:51
There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?
– Adrian
Jan 18 at 21:51
@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)
– dircrys
Jan 18 at 21:53
@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)
– dircrys
Jan 18 at 21:53
But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.
– Adrian
Jan 18 at 21:57
But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.
– Adrian
Jan 18 at 21:57
I assume
getIndividualScrape
is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.– JimB
Jan 18 at 21:59
I assume
getIndividualScrape
is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.– JimB
Jan 18 at 21:59
The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.
– dircrys
Jan 18 at 21:59
The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.
– dircrys
Jan 18 at 21:59
|
show 7 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54261875%2fleaking-goroutines-typically-have-three-times-as-many-running-as-i-want%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54261875%2fleaking-goroutines-typically-have-three-times-as-many-running-as-i-want%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?
– Adrian
Jan 18 at 21:51
@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)
– dircrys
Jan 18 at 21:53
But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.
– Adrian
Jan 18 at 21:57
I assume
getIndividualScrape
is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.– JimB
Jan 18 at 21:59
The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.
– dircrys
Jan 18 at 21:59