Leaking goroutines, typically have three times as many running as I want












0















I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.



Only I can't get this to work.



I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want



This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.



I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.



func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}


I'm expecting it to be constantly at 80 goroutines - or at least constant.



This happens when I run it from a benchmarking test or when i run it from the main function.



Thanks very much for any tips!



EDIT



getIndividualScrape


calls another function:



func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}


which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body I'd have covered that but maybe not?










share|improve this question

























  • There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?

    – Adrian
    Jan 18 at 21:51











  • @Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)

    – dircrys
    Jan 18 at 21:53













  • But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.

    – Adrian
    Jan 18 at 21:57











  • I assume getIndividualScrape is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.

    – JimB
    Jan 18 at 21:59











  • The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.

    – dircrys
    Jan 18 at 21:59
















0















I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.



Only I can't get this to work.



I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want



This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.



I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.



func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}


I'm expecting it to be constantly at 80 goroutines - or at least constant.



This happens when I run it from a benchmarking test or when i run it from the main function.



Thanks very much for any tips!



EDIT



getIndividualScrape


calls another function:



func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}


which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body I'd have covered that but maybe not?










share|improve this question

























  • There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?

    – Adrian
    Jan 18 at 21:51











  • @Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)

    – dircrys
    Jan 18 at 21:53













  • But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.

    – Adrian
    Jan 18 at 21:57











  • I assume getIndividualScrape is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.

    – JimB
    Jan 18 at 21:59











  • The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.

    – dircrys
    Jan 18 at 21:59














0












0








0








I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.



Only I can't get this to work.



I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want



This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.



I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.



func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}


I'm expecting it to be constantly at 80 goroutines - or at least constant.



This happens when I run it from a benchmarking test or when i run it from the main function.



Thanks very much for any tips!



EDIT



getIndividualScrape


calls another function:



func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}


which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body I'd have covered that but maybe not?










share|improve this question
















I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.



Only I can't get this to work.



I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want



This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.



I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.



func (B BrandScraper) ScrapeUrls(URLs ...string) scrapeResponse {
concurrent := 80
semaphoreChan := make(chan struct{}, concurrent)
scrapeResults := make(scrapeResponse, len(URLs))
for _, URL := range URLs {
semaphoreChan <- struct{}{}
go func(URL string) {
defer func() {
<-semaphoreChan
}()
scrapeResults = append(scrapeResults,
B.getIndividualScrape(URL))
fmt.Printf("#goroutines: %dn", runtime.NumGoroutine())
}(URL)
}
return scrapeResults
}


I'm expecting it to be constantly at 80 goroutines - or at least constant.



This happens when I run it from a benchmarking test or when i run it from the main function.



Thanks very much for any tips!



EDIT



getIndividualScrape


calls another function:



func (B BrandScraper) doGetRequest(URL string) io.Reader {
resp, err := http.Get(URL)
if err != nil {
log.Fatal(err)
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close()
return bytes.NewReader(body)
}


which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body I'd have covered that but maybe not?







go concurrency goroutine






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 22 at 2:56









reticentroot

2,2891925




2,2891925










asked Jan 18 at 21:49









dircrysdircrys

186




186













  • There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?

    – Adrian
    Jan 18 at 21:51











  • @Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)

    – dircrys
    Jan 18 at 21:53













  • But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.

    – Adrian
    Jan 18 at 21:57











  • I assume getIndividualScrape is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.

    – JimB
    Jan 18 at 21:59











  • The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.

    – dircrys
    Jan 18 at 21:59



















  • There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?

    – Adrian
    Jan 18 at 21:51











  • @Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)

    – dircrys
    Jan 18 at 21:53













  • But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.

    – Adrian
    Jan 18 at 21:57











  • I assume getIndividualScrape is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.

    – JimB
    Jan 18 at 21:59











  • The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.

    – dircrys
    Jan 18 at 21:59

















There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?

– Adrian
Jan 18 at 21:51





There are other background goroutines spawned by the runtime and by functionality you're using from the standard library. Is there actually a problem?

– Adrian
Jan 18 at 21:51













@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)

– dircrys
Jan 18 at 21:53







@Adrian The problem is really the random spikes, if it was consistently three times as high as I 'want' then that would be fine (ish)

– dircrys
Jan 18 at 21:53















But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.

– Adrian
Jan 18 at 21:57





But why do you want that? Unless you're hitting a memory or CPU constraint, it seems like you don't actually have a problem.

– Adrian
Jan 18 at 21:57













I assume getIndividualScrape is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.

– JimB
Jan 18 at 21:59





I assume getIndividualScrape is making some sort of http request? Which of course can start multiple goroutines of its own. The problem is not in this part of the code, but in your handling of the http client.

– JimB
Jan 18 at 21:59













The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.

– dircrys
Jan 18 at 21:59





The initial issue was that I was hitting a file descriptor limit because of the HTTP requests the goroutines trigger. But now I'd also like to know why the number is so variable.

– dircrys
Jan 18 at 21:59












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54261875%2fleaking-goroutines-typically-have-three-times-as-many-running-as-i-want%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54261875%2fleaking-goroutines-typically-have-three-times-as-many-running-as-i-want%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Homophylophilia

Updating UILabel text programmatically using a function

Cloud Functions - OpenCV Videocapture Read method fails for larger files from cloud storage