How can you find all unbalanced parens in a string in linear time with constant memory?












9












$begingroup$


I was given the following problem during an interview:




Gives a string which contains some mixture of parens (not brackets or braces-- only parens) with other alphanumeric characters, identify all parens that have no matching paren.



For example, in the string ")(ab))", indices 0 and 5 contain parens that have no matching paren.




I put forward a working O(n) solution using O(n) memory, using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren.



Afterwards, the interviewer noted that the problem could be solved in linear time with constant memory (as in, no additional memory usage besides what's taken up by the input.)



I asked how and she said something about going through the string once from the left identifying all open parens, and then a second time from the right identifying all close parens....or maybe it was the other way around. I didn't really understand and didn't want to ask her to hand-hold me through it.



Can anyone clarify the solution she suggested?










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    We may need some clarification from you first. Is the first parens or the second parens in "(()" considered unbalanced? Is the last parens or the second to last parens in "())" considered unbalanced? Or is it enough to identify any set of parens with least cardinality such that removing them will leave the remaining parens balanced? Or something else? Or is this part of the interview so that an answer can just put forward any justifiable specification?
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    I would say it doesn't matter, up to you. Remove any set that leaves the remainder balanced.
    $endgroup$
    – temporary_user_name
    yesterday






  • 5




    $begingroup$
    Then remove them all ;P
    $endgroup$
    – Veedrac
    yesterday










  • $begingroup$
    @Veedrac, of course (as you know) the poster forgot the word 'minimal' in "Remove any minimal set …."
    $endgroup$
    – LSpice
    yesterday










  • $begingroup$
    I didn't "forget it," per se, but rather left it out because it didn't seem like an important specification to me since there's only one set that can be removed to make it balanced, besides "all of them" which is of course defeating the purpose of the exercise.
    $endgroup$
    – temporary_user_name
    yesterday


















9












$begingroup$


I was given the following problem during an interview:




Gives a string which contains some mixture of parens (not brackets or braces-- only parens) with other alphanumeric characters, identify all parens that have no matching paren.



For example, in the string ")(ab))", indices 0 and 5 contain parens that have no matching paren.




I put forward a working O(n) solution using O(n) memory, using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren.



Afterwards, the interviewer noted that the problem could be solved in linear time with constant memory (as in, no additional memory usage besides what's taken up by the input.)



I asked how and she said something about going through the string once from the left identifying all open parens, and then a second time from the right identifying all close parens....or maybe it was the other way around. I didn't really understand and didn't want to ask her to hand-hold me through it.



Can anyone clarify the solution she suggested?










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    We may need some clarification from you first. Is the first parens or the second parens in "(()" considered unbalanced? Is the last parens or the second to last parens in "())" considered unbalanced? Or is it enough to identify any set of parens with least cardinality such that removing them will leave the remaining parens balanced? Or something else? Or is this part of the interview so that an answer can just put forward any justifiable specification?
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    I would say it doesn't matter, up to you. Remove any set that leaves the remainder balanced.
    $endgroup$
    – temporary_user_name
    yesterday






  • 5




    $begingroup$
    Then remove them all ;P
    $endgroup$
    – Veedrac
    yesterday










  • $begingroup$
    @Veedrac, of course (as you know) the poster forgot the word 'minimal' in "Remove any minimal set …."
    $endgroup$
    – LSpice
    yesterday










  • $begingroup$
    I didn't "forget it," per se, but rather left it out because it didn't seem like an important specification to me since there's only one set that can be removed to make it balanced, besides "all of them" which is of course defeating the purpose of the exercise.
    $endgroup$
    – temporary_user_name
    yesterday
















9












9








9


1



$begingroup$


I was given the following problem during an interview:




Gives a string which contains some mixture of parens (not brackets or braces-- only parens) with other alphanumeric characters, identify all parens that have no matching paren.



For example, in the string ")(ab))", indices 0 and 5 contain parens that have no matching paren.




I put forward a working O(n) solution using O(n) memory, using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren.



Afterwards, the interviewer noted that the problem could be solved in linear time with constant memory (as in, no additional memory usage besides what's taken up by the input.)



I asked how and she said something about going through the string once from the left identifying all open parens, and then a second time from the right identifying all close parens....or maybe it was the other way around. I didn't really understand and didn't want to ask her to hand-hold me through it.



Can anyone clarify the solution she suggested?










share|cite|improve this question









$endgroup$




I was given the following problem during an interview:




Gives a string which contains some mixture of parens (not brackets or braces-- only parens) with other alphanumeric characters, identify all parens that have no matching paren.



For example, in the string ")(ab))", indices 0 and 5 contain parens that have no matching paren.




I put forward a working O(n) solution using O(n) memory, using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren.



Afterwards, the interviewer noted that the problem could be solved in linear time with constant memory (as in, no additional memory usage besides what's taken up by the input.)



I asked how and she said something about going through the string once from the left identifying all open parens, and then a second time from the right identifying all close parens....or maybe it was the other way around. I didn't really understand and didn't want to ask her to hand-hold me through it.



Can anyone clarify the solution she suggested?







algorithms






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked yesterday









temporary_user_nametemporary_user_name

1898




1898








  • 1




    $begingroup$
    We may need some clarification from you first. Is the first parens or the second parens in "(()" considered unbalanced? Is the last parens or the second to last parens in "())" considered unbalanced? Or is it enough to identify any set of parens with least cardinality such that removing them will leave the remaining parens balanced? Or something else? Or is this part of the interview so that an answer can just put forward any justifiable specification?
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    I would say it doesn't matter, up to you. Remove any set that leaves the remainder balanced.
    $endgroup$
    – temporary_user_name
    yesterday






  • 5




    $begingroup$
    Then remove them all ;P
    $endgroup$
    – Veedrac
    yesterday










  • $begingroup$
    @Veedrac, of course (as you know) the poster forgot the word 'minimal' in "Remove any minimal set …."
    $endgroup$
    – LSpice
    yesterday










  • $begingroup$
    I didn't "forget it," per se, but rather left it out because it didn't seem like an important specification to me since there's only one set that can be removed to make it balanced, besides "all of them" which is of course defeating the purpose of the exercise.
    $endgroup$
    – temporary_user_name
    yesterday
















  • 1




    $begingroup$
    We may need some clarification from you first. Is the first parens or the second parens in "(()" considered unbalanced? Is the last parens or the second to last parens in "())" considered unbalanced? Or is it enough to identify any set of parens with least cardinality such that removing them will leave the remaining parens balanced? Or something else? Or is this part of the interview so that an answer can just put forward any justifiable specification?
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    I would say it doesn't matter, up to you. Remove any set that leaves the remainder balanced.
    $endgroup$
    – temporary_user_name
    yesterday






  • 5




    $begingroup$
    Then remove them all ;P
    $endgroup$
    – Veedrac
    yesterday










  • $begingroup$
    @Veedrac, of course (as you know) the poster forgot the word 'minimal' in "Remove any minimal set …."
    $endgroup$
    – LSpice
    yesterday










  • $begingroup$
    I didn't "forget it," per se, but rather left it out because it didn't seem like an important specification to me since there's only one set that can be removed to make it balanced, besides "all of them" which is of course defeating the purpose of the exercise.
    $endgroup$
    – temporary_user_name
    yesterday










1




1




$begingroup$
We may need some clarification from you first. Is the first parens or the second parens in "(()" considered unbalanced? Is the last parens or the second to last parens in "())" considered unbalanced? Or is it enough to identify any set of parens with least cardinality such that removing them will leave the remaining parens balanced? Or something else? Or is this part of the interview so that an answer can just put forward any justifiable specification?
$endgroup$
– Apass.Jack
yesterday






$begingroup$
We may need some clarification from you first. Is the first parens or the second parens in "(()" considered unbalanced? Is the last parens or the second to last parens in "())" considered unbalanced? Or is it enough to identify any set of parens with least cardinality such that removing them will leave the remaining parens balanced? Or something else? Or is this part of the interview so that an answer can just put forward any justifiable specification?
$endgroup$
– Apass.Jack
yesterday














$begingroup$
I would say it doesn't matter, up to you. Remove any set that leaves the remainder balanced.
$endgroup$
– temporary_user_name
yesterday




$begingroup$
I would say it doesn't matter, up to you. Remove any set that leaves the remainder balanced.
$endgroup$
– temporary_user_name
yesterday




5




5




$begingroup$
Then remove them all ;P
$endgroup$
– Veedrac
yesterday




$begingroup$
Then remove them all ;P
$endgroup$
– Veedrac
yesterday












$begingroup$
@Veedrac, of course (as you know) the poster forgot the word 'minimal' in "Remove any minimal set …."
$endgroup$
– LSpice
yesterday




$begingroup$
@Veedrac, of course (as you know) the poster forgot the word 'minimal' in "Remove any minimal set …."
$endgroup$
– LSpice
yesterday












$begingroup$
I didn't "forget it," per se, but rather left it out because it didn't seem like an important specification to me since there's only one set that can be removed to make it balanced, besides "all of them" which is of course defeating the purpose of the exercise.
$endgroup$
– temporary_user_name
yesterday






$begingroup$
I didn't "forget it," per se, but rather left it out because it didn't seem like an important specification to me since there's only one set that can be removed to make it balanced, besides "all of them" which is of course defeating the purpose of the exercise.
$endgroup$
– temporary_user_name
yesterday












2 Answers
2






active

oldest

votes


















14












$begingroup$

Since this comes from a programming background and not a theoretical computer science exercise, I assume that it takes $O(1)$ memory to store an index into the string. In theoretical computer science, this would mean using the RAM model; with Turing machines you couldn't do this and you'd need $Theta(log(n))$ memory to store an index into a string of length $n$.



You can keep the basic principle of the algorithm that you used. You missed an opportunity for a memory optimization.




using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren




So what does this stack contain? It's never going to contain () (an opening parenthesis followed by a closing parenthesis), since whenever the ) appear you pop the ( instead of pushing the ). So the stack is always of the form )…)(…( — a bunch of closing parentheses followed by a bunch of opening parentheses.



You don't need a stack to represent this. Just remember the number of closing parentheses and the number of opening parentheses.



If you process the string from left to right, using these two counters, what you have at the end is the number of mismatched closing parentheses and the number of mismatched opening parentheses.



If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis. That would require $Theta(n)$ memory in the worst case. But you don't need to wait until the end to produce output. As soon as you find a mismatched closing parenthesis, you know that it's mismatched, so output it now. And then you aren't going to use the number of mismatched closing parentheses for anything, so just keep a counter of unmatched opening parentheses.



In summary: process the string from left to right. Maintain a counter of unmatched opening parentheses. If you see an opening parenthesis, increment the counter. If you see a closing parenthesis and the counter is nonzero, decrement the counter. If you see a closing parenthesis and the counter is zero, output the current index as a mismatched closing parenthesis.



The final value of the counter is the number of mismatched opening parentheses, but this doesn't give you their position. Notice that the problem is symmetric. To list the positions of mismatched opening parentheses, just run the algorithm in the opposite direction.



Exercise 1: write this down in a formal notation (math, pseudocode or your favorite programming language).



Exercise 2: convince yourself that this is the same algorithm as Apass.Jack, just explained differently.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    "If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
    $endgroup$
    – Mooing Duck
    yesterday












  • $begingroup$
    For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
    $endgroup$
    – Mooing Duck
    yesterday






  • 1




    $begingroup$
    @MooingDuck That doesn't work. E.g. (().
    $endgroup$
    – orlp
    yesterday



















7












$begingroup$

Since we can just ignore all alphanumeric characters, we will assume the string contains only parentheses from now on. As in the question, there is only one kind of parenthesis, "()".



If we keep removing balanced parentheses until no more balanced parentheses can be removed, all remaining parentheses must look like like "))…)((…(", which are all unbalanced parentheses. This observation suggests that we should find first that turning point, before which we have unbalanced closing parentheses only and after which we have unbalanced opening parentheses only.



Here is the algorithm. In a nutshell, it computes the turning point first. Then it outputs extra closing parenthesis, scanning the string from the start to the right until the turning point. Symmetrically, it outputs extra opening parenthesis, scanning from the end to the left until the turning point.





Let str be the string as an array of characters, whose size is $n$.



Initialize turning_point=0, maximum_count=0, count=0. For each i from 0 to n-1 do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set turning_point=i and maximum_count=count.


Now turning_point is the index of the turning point.



Reset maximum_count=0, count=0. For each i from 0 to turning_point do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced closing parenthesis.


Reset maximum_count=0, count=0. For each i from n-1 to turning_point+1 downwards do the following.




  1. If str[j] = '(', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced opening parenthesis.




It is clear that the algorithm runs in $O(n)$ time and $O(1)$ auxiliary memory and $O(u)$ output memory, where $u$ is the number of unbalanced parentheses.





If we analyzed the algorithm above, we will see that, in fact, we do not need to find and use the turning point at all. The nice observation that all unbalanced closing parentheses happens before all unbalanced opening parentheses can be ignored although interesting.



Here is code in Python.



Just hit "run" to see several test results.





Exercise 1. Show that the above algorithm will output a set of parentheses with the least cardinality such that the remaining parentheses are balanced.



Problem 1. Can we generalize the algorithm to the case when the string contains two kinds of parentheses such as "()"? We have to determine how to recognize and treat the new situation, the interleaving case, "([)]".






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
    $endgroup$
    – OzrenTkalcecKrznaric
    yesterday










  • $begingroup$
    @OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
    $endgroup$
    – dquijada
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f103018%2fhow-can-you-find-all-unbalanced-parens-in-a-string-in-linear-time-with-constant%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









14












$begingroup$

Since this comes from a programming background and not a theoretical computer science exercise, I assume that it takes $O(1)$ memory to store an index into the string. In theoretical computer science, this would mean using the RAM model; with Turing machines you couldn't do this and you'd need $Theta(log(n))$ memory to store an index into a string of length $n$.



You can keep the basic principle of the algorithm that you used. You missed an opportunity for a memory optimization.




using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren




So what does this stack contain? It's never going to contain () (an opening parenthesis followed by a closing parenthesis), since whenever the ) appear you pop the ( instead of pushing the ). So the stack is always of the form )…)(…( — a bunch of closing parentheses followed by a bunch of opening parentheses.



You don't need a stack to represent this. Just remember the number of closing parentheses and the number of opening parentheses.



If you process the string from left to right, using these two counters, what you have at the end is the number of mismatched closing parentheses and the number of mismatched opening parentheses.



If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis. That would require $Theta(n)$ memory in the worst case. But you don't need to wait until the end to produce output. As soon as you find a mismatched closing parenthesis, you know that it's mismatched, so output it now. And then you aren't going to use the number of mismatched closing parentheses for anything, so just keep a counter of unmatched opening parentheses.



In summary: process the string from left to right. Maintain a counter of unmatched opening parentheses. If you see an opening parenthesis, increment the counter. If you see a closing parenthesis and the counter is nonzero, decrement the counter. If you see a closing parenthesis and the counter is zero, output the current index as a mismatched closing parenthesis.



The final value of the counter is the number of mismatched opening parentheses, but this doesn't give you their position. Notice that the problem is symmetric. To list the positions of mismatched opening parentheses, just run the algorithm in the opposite direction.



Exercise 1: write this down in a formal notation (math, pseudocode or your favorite programming language).



Exercise 2: convince yourself that this is the same algorithm as Apass.Jack, just explained differently.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    "If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
    $endgroup$
    – Mooing Duck
    yesterday












  • $begingroup$
    For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
    $endgroup$
    – Mooing Duck
    yesterday






  • 1




    $begingroup$
    @MooingDuck That doesn't work. E.g. (().
    $endgroup$
    – orlp
    yesterday
















14












$begingroup$

Since this comes from a programming background and not a theoretical computer science exercise, I assume that it takes $O(1)$ memory to store an index into the string. In theoretical computer science, this would mean using the RAM model; with Turing machines you couldn't do this and you'd need $Theta(log(n))$ memory to store an index into a string of length $n$.



You can keep the basic principle of the algorithm that you used. You missed an opportunity for a memory optimization.




using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren




So what does this stack contain? It's never going to contain () (an opening parenthesis followed by a closing parenthesis), since whenever the ) appear you pop the ( instead of pushing the ). So the stack is always of the form )…)(…( — a bunch of closing parentheses followed by a bunch of opening parentheses.



You don't need a stack to represent this. Just remember the number of closing parentheses and the number of opening parentheses.



If you process the string from left to right, using these two counters, what you have at the end is the number of mismatched closing parentheses and the number of mismatched opening parentheses.



If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis. That would require $Theta(n)$ memory in the worst case. But you don't need to wait until the end to produce output. As soon as you find a mismatched closing parenthesis, you know that it's mismatched, so output it now. And then you aren't going to use the number of mismatched closing parentheses for anything, so just keep a counter of unmatched opening parentheses.



In summary: process the string from left to right. Maintain a counter of unmatched opening parentheses. If you see an opening parenthesis, increment the counter. If you see a closing parenthesis and the counter is nonzero, decrement the counter. If you see a closing parenthesis and the counter is zero, output the current index as a mismatched closing parenthesis.



The final value of the counter is the number of mismatched opening parentheses, but this doesn't give you their position. Notice that the problem is symmetric. To list the positions of mismatched opening parentheses, just run the algorithm in the opposite direction.



Exercise 1: write this down in a formal notation (math, pseudocode or your favorite programming language).



Exercise 2: convince yourself that this is the same algorithm as Apass.Jack, just explained differently.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    "If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
    $endgroup$
    – Mooing Duck
    yesterday












  • $begingroup$
    For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
    $endgroup$
    – Mooing Duck
    yesterday






  • 1




    $begingroup$
    @MooingDuck That doesn't work. E.g. (().
    $endgroup$
    – orlp
    yesterday














14












14








14





$begingroup$

Since this comes from a programming background and not a theoretical computer science exercise, I assume that it takes $O(1)$ memory to store an index into the string. In theoretical computer science, this would mean using the RAM model; with Turing machines you couldn't do this and you'd need $Theta(log(n))$ memory to store an index into a string of length $n$.



You can keep the basic principle of the algorithm that you used. You missed an opportunity for a memory optimization.




using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren




So what does this stack contain? It's never going to contain () (an opening parenthesis followed by a closing parenthesis), since whenever the ) appear you pop the ( instead of pushing the ). So the stack is always of the form )…)(…( — a bunch of closing parentheses followed by a bunch of opening parentheses.



You don't need a stack to represent this. Just remember the number of closing parentheses and the number of opening parentheses.



If you process the string from left to right, using these two counters, what you have at the end is the number of mismatched closing parentheses and the number of mismatched opening parentheses.



If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis. That would require $Theta(n)$ memory in the worst case. But you don't need to wait until the end to produce output. As soon as you find a mismatched closing parenthesis, you know that it's mismatched, so output it now. And then you aren't going to use the number of mismatched closing parentheses for anything, so just keep a counter of unmatched opening parentheses.



In summary: process the string from left to right. Maintain a counter of unmatched opening parentheses. If you see an opening parenthesis, increment the counter. If you see a closing parenthesis and the counter is nonzero, decrement the counter. If you see a closing parenthesis and the counter is zero, output the current index as a mismatched closing parenthesis.



The final value of the counter is the number of mismatched opening parentheses, but this doesn't give you their position. Notice that the problem is symmetric. To list the positions of mismatched opening parentheses, just run the algorithm in the opposite direction.



Exercise 1: write this down in a formal notation (math, pseudocode or your favorite programming language).



Exercise 2: convince yourself that this is the same algorithm as Apass.Jack, just explained differently.






share|cite|improve this answer









$endgroup$



Since this comes from a programming background and not a theoretical computer science exercise, I assume that it takes $O(1)$ memory to store an index into the string. In theoretical computer science, this would mean using the RAM model; with Turing machines you couldn't do this and you'd need $Theta(log(n))$ memory to store an index into a string of length $n$.



You can keep the basic principle of the algorithm that you used. You missed an opportunity for a memory optimization.




using a stack and going through the string once adding parens to the stack and removing them from the stack whenever I encountered a closing paren and the top of the stack contained an opening paren




So what does this stack contain? It's never going to contain () (an opening parenthesis followed by a closing parenthesis), since whenever the ) appear you pop the ( instead of pushing the ). So the stack is always of the form )…)(…( — a bunch of closing parentheses followed by a bunch of opening parentheses.



You don't need a stack to represent this. Just remember the number of closing parentheses and the number of opening parentheses.



If you process the string from left to right, using these two counters, what you have at the end is the number of mismatched closing parentheses and the number of mismatched opening parentheses.



If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis. That would require $Theta(n)$ memory in the worst case. But you don't need to wait until the end to produce output. As soon as you find a mismatched closing parenthesis, you know that it's mismatched, so output it now. And then you aren't going to use the number of mismatched closing parentheses for anything, so just keep a counter of unmatched opening parentheses.



In summary: process the string from left to right. Maintain a counter of unmatched opening parentheses. If you see an opening parenthesis, increment the counter. If you see a closing parenthesis and the counter is nonzero, decrement the counter. If you see a closing parenthesis and the counter is zero, output the current index as a mismatched closing parenthesis.



The final value of the counter is the number of mismatched opening parentheses, but this doesn't give you their position. Notice that the problem is symmetric. To list the positions of mismatched opening parentheses, just run the algorithm in the opposite direction.



Exercise 1: write this down in a formal notation (math, pseudocode or your favorite programming language).



Exercise 2: convince yourself that this is the same algorithm as Apass.Jack, just explained differently.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered yesterday









GillesGilles

33k792163




33k792163












  • $begingroup$
    Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    "If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
    $endgroup$
    – Mooing Duck
    yesterday












  • $begingroup$
    For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
    $endgroup$
    – Mooing Duck
    yesterday






  • 1




    $begingroup$
    @MooingDuck That doesn't work. E.g. (().
    $endgroup$
    – orlp
    yesterday


















  • $begingroup$
    Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    "If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
    $endgroup$
    – Mooing Duck
    yesterday












  • $begingroup$
    For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
    $endgroup$
    – Mooing Duck
    yesterday






  • 1




    $begingroup$
    @MooingDuck That doesn't work. E.g. (().
    $endgroup$
    – orlp
    yesterday
















$begingroup$
Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
$endgroup$
– temporary_user_name
yesterday




$begingroup$
Oh very good Gilles, very well explained. I understand perfectly now. It has been quite a few years since I got an answer from you on one of my questions.
$endgroup$
– temporary_user_name
yesterday












$begingroup$
"If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
$endgroup$
– Mooing Duck
yesterday






$begingroup$
"If you want to report the positions of the mismatched parentheses at the end, you'll need to remember the position of each parenthesis." Not quite. Linear time doesn't mean single pass. You can do a second pass to find any brackets in the mismatched side and mark them.
$endgroup$
– Mooing Duck
yesterday














$begingroup$
For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
$endgroup$
– Mooing Duck
yesterday




$begingroup$
For the last step, you don't have to run it in reverse, you can simply mark the last N "(" as mismatches.
$endgroup$
– Mooing Duck
yesterday




1




1




$begingroup$
@MooingDuck That doesn't work. E.g. (().
$endgroup$
– orlp
yesterday




$begingroup$
@MooingDuck That doesn't work. E.g. (().
$endgroup$
– orlp
yesterday











7












$begingroup$

Since we can just ignore all alphanumeric characters, we will assume the string contains only parentheses from now on. As in the question, there is only one kind of parenthesis, "()".



If we keep removing balanced parentheses until no more balanced parentheses can be removed, all remaining parentheses must look like like "))…)((…(", which are all unbalanced parentheses. This observation suggests that we should find first that turning point, before which we have unbalanced closing parentheses only and after which we have unbalanced opening parentheses only.



Here is the algorithm. In a nutshell, it computes the turning point first. Then it outputs extra closing parenthesis, scanning the string from the start to the right until the turning point. Symmetrically, it outputs extra opening parenthesis, scanning from the end to the left until the turning point.





Let str be the string as an array of characters, whose size is $n$.



Initialize turning_point=0, maximum_count=0, count=0. For each i from 0 to n-1 do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set turning_point=i and maximum_count=count.


Now turning_point is the index of the turning point.



Reset maximum_count=0, count=0. For each i from 0 to turning_point do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced closing parenthesis.


Reset maximum_count=0, count=0. For each i from n-1 to turning_point+1 downwards do the following.




  1. If str[j] = '(', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced opening parenthesis.




It is clear that the algorithm runs in $O(n)$ time and $O(1)$ auxiliary memory and $O(u)$ output memory, where $u$ is the number of unbalanced parentheses.





If we analyzed the algorithm above, we will see that, in fact, we do not need to find and use the turning point at all. The nice observation that all unbalanced closing parentheses happens before all unbalanced opening parentheses can be ignored although interesting.



Here is code in Python.



Just hit "run" to see several test results.





Exercise 1. Show that the above algorithm will output a set of parentheses with the least cardinality such that the remaining parentheses are balanced.



Problem 1. Can we generalize the algorithm to the case when the string contains two kinds of parentheses such as "()"? We have to determine how to recognize and treat the new situation, the interleaving case, "([)]".






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
    $endgroup$
    – OzrenTkalcecKrznaric
    yesterday










  • $begingroup$
    @OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
    $endgroup$
    – dquijada
    yesterday
















7












$begingroup$

Since we can just ignore all alphanumeric characters, we will assume the string contains only parentheses from now on. As in the question, there is only one kind of parenthesis, "()".



If we keep removing balanced parentheses until no more balanced parentheses can be removed, all remaining parentheses must look like like "))…)((…(", which are all unbalanced parentheses. This observation suggests that we should find first that turning point, before which we have unbalanced closing parentheses only and after which we have unbalanced opening parentheses only.



Here is the algorithm. In a nutshell, it computes the turning point first. Then it outputs extra closing parenthesis, scanning the string from the start to the right until the turning point. Symmetrically, it outputs extra opening parenthesis, scanning from the end to the left until the turning point.





Let str be the string as an array of characters, whose size is $n$.



Initialize turning_point=0, maximum_count=0, count=0. For each i from 0 to n-1 do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set turning_point=i and maximum_count=count.


Now turning_point is the index of the turning point.



Reset maximum_count=0, count=0. For each i from 0 to turning_point do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced closing parenthesis.


Reset maximum_count=0, count=0. For each i from n-1 to turning_point+1 downwards do the following.




  1. If str[j] = '(', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced opening parenthesis.




It is clear that the algorithm runs in $O(n)$ time and $O(1)$ auxiliary memory and $O(u)$ output memory, where $u$ is the number of unbalanced parentheses.





If we analyzed the algorithm above, we will see that, in fact, we do not need to find and use the turning point at all. The nice observation that all unbalanced closing parentheses happens before all unbalanced opening parentheses can be ignored although interesting.



Here is code in Python.



Just hit "run" to see several test results.





Exercise 1. Show that the above algorithm will output a set of parentheses with the least cardinality such that the remaining parentheses are balanced.



Problem 1. Can we generalize the algorithm to the case when the string contains two kinds of parentheses such as "()"? We have to determine how to recognize and treat the new situation, the interleaving case, "([)]".






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
    $endgroup$
    – OzrenTkalcecKrznaric
    yesterday










  • $begingroup$
    @OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
    $endgroup$
    – dquijada
    yesterday














7












7








7





$begingroup$

Since we can just ignore all alphanumeric characters, we will assume the string contains only parentheses from now on. As in the question, there is only one kind of parenthesis, "()".



If we keep removing balanced parentheses until no more balanced parentheses can be removed, all remaining parentheses must look like like "))…)((…(", which are all unbalanced parentheses. This observation suggests that we should find first that turning point, before which we have unbalanced closing parentheses only and after which we have unbalanced opening parentheses only.



Here is the algorithm. In a nutshell, it computes the turning point first. Then it outputs extra closing parenthesis, scanning the string from the start to the right until the turning point. Symmetrically, it outputs extra opening parenthesis, scanning from the end to the left until the turning point.





Let str be the string as an array of characters, whose size is $n$.



Initialize turning_point=0, maximum_count=0, count=0. For each i from 0 to n-1 do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set turning_point=i and maximum_count=count.


Now turning_point is the index of the turning point.



Reset maximum_count=0, count=0. For each i from 0 to turning_point do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced closing parenthesis.


Reset maximum_count=0, count=0. For each i from n-1 to turning_point+1 downwards do the following.




  1. If str[j] = '(', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced opening parenthesis.




It is clear that the algorithm runs in $O(n)$ time and $O(1)$ auxiliary memory and $O(u)$ output memory, where $u$ is the number of unbalanced parentheses.





If we analyzed the algorithm above, we will see that, in fact, we do not need to find and use the turning point at all. The nice observation that all unbalanced closing parentheses happens before all unbalanced opening parentheses can be ignored although interesting.



Here is code in Python.



Just hit "run" to see several test results.





Exercise 1. Show that the above algorithm will output a set of parentheses with the least cardinality such that the remaining parentheses are balanced.



Problem 1. Can we generalize the algorithm to the case when the string contains two kinds of parentheses such as "()"? We have to determine how to recognize and treat the new situation, the interleaving case, "([)]".






share|cite|improve this answer











$endgroup$



Since we can just ignore all alphanumeric characters, we will assume the string contains only parentheses from now on. As in the question, there is only one kind of parenthesis, "()".



If we keep removing balanced parentheses until no more balanced parentheses can be removed, all remaining parentheses must look like like "))…)((…(", which are all unbalanced parentheses. This observation suggests that we should find first that turning point, before which we have unbalanced closing parentheses only and after which we have unbalanced opening parentheses only.



Here is the algorithm. In a nutshell, it computes the turning point first. Then it outputs extra closing parenthesis, scanning the string from the start to the right until the turning point. Symmetrically, it outputs extra opening parenthesis, scanning from the end to the left until the turning point.





Let str be the string as an array of characters, whose size is $n$.



Initialize turning_point=0, maximum_count=0, count=0. For each i from 0 to n-1 do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set turning_point=i and maximum_count=count.


Now turning_point is the index of the turning point.



Reset maximum_count=0, count=0. For each i from 0 to turning_point do the following.




  1. If str[i] = ')', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced closing parenthesis.


Reset maximum_count=0, count=0. For each i from n-1 to turning_point+1 downwards do the following.




  1. If str[j] = '(', add 1 to count; otherwise, subtract 1.

  2. If count > maximum_count, set maximum_count = count. Output i as the index of an unbalanced opening parenthesis.




It is clear that the algorithm runs in $O(n)$ time and $O(1)$ auxiliary memory and $O(u)$ output memory, where $u$ is the number of unbalanced parentheses.





If we analyzed the algorithm above, we will see that, in fact, we do not need to find and use the turning point at all. The nice observation that all unbalanced closing parentheses happens before all unbalanced opening parentheses can be ignored although interesting.



Here is code in Python.



Just hit "run" to see several test results.





Exercise 1. Show that the above algorithm will output a set of parentheses with the least cardinality such that the remaining parentheses are balanced.



Problem 1. Can we generalize the algorithm to the case when the string contains two kinds of parentheses such as "()"? We have to determine how to recognize and treat the new situation, the interleaving case, "([)]".







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited yesterday

























answered yesterday









Apass.JackApass.Jack

8,4551633




8,4551633












  • $begingroup$
    Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
    $endgroup$
    – OzrenTkalcecKrznaric
    yesterday










  • $begingroup$
    @OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
    $endgroup$
    – dquijada
    yesterday


















  • $begingroup$
    Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
    $endgroup$
    – temporary_user_name
    yesterday










  • $begingroup$
    It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
    $endgroup$
    – OzrenTkalcecKrznaric
    yesterday










  • $begingroup$
    @OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
    $endgroup$
    – Apass.Jack
    yesterday












  • $begingroup$
    Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
    $endgroup$
    – dquijada
    yesterday
















$begingroup$
Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
$endgroup$
– temporary_user_name
yesterday




$begingroup$
Lol, exercise 1 and problem 1, cute. The logic of the algorithm you've described is surprisingly hard to visualize. I would have to code this out tomorrow to get it.
$endgroup$
– temporary_user_name
yesterday












$begingroup$
It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
$endgroup$
– Apass.Jack
yesterday






$begingroup$
It looks like I missed the rather obvious but most important explanation. The logic is, in fact, very simple. First, we output each extra opening parenthesis. Once we have passed the turning point, we output each extra closing parenthesis. Done.
$endgroup$
– Apass.Jack
yesterday














$begingroup$
Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
$endgroup$
– OzrenTkalcecKrznaric
yesterday




$begingroup$
Finding unbalanced opening parentheses is incorrect. I.e. if your arr is "())", p is 2 and p+1 falls outside of the arr boundary. Just an idea - to find unbalanced opening parentheses you could reverse arr and use part of algorithm to find unbalanced closing parentheses (of course, with reversely adapted indexes).
$endgroup$
– OzrenTkalcecKrznaric
yesterday












$begingroup$
@OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
$endgroup$
– Apass.Jack
yesterday






$begingroup$
@OzrenTkalcecKrznaric Exactly because $p+1$ falls outside of the boundary, there is no unbalanced opening parentheses in "())".
$endgroup$
– Apass.Jack
yesterday














$begingroup$
Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
$endgroup$
– dquijada
yesterday




$begingroup$
Took me a bit to understand this, but I like it, it's pretty clever.. and works at least for every case I have thought
$endgroup$
– dquijada
yesterday


















draft saved

draft discarded




















































Thanks for contributing an answer to Computer Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f103018%2fhow-can-you-find-all-unbalanced-parens-in-a-string-in-linear-time-with-constant%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Liquibase includeAll doesn't find base path

How to use setInterval in EJS file?

Petrus Granier-Deferre