Regex replace text outside script tag












1















I have this HTML:




"This is simple html text <script language="javascript">simple simple text text</script> text"


I need to match only words that are outside script tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text” — the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using PHP.



I found a similar answer for match text outside a tag:



(text|simple)(?![^<]*>|[^<>]*</)


Regex replace text outside html tags



But couln't put to work for a specific tag (script):



(text|simple)(?!(^<script*>)|[^<>]*</)


ps: This question is not a duplicate (strip_tags, remove javascript). 'Cause i´m not trying to strip tags, or select the content inside the script tag. i´m trying replace content outside the tag "script".










share|improve this question

























  • Do you absolutely need matching, or capturing groups will do?

    – Vivick
    Aug 26 '17 at 22:23











  • When you want to parse html with confidence, use an html parser not regex. SO says this over and over and over. IIRC there is even a note that the SO software pops up that says "don't use regex to parse html".

    – mickmackusa
    Aug 27 '17 at 2:48











  • @mickmackusa, but when you use a parser they stop working parsing a malformed html. I think this question is not a duplicate. 'Cause i´m not trying to strip tags, i´m trying replace content outside the tag "script".

    – Paulo A. Costa
    Aug 27 '17 at 3:00













  • Retracted dupe link, it is merely related.

    – mickmackusa
    Aug 27 '17 at 3:56
















1















I have this HTML:




"This is simple html text <script language="javascript">simple simple text text</script> text"


I need to match only words that are outside script tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text” — the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using PHP.



I found a similar answer for match text outside a tag:



(text|simple)(?![^<]*>|[^<>]*</)


Regex replace text outside html tags



But couln't put to work for a specific tag (script):



(text|simple)(?!(^<script*>)|[^<>]*</)


ps: This question is not a duplicate (strip_tags, remove javascript). 'Cause i´m not trying to strip tags, or select the content inside the script tag. i´m trying replace content outside the tag "script".










share|improve this question

























  • Do you absolutely need matching, or capturing groups will do?

    – Vivick
    Aug 26 '17 at 22:23











  • When you want to parse html with confidence, use an html parser not regex. SO says this over and over and over. IIRC there is even a note that the SO software pops up that says "don't use regex to parse html".

    – mickmackusa
    Aug 27 '17 at 2:48











  • @mickmackusa, but when you use a parser they stop working parsing a malformed html. I think this question is not a duplicate. 'Cause i´m not trying to strip tags, i´m trying replace content outside the tag "script".

    – Paulo A. Costa
    Aug 27 '17 at 3:00













  • Retracted dupe link, it is merely related.

    – mickmackusa
    Aug 27 '17 at 3:56














1












1








1








I have this HTML:




"This is simple html text <script language="javascript">simple simple text text</script> text"


I need to match only words that are outside script tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text” — the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using PHP.



I found a similar answer for match text outside a tag:



(text|simple)(?![^<]*>|[^<>]*</)


Regex replace text outside html tags



But couln't put to work for a specific tag (script):



(text|simple)(?!(^<script*>)|[^<>]*</)


ps: This question is not a duplicate (strip_tags, remove javascript). 'Cause i´m not trying to strip tags, or select the content inside the script tag. i´m trying replace content outside the tag "script".










share|improve this question
















I have this HTML:




"This is simple html text <script language="javascript">simple simple text text</script> text"


I need to match only words that are outside script tag. I mean if I want to match “simple” and “text” I should get the results only from “This is simple html text” and the last part “text” — the result will be “simple” 1 match, “text” 2 matches. Could anyone help me with this? I’m using PHP.



I found a similar answer for match text outside a tag:



(text|simple)(?![^<]*>|[^<>]*</)


Regex replace text outside html tags



But couln't put to work for a specific tag (script):



(text|simple)(?!(^<script*>)|[^<>]*</)


ps: This question is not a duplicate (strip_tags, remove javascript). 'Cause i´m not trying to strip tags, or select the content inside the script tag. i´m trying replace content outside the tag "script".







php html regex preg-replace






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 27 '17 at 3:06







Paulo A. Costa

















asked Aug 26 '17 at 22:16









Paulo A. CostaPaulo A. Costa

8415




8415













  • Do you absolutely need matching, or capturing groups will do?

    – Vivick
    Aug 26 '17 at 22:23











  • When you want to parse html with confidence, use an html parser not regex. SO says this over and over and over. IIRC there is even a note that the SO software pops up that says "don't use regex to parse html".

    – mickmackusa
    Aug 27 '17 at 2:48











  • @mickmackusa, but when you use a parser they stop working parsing a malformed html. I think this question is not a duplicate. 'Cause i´m not trying to strip tags, i´m trying replace content outside the tag "script".

    – Paulo A. Costa
    Aug 27 '17 at 3:00













  • Retracted dupe link, it is merely related.

    – mickmackusa
    Aug 27 '17 at 3:56



















  • Do you absolutely need matching, or capturing groups will do?

    – Vivick
    Aug 26 '17 at 22:23











  • When you want to parse html with confidence, use an html parser not regex. SO says this over and over and over. IIRC there is even a note that the SO software pops up that says "don't use regex to parse html".

    – mickmackusa
    Aug 27 '17 at 2:48











  • @mickmackusa, but when you use a parser they stop working parsing a malformed html. I think this question is not a duplicate. 'Cause i´m not trying to strip tags, i´m trying replace content outside the tag "script".

    – Paulo A. Costa
    Aug 27 '17 at 3:00













  • Retracted dupe link, it is merely related.

    – mickmackusa
    Aug 27 '17 at 3:56

















Do you absolutely need matching, or capturing groups will do?

– Vivick
Aug 26 '17 at 22:23





Do you absolutely need matching, or capturing groups will do?

– Vivick
Aug 26 '17 at 22:23













When you want to parse html with confidence, use an html parser not regex. SO says this over and over and over. IIRC there is even a note that the SO software pops up that says "don't use regex to parse html".

– mickmackusa
Aug 27 '17 at 2:48





When you want to parse html with confidence, use an html parser not regex. SO says this over and over and over. IIRC there is even a note that the SO software pops up that says "don't use regex to parse html".

– mickmackusa
Aug 27 '17 at 2:48













@mickmackusa, but when you use a parser they stop working parsing a malformed html. I think this question is not a duplicate. 'Cause i´m not trying to strip tags, i´m trying replace content outside the tag "script".

– Paulo A. Costa
Aug 27 '17 at 3:00







@mickmackusa, but when you use a parser they stop working parsing a malformed html. I think this question is not a duplicate. 'Cause i´m not trying to strip tags, i´m trying replace content outside the tag "script".

– Paulo A. Costa
Aug 27 '17 at 3:00















Retracted dupe link, it is merely related.

– mickmackusa
Aug 27 '17 at 3:56





Retracted dupe link, it is merely related.

– mickmackusa
Aug 27 '17 at 3:56












4 Answers
4






active

oldest

votes


















1














My pattern will use (*SKIP)(*FAIL) to disqualify matched script tags and their contents.



text and simple will be match on every qualifying occurrence.



Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~



Pattern / Replacement Demo Link



Code: (Demo)



$strings=['This has no replacements',
'This simple text has no script tag',
'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
'<script language="javascript">simple simple text text</script> this text starts with a script tag'
];

$strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);

var_export($strings);


Output:



array (
0 => 'This has no replacements',
1 => 'This ***replaced*** ***replaced*** has no script tag',
2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
)





share|improve this answer

































    0














    If it's assured that script will be present then simply match with



    (.*?)<script.*</script>(.*)


    The text outside the tag will appear in submatch 1 and 2. If script is optional then do (.*?)(<script.*</script>)?(.*).






    share|improve this answer































      0














      Here is another solution



      ([ws]*)(?:<script.*?/script>)(.*)$


      and here is the demo on https://regex101.com/r/1Lthi8/1






      share|improve this answer


























      • I´m trying to replace string outside the <script></script> tag.

        – Paulo A. Costa
        Aug 26 '17 at 23:05











      • yes, this is captured in group 1 as regex101 highlighted This is simple html text

        – JBone
        Aug 26 '17 at 23:08











      • Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

        – Paulo A. Costa
        Aug 26 '17 at 23:15











      • ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

        – JBone
        Aug 26 '17 at 23:16











      • you still have questions or did this solution work?

        – JBone
        Aug 27 '17 at 12:21



















      0














      Just an fyi, as far as tags go, it is impossible to ignore a single tag

      without parsing all tags.



      You can SKIP/FAIL past html tags and invisible content.

      This will find the words you're looking for.



      '~<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</1s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>(*SKIP)(?!)|(?:text|simple)~'



      https://regex101.com/r/7ZGlvW/1



      Formated



          <
      (?:
      (?:
      (?:
      # Invisible content; end tag req'd
      ( # (1 start)
      script
      | style
      | object
      | embed
      | applet
      | noframes
      | noscript
      | noembed
      ) # (1 end)
      (?:
      s+
      (?>
      " [Ss]*? "
      | ' [Ss]*? '
      | (?:
      (?! /> )
      [^>]
      )?
      )+
      )?
      s* >
      )

      [Ss]*? </ 1 s*
      (?= > )
      )

      | (?: /? [w:]+ s* /? )
      | (?:
      [w:]+
      s+
      (?:
      " [Ss]*? "
      | ' [Ss]*? '
      | [^>]?
      )+
      s* /?
      )
      | ? [Ss]*? ?
      | (?:
      !
      (?:
      (?: DOCTYPE [Ss]*? )
      | (?: [CDATA[ [Ss]*? ]] )
      | (?: -- [Ss]*? -- )
      | (?: ATTLIST [Ss]*? )
      | (?: ENTITY [Ss]*? )
      | (?: ELEMENT [Ss]*? )
      )
      )
      )
      >
      (*SKIP)
      (?!)
      |
      (?: text | simple )




      Or, a much faster approach is to match both tags AND the text you're

      looking for.



      Matching the tags moves past them.



      If you're doing a replace, use a callback to determine what to replace.

      Group 1 is a TAG or an Invisible Content run.

      Group 3 is the words you're looking to replace.



      So, in the callback, if group 1 matched, just return group 1.

      If group 3 matched, replace with what you want to replace it with.



      The regex



      '~(<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</2s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>)|(text|simple)~'



      https://regex101.com/r/7ZGlvW/2





      This regex is comparable to how SAX and DOM parsers parse tags.

      I've posted this hundreds of times on SO.



      Here is an example of how to remove all html tags:



      https://regex101.com/r/oCVkZv/1






      share|improve this answer


























      • This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

        – Paulo A. Costa
        Aug 28 '17 at 1:22











      • @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

        – sln
        Aug 28 '17 at 22:04











      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45900099%2fregex-replace-text-outside-script-tag%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      My pattern will use (*SKIP)(*FAIL) to disqualify matched script tags and their contents.



      text and simple will be match on every qualifying occurrence.



      Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~



      Pattern / Replacement Demo Link



      Code: (Demo)



      $strings=['This has no replacements',
      'This simple text has no script tag',
      'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
      'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
      '<script language="javascript">simple simple text text</script> this text starts with a script tag'
      ];

      $strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);

      var_export($strings);


      Output:



      array (
      0 => 'This has no replacements',
      1 => 'This ***replaced*** ***replaced*** has no script tag',
      2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
      3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
      4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
      )





      share|improve this answer






























        1














        My pattern will use (*SKIP)(*FAIL) to disqualify matched script tags and their contents.



        text and simple will be match on every qualifying occurrence.



        Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~



        Pattern / Replacement Demo Link



        Code: (Demo)



        $strings=['This has no replacements',
        'This simple text has no script tag',
        'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
        'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
        '<script language="javascript">simple simple text text</script> this text starts with a script tag'
        ];

        $strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);

        var_export($strings);


        Output:



        array (
        0 => 'This has no replacements',
        1 => 'This ***replaced*** ***replaced*** has no script tag',
        2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
        3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
        4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
        )





        share|improve this answer




























          1












          1








          1







          My pattern will use (*SKIP)(*FAIL) to disqualify matched script tags and their contents.



          text and simple will be match on every qualifying occurrence.



          Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~



          Pattern / Replacement Demo Link



          Code: (Demo)



          $strings=['This has no replacements',
          'This simple text has no script tag',
          'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
          'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
          '<script language="javascript">simple simple text text</script> this text starts with a script tag'
          ];

          $strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);

          var_export($strings);


          Output:



          array (
          0 => 'This has no replacements',
          1 => 'This ***replaced*** ***replaced*** has no script tag',
          2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
          3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
          4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
          )





          share|improve this answer















          My pattern will use (*SKIP)(*FAIL) to disqualify matched script tags and their contents.



          text and simple will be match on every qualifying occurrence.



          Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~



          Pattern / Replacement Demo Link



          Code: (Demo)



          $strings=['This has no replacements',
          'This simple text has no script tag',
          'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
          'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
          '<script language="javascript">simple simple text text</script> this text starts with a script tag'
          ];

          $strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);

          var_export($strings);


          Output:



          array (
          0 => 'This has no replacements',
          1 => 'This ***replaced*** ***replaced*** has no script tag',
          2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
          3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
          4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
          )






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 29 '17 at 0:15

























          answered Aug 27 '17 at 3:23









          mickmackusamickmackusa

          22.8k103356




          22.8k103356

























              0














              If it's assured that script will be present then simply match with



              (.*?)<script.*</script>(.*)


              The text outside the tag will appear in submatch 1 and 2. If script is optional then do (.*?)(<script.*</script>)?(.*).






              share|improve this answer




























                0














                If it's assured that script will be present then simply match with



                (.*?)<script.*</script>(.*)


                The text outside the tag will appear in submatch 1 and 2. If script is optional then do (.*?)(<script.*</script>)?(.*).






                share|improve this answer


























                  0












                  0








                  0







                  If it's assured that script will be present then simply match with



                  (.*?)<script.*</script>(.*)


                  The text outside the tag will appear in submatch 1 and 2. If script is optional then do (.*?)(<script.*</script>)?(.*).






                  share|improve this answer













                  If it's assured that script will be present then simply match with



                  (.*?)<script.*</script>(.*)


                  The text outside the tag will appear in submatch 1 and 2. If script is optional then do (.*?)(<script.*</script>)?(.*).







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 26 '17 at 22:41









                  yaccyacc

                  2,17231329




                  2,17231329























                      0














                      Here is another solution



                      ([ws]*)(?:<script.*?/script>)(.*)$


                      and here is the demo on https://regex101.com/r/1Lthi8/1






                      share|improve this answer


























                      • I´m trying to replace string outside the <script></script> tag.

                        – Paulo A. Costa
                        Aug 26 '17 at 23:05











                      • yes, this is captured in group 1 as regex101 highlighted This is simple html text

                        – JBone
                        Aug 26 '17 at 23:08











                      • Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

                        – Paulo A. Costa
                        Aug 26 '17 at 23:15











                      • ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

                        – JBone
                        Aug 26 '17 at 23:16











                      • you still have questions or did this solution work?

                        – JBone
                        Aug 27 '17 at 12:21
















                      0














                      Here is another solution



                      ([ws]*)(?:<script.*?/script>)(.*)$


                      and here is the demo on https://regex101.com/r/1Lthi8/1






                      share|improve this answer


























                      • I´m trying to replace string outside the <script></script> tag.

                        – Paulo A. Costa
                        Aug 26 '17 at 23:05











                      • yes, this is captured in group 1 as regex101 highlighted This is simple html text

                        – JBone
                        Aug 26 '17 at 23:08











                      • Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

                        – Paulo A. Costa
                        Aug 26 '17 at 23:15











                      • ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

                        – JBone
                        Aug 26 '17 at 23:16











                      • you still have questions or did this solution work?

                        – JBone
                        Aug 27 '17 at 12:21














                      0












                      0








                      0







                      Here is another solution



                      ([ws]*)(?:<script.*?/script>)(.*)$


                      and here is the demo on https://regex101.com/r/1Lthi8/1






                      share|improve this answer















                      Here is another solution



                      ([ws]*)(?:<script.*?/script>)(.*)$


                      and here is the demo on https://regex101.com/r/1Lthi8/1







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Aug 26 '17 at 23:18

























                      answered Aug 26 '17 at 22:49









                      JBoneJBone

                      3811416




                      3811416













                      • I´m trying to replace string outside the <script></script> tag.

                        – Paulo A. Costa
                        Aug 26 '17 at 23:05











                      • yes, this is captured in group 1 as regex101 highlighted This is simple html text

                        – JBone
                        Aug 26 '17 at 23:08











                      • Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

                        – Paulo A. Costa
                        Aug 26 '17 at 23:15











                      • ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

                        – JBone
                        Aug 26 '17 at 23:16











                      • you still have questions or did this solution work?

                        – JBone
                        Aug 27 '17 at 12:21



















                      • I´m trying to replace string outside the <script></script> tag.

                        – Paulo A. Costa
                        Aug 26 '17 at 23:05











                      • yes, this is captured in group 1 as regex101 highlighted This is simple html text

                        – JBone
                        Aug 26 '17 at 23:08











                      • Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

                        – Paulo A. Costa
                        Aug 26 '17 at 23:15











                      • ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

                        – JBone
                        Aug 26 '17 at 23:16











                      • you still have questions or did this solution work?

                        – JBone
                        Aug 27 '17 at 12:21

















                      I´m trying to replace string outside the <script></script> tag.

                      – Paulo A. Costa
                      Aug 26 '17 at 23:05





                      I´m trying to replace string outside the <script></script> tag.

                      – Paulo A. Costa
                      Aug 26 '17 at 23:05













                      yes, this is captured in group 1 as regex101 highlighted This is simple html text

                      – JBone
                      Aug 26 '17 at 23:08





                      yes, this is captured in group 1 as regex101 highlighted This is simple html text

                      – JBone
                      Aug 26 '17 at 23:08













                      Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

                      – Paulo A. Costa
                      Aug 26 '17 at 23:15





                      Match 2 is inside the tag and the last word "text" is not being selected. and finally, this is trying to ignore all tags, not the specifc tag "script".

                      – Paulo A. Costa
                      Aug 26 '17 at 23:15













                      ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

                      – JBone
                      Aug 26 '17 at 23:16





                      ha .. I see the problem ... I missed seeing that second text. I updated my answer and the regex demo. Let me know if you still have issues/questions

                      – JBone
                      Aug 26 '17 at 23:16













                      you still have questions or did this solution work?

                      – JBone
                      Aug 27 '17 at 12:21





                      you still have questions or did this solution work?

                      – JBone
                      Aug 27 '17 at 12:21











                      0














                      Just an fyi, as far as tags go, it is impossible to ignore a single tag

                      without parsing all tags.



                      You can SKIP/FAIL past html tags and invisible content.

                      This will find the words you're looking for.



                      '~<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</1s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>(*SKIP)(?!)|(?:text|simple)~'



                      https://regex101.com/r/7ZGlvW/1



                      Formated



                          <
                      (?:
                      (?:
                      (?:
                      # Invisible content; end tag req'd
                      ( # (1 start)
                      script
                      | style
                      | object
                      | embed
                      | applet
                      | noframes
                      | noscript
                      | noembed
                      ) # (1 end)
                      (?:
                      s+
                      (?>
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | (?:
                      (?! /> )
                      [^>]
                      )?
                      )+
                      )?
                      s* >
                      )

                      [Ss]*? </ 1 s*
                      (?= > )
                      )

                      | (?: /? [w:]+ s* /? )
                      | (?:
                      [w:]+
                      s+
                      (?:
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | [^>]?
                      )+
                      s* /?
                      )
                      | ? [Ss]*? ?
                      | (?:
                      !
                      (?:
                      (?: DOCTYPE [Ss]*? )
                      | (?: [CDATA[ [Ss]*? ]] )
                      | (?: -- [Ss]*? -- )
                      | (?: ATTLIST [Ss]*? )
                      | (?: ENTITY [Ss]*? )
                      | (?: ELEMENT [Ss]*? )
                      )
                      )
                      )
                      >
                      (*SKIP)
                      (?!)
                      |
                      (?: text | simple )




                      Or, a much faster approach is to match both tags AND the text you're

                      looking for.



                      Matching the tags moves past them.



                      If you're doing a replace, use a callback to determine what to replace.

                      Group 1 is a TAG or an Invisible Content run.

                      Group 3 is the words you're looking to replace.



                      So, in the callback, if group 1 matched, just return group 1.

                      If group 3 matched, replace with what you want to replace it with.



                      The regex



                      '~(<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</2s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>)|(text|simple)~'



                      https://regex101.com/r/7ZGlvW/2





                      This regex is comparable to how SAX and DOM parsers parse tags.

                      I've posted this hundreds of times on SO.



                      Here is an example of how to remove all html tags:



                      https://regex101.com/r/oCVkZv/1






                      share|improve this answer


























                      • This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

                        – Paulo A. Costa
                        Aug 28 '17 at 1:22











                      • @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

                        – sln
                        Aug 28 '17 at 22:04
















                      0














                      Just an fyi, as far as tags go, it is impossible to ignore a single tag

                      without parsing all tags.



                      You can SKIP/FAIL past html tags and invisible content.

                      This will find the words you're looking for.



                      '~<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</1s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>(*SKIP)(?!)|(?:text|simple)~'



                      https://regex101.com/r/7ZGlvW/1



                      Formated



                          <
                      (?:
                      (?:
                      (?:
                      # Invisible content; end tag req'd
                      ( # (1 start)
                      script
                      | style
                      | object
                      | embed
                      | applet
                      | noframes
                      | noscript
                      | noembed
                      ) # (1 end)
                      (?:
                      s+
                      (?>
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | (?:
                      (?! /> )
                      [^>]
                      )?
                      )+
                      )?
                      s* >
                      )

                      [Ss]*? </ 1 s*
                      (?= > )
                      )

                      | (?: /? [w:]+ s* /? )
                      | (?:
                      [w:]+
                      s+
                      (?:
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | [^>]?
                      )+
                      s* /?
                      )
                      | ? [Ss]*? ?
                      | (?:
                      !
                      (?:
                      (?: DOCTYPE [Ss]*? )
                      | (?: [CDATA[ [Ss]*? ]] )
                      | (?: -- [Ss]*? -- )
                      | (?: ATTLIST [Ss]*? )
                      | (?: ENTITY [Ss]*? )
                      | (?: ELEMENT [Ss]*? )
                      )
                      )
                      )
                      >
                      (*SKIP)
                      (?!)
                      |
                      (?: text | simple )




                      Or, a much faster approach is to match both tags AND the text you're

                      looking for.



                      Matching the tags moves past them.



                      If you're doing a replace, use a callback to determine what to replace.

                      Group 1 is a TAG or an Invisible Content run.

                      Group 3 is the words you're looking to replace.



                      So, in the callback, if group 1 matched, just return group 1.

                      If group 3 matched, replace with what you want to replace it with.



                      The regex



                      '~(<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</2s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>)|(text|simple)~'



                      https://regex101.com/r/7ZGlvW/2





                      This regex is comparable to how SAX and DOM parsers parse tags.

                      I've posted this hundreds of times on SO.



                      Here is an example of how to remove all html tags:



                      https://regex101.com/r/oCVkZv/1






                      share|improve this answer


























                      • This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

                        – Paulo A. Costa
                        Aug 28 '17 at 1:22











                      • @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

                        – sln
                        Aug 28 '17 at 22:04














                      0












                      0








                      0







                      Just an fyi, as far as tags go, it is impossible to ignore a single tag

                      without parsing all tags.



                      You can SKIP/FAIL past html tags and invisible content.

                      This will find the words you're looking for.



                      '~<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</1s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>(*SKIP)(?!)|(?:text|simple)~'



                      https://regex101.com/r/7ZGlvW/1



                      Formated



                          <
                      (?:
                      (?:
                      (?:
                      # Invisible content; end tag req'd
                      ( # (1 start)
                      script
                      | style
                      | object
                      | embed
                      | applet
                      | noframes
                      | noscript
                      | noembed
                      ) # (1 end)
                      (?:
                      s+
                      (?>
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | (?:
                      (?! /> )
                      [^>]
                      )?
                      )+
                      )?
                      s* >
                      )

                      [Ss]*? </ 1 s*
                      (?= > )
                      )

                      | (?: /? [w:]+ s* /? )
                      | (?:
                      [w:]+
                      s+
                      (?:
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | [^>]?
                      )+
                      s* /?
                      )
                      | ? [Ss]*? ?
                      | (?:
                      !
                      (?:
                      (?: DOCTYPE [Ss]*? )
                      | (?: [CDATA[ [Ss]*? ]] )
                      | (?: -- [Ss]*? -- )
                      | (?: ATTLIST [Ss]*? )
                      | (?: ENTITY [Ss]*? )
                      | (?: ELEMENT [Ss]*? )
                      )
                      )
                      )
                      >
                      (*SKIP)
                      (?!)
                      |
                      (?: text | simple )




                      Or, a much faster approach is to match both tags AND the text you're

                      looking for.



                      Matching the tags moves past them.



                      If you're doing a replace, use a callback to determine what to replace.

                      Group 1 is a TAG or an Invisible Content run.

                      Group 3 is the words you're looking to replace.



                      So, in the callback, if group 1 matched, just return group 1.

                      If group 3 matched, replace with what you want to replace it with.



                      The regex



                      '~(<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</2s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>)|(text|simple)~'



                      https://regex101.com/r/7ZGlvW/2





                      This regex is comparable to how SAX and DOM parsers parse tags.

                      I've posted this hundreds of times on SO.



                      Here is an example of how to remove all html tags:



                      https://regex101.com/r/oCVkZv/1






                      share|improve this answer















                      Just an fyi, as far as tags go, it is impossible to ignore a single tag

                      without parsing all tags.



                      You can SKIP/FAIL past html tags and invisible content.

                      This will find the words you're looking for.



                      '~<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</1s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>(*SKIP)(?!)|(?:text|simple)~'



                      https://regex101.com/r/7ZGlvW/1



                      Formated



                          <
                      (?:
                      (?:
                      (?:
                      # Invisible content; end tag req'd
                      ( # (1 start)
                      script
                      | style
                      | object
                      | embed
                      | applet
                      | noframes
                      | noscript
                      | noembed
                      ) # (1 end)
                      (?:
                      s+
                      (?>
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | (?:
                      (?! /> )
                      [^>]
                      )?
                      )+
                      )?
                      s* >
                      )

                      [Ss]*? </ 1 s*
                      (?= > )
                      )

                      | (?: /? [w:]+ s* /? )
                      | (?:
                      [w:]+
                      s+
                      (?:
                      " [Ss]*? "
                      | ' [Ss]*? '
                      | [^>]?
                      )+
                      s* /?
                      )
                      | ? [Ss]*? ?
                      | (?:
                      !
                      (?:
                      (?: DOCTYPE [Ss]*? )
                      | (?: [CDATA[ [Ss]*? ]] )
                      | (?: -- [Ss]*? -- )
                      | (?: ATTLIST [Ss]*? )
                      | (?: ENTITY [Ss]*? )
                      | (?: ELEMENT [Ss]*? )
                      )
                      )
                      )
                      >
                      (*SKIP)
                      (?!)
                      |
                      (?: text | simple )




                      Or, a much faster approach is to match both tags AND the text you're

                      looking for.



                      Matching the tags moves past them.



                      If you're doing a replace, use a callback to determine what to replace.

                      Group 1 is a TAG or an Invisible Content run.

                      Group 3 is the words you're looking to replace.



                      So, in the callback, if group 1 matched, just return group 1.

                      If group 3 matched, replace with what you want to replace it with.



                      The regex



                      '~(<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:s+(?>"[Ss]*?"|'[Ss]*?'|(?:(?!/>)[^>])?)+)?s*>)[Ss]*?</2s*(?=>))|(?:/?[w:]+s*/?)|(?:[w:]+s+(?:"[Ss]*?"|'[Ss]*?'|[^>]?)+s*/?)|?[Ss]*??|(?:!(?:(?:DOCTYPE[Ss]*?)|(?:[CDATA[[Ss]*?]])|(?:--[Ss]*?--)|(?:ATTLIST[Ss]*?)|(?:ENTITY[Ss]*?)|(?:ELEMENT[Ss]*?))))>)|(text|simple)~'



                      https://regex101.com/r/7ZGlvW/2





                      This regex is comparable to how SAX and DOM parsers parse tags.

                      I've posted this hundreds of times on SO.



                      Here is an example of how to remove all html tags:



                      https://regex101.com/r/oCVkZv/1







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Aug 27 '17 at 0:58

























                      answered Aug 27 '17 at 0:26









                      slnsln

                      26.4k31636




                      26.4k31636













                      • This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

                        – Paulo A. Costa
                        Aug 28 '17 at 1:22











                      • @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

                        – sln
                        Aug 28 '17 at 22:04



















                      • This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

                        – Paulo A. Costa
                        Aug 28 '17 at 1:22











                      • @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

                        – sln
                        Aug 28 '17 at 22:04

















                      This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

                      – Paulo A. Costa
                      Aug 28 '17 at 1:22





                      This regEx works fine, but use a lot of memory, causing the error: Firefox: The connection was reset Chrome: (net::ERR_CONNECTION_RESET): The connection was reset. IE: Internet Explorer cannot display the webpage

                      – Paulo A. Costa
                      Aug 28 '17 at 1:22













                      @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

                      – sln
                      Aug 28 '17 at 22:04





                      @PauloACosta - I see you've accepted a skip/fail answer as I originally posted. But, as I said it is impossible to ignore a single tag without parsing all tags. And using skip/fail with my regex will be slower. Where you get that MEMORY problem is not from the regex. Otherwise, for speed, I said not to use skip/fail and instead just match both tags and text you need using my later regex. You made the wrong choice in an answer. That's too bad...

                      – sln
                      Aug 28 '17 at 22:04


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f45900099%2fregex-replace-text-outside-script-tag%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How fix org.hibernate.TransientPropertyValueException

                      Updating UILabel text programmatically using a function

                      Cloud Functions - OpenCV Videocapture Read method fails for larger files from cloud storage