What's the most efficient way to normalize text from column into a table?
In T-SQL I have a column with some text in it with a format like the following:
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
where there can be any number of bracket sets, but usually between 3 and 6. I'm looking for a way to quickly format them into a temp table or table variable so I can report on the data. For example, I'd want the table format to be:
|Key|Column 1|Column 2|Column 3|Column 4|Column 5|
|Key 1|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 2|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 3|Value 1|Value 2|Value 3|Value 4|Value 5|
I know this is pushing the limits of SQL and should be handled through modification of the application, but I'm hoping there's something clever I can do with T-SQL for now.
sql sql-server tsql split
add a comment |
In T-SQL I have a column with some text in it with a format like the following:
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
where there can be any number of bracket sets, but usually between 3 and 6. I'm looking for a way to quickly format them into a temp table or table variable so I can report on the data. For example, I'd want the table format to be:
|Key|Column 1|Column 2|Column 3|Column 4|Column 5|
|Key 1|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 2|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 3|Value 1|Value 2|Value 3|Value 4|Value 5|
I know this is pushing the limits of SQL and should be handled through modification of the application, but I'm hoping there's something clever I can do with T-SQL for now.
sql sql-server tsql split
What are you using for your database? SQLServer Oracle, other? It will determine what options you have.
– Brad
Apr 27 '18 at 13:28
SQL Server - I believe 2014.
– Shawn Hubbard
Apr 27 '18 at 14:04
add a comment |
In T-SQL I have a column with some text in it with a format like the following:
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
where there can be any number of bracket sets, but usually between 3 and 6. I'm looking for a way to quickly format them into a temp table or table variable so I can report on the data. For example, I'd want the table format to be:
|Key|Column 1|Column 2|Column 3|Column 4|Column 5|
|Key 1|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 2|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 3|Value 1|Value 2|Value 3|Value 4|Value 5|
I know this is pushing the limits of SQL and should be handled through modification of the application, but I'm hoping there's something clever I can do with T-SQL for now.
sql sql-server tsql split
In T-SQL I have a column with some text in it with a format like the following:
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
where there can be any number of bracket sets, but usually between 3 and 6. I'm looking for a way to quickly format them into a temp table or table variable so I can report on the data. For example, I'd want the table format to be:
|Key|Column 1|Column 2|Column 3|Column 4|Column 5|
|Key 1|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 2|Value 1|Value 2|Value 3|Value 4|Value 5|
|Key 3|Value 1|Value 2|Value 3|Value 4|Value 5|
I know this is pushing the limits of SQL and should be handled through modification of the application, but I'm hoping there's something clever I can do with T-SQL for now.
sql sql-server tsql split
sql sql-server tsql split
edited Jan 18 at 23:30
John Cappelletti
45.9k62546
45.9k62546
asked Apr 27 '18 at 13:24
Shawn HubbardShawn Hubbard
605822
605822
What are you using for your database? SQLServer Oracle, other? It will determine what options you have.
– Brad
Apr 27 '18 at 13:28
SQL Server - I believe 2014.
– Shawn Hubbard
Apr 27 '18 at 14:04
add a comment |
What are you using for your database? SQLServer Oracle, other? It will determine what options you have.
– Brad
Apr 27 '18 at 13:28
SQL Server - I believe 2014.
– Shawn Hubbard
Apr 27 '18 at 14:04
What are you using for your database? SQLServer Oracle, other? It will determine what options you have.
– Brad
Apr 27 '18 at 13:28
What are you using for your database? SQLServer Oracle, other? It will determine what options you have.
– Brad
Apr 27 '18 at 13:28
SQL Server - I believe 2014.
– Shawn Hubbard
Apr 27 '18 at 14:04
SQL Server - I believe 2014.
– Shawn Hubbard
Apr 27 '18 at 14:04
add a comment |
2 Answers
2
active
oldest
votes
If you have a maximum number of columns, a little XML within a CROSS APPLY.
If unknown, you would have to go DYNAMIC.
Example
Declare @YourTable Table ([ID] varchar(50),[SomeCol] varchar(50))
Insert Into @YourTable Values
(1,'[Key1:Value1:Value2:Value3:Value4:Value5]')
,(2,'[Key2:Value1:Value2:Value3:Value4:Value5]')
,(3,'[Key3:Value1:Value2:Value3:Value4:Value5]')
Select A.ID
,B.*
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(SomeCol,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
ID Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
1 Key1 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
2 Key2 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
3 Key3 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
EDIT
I should add, the ltrim(rtrim(...))
is optional and the varchar(max)
is my demonstrative default.
EDIT - One String delimited with CRLF
Declare @S varchar(max)='
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
'
Select B.*
From (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace(@S,char(13)+char(10),'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(RetVal,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Where A.RetVal is not null
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
add a comment |
The fastest way to split a string when you know the maximum number of columns is to use the Cascading CROSS APPLY technique. Let's say you know that their will be no more than 10 items in your string. You could do this:
DECLARE @string varchar(1000) = '[Key1:Value1:Value2:Value3:Value4:Value5]'
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d5.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d5.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d5.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d5.d-1)
FROM (VALUES (REPLACE(REPLACE(@string,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d);
To use this technique against a table with the strings stored in rows would be like this:
DECLARE @table TABLE (someid int identity, somestring varchar(1000));
INSERT @table(somestring) VALUES
('[Key1:Value1:Value2:Value3:Value4:Value5]'),
('[Key2:Value1:Value2:Value3:Value4:Value5]'),
('[Key3:Value1:Value2:Value3:Value4:Value5]'),
('[Key4:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8]'),
('[Key5:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8:Value9:Value10]');
SELECT *
FROM @table s
CROSS APPLY
(
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
dCount = LEN(t.string)-LEN(REPLACE(t.string,':','')),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d6.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d7.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d8.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d9.d-1)
FROM (VALUES (REPLACE(REPLACE(s.somestring,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d)
) split
WHERE LEN(s.somestring)-LEN(REPLACE(s.somestring,':','')) < 10
If you don't know the maximum number of possible items you could take this logic and wrap it in some Dynamic SQL that creates the correct number of CROSS APPLY's. I don't have time to put together that logic but, to get the maximum number of possible delimiters you could do something like this:
DECLARE @maxDelimiters tinyint =
(SELECT MAX(LEN(s.somestring)-LEN(REPLACE(s.somestring,':',''))) FROM @table s);
Alternatively, if you wanted to use John's technique, you could also use Dynamic SQL to create his query with the exact number of "pos" values required.
1
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
1
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
1
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50063415%2fwhats-the-most-efficient-way-to-normalize-text-from-column-into-a-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you have a maximum number of columns, a little XML within a CROSS APPLY.
If unknown, you would have to go DYNAMIC.
Example
Declare @YourTable Table ([ID] varchar(50),[SomeCol] varchar(50))
Insert Into @YourTable Values
(1,'[Key1:Value1:Value2:Value3:Value4:Value5]')
,(2,'[Key2:Value1:Value2:Value3:Value4:Value5]')
,(3,'[Key3:Value1:Value2:Value3:Value4:Value5]')
Select A.ID
,B.*
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(SomeCol,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
ID Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
1 Key1 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
2 Key2 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
3 Key3 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
EDIT
I should add, the ltrim(rtrim(...))
is optional and the varchar(max)
is my demonstrative default.
EDIT - One String delimited with CRLF
Declare @S varchar(max)='
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
'
Select B.*
From (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace(@S,char(13)+char(10),'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(RetVal,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Where A.RetVal is not null
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
add a comment |
If you have a maximum number of columns, a little XML within a CROSS APPLY.
If unknown, you would have to go DYNAMIC.
Example
Declare @YourTable Table ([ID] varchar(50),[SomeCol] varchar(50))
Insert Into @YourTable Values
(1,'[Key1:Value1:Value2:Value3:Value4:Value5]')
,(2,'[Key2:Value1:Value2:Value3:Value4:Value5]')
,(3,'[Key3:Value1:Value2:Value3:Value4:Value5]')
Select A.ID
,B.*
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(SomeCol,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
ID Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
1 Key1 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
2 Key2 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
3 Key3 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
EDIT
I should add, the ltrim(rtrim(...))
is optional and the varchar(max)
is my demonstrative default.
EDIT - One String delimited with CRLF
Declare @S varchar(max)='
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
'
Select B.*
From (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace(@S,char(13)+char(10),'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(RetVal,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Where A.RetVal is not null
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
add a comment |
If you have a maximum number of columns, a little XML within a CROSS APPLY.
If unknown, you would have to go DYNAMIC.
Example
Declare @YourTable Table ([ID] varchar(50),[SomeCol] varchar(50))
Insert Into @YourTable Values
(1,'[Key1:Value1:Value2:Value3:Value4:Value5]')
,(2,'[Key2:Value1:Value2:Value3:Value4:Value5]')
,(3,'[Key3:Value1:Value2:Value3:Value4:Value5]')
Select A.ID
,B.*
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(SomeCol,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
ID Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
1 Key1 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
2 Key2 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
3 Key3 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
EDIT
I should add, the ltrim(rtrim(...))
is optional and the varchar(max)
is my demonstrative default.
EDIT - One String delimited with CRLF
Declare @S varchar(max)='
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
'
Select B.*
From (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace(@S,char(13)+char(10),'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(RetVal,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Where A.RetVal is not null
If you have a maximum number of columns, a little XML within a CROSS APPLY.
If unknown, you would have to go DYNAMIC.
Example
Declare @YourTable Table ([ID] varchar(50),[SomeCol] varchar(50))
Insert Into @YourTable Values
(1,'[Key1:Value1:Value2:Value3:Value4:Value5]')
,(2,'[Key2:Value1:Value2:Value3:Value4:Value5]')
,(3,'[Key3:Value1:Value2:Value3:Value4:Value5]')
Select A.ID
,B.*
From @YourTable A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(SomeCol,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Returns
ID Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9
1 Key1 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
2 Key2 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
3 Key3 Value1 Value2 Value3 Value4 Value5 NULL NULL NULL
EDIT
I should add, the ltrim(rtrim(...))
is optional and the varchar(max)
is my demonstrative default.
EDIT - One String delimited with CRLF
Declare @S varchar(max)='
[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:Value5]
[Key3:Value1:Value2:Value3:Value4:Value5]
'
Select B.*
From (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace(@S,char(13)+char(10),'</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) A
Cross Apply (
Select Pos1 = ltrim(rtrim(xDim.value('/x[1]','varchar(max)')))
,Pos2 = ltrim(rtrim(xDim.value('/x[2]','varchar(max)')))
,Pos3 = ltrim(rtrim(xDim.value('/x[3]','varchar(max)')))
,Pos4 = ltrim(rtrim(xDim.value('/x[4]','varchar(max)')))
,Pos5 = ltrim(rtrim(xDim.value('/x[5]','varchar(max)')))
,Pos6 = ltrim(rtrim(xDim.value('/x[6]','varchar(max)')))
,Pos7 = ltrim(rtrim(xDim.value('/x[7]','varchar(max)')))
,Pos8 = ltrim(rtrim(xDim.value('/x[8]','varchar(max)')))
,Pos9 = ltrim(rtrim(xDim.value('/x[9]','varchar(max)')))
From (Select Cast('<x>' + replace(replace(replace(RetVal,'[',''),']',''),':','</x><x>')+'</x>' as xml) as xDim) as A
) B
Where A.RetVal is not null
edited Apr 28 '18 at 13:32
answered Apr 27 '18 at 14:10
John CappellettiJohn Cappelletti
45.9k62546
45.9k62546
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
add a comment |
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
This almost gives me what I need. The input is one long string so i still need to split each set of data into a set I can operate on here. I think this will get me there though - I'm going to tinker with it and report back.
– Shawn Hubbard
Apr 28 '18 at 13:24
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard Easy fix, I wasn't clear that that was one long string. give me a moment
– John Cappelletti
Apr 28 '18 at 13:26
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
@ShawnHubbard See second edit
– John Cappelletti
Apr 28 '18 at 13:32
add a comment |
The fastest way to split a string when you know the maximum number of columns is to use the Cascading CROSS APPLY technique. Let's say you know that their will be no more than 10 items in your string. You could do this:
DECLARE @string varchar(1000) = '[Key1:Value1:Value2:Value3:Value4:Value5]'
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d5.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d5.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d5.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d5.d-1)
FROM (VALUES (REPLACE(REPLACE(@string,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d);
To use this technique against a table with the strings stored in rows would be like this:
DECLARE @table TABLE (someid int identity, somestring varchar(1000));
INSERT @table(somestring) VALUES
('[Key1:Value1:Value2:Value3:Value4:Value5]'),
('[Key2:Value1:Value2:Value3:Value4:Value5]'),
('[Key3:Value1:Value2:Value3:Value4:Value5]'),
('[Key4:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8]'),
('[Key5:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8:Value9:Value10]');
SELECT *
FROM @table s
CROSS APPLY
(
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
dCount = LEN(t.string)-LEN(REPLACE(t.string,':','')),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d6.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d7.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d8.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d9.d-1)
FROM (VALUES (REPLACE(REPLACE(s.somestring,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d)
) split
WHERE LEN(s.somestring)-LEN(REPLACE(s.somestring,':','')) < 10
If you don't know the maximum number of possible items you could take this logic and wrap it in some Dynamic SQL that creates the correct number of CROSS APPLY's. I don't have time to put together that logic but, to get the maximum number of possible delimiters you could do something like this:
DECLARE @maxDelimiters tinyint =
(SELECT MAX(LEN(s.somestring)-LEN(REPLACE(s.somestring,':',''))) FROM @table s);
Alternatively, if you wanted to use John's technique, you could also use Dynamic SQL to create his query with the exact number of "pos" values required.
1
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
1
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
1
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
|
show 1 more comment
The fastest way to split a string when you know the maximum number of columns is to use the Cascading CROSS APPLY technique. Let's say you know that their will be no more than 10 items in your string. You could do this:
DECLARE @string varchar(1000) = '[Key1:Value1:Value2:Value3:Value4:Value5]'
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d5.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d5.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d5.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d5.d-1)
FROM (VALUES (REPLACE(REPLACE(@string,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d);
To use this technique against a table with the strings stored in rows would be like this:
DECLARE @table TABLE (someid int identity, somestring varchar(1000));
INSERT @table(somestring) VALUES
('[Key1:Value1:Value2:Value3:Value4:Value5]'),
('[Key2:Value1:Value2:Value3:Value4:Value5]'),
('[Key3:Value1:Value2:Value3:Value4:Value5]'),
('[Key4:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8]'),
('[Key5:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8:Value9:Value10]');
SELECT *
FROM @table s
CROSS APPLY
(
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
dCount = LEN(t.string)-LEN(REPLACE(t.string,':','')),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d6.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d7.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d8.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d9.d-1)
FROM (VALUES (REPLACE(REPLACE(s.somestring,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d)
) split
WHERE LEN(s.somestring)-LEN(REPLACE(s.somestring,':','')) < 10
If you don't know the maximum number of possible items you could take this logic and wrap it in some Dynamic SQL that creates the correct number of CROSS APPLY's. I don't have time to put together that logic but, to get the maximum number of possible delimiters you could do something like this:
DECLARE @maxDelimiters tinyint =
(SELECT MAX(LEN(s.somestring)-LEN(REPLACE(s.somestring,':',''))) FROM @table s);
Alternatively, if you wanted to use John's technique, you could also use Dynamic SQL to create his query with the exact number of "pos" values required.
1
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
1
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
1
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
|
show 1 more comment
The fastest way to split a string when you know the maximum number of columns is to use the Cascading CROSS APPLY technique. Let's say you know that their will be no more than 10 items in your string. You could do this:
DECLARE @string varchar(1000) = '[Key1:Value1:Value2:Value3:Value4:Value5]'
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d5.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d5.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d5.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d5.d-1)
FROM (VALUES (REPLACE(REPLACE(@string,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d);
To use this technique against a table with the strings stored in rows would be like this:
DECLARE @table TABLE (someid int identity, somestring varchar(1000));
INSERT @table(somestring) VALUES
('[Key1:Value1:Value2:Value3:Value4:Value5]'),
('[Key2:Value1:Value2:Value3:Value4:Value5]'),
('[Key3:Value1:Value2:Value3:Value4:Value5]'),
('[Key4:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8]'),
('[Key5:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8:Value9:Value10]');
SELECT *
FROM @table s
CROSS APPLY
(
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
dCount = LEN(t.string)-LEN(REPLACE(t.string,':','')),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d6.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d7.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d8.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d9.d-1)
FROM (VALUES (REPLACE(REPLACE(s.somestring,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d)
) split
WHERE LEN(s.somestring)-LEN(REPLACE(s.somestring,':','')) < 10
If you don't know the maximum number of possible items you could take this logic and wrap it in some Dynamic SQL that creates the correct number of CROSS APPLY's. I don't have time to put together that logic but, to get the maximum number of possible delimiters you could do something like this:
DECLARE @maxDelimiters tinyint =
(SELECT MAX(LEN(s.somestring)-LEN(REPLACE(s.somestring,':',''))) FROM @table s);
Alternatively, if you wanted to use John's technique, you could also use Dynamic SQL to create his query with the exact number of "pos" values required.
The fastest way to split a string when you know the maximum number of columns is to use the Cascading CROSS APPLY technique. Let's say you know that their will be no more than 10 items in your string. You could do this:
DECLARE @string varchar(1000) = '[Key1:Value1:Value2:Value3:Value4:Value5]'
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d5.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d5.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d5.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d5.d-1)
FROM (VALUES (REPLACE(REPLACE(@string,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d);
To use this technique against a table with the strings stored in rows would be like this:
DECLARE @table TABLE (someid int identity, somestring varchar(1000));
INSERT @table(somestring) VALUES
('[Key1:Value1:Value2:Value3:Value4:Value5]'),
('[Key2:Value1:Value2:Value3:Value4:Value5]'),
('[Key3:Value1:Value2:Value3:Value4:Value5]'),
('[Key4:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8]'),
('[Key5:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8:Value9:Value10]');
SELECT *
FROM @table s
CROSS APPLY
(
SELECT
[key] = SUBSTRING(t.string,1,d1.d-1),
dCount = LEN(t.string)-LEN(REPLACE(t.string,':','')),
col1 = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
col2 = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
col3 = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
col4 = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
col5 = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
col6 = SUBSTRING(t.string,d6.d+1,d7.d-d6.d-1),
col7 = SUBSTRING(t.string,d7.d+1,d8.d-d7.d-1),
col8 = SUBSTRING(t.string,d8.d+1,d9.d-d8.d-1),
col9 = SUBSTRING(t.string,d9.d+1,d10.d-d9.d-1)
FROM (VALUES (REPLACE(REPLACE(s.somestring,']',':'),'[',''))) t(string)
CROSS APPLY (VALUES (CHARINDEX(':',t.string))) d1(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0))) d2(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0))) d3(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0))) d4(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0))) d5(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0))) d6(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0))) d7(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0))) d8(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0))) d9(d)
CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0))) d10(d)
) split
WHERE LEN(s.somestring)-LEN(REPLACE(s.somestring,':','')) < 10
If you don't know the maximum number of possible items you could take this logic and wrap it in some Dynamic SQL that creates the correct number of CROSS APPLY's. I don't have time to put together that logic but, to get the maximum number of possible delimiters you could do something like this:
DECLARE @maxDelimiters tinyint =
(SELECT MAX(LEN(s.somestring)-LEN(REPLACE(s.somestring,':',''))) FROM @table s);
Alternatively, if you wanted to use John's technique, you could also use Dynamic SQL to create his query with the exact number of "pos" values required.
answered Apr 27 '18 at 15:41
Alan BursteinAlan Burstein
3,8071713
3,8071713
1
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
1
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
1
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
|
show 1 more comment
1
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
1
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
1
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
1
1
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
You've got me curious now. I'll run some benchmarks in a bit and let you know. +1
– John Cappelletti
Apr 27 '18 at 15:46
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
Working with a sample of 5,000 records, 5 runs each. Mine had an average of 963ms (953 min 983 max). Yours had an average of 7,635ms (7,566 min 7,773 max)
– John Cappelletti
Apr 27 '18 at 16:14
1
1
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
Thanks John... Are you running it with show execution plan on? For me that slows it to a crawl. The other problem could be the how I'm removing the brackets in the first apply, sometimes derived values slow things down a bit. I'm going to play around with this a little and post some updated code.
– Alan Burstein
Apr 27 '18 at 16:37
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
@JohnCappelletti Please post the compare code and sample data somewhere and leave a link in the comments. I'm also curious now...
– Zohar Peled
Apr 28 '18 at 4:34
1
1
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
@ZoharPeled on dbfiddle.uk/… slightly different than my machine, but still a dramatic diff 1.5 sec vs 5.7 sec
– John Cappelletti
Apr 28 '18 at 11:27
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f50063415%2fwhats-the-most-efficient-way-to-normalize-text-from-column-into-a-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What are you using for your database? SQLServer Oracle, other? It will determine what options you have.
– Brad
Apr 27 '18 at 13:28
SQL Server - I believe 2014.
– Shawn Hubbard
Apr 27 '18 at 14:04