How does mapping on an rdd work in pyspark?

I was learning pyspark when I encounterd this.

from pyspark.sql import Row

df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),

                           Row([0,0,0,85,0,69,0]),

                           Row([0,89,56,0,0,0,0])],

                           ['features'])



+--------------------+

|            features|

+--------------------+ 

|[0, 45, 63, 0, 0,...|

|[0, 0, 0, 85, 0, ...|

|[0, 89, 56, 0, 0,...|

+--------------------+



sample = df.rdd.map(lambda row: row[0]*2)

sample.collect()



[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],

[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],

[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]

My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output

asked 19 hours ago

Shilpa

New contributor

add a comment |

I was learning pyspark when I encounterd this.

from pyspark.sql import Row

df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),

                           Row([0,0,0,85,0,69,0]),

                           Row([0,89,56,0,0,0,0])],

                           ['features'])



+--------------------+

|            features|

+--------------------+ 

|[0, 45, 63, 0, 0,...|

|[0, 0, 0, 85, 0, ...|

|[0, 89, 56, 0, 0,...|

+--------------------+



sample = df.rdd.map(lambda row: row[0]*2)

sample.collect()



[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],

[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],

[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]

My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output

asked 19 hours ago

Shilpa

New contributor

add a comment |

I was learning pyspark when I encounterd this.

from pyspark.sql import Row

df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),

                           Row([0,0,0,85,0,69,0]),

                           Row([0,89,56,0,0,0,0])],

                           ['features'])



+--------------------+

|            features|

+--------------------+ 

|[0, 45, 63, 0, 0,...|

|[0, 0, 0, 85, 0, ...|

|[0, 89, 56, 0, 0,...|

+--------------------+



sample = df.rdd.map(lambda row: row[0]*2)

sample.collect()



[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],

[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],

[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]

My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output

asked 19 hours ago

Shilpa

New contributor

I was learning pyspark when I encounterd this.

from pyspark.sql import Row

df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),

                           Row([0,0,0,85,0,69,0]),

                           Row([0,89,56,0,0,0,0])],

                           ['features'])



+--------------------+

|            features|

+--------------------+ 

|[0, 45, 63, 0, 0,...|

|[0, 0, 0, 85, 0, ...|

|[0, 89, 56, 0, 0,...|

+--------------------+



sample = df.rdd.map(lambda row: row[0]*2)

sample.collect()



[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],

[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],

[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]

My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output

pyspark apache-spark-sql rdd

asked 19 hours ago

Shilpa

New contributor

asked 19 hours ago

Shilpa

New contributor

asked 19 hours ago

Shilpa

New contributor

asked 19 hours ago

Shilpa

asked 19 hours ago

Shilpa

New contributor

Shilpa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"

when You are saying

df.rdd.map(lambda row: row[0]*2)

You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.

Now How to get Individual values in list.

df = spark.createDataFrame([Row(0,45,63,0,0,0,0),

                       Row(0,0,0,85,0,69,0),

                       Row(0,89,56,0,0,0,0)],

                       ['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])

This should give you access to individual values in a dedicated column.

Note : syntax for schema is just representation. please refer spark docs for exact syntax.

Hope This helps :)

answered 16 hours ago

Harjeet Kumar

3115

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Shilpa is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54248789%2fhow-does-mapping-on-an-rdd-work-in-pyspark%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"

when You are saying

df.rdd.map(lambda row: row[0]*2)

You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.

Now How to get Individual values in list.

df = spark.createDataFrame([Row(0,45,63,0,0,0,0),

                       Row(0,0,0,85,0,69,0),

                       Row(0,89,56,0,0,0,0)],

                       ['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])

This should give you access to individual values in a dedicated column.

Note : syntax for schema is just representation. please refer spark docs for exact syntax.

Hope This helps :)

answered 16 hours ago

Harjeet Kumar

3115

add a comment |

It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"

when You are saying

df.rdd.map(lambda row: row[0]*2)

You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.

Now How to get Individual values in list.

df = spark.createDataFrame([Row(0,45,63,0,0,0,0),

                       Row(0,0,0,85,0,69,0),

                       Row(0,89,56,0,0,0,0)],

                       ['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])

This should give you access to individual values in a dedicated column.

Note : syntax for schema is just representation. please refer spark docs for exact syntax.

Hope This helps :)

answered 16 hours ago

Harjeet Kumar

3115

add a comment |

It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"

when You are saying

df.rdd.map(lambda row: row[0]*2)

You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.

Now How to get Individual values in list.

df = spark.createDataFrame([Row(0,45,63,0,0,0,0),

                       Row(0,0,0,85,0,69,0),

                       Row(0,89,56,0,0,0,0)],

                       ['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])

This should give you access to individual values in a dedicated column.

Note : syntax for schema is just representation. please refer spark docs for exact syntax.

Hope This helps :)

answered 16 hours ago

Harjeet Kumar

3115

It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"

when You are saying

df.rdd.map(lambda row: row[0]*2)

You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.

Now How to get Individual values in list.

df = spark.createDataFrame([Row(0,45,63,0,0,0,0),

                       Row(0,0,0,85,0,69,0),

                       Row(0,89,56,0,0,0,0)],

                       ['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])

This should give you access to individual values in a dedicated column.

Note : syntax for schema is just representation. please refer spark docs for exact syntax.

Hope This helps :)

answered 16 hours ago

Harjeet Kumar

3115

answered 16 hours ago

Harjeet Kumar

3115

answered 16 hours ago

Harjeet Kumar

3115

answered 16 hours ago

Harjeet Kumar

3115

add a comment |

Shilpa is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Shilpa is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku