How does mapping on an rdd work in pyspark?
I was learning pyspark when I encounterd this.
from pyspark.sql import Row
df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),
Row([0,0,0,85,0,69,0]),
Row([0,89,56,0,0,0,0])],
['features'])
+--------------------+
| features|
+--------------------+
|[0, 45, 63, 0, 0,...|
|[0, 0, 0, 85, 0, ...|
|[0, 89, 56, 0, 0,...|
+--------------------+
sample = df.rdd.map(lambda row: row[0]*2)
sample.collect()
[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],
[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],
[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]
My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output
pyspark apache-spark-sql rdd
New contributor
add a comment |
I was learning pyspark when I encounterd this.
from pyspark.sql import Row
df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),
Row([0,0,0,85,0,69,0]),
Row([0,89,56,0,0,0,0])],
['features'])
+--------------------+
| features|
+--------------------+
|[0, 45, 63, 0, 0,...|
|[0, 0, 0, 85, 0, ...|
|[0, 89, 56, 0, 0,...|
+--------------------+
sample = df.rdd.map(lambda row: row[0]*2)
sample.collect()
[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],
[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],
[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]
My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output
pyspark apache-spark-sql rdd
New contributor
add a comment |
I was learning pyspark when I encounterd this.
from pyspark.sql import Row
df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),
Row([0,0,0,85,0,69,0]),
Row([0,89,56,0,0,0,0])],
['features'])
+--------------------+
| features|
+--------------------+
|[0, 45, 63, 0, 0,...|
|[0, 0, 0, 85, 0, ...|
|[0, 89, 56, 0, 0,...|
+--------------------+
sample = df.rdd.map(lambda row: row[0]*2)
sample.collect()
[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],
[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],
[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]
My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output
pyspark apache-spark-sql rdd
New contributor
I was learning pyspark when I encounterd this.
from pyspark.sql import Row
df = spark.createDataFrame([Row([0,45,63,0,0,0,0]),
Row([0,0,0,85,0,69,0]),
Row([0,89,56,0,0,0,0])],
['features'])
+--------------------+
| features|
+--------------------+
|[0, 45, 63, 0, 0,...|
|[0, 0, 0, 85, 0, ...|
|[0, 89, 56, 0, 0,...|
+--------------------+
sample = df.rdd.map(lambda row: row[0]*2)
sample.collect()
[[0, 45, 63, 0, 0, 0, 0, 0, 45, 63, 0, 0, 0, 0],
[0, 0, 0, 85, 0, 69, 0, 0, 0, 0, 85, 0, 69, 0],
[0, 89, 56, 0, 0, 0, 0, 0, 89, 56, 0, 0, 0, 0]]
My question is why is row[0] is taken as a complete list rather than one value?
What is the property that gives the above output
pyspark apache-spark-sql rdd
pyspark apache-spark-sql rdd
New contributor
New contributor
New contributor
asked 19 hours ago
ShilpaShilpa
62
62
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"
when You are saying
df.rdd.map(lambda row: row[0]*2)
You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.
Now How to get Individual values in list.
df = spark.createDataFrame([Row(0,45,63,0,0,0,0),
Row(0,0,0,85,0,69,0),
Row(0,89,56,0,0,0,0)],
['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])
This should give you access to individual values in a dedicated column.
Note : syntax for schema is just representation. please refer spark docs for exact syntax.
Hope This helps :)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Shilpa is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54248789%2fhow-does-mapping-on-an-rdd-work-in-pyspark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"
when You are saying
df.rdd.map(lambda row: row[0]*2)
You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.
Now How to get Individual values in list.
df = spark.createDataFrame([Row(0,45,63,0,0,0,0),
Row(0,0,0,85,0,69,0),
Row(0,89,56,0,0,0,0)],
['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])
This should give you access to individual values in a dedicated column.
Note : syntax for schema is just representation. please refer spark docs for exact syntax.
Hope This helps :)
add a comment |
It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"
when You are saying
df.rdd.map(lambda row: row[0]*2)
You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.
Now How to get Individual values in list.
df = spark.createDataFrame([Row(0,45,63,0,0,0,0),
Row(0,0,0,85,0,69,0),
Row(0,89,56,0,0,0,0)],
['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])
This should give you access to individual values in a dedicated column.
Note : syntax for schema is just representation. please refer spark docs for exact syntax.
Hope This helps :)
add a comment |
It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"
when You are saying
df.rdd.map(lambda row: row[0]*2)
You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.
Now How to get Individual values in list.
df = spark.createDataFrame([Row(0,45,63,0,0,0,0),
Row(0,0,0,85,0,69,0),
Row(0,89,56,0,0,0,0)],
['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])
This should give you access to individual values in a dedicated column.
Note : syntax for schema is just representation. please refer spark docs for exact syntax.
Hope This helps :)
It is Taken as Complete list as you have given it as one, and you have also defined it under one column "features"
when You are saying
df.rdd.map(lambda row: row[0]*2)
You are just asking spark that "I want all values in this list to occur twice". Hence you get the output that you are getting.
Now How to get Individual values in list.
df = spark.createDataFrame([Row(0,45,63,0,0,0,0),
Row(0,0,0,85,0,69,0),
Row(0,89,56,0,0,0,0)],
['feature1' , 'feature2' , 'feature3' , 'feature4', 'feature5' , 'feature6' , 'feature7'])
This should give you access to individual values in a dedicated column.
Note : syntax for schema is just representation. please refer spark docs for exact syntax.
Hope This helps :)
answered 16 hours ago
Harjeet KumarHarjeet Kumar
3115
3115
add a comment |
add a comment |
Shilpa is a new contributor. Be nice, and check out our Code of Conduct.
Shilpa is a new contributor. Be nice, and check out our Code of Conduct.
Shilpa is a new contributor. Be nice, and check out our Code of Conduct.
Shilpa is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54248789%2fhow-does-mapping-on-an-rdd-work-in-pyspark%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown