resample before pct_change() and missing values

I have a dataframe:

import pandas as pd

df = pd.DataFrame([['A', 'G1', '2019-01-01', 11],

             ['A', 'G1', '2019-01-02', 12], 

             ['A', 'G1', '2019-01-04', 14], 

             ['B', 'G2', '2019-01-01', 11], 

             ['B', 'G2', '2019-01-03', 13], 

             ['B', 'G2', '2019-01-06', 16]], 

            columns=['cust', 'group', 'date', 'val'])

df

enter image description here

df = df.groupby(['cust', 'group', 'date']).sum()

df

enter image description here

The dataframe is grouped and now I would like to calculate pct_change, but only if there are previous date.
If I do it like this:

df['pct'] = df.groupby(['cust', 'group']).val.pct_change()

df

enter image description here

I will get pct_change, but with no respect to the missing dates.
For example in group ('A', 'G1'), pct for date 2019-01-04 should be np.nan because there is no (previous) date 2019-01-03.

Maybe the solution would be to resample by day, where each new row will have np.nan as val, and than to do pct_change.

I tried to use df.resample('1D', level=2) but than I get an error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

For group ('B', 'G2') all pct_change should be np.nan because none of the rows has previous date.

Expected result is:

enter image description here

How to calculate pct_change respecting missing dates?

Solution:

new_df = pd.DataFrame()



for x, y  in df.groupby(['cust', 'group']):

    resampled=y.set_index('date').resample('D').val.mean().to_frame().rename({'val': 'resamp_val'}, axis=1) 

    resampled = resampled.join(y.set_index('date')).fillna({'cust':x[0],'group':x[1]})

    resampled['resamp_val_pct'] = resampled.resamp_val.pct_change(fill_method=None)



    new_df = pd.concat([new_df, resampled])



new_df = new_df[['cust', 'group', 'val', 'resamp_val', 'resamp_val_pct']]

new_df

enter image description here

edited yesterday

asked Jan 18 at 14:21

user3225309

40111

What is expected output?

– jezrael
Jan 18 at 14:35

I just provided expected result.

– user3225309
Jan 18 at 14:52

add a comment |

I have a dataframe:

import pandas as pd

df = pd.DataFrame([['A', 'G1', '2019-01-01', 11],

             ['A', 'G1', '2019-01-02', 12], 

             ['A', 'G1', '2019-01-04', 14], 

             ['B', 'G2', '2019-01-01', 11], 

             ['B', 'G2', '2019-01-03', 13], 

             ['B', 'G2', '2019-01-06', 16]], 

            columns=['cust', 'group', 'date', 'val'])

df

enter image description here

df = df.groupby(['cust', 'group', 'date']).sum()

df

enter image description here

The dataframe is grouped and now I would like to calculate pct_change, but only if there are previous date.
If I do it like this:

df['pct'] = df.groupby(['cust', 'group']).val.pct_change()

df

enter image description here

I will get pct_change, but with no respect to the missing dates.
For example in group ('A', 'G1'), pct for date 2019-01-04 should be np.nan because there is no (previous) date 2019-01-03.

Maybe the solution would be to resample by day, where each new row will have np.nan as val, and than to do pct_change.

I tried to use df.resample('1D', level=2) but than I get an error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

For group ('B', 'G2') all pct_change should be np.nan because none of the rows has previous date.

Expected result is:

enter image description here

How to calculate pct_change respecting missing dates?

Solution:

new_df = pd.DataFrame()



for x, y  in df.groupby(['cust', 'group']):

    resampled=y.set_index('date').resample('D').val.mean().to_frame().rename({'val': 'resamp_val'}, axis=1) 

    resampled = resampled.join(y.set_index('date')).fillna({'cust':x[0],'group':x[1]})

    resampled['resamp_val_pct'] = resampled.resamp_val.pct_change(fill_method=None)



    new_df = pd.concat([new_df, resampled])



new_df = new_df[['cust', 'group', 'val', 'resamp_val', 'resamp_val_pct']]

new_df

enter image description here

edited yesterday

asked Jan 18 at 14:21

user3225309

40111

What is expected output?

– jezrael
Jan 18 at 14:35

I just provided expected result.

– user3225309
Jan 18 at 14:52

add a comment |

I have a dataframe:

import pandas as pd

df = pd.DataFrame([['A', 'G1', '2019-01-01', 11],

             ['A', 'G1', '2019-01-02', 12], 

             ['A', 'G1', '2019-01-04', 14], 

             ['B', 'G2', '2019-01-01', 11], 

             ['B', 'G2', '2019-01-03', 13], 

             ['B', 'G2', '2019-01-06', 16]], 

            columns=['cust', 'group', 'date', 'val'])

df

enter image description here

df = df.groupby(['cust', 'group', 'date']).sum()

df

enter image description here

The dataframe is grouped and now I would like to calculate pct_change, but only if there are previous date.
If I do it like this:

df['pct'] = df.groupby(['cust', 'group']).val.pct_change()

df

enter image description here

I will get pct_change, but with no respect to the missing dates.
For example in group ('A', 'G1'), pct for date 2019-01-04 should be np.nan because there is no (previous) date 2019-01-03.

Maybe the solution would be to resample by day, where each new row will have np.nan as val, and than to do pct_change.

I tried to use df.resample('1D', level=2) but than I get an error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

For group ('B', 'G2') all pct_change should be np.nan because none of the rows has previous date.

Expected result is:

enter image description here

How to calculate pct_change respecting missing dates?

Solution:

new_df = pd.DataFrame()



for x, y  in df.groupby(['cust', 'group']):

    resampled=y.set_index('date').resample('D').val.mean().to_frame().rename({'val': 'resamp_val'}, axis=1) 

    resampled = resampled.join(y.set_index('date')).fillna({'cust':x[0],'group':x[1]})

    resampled['resamp_val_pct'] = resampled.resamp_val.pct_change(fill_method=None)



    new_df = pd.concat([new_df, resampled])



new_df = new_df[['cust', 'group', 'val', 'resamp_val', 'resamp_val_pct']]

new_df

enter image description here

edited yesterday

asked Jan 18 at 14:21

user3225309

40111

I have a dataframe:

import pandas as pd

df = pd.DataFrame([['A', 'G1', '2019-01-01', 11],

             ['A', 'G1', '2019-01-02', 12], 

             ['A', 'G1', '2019-01-04', 14], 

             ['B', 'G2', '2019-01-01', 11], 

             ['B', 'G2', '2019-01-03', 13], 

             ['B', 'G2', '2019-01-06', 16]], 

            columns=['cust', 'group', 'date', 'val'])

df

enter image description here

df = df.groupby(['cust', 'group', 'date']).sum()

df

enter image description here

The dataframe is grouped and now I would like to calculate pct_change, but only if there are previous date.
If I do it like this:

df['pct'] = df.groupby(['cust', 'group']).val.pct_change()

df

enter image description here

I will get pct_change, but with no respect to the missing dates.
For example in group ('A', 'G1'), pct for date 2019-01-04 should be np.nan because there is no (previous) date 2019-01-03.

Maybe the solution would be to resample by day, where each new row will have np.nan as val, and than to do pct_change.

I tried to use df.resample('1D', level=2) but than I get an error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

For group ('B', 'G2') all pct_change should be np.nan because none of the rows has previous date.

Expected result is:

enter image description here

How to calculate pct_change respecting missing dates?

Solution:

new_df = pd.DataFrame()



for x, y  in df.groupby(['cust', 'group']):

    resampled=y.set_index('date').resample('D').val.mean().to_frame().rename({'val': 'resamp_val'}, axis=1) 

    resampled = resampled.join(y.set_index('date')).fillna({'cust':x[0],'group':x[1]})

    resampled['resamp_val_pct'] = resampled.resamp_val.pct_change(fill_method=None)



    new_df = pd.concat([new_df, resampled])



new_df = new_df[['cust', 'group', 'val', 'resamp_val', 'resamp_val_pct']]

new_df

enter image description here

python pandas resampling

edited yesterday

asked Jan 18 at 14:21

user3225309

40111

edited yesterday

asked Jan 18 at 14:21

user3225309

40111

edited yesterday

asked Jan 18 at 14:21

user3225309

40111

asked Jan 18 at 14:21

user3225309

40111

asked Jan 18 at 14:21

user3225309

40111

What is expected output?

– jezrael
Jan 18 at 14:35

I just provided expected result.

– user3225309
Jan 18 at 14:52

add a comment |

What is expected output?

– jezrael
Jan 18 at 14:35

I just provided expected result.

– user3225309
Jan 18 at 14:52

What is expected output?

– jezrael
Jan 18 at 14:35

I just provided expected result.

– user3225309
Jan 18 at 14:52

add a comment |

2 Answers
2

active

oldest

votes

Check with groupby , then you need resample first and get the pct change with Boolean mask ,since pct_change will ignore NaN

d={}

for x, y  in df.groupby(['cust', 'group']):

    s=y.set_index('date').resample('D').val.mean()

    d[x]=pd.concat([s,s.pct_change().mask(s.shift().isnull()|s.isnull())],1)

newdf=pd.concat(d)

newdf.columns=['val','pct']

newdf

Out[651]: 

                  val       pct

     date                      

A G1 2019-01-01  11.0       NaN

     2019-01-02  12.0  0.090909

     2019-01-03   NaN       NaN

     2019-01-04  14.0       NaN

B G2 2019-01-01  11.0       NaN

     2019-01-02   NaN       NaN

     2019-01-03  13.0       NaN

     2019-01-04   NaN       NaN

     2019-01-05   NaN       NaN

     2019-01-06  16.0       NaN

You can add reset_index(inplace=True) at the end to make all index back to columns

answered Jan 18 at 15:10

W-B

106k83165

First I read the answer from AI_Learning in which I asked for resampling by group, the solution you have provided. I modified your example a bit, I will edit my question in order to present the solution.

– user3225309
yesterday

add a comment |

May be you could try comparing the difference between the consecutive rows is not equal to 1 day and then change the pct_change.

df= df.groupby(['cust', 'group', 'date'])

      .agg({'val':'sum','date':[min,max]}).reset_index()

df.columns = ['%s%s' % (a, '_%s' % b if b else '') for a, b in df.columns]



df['date_diff']=df['date'].diff()

df['pct_change_val']=df.val_sum.pct_change()

df['pct_change_final'] = df.apply(lambda row: np.NaN if pd.isnull(row.date_diff) 

                                  else np.NaN if row.date_diff != np.timedelta64(1, 'D') else row.pct_change_val ,axis=1)





#output:



    cust    group   date    date_min    date_max    val_sum date_diff   pct_change_val  pct_change_final

0   A   G1  2019-01-01  2019-01-01  2019-01-01  11          

1   A   G1  2019-01-02  2019-01-02  2019-01-02  12  1 days 00:00:00.000000000   0.09090909090909083 0.09090909090909083

2   A   G1  2019-01-04  2019-01-04  2019-01-04  14  2 days 00:00:00.000000000   0.16666666666666674 

3   B   G2  2019-01-01  2019-01-01  2019-01-01  11  -3 days +00:00:00.000000000 -0.2142857142857143 

4   B   G2  2019-01-03  2019-01-03  2019-01-03  13  2 days 00:00:00.000000000   0.18181818181818188 

5   B   G2  2019-01-06  2019-01-06  2019-01-06  16  3 days 00:00:00.000000000   0.23076923076923084

edited yesterday

answered Jan 18 at 15:30

AI_Learning

3,1612732

This works. Thanks. I got an idea for another approach. Would it be possible to find min/max dates for each group and than resample by day? Afterwards, I could use pct_change. For example if for group X min date is 2019-01-01 and max is 2019-01-05, I could resample the group, and than do the same for rest of groups. In that way I will have a dataframe in proper format for pct_change (and some others operations).

– user3225309
yesterday

I have updated the solution. hope it helps.

– AI_Learning
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54255901%2fresample-before-pct-change-and-missing-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Check with groupby , then you need resample first and get the pct change with Boolean mask ,since pct_change will ignore NaN

d={}

for x, y  in df.groupby(['cust', 'group']):

    s=y.set_index('date').resample('D').val.mean()

    d[x]=pd.concat([s,s.pct_change().mask(s.shift().isnull()|s.isnull())],1)

newdf=pd.concat(d)

newdf.columns=['val','pct']

newdf

Out[651]: 

                  val       pct

     date                      

A G1 2019-01-01  11.0       NaN

     2019-01-02  12.0  0.090909

     2019-01-03   NaN       NaN

     2019-01-04  14.0       NaN

B G2 2019-01-01  11.0       NaN

     2019-01-02   NaN       NaN

     2019-01-03  13.0       NaN

     2019-01-04   NaN       NaN

     2019-01-05   NaN       NaN

     2019-01-06  16.0       NaN

You can add reset_index(inplace=True) at the end to make all index back to columns

answered Jan 18 at 15:10

W-B

106k83165

First I read the answer from AI_Learning in which I asked for resampling by group, the solution you have provided. I modified your example a bit, I will edit my question in order to present the solution.

– user3225309
yesterday

add a comment |

Check with groupby , then you need resample first and get the pct change with Boolean mask ,since pct_change will ignore NaN

d={}

for x, y  in df.groupby(['cust', 'group']):

    s=y.set_index('date').resample('D').val.mean()

    d[x]=pd.concat([s,s.pct_change().mask(s.shift().isnull()|s.isnull())],1)

newdf=pd.concat(d)

newdf.columns=['val','pct']

newdf

Out[651]: 

                  val       pct

     date                      

A G1 2019-01-01  11.0       NaN

     2019-01-02  12.0  0.090909

     2019-01-03   NaN       NaN

     2019-01-04  14.0       NaN

B G2 2019-01-01  11.0       NaN

     2019-01-02   NaN       NaN

     2019-01-03  13.0       NaN

     2019-01-04   NaN       NaN

     2019-01-05   NaN       NaN

     2019-01-06  16.0       NaN

You can add reset_index(inplace=True) at the end to make all index back to columns

answered Jan 18 at 15:10

W-B

106k83165

First I read the answer from AI_Learning in which I asked for resampling by group, the solution you have provided. I modified your example a bit, I will edit my question in order to present the solution.

– user3225309
yesterday

add a comment |

Check with groupby , then you need resample first and get the pct change with Boolean mask ,since pct_change will ignore NaN

d={}

for x, y  in df.groupby(['cust', 'group']):

    s=y.set_index('date').resample('D').val.mean()

    d[x]=pd.concat([s,s.pct_change().mask(s.shift().isnull()|s.isnull())],1)

newdf=pd.concat(d)

newdf.columns=['val','pct']

newdf

Out[651]: 

                  val       pct

     date                      

A G1 2019-01-01  11.0       NaN

     2019-01-02  12.0  0.090909

     2019-01-03   NaN       NaN

     2019-01-04  14.0       NaN

B G2 2019-01-01  11.0       NaN

     2019-01-02   NaN       NaN

     2019-01-03  13.0       NaN

     2019-01-04   NaN       NaN

     2019-01-05   NaN       NaN

     2019-01-06  16.0       NaN

You can add reset_index(inplace=True) at the end to make all index back to columns

answered Jan 18 at 15:10

W-B

106k83165

Check with groupby , then you need resample first and get the pct change with Boolean mask ,since pct_change will ignore NaN

d={}

for x, y  in df.groupby(['cust', 'group']):

    s=y.set_index('date').resample('D').val.mean()

    d[x]=pd.concat([s,s.pct_change().mask(s.shift().isnull()|s.isnull())],1)

newdf=pd.concat(d)

newdf.columns=['val','pct']

newdf

Out[651]: 

                  val       pct

     date                      

A G1 2019-01-01  11.0       NaN

     2019-01-02  12.0  0.090909

     2019-01-03   NaN       NaN

     2019-01-04  14.0       NaN

B G2 2019-01-01  11.0       NaN

     2019-01-02   NaN       NaN

     2019-01-03  13.0       NaN

     2019-01-04   NaN       NaN

     2019-01-05   NaN       NaN

     2019-01-06  16.0       NaN

You can add reset_index(inplace=True) at the end to make all index back to columns

answered Jan 18 at 15:10

W-B

106k83165

answered Jan 18 at 15:10

W-B

106k83165

answered Jan 18 at 15:10

W-B

106k83165

answered Jan 18 at 15:10

W-B

106k83165

First I read the answer from AI_Learning in which I asked for resampling by group, the solution you have provided. I modified your example a bit, I will edit my question in order to present the solution.

– user3225309
yesterday

add a comment |

First I read the answer from AI_Learning in which I asked for resampling by group, the solution you have provided. I modified your example a bit, I will edit my question in order to present the solution.

– user3225309
yesterday

First I read the answer from AI_Learning in which I asked for resampling by group, the solution you have provided. I modified your example a bit, I will edit my question in order to present the solution.

– user3225309
yesterday

add a comment |

May be you could try comparing the difference between the consecutive rows is not equal to 1 day and then change the pct_change.

df= df.groupby(['cust', 'group', 'date'])

      .agg({'val':'sum','date':[min,max]}).reset_index()

df.columns = ['%s%s' % (a, '_%s' % b if b else '') for a, b in df.columns]



df['date_diff']=df['date'].diff()

df['pct_change_val']=df.val_sum.pct_change()

df['pct_change_final'] = df.apply(lambda row: np.NaN if pd.isnull(row.date_diff) 

                                  else np.NaN if row.date_diff != np.timedelta64(1, 'D') else row.pct_change_val ,axis=1)





#output:



    cust    group   date    date_min    date_max    val_sum date_diff   pct_change_val  pct_change_final

0   A   G1  2019-01-01  2019-01-01  2019-01-01  11          

1   A   G1  2019-01-02  2019-01-02  2019-01-02  12  1 days 00:00:00.000000000   0.09090909090909083 0.09090909090909083

2   A   G1  2019-01-04  2019-01-04  2019-01-04  14  2 days 00:00:00.000000000   0.16666666666666674 

3   B   G2  2019-01-01  2019-01-01  2019-01-01  11  -3 days +00:00:00.000000000 -0.2142857142857143 

4   B   G2  2019-01-03  2019-01-03  2019-01-03  13  2 days 00:00:00.000000000   0.18181818181818188 

5   B   G2  2019-01-06  2019-01-06  2019-01-06  16  3 days 00:00:00.000000000   0.23076923076923084

edited yesterday

answered Jan 18 at 15:30

AI_Learning

3,1612732

This works. Thanks. I got an idea for another approach. Would it be possible to find min/max dates for each group and than resample by day? Afterwards, I could use pct_change. For example if for group X min date is 2019-01-01 and max is 2019-01-05, I could resample the group, and than do the same for rest of groups. In that way I will have a dataframe in proper format for pct_change (and some others operations).

– user3225309
yesterday

I have updated the solution. hope it helps.

– AI_Learning
yesterday

add a comment |

May be you could try comparing the difference between the consecutive rows is not equal to 1 day and then change the pct_change.

df= df.groupby(['cust', 'group', 'date'])

      .agg({'val':'sum','date':[min,max]}).reset_index()

df.columns = ['%s%s' % (a, '_%s' % b if b else '') for a, b in df.columns]



df['date_diff']=df['date'].diff()

df['pct_change_val']=df.val_sum.pct_change()

df['pct_change_final'] = df.apply(lambda row: np.NaN if pd.isnull(row.date_diff) 

                                  else np.NaN if row.date_diff != np.timedelta64(1, 'D') else row.pct_change_val ,axis=1)





#output:



    cust    group   date    date_min    date_max    val_sum date_diff   pct_change_val  pct_change_final

0   A   G1  2019-01-01  2019-01-01  2019-01-01  11          

1   A   G1  2019-01-02  2019-01-02  2019-01-02  12  1 days 00:00:00.000000000   0.09090909090909083 0.09090909090909083

2   A   G1  2019-01-04  2019-01-04  2019-01-04  14  2 days 00:00:00.000000000   0.16666666666666674 

3   B   G2  2019-01-01  2019-01-01  2019-01-01  11  -3 days +00:00:00.000000000 -0.2142857142857143 

4   B   G2  2019-01-03  2019-01-03  2019-01-03  13  2 days 00:00:00.000000000   0.18181818181818188 

5   B   G2  2019-01-06  2019-01-06  2019-01-06  16  3 days 00:00:00.000000000   0.23076923076923084

edited yesterday

answered Jan 18 at 15:30

AI_Learning

3,1612732

This works. Thanks. I got an idea for another approach. Would it be possible to find min/max dates for each group and than resample by day? Afterwards, I could use pct_change. For example if for group X min date is 2019-01-01 and max is 2019-01-05, I could resample the group, and than do the same for rest of groups. In that way I will have a dataframe in proper format for pct_change (and some others operations).

– user3225309
yesterday

I have updated the solution. hope it helps.

– AI_Learning
yesterday

add a comment |

May be you could try comparing the difference between the consecutive rows is not equal to 1 day and then change the pct_change.

df= df.groupby(['cust', 'group', 'date'])

      .agg({'val':'sum','date':[min,max]}).reset_index()

df.columns = ['%s%s' % (a, '_%s' % b if b else '') for a, b in df.columns]



df['date_diff']=df['date'].diff()

df['pct_change_val']=df.val_sum.pct_change()

df['pct_change_final'] = df.apply(lambda row: np.NaN if pd.isnull(row.date_diff) 

                                  else np.NaN if row.date_diff != np.timedelta64(1, 'D') else row.pct_change_val ,axis=1)





#output:



    cust    group   date    date_min    date_max    val_sum date_diff   pct_change_val  pct_change_final

0   A   G1  2019-01-01  2019-01-01  2019-01-01  11          

1   A   G1  2019-01-02  2019-01-02  2019-01-02  12  1 days 00:00:00.000000000   0.09090909090909083 0.09090909090909083

2   A   G1  2019-01-04  2019-01-04  2019-01-04  14  2 days 00:00:00.000000000   0.16666666666666674 

3   B   G2  2019-01-01  2019-01-01  2019-01-01  11  -3 days +00:00:00.000000000 -0.2142857142857143 

4   B   G2  2019-01-03  2019-01-03  2019-01-03  13  2 days 00:00:00.000000000   0.18181818181818188 

5   B   G2  2019-01-06  2019-01-06  2019-01-06  16  3 days 00:00:00.000000000   0.23076923076923084

edited yesterday

answered Jan 18 at 15:30

AI_Learning

3,1612732

May be you could try comparing the difference between the consecutive rows is not equal to 1 day and then change the pct_change.

df= df.groupby(['cust', 'group', 'date'])

      .agg({'val':'sum','date':[min,max]}).reset_index()

df.columns = ['%s%s' % (a, '_%s' % b if b else '') for a, b in df.columns]



df['date_diff']=df['date'].diff()

df['pct_change_val']=df.val_sum.pct_change()

df['pct_change_final'] = df.apply(lambda row: np.NaN if pd.isnull(row.date_diff) 

                                  else np.NaN if row.date_diff != np.timedelta64(1, 'D') else row.pct_change_val ,axis=1)





#output:



    cust    group   date    date_min    date_max    val_sum date_diff   pct_change_val  pct_change_final

0   A   G1  2019-01-01  2019-01-01  2019-01-01  11          

1   A   G1  2019-01-02  2019-01-02  2019-01-02  12  1 days 00:00:00.000000000   0.09090909090909083 0.09090909090909083

2   A   G1  2019-01-04  2019-01-04  2019-01-04  14  2 days 00:00:00.000000000   0.16666666666666674 

3   B   G2  2019-01-01  2019-01-01  2019-01-01  11  -3 days +00:00:00.000000000 -0.2142857142857143 

4   B   G2  2019-01-03  2019-01-03  2019-01-03  13  2 days 00:00:00.000000000   0.18181818181818188 

5   B   G2  2019-01-06  2019-01-06  2019-01-06  16  3 days 00:00:00.000000000   0.23076923076923084

edited yesterday

answered Jan 18 at 15:30

AI_Learning

3,1612732

edited yesterday

answered Jan 18 at 15:30

AI_Learning

3,1612732

answered Jan 18 at 15:30

AI_Learning

3,1612732

answered Jan 18 at 15:30

AI_Learning

3,1612732

This works. Thanks. I got an idea for another approach. Would it be possible to find min/max dates for each group and than resample by day? Afterwards, I could use pct_change. For example if for group X min date is 2019-01-01 and max is 2019-01-05, I could resample the group, and than do the same for rest of groups. In that way I will have a dataframe in proper format for pct_change (and some others operations).

– user3225309
yesterday

I have updated the solution. hope it helps.

– AI_Learning
yesterday

add a comment |

This works. Thanks. I got an idea for another approach. Would it be possible to find min/max dates for each group and than resample by day? Afterwards, I could use pct_change. For example if for group X min date is 2019-01-01 and max is 2019-01-05, I could resample the group, and than do the same for rest of groups. In that way I will have a dataframe in proper format for pct_change (and some others operations).

– user3225309
yesterday

I have updated the solution. hope it helps.

– AI_Learning
yesterday

This works. Thanks. I got an idea for another approach. Would it be possible to find min/max dates for each group and than resample by day? Afterwards, I could use pct_change. For example if for group X min date is 2019-01-01 and max is 2019-01-05, I could resample the group, and than do the same for rest of groups. In that way I will have a dataframe in proper format for pct_change (and some others operations).

– user3225309
yesterday

I have updated the solution. hope it helps.

– AI_Learning
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku