pandas DataFrame.query expression that returns all rows by default

I have discovered the pandas DataFrame.query method and it almost does exactly what I needed it to (and implemented my own parser for, since I hadn't realized it existed but really I should be using the standard method).

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

There's just one thing missing: a way to select everything in the dataframe. Sometimes what my users want to use is every row, so they would put 'All' or something into that configuration option. In fact, that will be the default option.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

The only things I can think of are 1) put an if clause every time I need to do this type of query (probably 3 or 4 times in the code) or 2) subclass DataFrame and either reimplement query, or add a query_with_all method:

import pandas as pd



class MyDataFrame(pd.DataFrame):

    def query_with_all(self, query_string):

        if query_string.lower() == 'all':

            return self

        else:

            return self.query(query_string)

And then use my own class every time instead of the pandas one. Is this the only way to do this?

edited Dec 20 '18 at 10:09

asked Oct 19 '17 at 3:31

moink

24529

If the users knows the column names upfront, he could df.query('a == a') where a is one of the columns, but doesn't seem clean. Ah, may not work for rows with null

– Zero
Oct 19 '17 at 3:38

Or, have a global all_true = [True]*len(df) and then refer it df.query('@all_true ') perhaps? Or, have a all True reserved column if that isn't a constraint and refer df.query('_all_true_col')?

– Zero
Oct 19 '17 at 3:42

Zero, the columns will change, but there is one column that is absolutely required to be there and not be Null, so I will keep that in mind as an option. I don't think I would make my users put that in the config file, but rather would replace 'all' with that for internal use. But still not as clean as I would like, as you mention..

– moink
Oct 19 '17 at 3:43

Zero, as to your second suggestion, I would need to use the same query on different dataframes of different lengths, without knowing the length ahead of time.

– moink
Oct 19 '17 at 3:48

2

@Thomas, I ended up implementing my own module with something quite similar to the code I showed, though I didn't end up using inheritance, and several other functions on queries

– moink
Jun 22 '18 at 12:02

|
show 4 more comments

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

import pandas as pd



class MyDataFrame(pd.DataFrame):

    def query_with_all(self, query_string):

        if query_string.lower() == 'all':

            return self

        else:

            return self.query(query_string)

And then use my own class every time instead of the pandas one. Is this the only way to do this?

edited Dec 20 '18 at 10:09

asked Oct 19 '17 at 3:31

moink

24529

If the users knows the column names upfront, he could df.query('a == a') where a is one of the columns, but doesn't seem clean. Ah, may not work for rows with null

– Zero
Oct 19 '17 at 3:38

Or, have a global all_true = [True]*len(df) and then refer it df.query('@all_true ') perhaps? Or, have a all True reserved column if that isn't a constraint and refer df.query('_all_true_col')?

– Zero
Oct 19 '17 at 3:42

Zero, the columns will change, but there is one column that is absolutely required to be there and not be Null, so I will keep that in mind as an option. I don't think I would make my users put that in the config file, but rather would replace 'all' with that for internal use. But still not as clean as I would like, as you mention..

– moink
Oct 19 '17 at 3:43

Zero, as to your second suggestion, I would need to use the same query on different dataframes of different lengths, without knowing the length ahead of time.

– moink
Oct 19 '17 at 3:48

2

@Thomas, I ended up implementing my own module with something quite similar to the code I showed, though I didn't end up using inheritance, and several other functions on queries

– moink
Jun 22 '18 at 12:02

|
show 4 more comments

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

import pandas as pd



class MyDataFrame(pd.DataFrame):

    def query_with_all(self, query_string):

        if query_string.lower() == 'all':

            return self

        else:

            return self.query(query_string)

And then use my own class every time instead of the pandas one. Is this the only way to do this?

edited Dec 20 '18 at 10:09

asked Oct 19 '17 at 3:31

moink

24529

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

import pandas as pd



class MyDataFrame(pd.DataFrame):

    def query_with_all(self, query_string):

        if query_string.lower() == 'all':

            return self

        else:

            return self.query(query_string)

And then use my own class every time instead of the pandas one. Is this the only way to do this?

python pandas dataframe

edited Dec 20 '18 at 10:09

asked Oct 19 '17 at 3:31

moink

24529

edited Dec 20 '18 at 10:09

asked Oct 19 '17 at 3:31

moink

24529

edited Dec 20 '18 at 10:09

asked Oct 19 '17 at 3:31

moink

24529

asked Oct 19 '17 at 3:31

moink

24529

asked Oct 19 '17 at 3:31

moink

24529

If the users knows the column names upfront, he could df.query('a == a') where a is one of the columns, but doesn't seem clean. Ah, may not work for rows with null

– Zero
Oct 19 '17 at 3:38

Or, have a global all_true = [True]*len(df) and then refer it df.query('@all_true ') perhaps? Or, have a all True reserved column if that isn't a constraint and refer df.query('_all_true_col')?

– Zero
Oct 19 '17 at 3:42

Zero, the columns will change, but there is one column that is absolutely required to be there and not be Null, so I will keep that in mind as an option. I don't think I would make my users put that in the config file, but rather would replace 'all' with that for internal use. But still not as clean as I would like, as you mention..

– moink
Oct 19 '17 at 3:43

Zero, as to your second suggestion, I would need to use the same query on different dataframes of different lengths, without knowing the length ahead of time.

– moink
Oct 19 '17 at 3:48

2

@Thomas, I ended up implementing my own module with something quite similar to the code I showed, though I didn't end up using inheritance, and several other functions on queries

– moink
Jun 22 '18 at 12:02

|
show 4 more comments

If the users knows the column names upfront, he could df.query('a == a') where a is one of the columns, but doesn't seem clean. Ah, may not work for rows with null

– Zero
Oct 19 '17 at 3:38

Or, have a global all_true = [True]*len(df) and then refer it df.query('@all_true ') perhaps? Or, have a all True reserved column if that isn't a constraint and refer df.query('_all_true_col')?

– Zero
Oct 19 '17 at 3:42

Zero, the columns will change, but there is one column that is absolutely required to be there and not be Null, so I will keep that in mind as an option. I don't think I would make my users put that in the config file, but rather would replace 'all' with that for internal use. But still not as clean as I would like, as you mention..

– moink
Oct 19 '17 at 3:43

Zero, as to your second suggestion, I would need to use the same query on different dataframes of different lengths, without knowing the length ahead of time.

– moink
Oct 19 '17 at 3:48

2

@Thomas, I ended up implementing my own module with something quite similar to the code I showed, though I didn't end up using inheritance, and several other functions on queries

– moink
Jun 22 '18 at 12:02

If the users knows the column names upfront, he could df.query('a == a') where a is one of the columns, but doesn't seem clean. Ah, may not work for rows with null

– Zero
Oct 19 '17 at 3:38

Or, have a global all_true = [True]*len(df) and then refer it df.query('@all_true ') perhaps? Or, have a all True reserved column if that isn't a constraint and refer df.query('_all_true_col')?

– Zero
Oct 19 '17 at 3:42

Zero, the columns will change, but there is one column that is absolutely required to be there and not be Null, so I will keep that in mind as an option. I don't think I would make my users put that in the config file, but rather would replace 'all' with that for internal use. But still not as clean as I would like, as you mention..

– moink
Oct 19 '17 at 3:43

Zero, as to your second suggestion, I would need to use the same query on different dataframes of different lengths, without knowing the length ahead of time.

– moink
Oct 19 '17 at 3:48

@Thomas, I ended up implementing my own module with something quite similar to the code I showed, though I didn't end up using inheritance, and several other functions on queries

– moink
Jun 22 '18 at 12:02

|
show 4 more comments

2 Answers
2

active

oldest

votes

+100

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):

    if query_string == "all":

        return data_frame

    return data_frame.query(query_string)

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)

df.query('@ALL', engine='python')

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

All of these will property handle NaN values.

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

answered Dec 20 '18 at 1:15

Joshua

1,623717

Well, this is an obvious choice (and also in the OP), but the idea would be to keep this inside query if at all possible?

– coldspeed
Dec 20 '18 at 5:54

1

@coldspeed Sorry for not reading your post / the comments thoroughly. I've added two solutions that stay (mostly) inside the query.

– Joshua
Dec 20 '18 at 6:51

2

Hmm, I've tried both, and both throw errors. Did you use any options with query? The first one gives "ValueError: unknown type object" and the second one "TypeError: unsupported expression type: <class 'tuple'>". Any idea?

– coldspeed
Dec 20 '18 at 6:53

What versions are running? I have pd.__version__ = '0.23.4', np.__version__ = '1.15.4', sys.version='3.7.1 (default, Oct 23 2018, 14:07:42) n[Clang 4.0.1 (tags/RELEASE_401/final)]'.

– Joshua
Dec 20 '18 at 16:45

Same versions, no difference. I think these will work if you add engine='python' as an argument. Your second option will not work on MultiIndexed dataframes.

– coldspeed
Dec 20 '18 at 18:59

|
show 1 more comment

df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.

In you particular case you could then define a global variable all_true = 'ilevel_0 in ilevel_0' (as suggested in the comments by Zero) so that your engineers could use the name of the global variable in their config file instead.

This statement is just a dirty way to properly query True like you already tried. ilevel_0 is a more formal way of making sure you are referring the index. See the docs here for more details on using in and ilevel_0: https://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method

answered Dec 20 '18 at 7:32

jorijnsmit

614522

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46822423%2fpandas-dataframe-query-expression-that-returns-all-rows-by-default%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

+100

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):

    if query_string == "all":

        return data_frame

    return data_frame.query(query_string)

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)

df.query('@ALL', engine='python')

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

All of these will property handle NaN values.

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

answered Dec 20 '18 at 1:15

Joshua

1,623717

Well, this is an obvious choice (and also in the OP), but the idea would be to keep this inside query if at all possible?

– coldspeed
Dec 20 '18 at 5:54

1

@coldspeed Sorry for not reading your post / the comments thoroughly. I've added two solutions that stay (mostly) inside the query.

– Joshua
Dec 20 '18 at 6:51

2

Hmm, I've tried both, and both throw errors. Did you use any options with query? The first one gives "ValueError: unknown type object" and the second one "TypeError: unsupported expression type: <class 'tuple'>". Any idea?

– coldspeed
Dec 20 '18 at 6:53

What versions are running? I have pd.__version__ = '0.23.4', np.__version__ = '1.15.4', sys.version='3.7.1 (default, Oct 23 2018, 14:07:42) n[Clang 4.0.1 (tags/RELEASE_401/final)]'.

– Joshua
Dec 20 '18 at 16:45

Same versions, no difference. I think these will work if you add engine='python' as an argument. Your second option will not work on MultiIndexed dataframes.

– coldspeed
Dec 20 '18 at 18:59

|
show 1 more comment

+100

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):

    if query_string == "all":

        return data_frame

    return data_frame.query(query_string)

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)

df.query('@ALL', engine='python')

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

All of these will property handle NaN values.

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

answered Dec 20 '18 at 1:15

Joshua

1,623717

Well, this is an obvious choice (and also in the OP), but the idea would be to keep this inside query if at all possible?

– coldspeed
Dec 20 '18 at 5:54

1

@coldspeed Sorry for not reading your post / the comments thoroughly. I've added two solutions that stay (mostly) inside the query.

– Joshua
Dec 20 '18 at 6:51

2

Hmm, I've tried both, and both throw errors. Did you use any options with query? The first one gives "ValueError: unknown type object" and the second one "TypeError: unsupported expression type: <class 'tuple'>". Any idea?

– coldspeed
Dec 20 '18 at 6:53

What versions are running? I have pd.__version__ = '0.23.4', np.__version__ = '1.15.4', sys.version='3.7.1 (default, Oct 23 2018, 14:07:42) n[Clang 4.0.1 (tags/RELEASE_401/final)]'.

– Joshua
Dec 20 '18 at 16:45

Same versions, no difference. I think these will work if you add engine='python' as an argument. Your second option will not work on MultiIndexed dataframes.

– coldspeed
Dec 20 '18 at 18:59

|
show 1 more comment

+100

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):

    if query_string == "all":

        return data_frame

    return data_frame.query(query_string)

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)

df.query('@ALL', engine='python')

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

All of these will property handle NaN values.

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

answered Dec 20 '18 at 1:15

Joshua

1,623717

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):

    if query_string == "all":

        return data_frame

    return data_frame.query(query_string)

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)

df.query('@ALL', engine='python')

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

All of these will property handle NaN values.

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

answered Dec 20 '18 at 1:15

Joshua

1,623717

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

edited Dec 20 '18 at 19:01

coldspeed

126k23127214

answered Dec 20 '18 at 1:15

Joshua

1,623717

answered Dec 20 '18 at 1:15

Joshua

1,623717

answered Dec 20 '18 at 1:15

Joshua

1,623717

Well, this is an obvious choice (and also in the OP), but the idea would be to keep this inside query if at all possible?

– coldspeed
Dec 20 '18 at 5:54

1

@coldspeed Sorry for not reading your post / the comments thoroughly. I've added two solutions that stay (mostly) inside the query.

– Joshua
Dec 20 '18 at 6:51

2

Hmm, I've tried both, and both throw errors. Did you use any options with query? The first one gives "ValueError: unknown type object" and the second one "TypeError: unsupported expression type: <class 'tuple'>". Any idea?

– coldspeed
Dec 20 '18 at 6:53

What versions are running? I have pd.__version__ = '0.23.4', np.__version__ = '1.15.4', sys.version='3.7.1 (default, Oct 23 2018, 14:07:42) n[Clang 4.0.1 (tags/RELEASE_401/final)]'.

– Joshua
Dec 20 '18 at 16:45

Same versions, no difference. I think these will work if you add engine='python' as an argument. Your second option will not work on MultiIndexed dataframes.

– coldspeed
Dec 20 '18 at 18:59

|
show 1 more comment

Well, this is an obvious choice (and also in the OP), but the idea would be to keep this inside query if at all possible?

– coldspeed
Dec 20 '18 at 5:54

1

@coldspeed Sorry for not reading your post / the comments thoroughly. I've added two solutions that stay (mostly) inside the query.

– Joshua
Dec 20 '18 at 6:51

2

Hmm, I've tried both, and both throw errors. Did you use any options with query? The first one gives "ValueError: unknown type object" and the second one "TypeError: unsupported expression type: <class 'tuple'>". Any idea?

– coldspeed
Dec 20 '18 at 6:53

What versions are running? I have pd.__version__ = '0.23.4', np.__version__ = '1.15.4', sys.version='3.7.1 (default, Oct 23 2018, 14:07:42) n[Clang 4.0.1 (tags/RELEASE_401/final)]'.

– Joshua
Dec 20 '18 at 16:45

Same versions, no difference. I think these will work if you add engine='python' as an argument. Your second option will not work on MultiIndexed dataframes.

– coldspeed
Dec 20 '18 at 18:59

Well, this is an obvious choice (and also in the OP), but the idea would be to keep this inside query if at all possible?

– coldspeed
Dec 20 '18 at 5:54

@coldspeed Sorry for not reading your post / the comments thoroughly. I've added two solutions that stay (mostly) inside the query.

– Joshua
Dec 20 '18 at 6:51

Hmm, I've tried both, and both throw errors. Did you use any options with query? The first one gives "ValueError: unknown type object" and the second one "TypeError: unsupported expression type: <class 'tuple'>". Any idea?

– coldspeed
Dec 20 '18 at 6:53

What versions are running? I have pd.__version__ = '0.23.4', np.__version__ = '1.15.4', sys.version='3.7.1 (default, Oct 23 2018, 14:07:42) n[Clang 4.0.1 (tags/RELEASE_401/final)]'.

– Joshua
Dec 20 '18 at 16:45

Same versions, no difference. I think these will work if you add engine='python' as an argument. Your second option will not work on MultiIndexed dataframes.

– coldspeed
Dec 20 '18 at 18:59

|
show 1 more comment

df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.

answered Dec 20 '18 at 7:32

jorijnsmit

614522

add a comment |

df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.

answered Dec 20 '18 at 7:32

jorijnsmit

614522

add a comment |

df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.

answered Dec 20 '18 at 7:32

jorijnsmit

614522

df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.

answered Dec 20 '18 at 7:32

jorijnsmit

614522

answered Dec 20 '18 at 7:32

jorijnsmit

614522

answered Dec 20 '18 at 7:32

jorijnsmit

614522

answered Dec 20 '18 at 7:32

jorijnsmit

614522

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku