Is it possible to get the average time between a range of timestamps in Cassandra?
I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.
I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.
I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.
Simple events table:
CREATE TABLE tests.events(
device_id TEXT,
event_name TEXT,
event_time TIMESTAMP,
PRIMARY KEY(device_id, event_name)
)
WITH CLUSTERING ORDER BY (event_time DESC)
WITH default_time_to_live = 2592000; // 30 days
My current query:
SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;
I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?
Let's say I have the following event_time:
2019-01-01 10:00:00
2019-01-01 11:00:00
2019-01-01 11:30:00
The average would be 45 (minutes).
I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.
cassandra
add a comment |
I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.
I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.
I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.
Simple events table:
CREATE TABLE tests.events(
device_id TEXT,
event_name TEXT,
event_time TIMESTAMP,
PRIMARY KEY(device_id, event_name)
)
WITH CLUSTERING ORDER BY (event_time DESC)
WITH default_time_to_live = 2592000; // 30 days
My current query:
SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;
I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?
Let's say I have the following event_time:
2019-01-01 10:00:00
2019-01-01 11:00:00
2019-01-01 11:30:00
The average would be 45 (minutes).
I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.
cassandra
add a comment |
I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.
I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.
I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.
Simple events table:
CREATE TABLE tests.events(
device_id TEXT,
event_name TEXT,
event_time TIMESTAMP,
PRIMARY KEY(device_id, event_name)
)
WITH CLUSTERING ORDER BY (event_time DESC)
WITH default_time_to_live = 2592000; // 30 days
My current query:
SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;
I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?
Let's say I have the following event_time:
2019-01-01 10:00:00
2019-01-01 11:00:00
2019-01-01 11:30:00
The average would be 45 (minutes).
I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.
cassandra
I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.
I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.
I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.
Simple events table:
CREATE TABLE tests.events(
device_id TEXT,
event_name TEXT,
event_time TIMESTAMP,
PRIMARY KEY(device_id, event_name)
)
WITH CLUSTERING ORDER BY (event_time DESC)
WITH default_time_to_live = 2592000; // 30 days
My current query:
SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;
I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?
Let's say I have the following event_time:
2019-01-01 10:00:00
2019-01-01 11:00:00
2019-01-01 11:30:00
The average would be 45 (minutes).
I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.
cassandra
cassandra
asked Jan 19 at 23:48
HypeWolfHypeWolf
12412
12412
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions
to true in your cassandra.yaml
file.
To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54272351%2fis-it-possible-to-get-the-average-time-between-a-range-of-timestamps-in-cassandr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions
to true in your cassandra.yaml
file.
To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.
add a comment |
You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions
to true in your cassandra.yaml
file.
To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.
add a comment |
You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions
to true in your cassandra.yaml
file.
To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.
You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions
to true in your cassandra.yaml
file.
To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.
answered 2 days ago
Alex OttAlex Ott
28.1k35273
28.1k35273
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54272351%2fis-it-possible-to-get-the-average-time-between-a-range-of-timestamps-in-cassandr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown