Is it possible to get the average time between a range of timestamps in Cassandra?












0















I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.



I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.



I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.



Simple events table:



CREATE TABLE tests.events(
device_id TEXT,
event_name TEXT,
event_time TIMESTAMP,
PRIMARY KEY(device_id, event_name)
)
WITH CLUSTERING ORDER BY (event_time DESC)
WITH default_time_to_live = 2592000; // 30 days


My current query:



SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;


I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?



Let's say I have the following event_time:



2019-01-01 10:00:00
2019-01-01 11:00:00
2019-01-01 11:30:00


The average would be 45 (minutes).



I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.










share|improve this question



























    0















    I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.



    I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.



    I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.



    Simple events table:



    CREATE TABLE tests.events(
    device_id TEXT,
    event_name TEXT,
    event_time TIMESTAMP,
    PRIMARY KEY(device_id, event_name)
    )
    WITH CLUSTERING ORDER BY (event_time DESC)
    WITH default_time_to_live = 2592000; // 30 days


    My current query:



    SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;


    I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?



    Let's say I have the following event_time:



    2019-01-01 10:00:00
    2019-01-01 11:00:00
    2019-01-01 11:30:00


    The average would be 45 (minutes).



    I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.










    share|improve this question

























      0












      0








      0








      I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.



      I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.



      I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.



      Simple events table:



      CREATE TABLE tests.events(
      device_id TEXT,
      event_name TEXT,
      event_time TIMESTAMP,
      PRIMARY KEY(device_id, event_name)
      )
      WITH CLUSTERING ORDER BY (event_time DESC)
      WITH default_time_to_live = 2592000; // 30 days


      My current query:



      SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;


      I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?



      Let's say I have the following event_time:



      2019-01-01 10:00:00
      2019-01-01 11:00:00
      2019-01-01 11:30:00


      The average would be 45 (minutes).



      I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.










      share|improve this question














      I have an events table that store an event name and time with the owner (device_id). I want to know the average time of which an event happens.



      I know Cassandra support AVG() but I'm sure it wasn't designed for this use-case.



      I played around with counters but the fact that they are not consistent and can only be used inside a table with no other fields make it hard to implement properly and would create a mess.



      Simple events table:



      CREATE TABLE tests.events(
      device_id TEXT,
      event_name TEXT,
      event_time TIMESTAMP,
      PRIMARY KEY(device_id, event_name)
      )
      WITH CLUSTERING ORDER BY (event_time DESC)
      WITH default_time_to_live = 2592000; // 30 days


      My current query:



      SELECT * FROM events WHERE device_id = "abcd" AND event_time >= '2019-01-01 00:00:00+0200' AND event_time <= '2019-01-02 00:00:00+0200' LIMIT 100 ALLOW FILTERING;


      I retrieve the last 100 events from a device. Is it possible, directly in Cassandra, to return the average time between events?



      Let's say I have the following event_time:



      2019-01-01 10:00:00
      2019-01-01 11:00:00
      2019-01-01 11:30:00


      The average would be 45 (minutes).



      I'm planning on eventually rely on some sort of machine learning code in which I can feed it events after a grace period, but I'm nowhere near that point yet so I'm looking for a temporary workaround.







      cassandra






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 19 at 23:48









      HypeWolfHypeWolf

      12412




      12412
























          1 Answer
          1






          active

          oldest

          votes


















          0














          You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions to true in your cassandra.yaml file.



          To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54272351%2fis-it-possible-to-get-the-average-time-between-a-range-of-timestamps-in-cassandr%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions to true in your cassandra.yaml file.



            To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.






            share|improve this answer




























              0














              You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions to true in your cassandra.yaml file.



              To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.






              share|improve this answer


























                0












                0








                0







                You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions to true in your cassandra.yaml file.



                To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.






                share|improve this answer













                You can try to use user-defined aggregate functions, like described in documentation. I think that you can almost re-use the function that is shown there, only change the code for state initialization. You'll need to change the enable_user_defined_functions to true in your cassandra.yaml file.



                To get good performance, you'll need to make sure that that aggregations are executed only inside single partition. You can look into this blog post for more examples & best practices.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 2 days ago









                Alex OttAlex Ott

                28.1k35273




                28.1k35273






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54272351%2fis-it-possible-to-get-the-average-time-between-a-range-of-timestamps-in-cassandr%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Liquibase includeAll doesn't find base path

                    How to use setInterval in EJS file?

                    Petrus Granier-Deferre