Using Postgres To summarize detail Records.

Using your database to condition data and move data from detail tables into summary tables can be more efficient and simpler than writing complex application code to do the same.  This example shows how to take some data from a detail table and summarize by minute and by hour.  If you were doing this in Java or C#, you would want to make sure that the process was atomic and done in a roleback section incase something goes awry.

STEPS:

  1. First determine what you want to summarize:
    We will be summarizing detailed records of memory usage by process for a compex system with many different applications where each application will have many different jvms on many different machines.
    select to_timestamp(to_char(eventtime,'yyyy-MM-dd HH24:MI') || ':00', 'yyyy-MM-dd HH24:MI:SS') as ttime,
    machine_id,
    application_id,
    to_number(p.pid,'999999999') as pid,
    avg(responsetime) as avg_response ,
    max(responsetime) as max_response ,
    avg(concurrent_reqs) as avg_concurrent ,
    max(concurrent_reqs) as max_concurrent ,
    count(*) as cnt
    from events e, processes p
    where
    e.process_id=p.process_id and
    eventtime >#startTime# and
    eventtime<#endTime#
    group by ttime, machine_id, application_id, p.pid order by ttime
    ;
  2. Then build the Target Table.  As a simple hint, make sure that the values in the group by clause, and the select clause exactly match the columns in the table where you want to stuff the data.
  3. Here is the final example for the sql above:
       INSERT INTO sumry_evnts_mh_ap_pid_min
(eventtime, machine_id, application_id,
pid, avg_response, max_response, avg_concurrent, max_concurrent, cnt)
(select to_timestamp(to_char(eventtime,'yyyy-MM-dd HH24:MI') || ':00', 'yyyy-MM-dd HH24:MI:SS') as ttime,
machine_id,
application_id,
to_number(p.pid,'999999999') as pid,
avg(responsetime) as avg_response ,
max(responsetime) as max_response ,
avg(concurrent_reqs) as avg_concurrent ,
max(concurrent_reqs) as max_concurrent ,
count(*) as cnt
from events e, processes p
where
e.process_id=p.process_id and
eventtime >#startTime# and
eventtime<#endTime#
group by ttime, machine_id, application_id, p.pid order by ttime)

Using the same strategy for By Hour Summaries:

          INSERT INTO sumry_evnts_mh_ap_pid_hr
(eventtime, machine_id, application_id,
pid, avg_response, max_response,
avg_concurrent, max_concurrent, cnt)
select to_timestamp( to_char(eventtime,'yyyy-MM-dd HH24') || ':00:00', 'yyyy-MM-dd HH24:MI:SS') as ttime ,
machine_id ,
application_id,
to_number(p.pid,'999999999'),
avg(responsetime) as avg_response ,
max(responsetime) as max_response ,
avg(concurrent_reqs) as avg_concurrent,
max(concurrent_reqs) as max_concurrent,
count(*) as cnt
from events e, processes p
where
e.process_id=p.process_id and
eventtime &gt; #startTime# and
eventtime&lt;#endTime#
group by ttime , machine_id, application_id, p.pid order by ttime