timestamps, formatting, and internals

Started by David Salisburyalmost 14 years ago15 messagesgeneral
Jump to latest
#1David Salisbury
salisbury@globe.gov

I'm trying to debug an intermittent problem I'm seeing in one of our rollup scripts.

I'll try to summarize. A table has a measured_at field, of which I calculate another
time value based on that field and a longitude value, called solar_noon, and I summarize
min/max values grouped around this solarnoon. While I'm doing this I also calculate a
minimum time difference between the calcualted solar noon value and all the measured_at times
within the group. I then join this summary table back with the original table it's
summarizing, trying to pick out the one record in it that has the measured_at value that's closest
to the solarnoon value of the grouping.

Clear as mud? Anyways, there seems to be a problem on that last part. I'm thinking
the join on these date values is a bit funky. Perhaps things aren't matching up on micro
second values, but it's hard to know with queries if I'm seeing what the DB is seeing, as
date values are stored in seconds and what queries give you is a format of that.

So one question I have is if there a way to set PG in the way Oracle does it..
set nls_date_format = 'YYYY...' so I can query and see exactly what PG is seeing,
even to the microseconds? Is there a config parameter I can set in PG so that calculations
are done only to the second? It seems this join doesn't always find a record that's closest
to solar noon, and therefore drops the summary and join record all together.

PG 9.0, Linux

Here's the immediate code I'm thinking is in question.

SELECT DISTINCT on ( site_id, solarnoon )
yada.
WHERE
sds.site_id = sd.site_id
and
calculate_local_solar_noon( sd.measured_at, sds.longitude ) = sds.solarnoon
-- match with the record that's closest to solarnoon.
-- At this point we know the time difference,
-- but not whether it's more or less. The higher level
-- DISTINCT clause removes any duplicates caused should
-- solarnoon fall exactly between two measured_at times.
and
enough_measurements > 0
and
(
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) + (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
or
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) - (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
)
) end_distinct

#2David Salisbury
salisbury@globe.gov
In reply to: David Salisbury (#1)
Re: timestamps, formatting, and internals

Actually, figured I'd post the whole function, painful as it
might be for anyone to read. If anyone sees something that's a bit
of a risk ( like perhaps the whole thing ;)

On 5/18/12 5:19 PM, David Salisbury wrote:

I'm trying to debug an intermittent problem I'm seeing in one of our
rollup scripts.

I'll try to summarize. A table has a measured_at field, of which I
calculate another
time value based on that field and a longitude value, called solar_noon,
and I summarize
min/max values grouped around this solarnoon. While I'm doing this I
also calculate a
minimum time difference between the calcualted solar noon value and all
the measured_at times
within the group. I then join this summary table back with the original
table it's
summarizing, trying to pick out the one record in it that has the
measured_at value that's closest
to the solarnoon value of the grouping.

Clear as mud? Anyways, there seems to be a problem on that last part.
I'm thinking
the join on these date values is a bit funky. Perhaps things aren't
matching up on micro
second values, but it's hard to know with queries if I'm seeing what the
DB is seeing, as
date values are stored in seconds and what queries give you is a format
of that.

So one question I have is if there a way to set PG in the way Oracle
does it..
set nls_date_format = 'YYYY...' so I can query and see exactly what PG
is seeing,
even to the microseconds? Is there a config parameter I can set in PG so
that calculations
are done only to the second? It seems this join doesn't always find a
record that's closest
to solar noon, and therefore drops the summary and join record all
together.

PG 9.0, Linux

execute "
CREATE OR REPLACE FUNCTION rollup_and_insert_subdays()
returns void as $$
declare v_created_at timestamp without time zone;
BEGIN
v_created_at := now();
INSERT INTO air_temp_dailies (
uuid,
site_id,
organizationid,
protocol_id,
measured_at,
current_temp_c,
maximum_temp_c,
minimum_temp_c,
created_at,
comments
)
SELECT
subd_summary.uuid,
subd_summary.site_id,
subd_summary.organizationid,
subd_summary.protocol_id,
subd_summary.measured_at,
subd_summary.current_temp_c,
subd_summary.max_temp_c,
subd_summary.min_temp_c,
v_created_at as updated_at,
'automated station rollup' as comments

FROM (
SELECT DISTINCT on ( site_id, solarnoon )
site_id,
organizationid,
protocol_id,
uuid,
solarnoon,
measured_at,
current_temp_c,
max_temp_c,
min_temp_c
FROM (
SELECT
sd.site_id,
sd.organizationid,
sd.protocol_id,
sd.uuid,
sds.solarnoon,
sd.measured_at,
sd.current_temp_c,

sds.max_temp_c as max_temp_c,
sds.min_temp_c as min_temp_c
FROM (
SELECT
site_id,
longitude,
calculate_local_solar_noon(measured_at,longitude) as solarnoon,
max(current_temp_c) as max_temp_c,
min(current_temp_c) as min_temp_c,
min(
abs(
cast(
extract(
epoch FROM (
measured_at -
calculate_local_solar_noon(measured_at,longitude)
)
) as integer
)
)
) as
minimum_time_between_measured_and_solarnoon,
trunc(count(*)/93) as enough_measurements
FROM (
SELECT
site_id,
current_temp_c,
measured_at,
ST_X(point) as longitude
FROM
air_temp_sub_days INNER JOIN sites on (
air_temp_sub_days.site_id = sites.id
)
) as appending_longitude_to_sub_day_values
WHERE
measured_at is not null
GROUP BY
site_id,
calculate_local_solar_noon(measured_at,longitude),
-- we assume that for any site, for any given solarnoon, the
-- longitude is constant, so adding this value in the
-- group by has no effect other than allowing these values
-- to percolate to the outer select
longitude
) sds,
air_temp_sub_days sd
WHERE
sds.site_id = sd.site_id
and
calculate_local_solar_noon(
sd.measured_at,
sds.longitude
) = sds.solarnoon
-- match with the record that's closest to solarnoon.
-- At this point we know the time difference,
-- but not whether it's more or less. The higher level
-- DISTINCT clause removes any duplicates caused should
-- solarnoon fall exactly between two measured_at times.
and
enough_measurements > 0
and
(
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) + (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
or
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) - (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
)
) end_distinct
) subd_summary
LEFT OUTER JOIN air_temp_dailies on
subd_summary.site_id = air_temp_dailies.site_id
and
subd_summary.measured_at = air_temp_dailies.measured_at
WHERE
air_temp_dailies.site_id is null
and
air_temp_dailies.measured_at is null
order by subd_summary.site_id, subd_summary.measured_at
;

END $$ LANGUAGE plpgsql;

#3David Salisbury
salisbury@globe.gov
In reply to: David Salisbury (#2)
Re: timestamps, formatting, and internals

Oh.. and while I'm polluting this list (sorry) it's a timestamp field
without a time zone.

thanks for any ideas,

-Dave

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David Salisbury (#1)
Re: timestamps, formatting, and internals

On 05/18/2012 04:19 PM, David Salisbury wrote:

I'm trying to debug an intermittent problem I'm seeing in one of our
rollup scripts.

I'll try to summarize. A table has a measured_at field, of which I
calculate another
time value based on that field and a longitude value, called solar_noon,
and I summarize
min/max values grouped around this solarnoon. While I'm doing this I
also calculate a
minimum time difference between the calcualted solar noon value and all
the measured_at times
within the group. I then join this summary table back with the original
table it's
summarizing, trying to pick out the one record in it that has the
measured_at value that's closest
to the solarnoon value of the grouping.

Clear as mud? Anyways, there seems to be a problem on that last part.
I'm thinking
the join on these date values is a bit funky. Perhaps things aren't
matching up on micro
second values, but it's hard to know with queries if I'm seeing what the
DB is seeing, as
date values are stored in seconds and what queries give you is a format
of that.

I am not sure I follow. Timestamps(which is what I think you are
referring to) can be stored with up to microsecond precision and the
values will be returned at the precision specified.

See here;
http://www.postgresql.org/docs/9.0/interactive/datatype-datetime.html

So one question I have is if there a way to set PG in the way Oracle
does it..
set nls_date_format = 'YYYY...' so I can query and see exactly what PG
is seeing,
even to the microseconds?

Maybe you are looking for data formatting?:
http://www.postgresql.org/docs/9.0/interactive/functions-formatting.html

Is there a config parameter I can set in PG so

that calculations
are done only to the second? It seems this join doesn't always find a
record that's closest
to solar noon, and therefore drops the summary and join record all
together.

PG 9.0, Linux

--
Adrian Klaver
adrian.klaver@gmail.com

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David Salisbury (#1)
Re: timestamps, formatting, and internals

On 05/19/2012 10:34 AM, David Salisbury wrote:

CCing the list.

On 5/19/12 8:12 AM, Adrian Klaver wrote:

I hope no one looks further into the problem as the case is closed. It
was a coding
problem and not a time matchup problem. Late Friday afternoons just
aren't my most
shining moments. ;)

But I do still wonder... Is there a setting that I can set a default
formatting of the date. If I do a "select timestamp '01-jan-2012'" I
would want
it to always return everything, down to the last microsecond. A "Give me
everything!"
setting without doing some to_char ugliness. It would just return..

01-jan-2012 00:00:00.000

<NOTE: A timestamp contains a date, but a date is not a
timestamp(actually it is one fixed at midnight). To avoid confusion you
might want to be specific what type of time you are working with.>

It will return what is provided:) In other words it depends on how the
timestamp field is set up. That is determined by the precision parameter
passed to the timezone type when creating or altering the field. As far
as ad-hoc timestamp as shown above the best you can do is change the
DateStyle but that only goes to two decimal places.:

test=# SET datestyle ='SQL';
SET
test=# SELECT timestamp '01-jan-2012';
timestamp
---------------------
01/01/2012 00:00:00
(1 row)

In either case it will return all significant digits:

test=# SELECT '01-jan-2012 00:00:00.000012'::timestamp;
timestamp
----------------------------
2012-01-01 00:00:00.000012

And conversely, is there a setting so that any time value will always be
rounded to the second,
shaving off the micro-seconds. So that select now(); would return, and
store _internally_! ..

2012-05-19 11:25:12.000

Change the precision of the timestamp field to 0. See below:

test=# \d timestamp_test
Table "public.timestamp_test"
Column | Type | Modifiers
---------+-----------------------------+-----------
id | integer | not null
txt_fld | text |
ts_fld | timestamp with time zone |
ts_fld2 | timestamp(0) with time zone |
Indexes:
"timestamp_test_pkey" PRIMARY KEY, btree (id)

test=# SELECT * from timestamp_test ;
id | txt_fld | ts_fld | ts_fld2
----+---------+-------------------------------+------------------------
7 | test4 | 2009-12-24 13:37:32.499764-08 | 2009-12-24 13:37:32-08
8 | t | 2010-05-20 08:13:28.157027-07 | 2010-05-20 12:13:28-07
9 | t | 2010-05-20 08:13:43.265383-07 | 2010-05-20 10:13:43-07
10 | t | 2010-05-20 08:13:53.718519-07 | 2010-05-20 13:13:54-07
11 | s | 2011-03-25 09:00:00.124-07 | 2011-03-25 14:15:13-07
12 | s | 2011-03-25 09:12:00.124-07 | 2011-03-25 14:16:27-07

Thanks for the reply and any help!

-ds

--
Adrian Klaver
adrian.klaver@gmail.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Salisbury (#2)
Re: timestamps, formatting, and internals

David Salisbury <salisbury@globe.gov> writes:

Actually, figured I'd post the whole function, painful as it
might be for anyone to read. If anyone sees something that's a bit
of a risk ( like perhaps the whole thing ;)

Well, I don't know exactly what's causing your issue, but I see a few
things that seem rather dubious:

min(
abs(
cast(
extract(
epoch FROM (
measured_at -
calculate_local_solar_noon(measured_at,longitude)
)
) as integer
)
)
) as
minimum_time_between_measured_and_solarnoon,

Is there a really good reason to force the interval value to integer
here? I forget offhand whether you get truncation or rounding when you
do that, but in either case it's entirely likely that the computed min()
will be less than the actual difference for *any* specific real site, if
the smallest such difference has a fractional part. I'd lose the CAST
step and see what happens.

(
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) + (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
or
(
sd.measured_at = (
calculate_local_solar_noon(sd.measured_at,sds.longitude) - (
sds.minimum_time_between_measured_and_solarnoon::text ||
' secs'
)::interval
)
)
)

Because of the CAST above, these tests are guaranteed to fail if the
measured_at value has a fractional-second part, and I'm not sure why you
are assuming that that should be zero. Also, the above is an expensive,
grotty, imprecise way to convert a number back to an interval. Consider
using

sds.minimum_time_between_measured_and_solarnoon * interval '1 second'

or even better, what about

abs (extract (epoch from (
sd.measured_at -
calculate_local_solar_noon(sd.measured_at,sds.longitude)))) <=
sds.minimum_time_between_measured_and_solarnoon

which seems to me to be a lot more naturally related to what you're
doing to compute minimum_time_between_measured_and_solarnoon in the
first place.

regards, tom lane

#7Jasen Betts
jasen@xnet.co.nz
In reply to: David Salisbury (#1)
Re: timestamps, formatting, and internals

On 2012-05-18, David Salisbury <salisbury@globe.gov> wrote:

So one question I have is if there a way to set PG in the way Oracle does it..

probably not.

set nls_date_format = 'YYYY...' so I can query and see exactly what PG is seeing,
even to the microseconds?

set datestyle to 'ISO';

Is there a config parameter I can set in PG so that calculations
are done only to the second?

no, but you can truncate explicitly
date_trunc('second',your_timestamp_expression)

The query: "show integer_datetimes;" should return 'on' which means
timestamps are microsecond precision if it returns 'off' your database
was built with floating point timstamps and equality tests will be
unreliable,

It seems this join doesn't always find a record that's closest
to solar noon, and therefore drops the summary and join record all together.

you didn't show the actual join only the where clause.

given your task I would create CTEs finding the first record before
and after local solar noon, then combine them with a union and feed
the result of that to another disctint on to filter the most
appropriate record for each site.

--
⚂⚃ 100% natural

#8David Salisbury
salisbury@globe.gov
In reply to: Jasen Betts (#7)
Re: timestamps, formatting, and internals

On 5/27/12 12:25 AM, Jasen Betts wrote:

The query: "show integer_datetimes;" should return 'on' which means
timestamps are microsecond precision if it returns 'off' your database
was built with floating point timstamps and equality tests will be
unreliable,

I find that rather interesting. I was told that I was losing microseconds
when I extracted an epoch from the difference between two timestamps and casted
that value to an integer. So if I have integer timestamps ( your case above )
I get microseconds, but integer epochs is without microseconds?

Thanks,

-ds

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David Salisbury (#8)
Re: timestamps, formatting, and internals

On 05/29/2012 04:28 PM, David Salisbury wrote:

On 5/27/12 12:25 AM, Jasen Betts wrote:

The query: "show integer_datetimes;" should return 'on' which means
timestamps are microsecond precision if it returns 'off' your database
was built with floating point timstamps and equality tests will be
unreliable,

I find that rather interesting. I was told that I was losing microseconds
when I extracted an epoch from the difference between two timestamps and
casted
that value to an integer. So if I have integer timestamps ( your case
above )
I get microseconds, but integer epochs is without microseconds?

test=> SELECT extract(epoch from(now() - (now() - interval '1.345577 sec')));
date_part
-----------
1.345577

test=> SELECT extract(epoch from(now() - (now() - interval '1.345577 sec')))::int;
date_part
-----------
1

An integer is an integer so you will lose all the fractional parts:)

Thanks,

-ds

--
Adrian Klaver
adrian.klaver@gmail.com

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Adrian Klaver (#9)
Re: timestamps, formatting, and internals

On 05/29/2012 07:08 PM, Adrian Klaver wrote:

On 05/29/2012 04:28 PM, David Salisbury wrote:

On 5/27/12 12:25 AM, Jasen Betts wrote:

The query: "show integer_datetimes;" should return 'on' which means
timestamps are microsecond precision if it returns 'off' your database
was built with floating point timstamps and equality tests will be
unreliable,

I find that rather interesting. I was told that I was losing microseconds
when I extracted an epoch from the difference between two timestamps and
casted
that value to an integer. So if I have integer timestamps ( your case
above )
I get microseconds, but integer epochs is without microseconds?

Think I realize where the confusion is now. When Jasen mentioned integer
datetimes he was referring to the internal storage format Postgres uses
to record the datetime value. Via the magic of programming(others will
have to fill that part in) the internal format can represent time down
to microseconds even though the value is actually stored as an
eight-byte integer. When you do an explicit cast of a timestamp value to
integer you are asking that the value be only a whole number and the
decimal portion is discarded. In other words the internal integer
encodes the decimal values the external integer does not.

Thanks,

-ds

--
Adrian Klaver
adrian.klaver@gmail.com

#11David Salisbury
salisbury@globe.gov
In reply to: Adrian Klaver (#10)
Re: timestamps, formatting, and internals

On 5/30/12 9:42 AM, Adrian Klaver wrote:

Think I realize where the confusion is now. When Jasen mentioned integer
datetimes he was referring to the internal storage format Postgres uses
to record the datetime value. Via the magic of programming(others will
have to fill that part in) the internal format can represent time down
to microseconds even though the value is actually stored as an
eight-byte integer. When you do an explicit cast of a timestamp value to
integer you are asking that the value be only a whole number and the
decimal portion is discarded. In other words the internal integer
encodes the decimal values the external integer does not.

Thanks! I was looking for some sort of verification along these lines.
So in my mind, the internal storage of a timestamp would be the number
of milliseconds since 1970 ( or similar ). But to me, if I cast something
that is an integer into an integer it would still be an integer ;) , and
still hold the milliseconds. Perhaps if I cast a datetime into a bigint it'll
still hold the number of ms? Some sort of parameter setting for dates
would be nice to be able to default a date/time format down to the ms, w/o
having to explicitly format it with every select... imho.

-ds

#12Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David Salisbury (#11)
Re: timestamps, formatting, and internals

On 05/30/2012 01:48 PM, David Salisbury wrote:

On 5/30/12 9:42 AM, Adrian Klaver wrote:

Think I realize where the confusion is now. When Jasen mentioned integer
datetimes he was referring to the internal storage format Postgres uses
to record the datetime value. Via the magic of programming(others will
have to fill that part in) the internal format can represent time down
to microseconds even though the value is actually stored as an
eight-byte integer. When you do an explicit cast of a timestamp value to
integer you are asking that the value be only a whole number and the
decimal portion is discarded. In other words the internal integer
encodes the decimal values the external integer does not.

Thanks! I was looking for some sort of verification along these lines.
So in my mind, the internal storage of a timestamp would be the number
of milliseconds since 1970 ( or similar ). But to me, if I cast something
that is an integer into an integer it would still be an integer ;) , and
still hold the milliseconds. Perhaps if I cast a datetime into a bigint
it'll
still hold the number of ms? Some sort of parameter setting for dates
would be nice to be able to default a date/time format down to the ms, w/o
having to explicitly format it with every select... imho

The client does not see the internal value. That value is decoded when
presented to the client and encoded when the client presents a datetime
value. Any integer casting you do as the client will not change that. As
to datetime formatting, I believe that was covered in another thread:)

test=> SELECT now();
now
-------------------------------
2012-05-30 14:25:28.719475-07
(1 row)

test=> SELECT '2012-05-30'::TIMESTAMP;
timestamp
---------------------
2012-05-30 00:00:00
(1 row)

test=> SELECT '2012-05-30 2:26:45.56789'::TIMESTAMP;
timestamp
---------------------------
2012-05-30 02:26:45.56789
(1 row)

A timestamp will return values that are significant.

-ds

--
Adrian Klaver
adrian.klaver@gmail.com

#13Jasen Betts
jasen@xnet.co.nz
In reply to: David Salisbury (#1)
Re: timestamps, formatting, and internals

On 2012-05-29, David Salisbury <salisbury@globe.gov> wrote:

On 5/27/12 12:25 AM, Jasen Betts wrote:

The query: "show integer_datetimes;" should return 'on' which means
timestamps are microsecond precision if it returns 'off' your database
was built with floating point timstamps and equality tests will be
unreliable,

I find that rather interesting. I was told that I was losing microseconds
when I extracted an epoch from the difference between two timestamps and casted
that value to an integer. So if I have integer timestamps ( your case above )
I get microseconds, but integer epochs is without microseconds?

yeah, the microseconds appear as fractions of seconds, so in the
conversion to integer epoch they get rounded off.

Thanks,

-ds

--
⚂⚃ 100% natural

#14Jasen Betts
jasen@xnet.co.nz
In reply to: David Salisbury (#1)
Re: timestamps, formatting, and internals

On 2012-05-30, David Salisbury <salisbury@globe.gov> wrote:

On 5/30/12 9:42 AM, Adrian Klaver wrote:

Think I realize where the confusion is now. When Jasen mentioned integer
datetimes he was referring to the internal storage format Postgres uses
to record the datetime value. Via the magic of programming(others will
have to fill that part in) the internal format can represent time down
to microseconds even though the value is actually stored as an
eight-byte integer. When you do an explicit cast of a timestamp value to
integer you are asking that the value be only a whole number and the
decimal portion is discarded. In other words the internal integer
encodes the decimal values the external integer does not.

Thanks! I was looking for some sort of verification along these lines.
So in my mind, the internal storage of a timestamp would be the number
of milliseconds since 1970 ( or similar ). But to me, if I cast something
that is an integer into an integer it would still be an integer ;) , and
still hold the milliseconds.

It's internally stored as int8, but treated arithmetically as a number
of millionths. "Fixed point" is the apropiate term I think.

Perhaps if I cast a datetime into a bigint it'll
still hold the number of ms?

only if you multiply it by 1000000

Some sort of parameter setting for dates
would be nice to be able to default a date/time format down to the ms, w/o
having to explicitly format it with every select... imho.

sounds like a potential foot-gun to me.

--
⚂⚃ 100% natural

#15Mark Morgan Lloyd
markMLl.pgsql-general@telemetry.co.uk
In reply to: Jasen Betts (#13)
Re: timestamps, formatting, and internals

Jasen Betts wrote:

On 2012-05-29, David Salisbury <salisbury@globe.gov> wrote:

On 5/27/12 12:25 AM, Jasen Betts wrote:

The query: "show integer_datetimes;" should return 'on' which means
timestamps are microsecond precision if it returns 'off' your database
was built with floating point timstamps and equality tests will be
unreliable,

I find that rather interesting. I was told that I was losing microseconds
when I extracted an epoch from the difference between two timestamps and casted
that value to an integer. So if I have integer timestamps ( your case above )
I get microseconds, but integer epochs is without microseconds?

yeah, the microseconds appear as fractions of seconds, so in the
conversion to integer epoch they get rounded off.

I think you need to consider what you're actually computing and
measuring. My understanding is that Meeus's Equation of Time calculation
is good to something like 250mSec so that's the limit of your accuracy,
but as soon as you start taking refraction and atmospheric turbulence
into account- even with the Sun high above the horizon- you're going to
degrade that.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]