Trying to compute the median

Started by Maciej Blizińskialmost 22 years ago3 messagesgeneral
Jump to latest
#1Maciej Bliziński
maciej.blizinski@dobranet.polbox.pl

Hello,

I started to write the query that should compute the median.
Surprisingly, I get following error message:

"server closed the connection unexpectedly This probably means the server
terminated abnormally before or while processing the request."

I am using PostgreSQL 7.4.2, compiled from source under
Slackware-current, Linux 2.6.4.

Here's the query (it's not finished of course, but generates the error):

------------8<------------- ------------8<-------------

CREATE TEMPORARY TABLE test (
value INTEGER PRIMARY KEY
);

INSERT INTO test VALUES (-1);
INSERT INTO test VALUES (0);
INSERT INTO test VALUES (3);
INSERT INTO test VALUES (5);
INSERT INTO test VALUES (8);

SELECT
count(value) AS count,
CASE
WHEN mod(count(value), 2) = 1
THEN
/* odd number of elements */
(
SELECT value FROM test AS t2
ORDER BY
value ASC
OFFSET (count(t1.value) / 2)::INTEGER
LIMIT 1
)
ELSE
/* even number of elements */
0.0
END
AS median
FROM
test AS t1
;

------------8<------------- ------------8<-------------

Is it a PostgreSQL bug, or is my query so badly broken?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maciej Bliziński (#1)
Re: Trying to compute the median

Maciej =?iso-8859-2?Q?Blizi=F1ski?= <maciej.blizinski@dobranet.polbox.pl> writes:

I started to write the query that should compute the median.
Surprisingly, I get following error message:
"server closed the connection unexpectedly

Yeah, that's a bug. The patch is attached if you need it. However, I
think you will wind up looking for some other way to solve the problem,
because this query won't scale well to large datasets.

regards, tom lane

Index: setrefs.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/plan/setrefs.c,v
retrieving revision 1.97
diff -c -r1.97 setrefs.c
*** setrefs.c	8 Aug 2003 21:41:50 -0000	1.97
--- setrefs.c	11 May 2004 12:45:54 -0000
***************
*** 189,195 ****
  		case T_Sort:
  		case T_Unique:
  		case T_SetOp:
- 		case T_Limit:
  			/*
  			 * These plan types don't actually bother to evaluate their
--- 189,194 ----
***************
*** 201,206 ****
--- 200,214 ----
  			 * reprocessing subplans that also appear in lower levels of
  			 * the plan tree!
  			 */
+ 			break;
+ 		case T_Limit:
+ 			/*
+ 			 * Like the plan types above, Limit doesn't evaluate its
+ 			 * tlist or quals.  It does have live expressions for
+ 			 * limit/offset, however.
+ 			 */
+ 			fix_expr_references(plan, ((Limit *) plan)->limitOffset);
+ 			fix_expr_references(plan, ((Limit *) plan)->limitCount);
  			break;
  		case T_Agg:
  		case T_Group:
Index: subselect.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/optimizer/plan/subselect.c,v
retrieving revision 1.83.2.1
diff -c -r1.83.2.1 subselect.c
*** subselect.c	25 Nov 2003 23:59:32 -0000	1.83.2.1
--- subselect.c	11 May 2004 12:45:54 -0000
***************
*** 1018,1023 ****
--- 1018,1030 ----
  							  &context);
  			break;
+ 		case T_Limit:
+ 			finalize_primnode(((Limit *) plan)->limitOffset,
+ 							  &context);
+ 			finalize_primnode(((Limit *) plan)->limitCount,
+ 							  &context);
+ 			break;
+ 
  		case T_Hash:
  			finalize_primnode((Node *) ((Hash *) plan)->hashkeys,
  							  &context);
***************
*** 1029,1035 ****
  		case T_Sort:
  		case T_Unique:
  		case T_SetOp:
- 		case T_Limit:
  		case T_Group:
  			break;
--- 1036,1041 ----
#3Maciej Bliziński
maciej.blizinski@dobranet.polbox.pl
In reply to: Tom Lane (#2)
Re: Trying to compute the median

Tom Lane napisaďż˝:

Yeah, that's a bug. The patch is attached if you need it. However, I
think you will wind up looking for some other way to solve the problem,
because this query won't scale well to large datasets.

I already did some googling and found two examples of median calculating
functions, but they didn't convince me; one of them was creating
temporary table to calculate the median, and it doesn't look for me like
a well-scaling solution.

What would you suggest then?

Maciej Blizi�ski