Make printtup a bit faster

Started by Andy Fanover 1 year ago15 messages

zhihuifan1213@163.com

over 1 year ago

Usually I see printtup in the perf-report with a noticeable ratio. Take
"SELECT * FROM pg_class" for example, we can see:

85.65% 3.25% postgres postgres [.] printtup

The high level design of printtup is:

1. Used a pre-allocated StringInfo DR_printtup.buf to store data for
each tuples.
2. for each datum in the tuple, it calls the type-specific out function
and get a cstring.
3. after get the cstring, we figure out the "len" and add both len and
'data' into DR_printtup.buf.
4. after all the datums are handled, socket_putmessage copies them into
PqSendBuffer.
5. When the usage of PgSendBuffer is up to PqSendBufferSize, using send
syscall to sent them into client (by copying the data from userspace to
kernel space again).

Part of the slowness is caused by "memcpy", "strlen" and palloc in
outfunction.

8.35% 8.35% postgres libc.so.6 [.] __strlen_avx2
4.27% 0.00% postgres libc.so.6 [.] __memcpy_avx_unaligned_erms
3.93% 3.93% postgres postgres [.] palloc (part of them caused by
out function)
5.70% 5.70% postgres postgres [.] AllocSetAlloc (part of them
caused by printtup.)

My high level proposal is define a type specific print function like:

oidprint(Datum datum, StringInfo buf)
textprint(Datum datum, StringInfo buf)

This function should append both data and len into buf directly.

for the oidprint case, we can avoid:

5. the dedicate palloc in oid function.
6. the memcpy from the above memory into DR_printtup.buf

for the textprint case, we can avoid

7. strlen, since we can figure out the length from varlena.vl_len

int2/4/8/timestamp/date/time are similar with oid. and numeric, varchar
are similar with text. This almost covers all the common used type.

Hard coding the relationship between common used type and {type}print
function OID looks not cool, Adding a new attribute in pg_type looks too
aggressive however. Anyway this is the next topic to talk about.

If a type's print function is not defined, we can still using the out
function (and PrinttupAttrInfo caches FmgrInfo rather than
FunctionCallInfo, so there is some optimization in this step as well).

This proposal covers the step 2 & 3. If we can do something more
aggressively, we can let the xxxprint print to PqSendBuffer directly,
but this is more complex and need some infrastructure changes. the
memcpy in step 4 is: "1.27% __memcpy_avx_unaligned_erms" in my above
case.

What do you think?

--
Best Regards
Andy Fan

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Andy Fan (#1)

Re: Make printtup a bit faster

On Thu, 29 Aug 2024 at 21:40, Andy Fan <zhihuifan1213@163.com> wrote:

Usually I see printtup in the perf-report with a noticeable ratio.

Part of the slowness is caused by "memcpy", "strlen" and palloc in
outfunction.

Yeah, it's a pretty inefficient API from a performance point of view.

My high level proposal is define a type specific print function like:

oidprint(Datum datum, StringInfo buf)
textprint(Datum datum, StringInfo buf)

I think what we should do instead is make the output functions take a
StringInfo and just pass it the StringInfo where we'd like the bytes
written.

That of course would require rewriting all the output functions for
all the built-in types, so not a small task. Extensions make that job
harder. I don't think it would be good to force extensions to rewrite
their output functions, so perhaps some wrapper function could help us
align the APIs for extensions that have not been converted yet.

There's a similar problem with input functions not having knowledge of
the input length. You only have to look at textin() to see how useful
that could be. Fixing that would probably make COPY FROM horrendously
faster. Team that up with SIMD for the delimiter char search and COPY
go a bit better still. Neil Conway did propose the SIMD part in [1]/messages/by-id/CAOW5sYb1HprQKrzjCsrCP1EauQzZy+njZ-AwBbOUMoGJHJS7Sw@mail.gmail.com,
but it's just not nearly as good as it could be when having to still
perform the strlen() calls.

I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

David

[1]: /messages/by-id/CAOW5sYb1HprQKrzjCsrCP1EauQzZy+njZ-AwBbOUMoGJHJS7Sw@mail.gmail.com

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: David Rowley (#2)

Re: Make printtup a bit faster

David Rowley <dgrowleyml@gmail.com> writes:

[ redesign I/O function APIs ]
I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

I'm skeptical that such a thing will ever be practical. To avoid
breaking un-converted data types, all the call sites would have to
support both old and new APIs. To avoid breaking non-core callers,
all the I/O functions would have to support both old and new APIs.
That probably adds enough overhead to negate whatever benefit you'd
get.

regards, tom lane

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: David Rowley (#2)

Re: Make printtup a bit faster

David Rowley <dgrowleyml@gmail.com> writes:

Hello David,

My high level proposal is define a type specific print function like:

oidprint(Datum datum, StringInfo buf)
textprint(Datum datum, StringInfo buf)

I think what we should do instead is make the output functions take a
StringInfo and just pass it the StringInfo where we'd like the bytes
written.

That of course would require rewriting all the output functions for
all the built-in types, so not a small task. Extensions make that job
harder. I don't think it would be good to force extensions to rewrite
their output functions, so perhaps some wrapper function could help us
align the APIs for extensions that have not been converted yet.

I have the similar concern as Tom that this method looks too
aggressive. That's why I said:

"If a type's print function is not defined, we can still using the out
function."

AND

"Hard coding the relationship between [common] used type and {type}print
function OID looks not cool, Adding a new attribute in pg_type looks too
aggressive however. Anyway this is the next topic to talk about."

What would be the extra benefit we redesign all the out functions?

There's a similar problem with input functions not having knowledge of
the input length. You only have to look at textin() to see how useful
that could be. Fixing that would probably make COPY FROM horrendously
faster. Team that up with SIMD for the delimiter char search and COPY
go a bit better still. Neil Conway did propose the SIMD part in [1],
but it's just not nearly as good as it could be when having to still
perform the strlen() calls.

OK, I think I can understand the needs to make in-function knows the
input length and good to know the SIMD part for delimiter char
search. strlen looks like a delimiter char search ('\0') as well. Not
sure if "strlen" has been implemented with SIMD part, but if not, why?

I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

I see you did many amazing work with cache-line-frindly data struct
design, branch predition optimization and SIMD optimization. I'd like to
try one myself. I'm not sure if I can meet the target, what if we handle
the out/in function separately (can be by different people)?

--
Best Regards
Andy Fan

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Tom Lane (#3)

Re: Make printtup a bit faster

On Fri, 30 Aug 2024 at 03:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

[ redesign I/O function APIs ]
I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

I'm skeptical that such a thing will ever be practical. To avoid
breaking un-converted data types, all the call sites would have to
support both old and new APIs. To avoid breaking non-core callers,
all the I/O functions would have to support both old and new APIs.
That probably adds enough overhead to negate whatever benefit you'd
get.

Scepticism is certainly good when it comes to such a large API change.
I don't want to argue with you, but I'd like to state a few things
about why I think you're wrong on this...

So, we currently return cstrings in our output functions. Let's take
jsonb_out() as an example, to build that cstring, we make a *new*
StringInfoData on *every call* inside JsonbToCStringWorker(). That
gives you 1024 bytes before you need to enlarge it. However, it's
maybe not all bad as we have some size estimations there to call
enlargeStringInfo(), only that's a bit wasteful as it does a
repalloc() which memcpys the freshly allocated 1024 bytes allocated in
initStringInfo() when it doesn't yet contain any data. After
jsonb_out() has returned and we have the cstring, only we forgot the
length of the string, so most places will immediately call strlen() or
do that indirectly via appendStringInfoString(). For larger JSON
documents, that'll likely require pulling cachelines back into L1
again. I don't know how modern CPU cacheline eviction works, but if it
was as simple as FIFO then the strlen() would flush all those
cachelines only for memcpy() to have to read them back again for
output strings larger than L1.

If we rewrote all of core's output functions to use the new API, then
the branch to test the function signature would be perfectly
predictable and amount to an extra cmp and jne/je opcode. So, I just
don't agree with the overheads negating the benefits comment. You're
probably off by 1 order of magnitude at the minimum and for
medium/large varlena types likely 2-3+ orders. Even a simple int4out
requires a palloc()/memcpy. If we were outputting lots of data, e.g.
in a COPY operation, the output buffer would seldom need to be
enlarged as it would quickly adjust to the correct size.

For the input functions, the possible gains are extensive too.
textin() is a good example, it uses cstring_to_text(), but could be
changed to use cstring_to_text_with_len(). Knowing the input string
length also opens the door to SIMD. Take int4in() as an example, if
pg_strtoint32_safe() knew its input length there are a bunch of
prechecks that could be done with either 64-bit SWAR or with SIMD.
For example, if you knew you had an 8-char string of decimal digits
then converting that to an int32 is quite cheap. It's impossible to
overflow an int32 with 8 decimal digits, so no overflow checks need to
be done until there are at least 10 decimal digits. ca6fde922 seems
like good enough example of the possible gains of SIMD vs
byte-at-a-time processing. I saw some queries go 4x faster there and
that was me trying to keep the JSON document sizes realistic.
byte-at-a-time is just not enough to saturate RAM speed. Take DDR5,
for example, Wikipedia says it has a bandwidth of 32–64 GB/s, so
unless we discover room temperature superconductors, we're not going
to see any massive jump in clock speeds any time soon, and with 5 or
6Ghz CPUs, there's just no way to get anywhere near that bandwidth by
processing byte-at-a-time. For some sort of nieve strcpy() type
function, you're going to need at least a cmp and mov, even if those
were latency=1 (which they're not, see [1]https://www.agner.org/optimize/instruction_tables.pdf), you can only do 2.5
billion of those two per second on a 5Ghz processor. I've done tested,
but hypothetically (assuming latency=1) that amounts to processing
2.5GB/s, i.e. a long way from DDR5 RAM speed and that's not taking
into account having to increment pointers to the next byte on each
loop.

David

[1]: https://www.agner.org/optimize/instruction_tables.pdf

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Andy Fan (#4)

Re: Make printtup a bit faster

On Fri, 30 Aug 2024 at 12:10, Andy Fan <zhihuifan1213@163.com> wrote:

What would be the extra benefit we redesign all the out functions?

If I've understood your proposal correctly, it sounds like you want to
invent a new "print" output function for each type to output the Datum
onto a StringInfo, if that's the case, what would be the point of
having both versions? If there's anywhere we call output functions
where the resulting value isn't directly appended to a StringInfo,
then we could just use a temporary StringInfo to obtain the cstring
and its length.

David

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: David Rowley (#6)

Re: Make printtup a bit faster

David Rowley <dgrowleyml@gmail.com> writes:

On Fri, 30 Aug 2024 at 12:10, Andy Fan <zhihuifan1213@163.com> wrote:

What would be the extra benefit we redesign all the out functions?

If I've understood your proposal correctly, it sounds like you want to
invent a new "print" output function for each type to output the Datum
onto a StringInfo,

Mostly yes, but not for [each type at once], just for the [common used
type], like int2/4/8, float4/8, date/time/timestamp, text/.. and so on.

if that's the case, what would be the point of having both versions?

The biggest benefit would be compatibility.

In my opinion, print function (not need to be in pg_type at all) is as
an optimization and optional, in some performance critical path we can
replace the out-function with printfunction, like (printtup). if such
performance-critical path find a type without a print-function is
defined, just keep the old way.

Kind of like supportfunction for proc, this is for data type? Within
this way, changes would be much smaller and step-by-step.

If there's anywhere we call output functions
where the resulting value isn't directly appended to a StringInfo,
then we could just use a temporary StringInfo to obtain the cstring
and its length.

I think this is true, but it requests some caller's code change.

--
Best Regards
Andy Fan

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Andy Fan (#7)

Re: Make printtup a bit faster

On Fri, 30 Aug 2024 at 13:04, Andy Fan <zhihuifan1213@163.com> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

If there's anywhere we call output functions
where the resulting value isn't directly appended to a StringInfo,
then we could just use a temporary StringInfo to obtain the cstring
and its length.

I think this is true, but it requests some caller's code change.

Yeah, calling code would need to be changed to take advantage of the
new API, however, the differences in which types support which API
could be hidden inside OutputFunctionCall(). That function could just
fake up a StringInfo for any types that only support the old cstring
API. That means we don't need to add handling for both cases
everywhere we need to call the output function. It's possible that
could make some operations slightly slower when only the old API is
available, but then maybe not as we do now have read-only StringInfos.
Maybe the StringInfoData.data field could just be set to point to the
given cstring using initReadOnlyStringInfo() rather than doing
appendBinaryStringInfo() onto yet another buffer for the old API.

David

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: David Rowley (#8)

Re: Make printtup a bit faster

David Rowley <dgrowleyml@gmail.com> writes:

On Fri, 30 Aug 2024 at 13:04, Andy Fan <zhihuifan1213@163.com> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

If there's anywhere we call output functions
where the resulting value isn't directly appended to a StringInfo,
then we could just use a temporary StringInfo to obtain the cstring
and its length.

I think this is true, but it requests some caller's code change.

Yeah, calling code would need to be changed to take advantage of the
new API, however, the differences in which types support which API
could be hidden inside OutputFunctionCall(). That function could just
fake up a StringInfo for any types that only support the old cstring
API. That means we don't need to add handling for both cases
everywhere we need to call the output function.

We can do this, then the printtup case (stands for some performance
crital path) still need to change discard OutputFunctionCall() since it
uses the fake StringInfo then a memcpy is needed again IIUC.

Besides above, my major concerns about your proposal need to change [all
the type's outfunction at once] which is too aggresive for me. In the
fresh setup without any extension is created, "SELECT count(*) FROM
pg_type" returns 627 already, So other piece of my previous reply is
more important to me.

It is great that both of us feeling the current stategy is not good for
performance:)

--
Best Regards
Andy Fan

#10

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: David Rowley (#5)

1 attachment(s)

Re: Make printtup a bit faster

David Rowley <dgrowleyml@gmail.com> writes:

On Fri, 30 Aug 2024 at 03:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

[ redesign I/O function APIs ]
I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

I'm skeptical that such a thing will ever be practical. To avoid
breaking un-converted data types, all the call sites would have to
support both old and new APIs. To avoid breaking non-core callers,
all the I/O functions would have to support both old and new APIs.
That probably adds enough overhead to negate whatever benefit you'd
get.

So, we currently return cstrings in our output functions. Let's take
jsonb_out() as an example, to build that cstring, we make a *new*
StringInfoData on *every call* inside JsonbToCStringWorker(). That
gives you 1024 bytes before you need to enlarge it. However, it's
maybe not all bad as we have some size estimations there to call
enlargeStringInfo(), only that's a bit wasteful as it does a
repalloc() which memcpys the freshly allocated 1024 bytes allocated in
initStringInfo() when it doesn't yet contain any data. After
jsonb_out() has returned and we have the cstring, only we forgot the
length of the string, so most places will immediately call strlen() or
do that indirectly via appendStringInfoString(). For larger JSON
documents, that'll likely require pulling cachelines back into L1
again. I don't know how modern CPU cacheline eviction works, but if it
was as simple as FIFO then the strlen() would flush all those
cachelines only for memcpy() to have to read them back again for
output strings larger than L1.

The attached is PoC of this idea, not matter which method are adopted
(rewrite all the outfunction or a optional print function), I think the
benefit will be similar. In the blew test case, it shows us 10%+
improvements. (0.134ms vs 0.110ms)

create table demo as select oid as oid1, relname::text as text1, relam,
relname::text as text2 from pg_class;

pgbench:
select * from demo;

--
Best Regards
Andy Fan

Attachments:

v20240830-0001-Avoiding-some-memcpy-strlen-palloc-in-prin.patchtext/x-diffDownload

From 5ace763a5126478da1c3cb68d5221d83e45d2f34 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihuifan1213@163.com>
Date: Fri, 30 Aug 2024 12:50:54 +0800
Subject: [PATCH v20240830 1/1] Avoiding some memcpy, strlen, palloc in
 printtup.

https://www.postgresql.org/message-id/87wmjzfz0h.fsf%40163.com
---
 src/backend/access/common/printtup.c | 40 ++++++++++++++++++++++------
 src/backend/utils/adt/oid.c          | 20 ++++++++++++++
 src/backend/utils/adt/varlena.c      | 17 ++++++++++++
 src/include/catalog/pg_proc.dat      |  9 ++++++-
 4 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index c78cc39308..ecba4a7113 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -19,6 +19,7 @@
 #include "libpq/pqformat.h"
 #include "libpq/protocol.h"
 #include "tcop/pquery.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
@@ -49,6 +50,7 @@ typedef struct
 	bool		typisvarlena;	/* is it varlena (ie possibly toastable)? */
 	int16		format;			/* format code for this column */
 	FmgrInfo	finfo;			/* Precomputed call info for output fn */
+	FmgrInfo	p_finfo;        /* Precomputed call info for print fn if any */
 } PrinttupAttrInfo;
 
 typedef struct
@@ -274,10 +276,25 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 		thisState->format = format;
 		if (format == 0)
 		{
-			getTypeOutputInfo(attr->atttypid,
-							  &thisState->typoutput,
-							  &thisState->typisvarlena);
-			fmgr_info(thisState->typoutput, &thisState->finfo);
+			/*
+			 * If the type defines a print function, then use it
+			 * rather than outfunction.
+			 *
+			 * XXX: need a generic function to improve the if-elseif.
+			 */
+			if (attr->atttypid == OIDOID)
+				fmgr_info(F_OIDPRINT, &thisState->p_finfo);
+			else if (attr->atttypid == TEXTOID)
+				fmgr_info(F_TEXTPRINT, &thisState->p_finfo);
+			else
+			{
+				getTypeOutputInfo(attr->atttypid,
+								  &thisState->typoutput,
+								  &thisState->typisvarlena);
+				fmgr_info(thisState->typoutput, &thisState->finfo);
+				/* mark print function is invalid */
+				thisState->p_finfo.fn_oid = InvalidOid;
+			}
 		}
 		else if (format == 1)
 		{
@@ -355,10 +372,17 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 		if (thisState->format == 0)
 		{
 			/* Text output */
-			char	   *outputstr;
-
-			outputstr = OutputFunctionCall(&thisState->finfo, attr);
-			pq_sendcountedtext(buf, outputstr, strlen(outputstr));
+			if (thisState->p_finfo.fn_oid)
+			{
+				FunctionCall2(&thisState->p_finfo, attr, PointerGetDatum(buf));
+			}
+			else
+			{
+				char	   *outputstr;
+
+				outputstr = OutputFunctionCall(&thisState->finfo, attr);
+				pq_sendcountedtext(buf, outputstr, strlen(outputstr));
+			}
 		}
 		else
 		{
diff --git a/src/backend/utils/adt/oid.c b/src/backend/utils/adt/oid.c
index 56fb1fd77c..cc85d920c8 100644
--- a/src/backend/utils/adt/oid.c
+++ b/src/backend/utils/adt/oid.c
@@ -53,6 +53,26 @@ oidout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+oidprint(PG_FUNCTION_ARGS)
+{
+	Oid			o = PG_GETARG_OID(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	uint32 *lenp;
+	uint32 data_len;
+
+	/* 12 is the max length for an oid's text presentation. */
+	enlargeStringInfo(buf, sizeof(int32) + 12);
+
+	/* note the position for len */
+	lenp = (uint32 *) (buf->data + buf->len);
+	data_len = pg_snprintf(buf->data + buf->len + sizeof(int), 12, "%u", o);
+	*lenp = pg_hton32(data_len);
+	buf->len += sizeof(uint32) + data_len;
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		oidrecv			- converts external binary format to oid
  */
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 7c6391a276..3b7006d54a 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -594,6 +594,23 @@ textout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(TextDatumGetCString(txt));
 }
 
+
+Datum
+textprint(PG_FUNCTION_ARGS)
+{
+	text		*txt = (text *) pg_detoast_datum((struct varlena *)PG_GETARG_POINTER(0));
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	uint32 text_len = VARSIZE(txt) - VARHDRSZ;
+	uint32 ni = pg_hton32(text_len);
+
+	enlargeStringInfo(buf, sizeof(int32) + text_len);
+	memcpy((char *pg_restrict) buf->data + buf->len, &ni, sizeof(uint32));
+	memcpy(buf->data + buf->len + sizeof(int), VARDATA(txt), text_len);
+	buf->len += sizeof(uint32) + text_len;
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		textrecv			- converts external binary format to text
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 85f42be1b3..74eeead4de 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -100,6 +100,10 @@
 { oid => '47', descr => 'I/O',
   proname => 'textout', prorettype => 'cstring', proargtypes => 'text',
   prosrc => 'textout' },
+{
+  oid => '8907', descr => 'I/O',
+  proname => 'textprint', prorettype => 'void', proargtypes => 'text internal',
+  prosrc => 'textprint' },
 { oid => '48', descr => 'I/O',
   proname => 'tidin', prorettype => 'tid', proargtypes => 'cstring',
   prosrc => 'tidin' },
@@ -4718,7 +4722,10 @@
 { oid => '1799', descr => 'I/O',
   proname => 'oidout', prorettype => 'cstring', proargtypes => 'oid',
   prosrc => 'oidout' },
-
+{
+  oid => '9771', descr => 'I/O',
+  proname => 'oidprint', prorettype => 'void', proargtypes => 'oid internal',
+  prosrc => 'oidprint'},
 { oid => '3058', descr => 'concatenate values',
   proname => 'concat', provariadic => 'any', proisstrict => 'f',
   provolatile => 's', prorettype => 'text', proargtypes => 'any',
-- 
2.45.1

#11

Andreas Karlsson

andreas@proxel.se

over 1 year ago

In reply to: David Rowley (#2)

Re: Make printtup a bit faster

On 8/29/24 1:51 PM, David Rowley wrote:

I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

I am interested in working on this, unless Andy Fan wants to do this
work. :) I believe that optimizing the out, in and send functions would
be worth the pain. I get Tom's objections but I do not think adding a
small check would add much overhead compared to the gains we can get.

And given that all of in, out and send could be optimized I do not like
the idea of duplicating all three in the catalog.

David, have you given any thought on the cleanest way to check for if
the new API or the old is the be used for these functions? If not I can
figure out something myself, just wondering if you already had something
in mind.

Andreas

#12

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: Andy Fan (#10)

Re: Make printtup a bit faster

Andy Fan <zhihuifan1213@163.com> writes:

The attached is PoC of this idea, not matter which method are adopted
(rewrite all the outfunction or a optional print function), I think the
benefit will be similar. In the blew test case, it shows us 10%+
improvements. (0.134ms vs 0.110ms)

After working on more {type}_print functions, I'm thinking it is pretty
like the 3rd IO function which shows some confused maintainence
effort. so I agree refactoring the existing out function is a better
idea. I'd like to work on _print function body first for easy review and
testing. after all, if some common issues exists in these changes,
it is better to know that before we working on the 700+ out functions.

--
Best Regards
Andy Fan

#13

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: Andreas Karlsson (#11)

Re: Make printtup a bit faster

Hello David & Andreas,

On 8/29/24 1:51 PM, David Rowley wrote:

I had planned to work on this for PG18, but I'd be happy for some
assistance if you're willing.

I am interested in working on this, unless Andy Fan wants to do this
work. :) I believe that optimizing the out, in and send functions would
be worth the pain. I get Tom's objections but I do not think adding a
small check would add much overhead compared to the gains we can get.

Just to be clearer, I'd like work on the out function only due to my
internal assignment. (Since David planned it for PG18, so it is better
say things clearer eariler). I'd put parts of out(print) function
refactor in the next 2 days. I think it deserves a double check before
working on *all* the out function.

select count(*), count(distinct typoutput) from pg_type;
count | count
-------+-------
621 | 97
(1 row)

select typoutput, count(*) from pg_type group by typoutput having
count(*) > 1 order by 2 desc;

--
Best Regards
Andy Fan

#14

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Andy Fan (#13)

Re: Make printtup a bit faster

Andy Fan <zhihuifan1213@163.com> writes:

Just to be clearer, I'd like work on the out function only due to my
internal assignment. (Since David planned it for PG18, so it is better
say things clearer eariler). I'd put parts of out(print) function
refactor in the next 2 days. I think it deserves a double check before
working on *all* the out function.

Well, sure. You *cannot* write a patch that breaks existing output
functions. Not at the start, and not at the end either. You
should focus on writing the infrastructure and, for starters,
converting just a few output functions as a demonstration. If
that gets accepted then you can work on converting other output
functions a few at a time. But they'll never all be done, because
we can't realistically force extensions to convert.

There are lots of examples of similar incremental conversions in our
project's history. I think the most recent example is the "soft error
handling" work (d9f7f5d32, ccff2d20e, and many follow-on patches).

regards, tom lane

#15

Andy Fan

zhihuifan1213@163.com

over 1 year ago

In reply to: Tom Lane (#14)

4 attachment(s)

Re: Make printtup a bit faster

... I'd put parts of out(print) function
refactor in the next 2 days. I think it deserves a double check before
working on *all* the out function.

Well, sure. You *cannot* write a patch that breaks existing output
functions. Not at the start, and not at the end either. You
should focus on writing the infrastructure and, for starters,
converting just a few output functions as a demonstration. If
that gets accepted then you can work on converting other output
functions a few at a time. But they'll never all be done, because
we can't realistically force extensions to convert.

There are lots of examples of similar incremental conversions in our
project's history. I think the most recent example is the "soft error
handling" work (d9f7f5d32, ccff2d20e, and many follow-on patches).

Thank you for this example! What I want is a smaller step than you said.
Our goal is to make out function take an extra StringInfo input to avoid
the extra palloc, memcpy, strlen. so the *final state* is:

1). implement all the out functions with (datum, StringInfo) as inputs.
2). change all the caller like printtup or any other function.
3). any extensions which doesn't in core has to change their out function
for their data type. The patch in this thread can't help in this area,
but I guess it would not be very hard for extension's author.

The current (intermediate) stage is:
- I finished parts of step (1), 17 functions in toally. and named it as
print function, the function body is exactly same as the out function in
final stage, so this part is reviewable.
- I use them in printtup user case. so it is testable (for correctness
and performance test purpose).

so I want some of you can have a double check on these function bodies, if
anything wrong, I can change it easlier (vs I made the same efforts on
all the type function). does it make sense?

Patch 0001 ~ 0003 is something related and can be reviewed or committed
seperately. and 0004 is the main part of the above.

--
Best Regards
Andy Fan

Attachments:

v20240912-0003-add-unlikely-hint-for-enlargeStringInfo.patchtext/x-diffDownload

From 4fa462d02902e7ac278a312ad60f43c52f403753 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihuifan1213@163.com>
Date: Wed, 11 Sep 2024 12:25:52 +0800
Subject: [PATCH v20240912 3/4] add unlikely hint for enlargeStringInfo.

enlargeStringInfo  has a noticeable ratio in perf peport with a
"select * from pg_class" workload). So add a unlikely  hint in
enlargeStringinfo to avoid some overhead.
---
 src/common/stringinfo.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/common/stringinfo.c b/src/common/stringinfo.c
index eb9d6502fc..838d9b80d0 100644
--- a/src/common/stringinfo.c
+++ b/src/common/stringinfo.c
@@ -297,7 +297,7 @@ enlargeStringInfo(StringInfo str, int needed)
 	 * Guard against out-of-range "needed" values.  Without this, we can get
 	 * an overflow or infinite loop in the following.
 	 */
-	if (needed < 0)				/* should not happen */
+	if (unlikely(needed < 0))				/* should not happen */
 	{
 #ifndef FRONTEND
 		elog(ERROR, "invalid string enlargement request size: %d", needed);
@@ -306,7 +306,7 @@ enlargeStringInfo(StringInfo str, int needed)
 		exit(EXIT_FAILURE);
 #endif
 	}
-	if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
+	if (unlikely(((Size) needed) >= (MaxAllocSize - (Size) str->len)))
 	{
 #ifndef FRONTEND
 		ereport(ERROR,
-- 
2.45.1

v20240912-0001-Refactor-float8out_internval-for-better-pe.patchtext/x-diffDownload

From dc1475195f1350745e45a5b7db381354f99d83da Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihuifan1213@163.com>
Date: Wed, 11 Sep 2024 12:19:57 +0800
Subject: [PATCH v20240912 1/4] Refactor float8out_internval for better
 performance

Some users like cube, geo needs calls float8out_internval to get a
string, and then copy them into its own StringInfo. In this commit,
we would let the user provide a buffer to float8out_internal so that it
can put the data to buffer directly. This commit also reuse the existing
string length to avoid another strlen call in appendStringInfoString.
---
 contrib/cube/cube.c             | 17 +++++++++++++++--
 src/backend/utils/adt/float.c   | 14 ++++++++------
 src/backend/utils/adt/geo_ops.c | 34 ++++++++++++++++++++++-----------
 src/include/catalog/pg_type.h   |  1 +
 src/include/utils/float.h       |  2 +-
 5 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index 1fc447511a..a239acf35c 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -12,6 +12,7 @@
 
 #include "access/gist.h"
 #include "access/stratnum.h"
+#include "catalog/pg_type.h"
 #include "cubedata.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
@@ -295,26 +296,38 @@ cube_out(PG_FUNCTION_ARGS)
 	StringInfoData buf;
 	int			dim = DIM(cube);
 	int			i;
+	int str_len;
 
 	initStringInfo(&buf);
 
 	appendStringInfoChar(&buf, '(');
+
+	/* 3 for ", " and 1 for '\0'. */
+	enlargeStringInfo(&buf, (MAXFLOAT8LEN + 4) * dim);
 	for (i = 0; i < dim; i++)
 	{
 		if (i > 0)
 			appendStringInfoString(&buf, ", ");
-		appendStringInfoString(&buf, float8out_internal(LL_COORD(cube, i)));
+		float8out_internal(LL_COORD(cube, i), buf.data + buf.len, &str_len);
+		buf.len += str_len;
+		buf.data[buf.len] = '\0';
 	}
 	appendStringInfoChar(&buf, ')');
 
 	if (!cube_is_point_internal(cube))
 	{
 		appendStringInfoString(&buf, ",(");
+
+		/* 3 for ", " and 1 for '\0'. */
+		enlargeStringInfo(&buf, (MAXFLOAT8LEN + 4) * dim);
 		for (i = 0; i < dim; i++)
 		{
 			if (i > 0)
 				appendStringInfoString(&buf, ", ");
-			appendStringInfoString(&buf, float8out_internal(UR_COORD(cube, i)));
+
+			float8out_internal(UR_COORD(cube, i), buf.data + buf.len, &str_len);
+			buf.len += str_len;
+			buf.data[buf.len] = '\0';
 		}
 		appendStringInfoChar(&buf, ')');
 	}
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index f709c21e1f..1f31f8540e 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -531,22 +531,24 @@ float8out(PG_FUNCTION_ARGS)
  * float8out_internal - guts of float8out()
  *
  * This is exposed for use by functions that want a reasonably
- * platform-independent way of outputting doubles.
- * The result is always palloc'd.
+ * platform-independent way of outputting doubles, output the
+ * string length to *len;
  */
 char *
-float8out_internal(double num)
+float8out_internal(double num, char *ascii, int *len)
 {
-	char	   *ascii = (char *) palloc(32);
 	int			ndig = DBL_DIG + extra_float_digits;
 
+	if (ascii == NULL)
+		ascii = (char *) palloc(MAXFLOAT8LEN);
+
 	if (extra_float_digits > 0)
 	{
-		double_to_shortest_decimal_buf(num, ascii);
+		*len = double_to_shortest_decimal_buf(num, ascii);
 		return ascii;
 	}
 
-	(void) pg_strfromd(ascii, 32, ndig, num);
+	*len = pg_strfromd(ascii, 32, ndig, num);
 	return ascii;
 }
 
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 07d1649c7b..59f2feaa59 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -29,6 +29,7 @@
 #include <float.h>
 #include <ctype.h>
 
+#include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
 #include "nodes/miscnodes.h"
@@ -202,10 +203,12 @@ single_decode(char *num, float8 *x, char **endptr_p,
 static void
 single_encode(float8 x, StringInfo str)
 {
-	char	   *xstr = float8out_internal(x);
+	int str_len;
+	enlargeStringInfo(str, MAXFLOAT8LEN + 1);
+	float8out_internal(x, str->data + str->len, &str_len);
 
-	appendStringInfoString(str, xstr);
-	pfree(xstr);
+	str->len += str_len;
+	str->data[str->len] = '\0';
 }								/* single_encode() */
 
 static bool
@@ -254,12 +257,20 @@ fail:
 static void
 pair_encode(float8 x, float8 y, StringInfo str)
 {
-	char	   *xstr = float8out_internal(x);
-	char	   *ystr = float8out_internal(y);
+	int data_len;
+	/* the additional 2 is for ',' and '\0' */
+	enlargeStringInfo(str, MAXFLOAT8LEN * 2 + 2);
 
-	appendStringInfo(str, "%s,%s", xstr, ystr);
-	pfree(xstr);
-	pfree(ystr);
+	float8out_internal(x, str->data + str->len, &data_len);
+	str->len += data_len;
+
+	str->data[str->len] = ',';
+	str->len++;
+
+	float8out_internal(y, str->data + str->len, &data_len);
+	str->len += data_len;
+
+	str->data[str->len] = '\0';
 }
 
 static bool
@@ -1023,9 +1034,10 @@ Datum
 line_out(PG_FUNCTION_ARGS)
 {
 	LINE	   *line = PG_GETARG_LINE_P(0);
-	char	   *astr = float8out_internal(line->A);
-	char	   *bstr = float8out_internal(line->B);
-	char	   *cstr = float8out_internal(line->C);
+	int datalen;
+	char	   *astr = float8out_internal(line->A, NULL, &datalen);
+	char	   *bstr = float8out_internal(line->B, NULL, &datalen);
+	char	   *cstr = float8out_internal(line->C, NULL, &datalen);
 
 	PG_RETURN_CSTRING(psprintf("%c%s%c%s%c%s%c", LDELIM_L, astr, DELIM, bstr,
 							   DELIM, cstr, RDELIM_L));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index e925969732..1ab8e4e4e9 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -344,6 +344,7 @@ MAKE_SYSCACHE(TYPENAMENSP, pg_type_typname_nsp_index, 64);
 
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
+#define MAXFLOAT8LEN 32
 
 extern ObjectAddress TypeShellMake(const char *typeName,
 								   Oid typeNamespace,
diff --git a/src/include/utils/float.h b/src/include/utils/float.h
index 7d1badd292..65c395299d 100644
--- a/src/include/utils/float.h
+++ b/src/include/utils/float.h
@@ -47,7 +47,7 @@ extern float8 float8in_internal(char *num, char **endptr_p,
 extern float4 float4in_internal(char *num, char **endptr_p,
 								const char *type_name, const char *orig_string,
 								struct Node *escontext);
-extern char *float8out_internal(float8 num);
+extern char *float8out_internal(float8 num, char *ascii, int *len);
 extern int	float4_cmp_internal(float4 a, float4 b);
 extern int	float8_cmp_internal(float8 a, float8 b);
 
-- 
2.45.1

v20240912-0002-Continue-to-remove-some-unnecesary-strlen-.patchtext/x-diffDownload

From 8b4ba05a9c0e767c1d053365e70966e1d9544179 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihuifan1213@163.com>
Date: Wed, 11 Sep 2024 12:21:39 +0800
Subject: [PATCH v20240912 2/4] Continue to remove some unnecesary strlen calls

sprintf return the number of characters printed (not including the
trailing `\0'), so it is exactly same as strlen. so we can reuse that
value and avoid a strlen call.
---
 src/backend/utils/adt/datetime.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 7abdc62f41..586bec8466 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -4594,6 +4594,7 @@ EncodeInterval(struct pg_itm *itm, int style, char *str)
 	int			fsec = itm->tm_usec;
 	bool		is_before = false;
 	bool		is_zero = true;
+	int			data_len;
 
 	/*
 	 * The sign of year and month are guaranteed to match, since they are
@@ -4651,11 +4652,11 @@ EncodeInterval(struct pg_itm *itm, int style, char *str)
 					char		sec_sign = (hour < 0 || min < 0 ||
 											sec < 0 || fsec < 0) ? '-' : '+';
 
-					sprintf(cp, "%c%d-%d %c%lld %c%lld:%02d:",
-							year_sign, abs(year), abs(mon),
-							day_sign, (long long) i64abs(mday),
-							sec_sign, (long long) i64abs(hour), abs(min));
-					cp += strlen(cp);
+					data_len = sprintf(cp, "%c%d-%d %c%lld %c%lld:%02d:",
+									   year_sign, abs(year), abs(mon),
+									   day_sign, (long long) i64abs(mday),
+									   sec_sign, (long long) i64abs(hour), abs(min));
+					cp += data_len;
 					cp = AppendSeconds(cp, sec, fsec, MAX_INTERVAL_PRECISION, true);
 					*cp = '\0';
 				}
@@ -4665,16 +4666,16 @@ EncodeInterval(struct pg_itm *itm, int style, char *str)
 				}
 				else if (has_day)
 				{
-					sprintf(cp, "%lld %lld:%02d:",
-							(long long) mday, (long long) hour, min);
-					cp += strlen(cp);
+					data_len = sprintf(cp, "%lld %lld:%02d:",
+									   (long long) mday, (long long) hour, min);
+					cp += data_len;
 					cp = AppendSeconds(cp, sec, fsec, MAX_INTERVAL_PRECISION, true);
 					*cp = '\0';
 				}
 				else
 				{
-					sprintf(cp, "%lld:%02d:", (long long) hour, min);
-					cp += strlen(cp);
+					data_len = sprintf(cp, "%lld:%02d:", (long long) hour, min);
+					cp += data_len;
 					cp = AppendSeconds(cp, sec, fsec, MAX_INTERVAL_PRECISION, true);
 					*cp = '\0';
 				}
-- 
2.45.1

v20240912-0004-Make-printtup-a-bit-faster-intermediate-st.patchtext/x-diffDownload

From bbeb539f85c80ffe44f71c12aabd725c8b29ead4 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihuifan1213@163.com>
Date: Thu, 12 Sep 2024 10:03:57 +0000
Subject: [PATCH v20240912 4/4] Make printtup a bit faster (intermediate
 state).

Currently the out function usually allocate its own memory and fill it
with the cstring. After the printtup get the cstring, printtup computes
it string length and copy it to its own StringInfo. So there are some
wastage in this workflow.

In the desired case, out function should take a StringInfo as a input
and fill the data to StringInfo's buffer directly. Within this way,
there is no extra memory allocate, memory copy and probably avoid the
most strlen since the most of the outfunction can compute it easily. for
example a). snprintf return the length encoded string, b). the varlena's
header has a strlen. c). we know the start position before we encode a
Datum and we know the end position after the Datum encoding, so the
length would be similar as 'end_pos - start_pos'.

Since we have 79 out functions to change, this patch just finish part of
them by using a new print function and wish a review of it. If there are
anything wrong, it is better know them earlier.
---
 src/backend/access/common/printtup.c |  80 ++++++++++++++++++--
 src/backend/utils/adt/char.c         |  32 ++++++++
 src/backend/utils/adt/date.c         |  74 ++++++++++++++++++-
 src/backend/utils/adt/datetime.c     |  17 ++++-
 src/backend/utils/adt/float.c        |  53 +++++++++++++-
 src/backend/utils/adt/int.c          |  32 ++++++++
 src/backend/utils/adt/int8.c         |  16 ++++
 src/backend/utils/adt/numeric.c      |  68 +++++++++++++++--
 src/backend/utils/adt/oid.c          |  16 ++++
 src/backend/utils/adt/timestamp.c    | 106 ++++++++++++++++++++++++++-
 src/backend/utils/adt/varchar.c      |  25 +++++++
 src/backend/utils/adt/varlena.c      |  16 ++++
 src/include/catalog/pg_proc.dat      |  83 ++++++++++++++++++++-
 src/include/lib/stringinfo.h         |  19 +++++
 src/include/utils/date.h             |   2 +-
 src/include/utils/datetime.h         |   8 +-
 16 files changed, 618 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index c78cc39308..05f2f76b77 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -19,6 +19,7 @@
 #include "libpq/pqformat.h"
 #include "libpq/protocol.h"
 #include "tcop/pquery.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
@@ -49,6 +50,7 @@ typedef struct
 	bool		typisvarlena;	/* is it varlena (ie possibly toastable)? */
 	int16		format;			/* format code for this column */
 	FmgrInfo	finfo;			/* Precomputed call info for output fn */
+	FmgrInfo	p_finfo;        /* Precomputed call info for print fn if any */
 } PrinttupAttrInfo;
 
 typedef struct
@@ -243,6 +245,47 @@ SendRowDescriptionMessage(StringInfo buf, TupleDesc typeinfo,
 	pq_endmessage_reuse(buf);
 }
 
+static Oid
+get_type_printfn_tmp(Oid type)
+{
+	switch(type)
+	{
+		case OIDOID:
+			return F_OIDPRINT;
+		case TEXTOID:
+			return F_TEXTPRINT;
+		case FLOAT4OID:
+			return F_FLOAT4PRINT;
+		case FLOAT8OID:
+			return F_FLOAT8PRINT;
+		case INT2OID:
+			return F_INT2PRINT;
+		case INT4OID:
+			return F_INT4PRINT;
+		case INT8OID:
+			return F_INT8PRINT;
+		case TIMEOID:
+			return F_TIMEPRINT;
+	    case TIMETZOID:
+			return F_TIMETZPRINT;
+		case TIMESTAMPOID:
+			return F_TIMESTAMPPRINT;
+	    case TIMESTAMPTZOID:
+			return F_TIMESTAMPTZPRINT;
+	    case INTERVALOID:
+			return F_INTERVAL_PRINT;
+		case NUMERICOID:
+			return F_NUMERIC_PRINT;
+		case BPCHAROID:
+			return F_BPCHARPRINT;
+		case VARCHAROID:
+			return F_VARCHARPRINT;
+		case CHAROID:
+			return F_CHARPRINT;
+	}
+	return InvalidOid;
+}
+
 /*
  * Get the lookup info that printtup() needs
  */
@@ -274,10 +317,18 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
 		thisState->format = format;
 		if (format == 0)
 		{
-			getTypeOutputInfo(attr->atttypid,
-							  &thisState->typoutput,
-							  &thisState->typisvarlena);
-			fmgr_info(thisState->typoutput, &thisState->finfo);
+			Oid print_fn = get_type_printfn_tmp(attr->atttypid);
+			if (print_fn != InvalidOid)
+				fmgr_info(print_fn, &thisState->p_finfo);
+			else
+			{
+				getTypeOutputInfo(attr->atttypid,
+								  &thisState->typoutput,
+								  &thisState->typisvarlena);
+				fmgr_info(thisState->typoutput, &thisState->finfo);
+				/* mark print function is invalid */
+				thisState->p_finfo.fn_oid = InvalidOid;
+			}
 		}
 		else if (format == 1)
 		{
@@ -355,10 +406,23 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 		if (thisState->format == 0)
 		{
 			/* Text output */
-			char	   *outputstr;
-
-			outputstr = OutputFunctionCall(&thisState->finfo, attr);
-			pq_sendcountedtext(buf, outputstr, strlen(outputstr));
+			if (thisState->p_finfo.fn_oid)
+			{
+				/*
+				 * Use print function if it is defined.
+				 *
+				 * XXX: we can remove this if statement once we refactor all
+				 * the out function.
+				 */
+				FunctionCall2(&thisState->p_finfo, attr, PointerGetDatum(buf));
+			}
+			else
+			{
+				char	   *outputstr;
+
+				outputstr = OutputFunctionCall(&thisState->finfo, attr);
+				pq_sendcountedtext(buf, outputstr, strlen(outputstr));
+			}
 		}
 		else
 		{
diff --git a/src/backend/utils/adt/char.c b/src/backend/utils/adt/char.c
index 5ee94be0d1..e9f8ba8cf3 100644
--- a/src/backend/utils/adt/char.c
+++ b/src/backend/utils/adt/char.c
@@ -83,6 +83,38 @@ charout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+charprint(PG_FUNCTION_ARGS)
+{
+	char	ch = PG_GETARG_CHAR(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	char	*result;
+	uint32	data_len;
+
+	result = outStringReserveLen(buf, 5);
+
+	if (IS_HIGHBIT_SET(ch))
+	{
+		result[0] = '\\';
+		result[1] = TOOCTAL(((unsigned char) ch) >> 6);
+		result[2] = TOOCTAL((((unsigned char) ch) >> 3) & 07);
+		result[3] = TOOCTAL(((unsigned char) ch) & 07);
+		result[4] = '\0';
+		data_len = 4;
+	}
+	else
+	{
+		/* This produces acceptable results for 0x00 as well */
+		result[0] = ch;
+		result[1] = '\0';
+		data_len = 1;
+	}
+
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		charrecv			- converts external binary format to char
  *
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 9c854e0e5c..4f5c939d2a 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -202,6 +202,31 @@ date_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+date_print(PG_FUNCTION_ARGS)
+{
+	DateADT		date = PG_GETARG_DATEADT(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	char *data;
+	uint32 data_len;
+
+	struct pg_tm tt,
+			   *tm = &tt;
+
+	data = outStringReserveLen(buf, MAXDATELEN + 1);
+
+	if (DATE_NOT_FINITE(date))
+		data_len = EncodeSpecialDate(date, data);
+	else
+	{
+		j2date(date + POSTGRES_EPOCH_JDATE,
+			   &(tm->tm_year), &(tm->tm_mon), &(tm->tm_mday));
+		data_len =  EncodeDateOnly(tm, DateStyle, data);
+	}
+	outStringCompletePhase(buf, data_len);
+	PG_RETURN_VOID();
+}
+
 /*
  *		date_recv			- converts external binary format to date
  */
@@ -290,13 +315,21 @@ make_date(PG_FUNCTION_ARGS)
 /*
  * Convert reserved date values to string.
  */
-void
+int
 EncodeSpecialDate(DateADT dt, char *str)
 {
 	if (DATE_IS_NOBEGIN(dt))
+	{
 		strcpy(str, EARLY);
+		/* the return value can be computed at compiling time. */
+		return strlen(EARLY);
+	}
 	else if (DATE_IS_NOEND(dt))
+	{
 		strcpy(str, LATE);
+		/* the return value can be computed at compiling time. */
+		return strlen(LATE);
+	}
 	else						/* shouldn't happen */
 		elog(ERROR, "invalid argument for EncodeSpecialDate");
 }
@@ -1514,6 +1547,25 @@ time_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+time_print(PG_FUNCTION_ARGS)
+{
+	TimeADT		time = PG_GETARG_TIMEADT(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	struct pg_tm tt,
+			   *tm = &tt;
+	fsec_t		fsec;
+	char	*data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, MAXDATELEN + 1);
+	time2tm(time, tm, &fsec);
+	data_len = EncodeTimeOnly(tm, fsec, false, 0, DateStyle, data);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		time_recv			- converts external binary format to time
  */
@@ -2328,6 +2380,26 @@ timetz_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+timetz_print(PG_FUNCTION_ARGS)
+{
+	TimeTzADT		*time = PG_GETARG_TIMETZADT_P(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	struct pg_tm tt,
+			   *tm = &tt;
+	fsec_t		fsec;
+	char	*data;
+	uint32 data_len;
+	int tz;
+
+	data = outStringReserveLen(buf, MAXDATELEN + 1);
+	timetz2tm(time, tm, &fsec, &tz);
+	data_len = EncodeTimeOnly(tm, fsec, true, tz, DateStyle, data);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		timetz_recv			- converts external binary format to timetz
  */
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 586bec8466..a6ee310c52 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -4223,9 +4223,10 @@ EncodeTimezone(char *str, int tz, int style)
 /* EncodeDateOnly()
  * Encode date as local time.
  */
-void
+int
 EncodeDateOnly(struct pg_tm *tm, int style, char *str)
 {
+	char *start = str;
 	Assert(tm->tm_mon >= 1 && tm->tm_mon <= MONTHS_PER_YEAR);
 
 	switch (style)
@@ -4297,6 +4298,7 @@ EncodeDateOnly(struct pg_tm *tm, int style, char *str)
 		str += 3;
 	}
 	*str = '\0';
+	return str - start;
 }
 
 
@@ -4307,10 +4309,13 @@ EncodeDateOnly(struct pg_tm *tm, int style, char *str)
  * a time zone (the difference between time and timetz types), tz is the
  * numeric time zone offset, style is the date style, str is where to write the
  * output.
+ *
+ * returns the strlen of the encoded format.
  */
-void
+int
 EncodeTimeOnly(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, int style, char *str)
 {
+	char *start = str;
 	str = pg_ultostr_zeropad(str, tm->tm_hour, 2);
 	*str++ = ':';
 	str = pg_ultostr_zeropad(str, tm->tm_min, 2);
@@ -4319,6 +4324,7 @@ EncodeTimeOnly(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, int style,
 	if (print_tz)
 		str = EncodeTimezone(str, tz, style);
 	*str = '\0';
+	return str - start;
 }
 
 
@@ -4337,11 +4343,14 @@ EncodeTimeOnly(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, int style,
  *	ISO - yyyy-mm-dd hh:mm:ss+/-tz
  *	German - dd.mm.yyyy hh:mm:ss tz
  *	XSD - yyyy-mm-ddThh:mm:ss.ss+/-tz
+ *
+ *  return the strlen of the encoded data.
  */
-void
+int
 EncodeDateTime(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, const char *tzn, int style, char *str)
 {
 	int			day;
+	char		*start = str;
 
 	Assert(tm->tm_mon >= 1 && tm->tm_mon <= MONTHS_PER_YEAR);
 
@@ -4501,6 +4510,8 @@ EncodeDateTime(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, const char
 		str += 3;
 	}
 	*str = '\0';
+
+	return str - start;
 }
 
 
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 1f31f8540e..54ea40c1ef 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -333,6 +333,32 @@ float4out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(ascii);
 }
 
+
+Datum
+float4print(PG_FUNCTION_ARGS)
+{
+	float4		num = PG_GETARG_FLOAT4(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	int data_len;
+	char *ascii;
+	int			ndig = FLT_DIG + extra_float_digits;
+
+	ascii = outStringReserveLen(buf, 32);
+
+	if (extra_float_digits > 0)
+		data_len = float_to_shortest_decimal_buf(num, ascii);
+	else
+		data_len =  pg_strfromd(ascii, 32, ndig, num);
+	if (data_len == -1)
+	{
+		/* XXX, think more of this. */
+		elog(ERROR, "failed on float4print");
+	}
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		float4recv			- converts external binary format to float4
  */
@@ -523,10 +549,35 @@ Datum
 float8out(PG_FUNCTION_ARGS)
 {
 	float8		num = PG_GETARG_FLOAT8(0);
+	int len;
 
-	PG_RETURN_CSTRING(float8out_internal(num));
+	PG_RETURN_CSTRING(float8out_internal(num, NULL, &len));
 }
 
+Datum
+float8print(PG_FUNCTION_ARGS)
+{
+	float8		num = PG_GETARG_FLOAT8(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	int data_len;
+	char *ascii;
+
+	ascii = outStringReserveLen(buf, 32);
+
+	float8out_internal(num, ascii, &data_len);
+
+	if (data_len == -1)
+	{
+		/* XXX, think more of this. */
+		elog(ERROR, "failed on float8print");
+	}
+
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
+
 /*
  * float8out_internal - guts of float8out()
  *
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 234f20796b..8a7a184885 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -80,6 +80,22 @@ int2out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+
+Datum
+int2print(PG_FUNCTION_ARGS)
+{
+	int16		arg1 = PG_GETARG_INT16(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	char *data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, 7);
+	data_len = pg_itoa(arg1, data);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		int2recv			- converts external binary format to int2
  */
@@ -304,6 +320,22 @@ int4out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+int4print(PG_FUNCTION_ARGS)
+{
+	int32		arg1 = PG_GETARG_INT32(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	char *data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, 12);
+	data_len = pg_ltoa(arg1, data);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
+
 /*
  *		int4recv			- converts external binary format to int4
  */
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 54fa3bc379..a2e575ca5f 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -76,6 +76,22 @@ int8out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+int8print(PG_FUNCTION_ARGS)
+{
+	int64		arg1 = PG_GETARG_INT64(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	char *data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, MAXINT8LEN + 1);
+	data_len = pg_lltoa(arg1, data);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
+
 /*
  *		int8recv			- converts external binary format to int8
  */
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 15b517ba98..68635ac74f 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -516,7 +516,7 @@ static bool set_var_from_non_decimal_integer_str(const char *str,
 static void set_var_from_num(Numeric num, NumericVar *dest);
 static void init_var_from_num(Numeric num, NumericVar *dest);
 static void set_var_from_var(const NumericVar *value, NumericVar *dest);
-static char *get_str_from_var(const NumericVar *var);
+static char *get_str_from_var(const NumericVar *var, StringInfo buf);
 static char *get_str_from_var_sci(const NumericVar *var, int rscale);
 
 static void numericvar_serialize(StringInfo buf, const NumericVar *var);
@@ -839,11 +839,52 @@ numeric_out(PG_FUNCTION_ARGS)
 	 */
 	init_var_from_num(num, &x);
 
-	str = get_str_from_var(&x);
+	str = get_str_from_var(&x, NULL);
 
 	PG_RETURN_CSTRING(str);
 }
 
+Datum
+numeric_print(PG_FUNCTION_ARGS)
+{
+	Numeric		num = PG_GETARG_NUMERIC(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+
+	NumericVar	x;
+
+	/*
+	 * Handle NaN and infinities
+	 */
+	if (NUMERIC_IS_SPECIAL(num))
+	{
+		const char* special_str;
+		char	*data;
+		uint32 data_len;
+
+		if (NUMERIC_IS_PINF(num))
+			special_str = "Infinity";
+		else if (NUMERIC_IS_NINF(num))
+			special_str = "-Infinity";
+		else
+			special_str = "NaN";
+
+		data_len = strlen(special_str) + 1;
+		data = outStringReserveLen(buf, data_len);
+		memcpy(data, special_str, data_len);
+		outStringCompletePhase(buf, data_len);
+		PG_RETURN_VOID();
+	}
+
+	/*
+	 * Get the number in the variable format.
+	 */
+	init_var_from_num(num, &x);
+
+	(void) get_str_from_var(&x, buf);
+
+	PG_RETURN_VOID();
+}
+
 /*
  * numeric_is_nan() -
  *
@@ -1046,7 +1087,7 @@ numeric_normalize(Numeric num)
 
 	init_var_from_num(num, &x);
 
-	str = get_str_from_var(&x);
+	str = get_str_from_var(&x, NULL);
 
 	/* If there's no decimal point, there's certainly nothing to remove. */
 	if (strchr(str, '.') != NULL)
@@ -7491,7 +7532,7 @@ set_var_from_var(const NumericVar *value, NumericVar *dest)
  *	Returns a palloc'd string.
  */
 static char *
-get_str_from_var(const NumericVar *var)
+get_str_from_var(const NumericVar *var, StringInfo buf)
 {
 	int			dscale;
 	char	   *str;
@@ -7519,7 +7560,14 @@ get_str_from_var(const NumericVar *var)
 	if (i <= 0)
 		i = 1;
 
-	str = palloc(i + dscale + DEC_DIGITS + 2);
+	if (buf == NULL)
+	{
+		str = palloc(i + dscale + DEC_DIGITS + 2);
+	}
+	else
+	{
+		str = outStringReserveLen(buf, i + dscale + DEC_DIGITS + 2);
+	}
 	cp = str;
 
 	/*
@@ -7618,6 +7666,12 @@ get_str_from_var(const NumericVar *var)
 	 * terminate the string and return it
 	 */
 	*cp = '\0';
+
+	if (buf != NULL)
+	{
+		uint32 data_len = cp - str;
+		outStringCompletePhase(buf, data_len);
+	}
 	return str;
 }
 
@@ -7691,7 +7745,7 @@ get_str_from_var_sci(const NumericVar *var, int rscale)
 
 	power_ten_int(exponent, &tmp_var);
 	div_var(var, &tmp_var, &tmp_var, rscale, true);
-	sig_out = get_str_from_var(&tmp_var);
+	sig_out = get_str_from_var(&tmp_var, NULL);
 
 	free_var(&tmp_var);
 
@@ -8344,7 +8398,7 @@ numericvar_to_double_no_overflow(const NumericVar *var)
 	double		val;
 	char	   *endptr;
 
-	tmp = get_str_from_var(var);
+	tmp = get_str_from_var(var, NULL);
 
 	/* unlike float8in, we ignore ERANGE from strtod */
 	val = strtod(tmp, &endptr);
diff --git a/src/backend/utils/adt/oid.c b/src/backend/utils/adt/oid.c
index 56fb1fd77c..db34d9b6ea 100644
--- a/src/backend/utils/adt/oid.c
+++ b/src/backend/utils/adt/oid.c
@@ -53,6 +53,22 @@ oidout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+oidprint(PG_FUNCTION_ARGS)
+{
+	Oid			o = PG_GETARG_OID(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	uint32 data_len;
+	char *data;
+
+	/* 12 is the max length for an oid's text presentation. */
+	data = outStringReserveLen(buf, 12);
+	data_len = pg_snprintf(data, 12, "%u", o);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		oidrecv			- converts external binary format to oid
  */
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index db9eea9098..9068682bca 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -95,7 +95,7 @@ static bool AdjustIntervalForTypmod(Interval *interval, int32 typmod,
 static TimestampTz timestamp2timestamptz(Timestamp timestamp);
 static Timestamp timestamptz2timestamp(TimestampTz timestamp);
 
-static void EncodeSpecialInterval(const Interval *interval, char *str);
+static int EncodeSpecialInterval(const Interval *interval, char *str);
 static void interval_um_internal(const Interval *interval, Interval *result);
 
 /* common code for timestamptypmodin and timestamptztypmodin */
@@ -252,6 +252,33 @@ timestamp_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+timestamp_print(PG_FUNCTION_ARGS)
+{
+	Timestamp	timestamp = PG_GETARG_TIMESTAMP(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	struct pg_tm tt,
+			   *tm = &tt;
+	fsec_t		fsec;
+	char	*data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, MAXDATELEN + 1);
+
+	if (TIMESTAMP_NOT_FINITE(timestamp))
+		data_len = EncodeSpecialTimestamp(timestamp, data);
+	else if (timestamp2tm(timestamp, NULL, tm, &fsec, NULL, NULL) == 0)
+		data_len = EncodeDateTime(tm, fsec, false, 0, NULL, DateStyle, data);
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+				 errmsg("timestamp out of range")));
+
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		timestamp_recv			- converts external binary format to timestamp
  */
@@ -796,6 +823,36 @@ timestamptz_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+timestamptz_print(PG_FUNCTION_ARGS)
+{
+	TimestampTz	timestamp = PG_GETARG_TIMESTAMPTZ(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	int tz;
+	const char *tzn;
+	struct pg_tm tt,
+			   *tm = &tt;
+	fsec_t		fsec;
+	char	*data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, MAXDATELEN + 1);
+
+	if (TIMESTAMP_NOT_FINITE(timestamp))
+		data_len = EncodeSpecialTimestamp(timestamp, data);
+	else if (timestamp2tm(timestamp, &tz, tm, &fsec, &tzn, NULL) == 0)
+		data_len = EncodeDateTime(tm, fsec, true, tz, tzn, DateStyle, data);
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+				 errmsg("timestamp out of range")));
+
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
+
 /*
  *		timestamptz_recv			- converts external binary format to timestamptz
  */
@@ -989,6 +1046,35 @@ interval_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(result);
 }
 
+Datum
+interval_print(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	struct pg_itm tt,
+			   *itm = &tt;
+	char *data;
+	uint32 data_len;
+
+	data = outStringReserveLen(buf, MAXDATELEN + 1);
+
+	if (INTERVAL_NOT_FINITE(span))
+		data_len = EncodeSpecialInterval(span, data);
+	else
+	{
+		interval2itm(*span, itm);
+		EncodeInterval(itm, IntervalStyle, data);
+		/*
+		 * XXX: making EncodeInterval returns a string len is error-prone for me.
+		 * so call strlen directly on the result.
+		 */
+		data_len = strlen(data);
+	}
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		interval_recv			- converts external binary format to interval
  */
@@ -1582,26 +1668,40 @@ out_of_range:
 /* EncodeSpecialTimestamp()
  * Convert reserved timestamp data type to string.
  */
-void
+int
 EncodeSpecialTimestamp(Timestamp dt, char *str)
 {
 	if (TIMESTAMP_IS_NOBEGIN(dt))
+	{
 		strcpy(str, EARLY);
+		return strlen(EARLY);
+	}
 	else if (TIMESTAMP_IS_NOEND(dt))
+	{
 		strcpy(str, LATE);
+		return strlen(LATE);
+	}
 	else						/* shouldn't happen */
 		elog(ERROR, "invalid argument for EncodeSpecialTimestamp");
 }
 
-static void
+static int
 EncodeSpecialInterval(const Interval *interval, char *str)
 {
 	if (INTERVAL_IS_NOBEGIN(interval))
+	{
 		strcpy(str, EARLY);
+		return strlen(EARLY);
+	}
 	else if (INTERVAL_IS_NOEND(interval))
+	{
 		strcpy(str, LATE);
+		return strlen(LATE);
+	}
 	else						/* shouldn't happen */
 		elog(ERROR, "invalid argument for EncodeSpecialInterval");
+
+	return 0;
 }
 
 Datum
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 0c219dcc77..db4cc7c3ef 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -223,6 +223,25 @@ bpcharout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(TextDatumGetCString(txt));
 }
 
+Datum
+bpcharprint(PG_FUNCTION_ARGS)
+{
+	Datum		txt = PG_GETARG_DATUM(0);
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+
+	/* XXX: improve here since we can put the cstring into buf directly. */
+	char	*data = TextDatumGetCString(txt);
+	uint32	data_len = strlen(data);
+	char	*target;
+
+	target = outStringReserveLen(buf, data_len);
+	memcpy(target, data, data_len);
+	outStringCompletePhase(buf, data_len);
+
+	PG_RETURN_VOID();
+}
+
+
 /*
  *		bpcharrecv			- converts external binary format to bpchar
  */
@@ -520,6 +539,12 @@ varcharout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(TextDatumGetCString(txt));
 }
 
+Datum
+varcharprint(PG_FUNCTION_ARGS)
+{
+	return bpcharprint(fcinfo);
+}
+
 /*
  *		varcharrecv			- converts external binary format to varchar
  */
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 7c6391a276..488d770bd2 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -594,6 +594,22 @@ textout(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(TextDatumGetCString(txt));
 }
 
+
+Datum
+textprint(PG_FUNCTION_ARGS)
+{
+	text		*txt = (text *) pg_detoast_datum((struct varlena *)PG_GETARG_POINTER(0));
+	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(1);
+	uint32 text_len = VARSIZE(txt) - VARHDRSZ;
+	char *data;
+
+	data = outStringReserveLen(buf, text_len);
+	memcpy(data, VARDATA(txt), text_len);
+	outStringCompletePhase(buf, text_len);
+
+	PG_RETURN_VOID();
+}
+
 /*
  *		textrecv			- converts external binary format to text
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 85f42be1b3..ab251a653b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -4718,7 +4718,6 @@
 { oid => '1799', descr => 'I/O',
   proname => 'oidout', prorettype => 'cstring', proargtypes => 'oid',
   prosrc => 'oidout' },
-
 { oid => '3058', descr => 'concatenate values',
   proname => 'concat', provariadic => 'any', proisstrict => 'f',
   provolatile => 's', prorettype => 'text', proargtypes => 'any',
@@ -12255,4 +12254,86 @@
   proargnames => '{summarized_tli,summarized_lsn,pending_lsn,summarizer_pid}',
   prosrc => 'pg_get_wal_summarizer_state' },
 
+{
+  oid => '9771', descr => 'I/O',
+  proname => 'oidprint', prorettype => 'void', proargtypes => 'oid internal',
+  prosrc => 'oidprint'},
+{
+  oid => '8907', descr => 'I/O',
+  proname => 'textprint', prorettype => 'void', proargtypes => 'text internal',
+  prosrc => 'textprint' },
+
+{
+  oid => '9234', descr => 'I/O',
+  proname => 'float4print', prorettype => 'void', proargtypes => 'float4 internal',
+  prosrc => 'float4print' },
+{
+  oid => '6313', descr => 'I/O',
+  proname => 'float8print', prorettype => 'void', proargtypes => 'float8 internal',
+  prosrc => 'float8print' },
+
+{
+  oid => '4099', descr => 'I/O',
+  proname => 'int2print', prorettype => 'void', proargtypes => 'int2 internal',
+  prosrc => 'int2print' },
+{
+  oid => '4100', descr => 'I/O',
+  proname => 'int4print', prorettype => 'void', proargtypes => 'int4 internal',
+  prosrc => 'int4print' },
+{
+  oid => '4551', descr => 'I/O',
+  proname => 'int8print', prorettype => 'void', proargtypes => 'int8 internal',
+  prosrc => 'int8print' },
+
+{
+  oid => '4552', descr => 'I/O',
+  proname => 'timeprint', prorettype => 'void', proargtypes => 'time internal',
+  prosrc => 'time_print' },
+
+{
+  oid => '4553', descr => 'I/O',
+  proname => 'timetzprint', prorettype => 'void', proargtypes => 'timetz internal',
+  prosrc => 'timetz_print' },
+
+{
+  oid => '4554', descr => 'I/O',
+  proname => 'dateprint', prorettype => 'void', proargtypes => 'date internal',
+  prosrc => 'date_print'},
+
+
+{
+  oid => '4555', descr => 'I/O',
+  proname => 'timestampprint', prorettype => 'void', proargtypes => 'timestamp internal',
+  prosrc => 'timestamp_print'},
+
+{
+  oid => '4556', descr => 'I/O',
+  proname => 'timestamptzprint', prorettype => 'void', proargtypes => 'timestamptz internal',
+  prosrc => 'timestamptz_print'},
+
+{
+  oid => '4557', descr => 'I/O',
+  proname => 'interval_print', prorettype => 'void', proargtypes => 'interval internal',
+  prosrc => 'interval_print'},
+
+{
+  oid => '4558', descr => 'I/O',
+  proname => 'numeric_print', prorettype => 'void', proargtypes => 'numeric internal',
+  prosrc => 'numeric_print'},
+
+{
+  oid => '4559', descr => 'I/O',
+  proname => 'charprint', prorettype => 'void', proargtypes => 'char internal',
+  prosrc => 'charprint'},
+
+{
+  oid => '4560', descr => 'I/O',
+  proname => 'bpcharprint', prorettype => 'void', proargtypes => 'bpchar internal',
+  prosrc => 'bpcharprint'},
+
+{
+  oid => '4561', descr => 'I/O',
+  proname => 'varcharprint', prorettype => 'void', proargtypes => 'varchar internal',
+  prosrc => 'varcharprint'},
+
 ]
diff --git a/src/include/lib/stringinfo.h b/src/include/lib/stringinfo.h
index cd9632e3fc..893a7825a2 100644
--- a/src/include/lib/stringinfo.h
+++ b/src/include/lib/stringinfo.h
@@ -240,4 +240,23 @@ extern void enlargeStringInfo(StringInfo str, int needed);
  */
 extern void destroyStringInfo(StringInfo str);
 
+/*
+ * outString - The StringInfo used in type specific out function.
+ */
+static inline char *
+outStringReserveLen(StringInfo buf, uint32 data_len)
+{
+	/* sizeof(uint32) is for storing the data_len itself. */
+	enlargeStringInfo(buf, sizeof(uint32) + data_len);
+	return buf->data + buf->len + sizeof(uint32);
+}
+
+/* define outStringCompletePhase as macro to avoid including pg_bswap.h */
+#define outStringCompletePhase(buf, data_len) \
+{ \
+	*(uint32 *)(buf->data + buf->len) = pg_hton32(data_len); \
+	buf->len += sizeof(uint32) + data_len; \
+}
+
+
 #endif							/* STRINGINFO_H */
diff --git a/src/include/utils/date.h b/src/include/utils/date.h
index aaed6471a6..5fe73d29da 100644
--- a/src/include/utils/date.h
+++ b/src/include/utils/date.h
@@ -103,7 +103,7 @@ extern TimestampTz date2timestamptz_opt_overflow(DateADT dateVal, int *overflow)
 extern int32 date_cmp_timestamp_internal(DateADT dateVal, Timestamp dt2);
 extern int32 date_cmp_timestamptz_internal(DateADT dateVal, TimestampTz dt2);
 
-extern void EncodeSpecialDate(DateADT dt, char *str);
+extern int EncodeSpecialDate(DateADT dt, char *str);
 extern DateADT GetSQLCurrentDate(void);
 extern TimeTzADT *GetSQLCurrentTime(int32 typmod);
 extern TimeADT GetSQLLocalTime(int32 typmod);
diff --git a/src/include/utils/datetime.h b/src/include/utils/datetime.h
index e4ac2b8e7f..9d994fd851 100644
--- a/src/include/utils/datetime.h
+++ b/src/include/utils/datetime.h
@@ -330,11 +330,11 @@ extern int	DetermineTimeZoneAbbrevOffset(struct pg_tm *tm, const char *abbr, pg_
 extern int	DetermineTimeZoneAbbrevOffsetTS(TimestampTz ts, const char *abbr,
 											pg_tz *tzp, int *isdst);
 
-extern void EncodeDateOnly(struct pg_tm *tm, int style, char *str);
-extern void EncodeTimeOnly(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, int style, char *str);
-extern void EncodeDateTime(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, const char *tzn, int style, char *str);
+extern int EncodeDateOnly(struct pg_tm *tm, int style, char *str);
+extern int EncodeTimeOnly(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, int style, char *str);
+extern int EncodeDateTime(struct pg_tm *tm, fsec_t fsec, bool print_tz, int tz, const char *tzn, int style, char *str);
 extern void EncodeInterval(struct pg_itm *itm, int style, char *str);
-extern void EncodeSpecialTimestamp(Timestamp dt, char *str);
+extern int EncodeSpecialTimestamp(Timestamp dt, char *str);
 
 extern int	ValidateDate(int fmask, bool isjulian, bool is2digits, bool bc,
 						 struct pg_tm *tm);
-- 
2.45.1