Modifying update_attstats of analyze.c for C Strings
Hi,
I am trying to implement a functionality that is similar to ANALYZE, but
needs to have different values (the values will be valid and is stored in
inp->str[][]) for MCV/Histogram Bounds in case the column under
consideration is varchar (C Strings). I have written a function
*dummy_update_attstats* with the following changes. Other things remain the
same as in *update_attstats* of *~/src/backend/commands/analyze.c*
*---*
*{*
* ArrayType *arry;*
* if (*
*strcmp(col_type,"varchar") == 0*
*)*
* arry = construct_array(stats->stavalues[k],*
* stats->numvalues[k],*
* CSTRINGOID,*
* -2,*
* false,*
* 'c');*
* else*
* arry = construct_array(stats->stavalues[k],*
* stats->numvalues[k],*
* stats->statypid[k],*
* stats->statyplen[k],*
* stats->statypbyval[k],*
* stats->statypalign[k]);*
* values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
---
and I update the hist_values in the appropriate function as:
---
*if (strcmp(col_type,"varchar") == 0**)*
* hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),*
* false,*
* -2);*
*---*
I tried this based on the following reference :
/messages/by-id/attachment/20352/vacattrstats-extend.diff
My issue is : When I use my way for strings, the MCV/histogram_bounds in
pg_stats doesn't have double quotes (" ") surrounding string. That is,
If normal *update_attstats* is used, histogram_bounds for *TPCH
nation(n_name)* are : *"ALGERIA ","ARGENTINA ",...*
If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
nation(n_name)* are : *ALGERIA,ARGENTINA,...*
This becomes an issue if the string has ',' (commas), like for example in
*n_comment* column of *nation* table.
Could someone point out the problem and suggest a solution?
Thank you.
--
Regards,
Ashoke
As a follow-up question,
I found some of the varchar column types, in which the histogram_bounds are
not being surrounded in double quotes (" ") even in the default
implementation.
Ex : *c_name* column of *Customer* table
I also found histogram_bounds in which only some strings are surrounded in
double quotes and some are not.
Ex : *c_address *column of* Customer *table
Why are there such inconsistencies? How is this determined?
Thank you.
On Tue, Jul 8, 2014 at 10:52 AM, Ashoke <s.ashoke@gmail.com> wrote:
Hi,
I am trying to implement a functionality that is similar to ANALYZE, but
needs to have different values (the values will be valid and is stored in
inp->str[][]) for MCV/Histogram Bounds in case the column under
consideration is varchar (C Strings). I have written a function
*dummy_update_attstats* with the following changes. Other things remain
the same as in *update_attstats* of *~/src/backend/commands/analyze.c**---*
*{** ArrayType *arry; *
* if (*
*strcmp(col_type,"varchar") == 0*
* )*
* arry = construct_array(stats->stavalues[k],*
* stats->numvalues[k], *
* CSTRINGOID,*
* -2, *
* false,*
* 'c'); *
* else*
* arry = construct_array(stats->stavalues[k], *
* stats->numvalues[k],*
* stats->statypid[k], *
* stats->statyplen[k],*
* stats->statypbyval[k], *
* stats->statypalign[k]);*
* values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
---and I update the hist_values in the appropriate function as:
---*if (strcmp(col_type,"varchar") == 0**)*
* hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),*
* false,*
* -2);*
*---*I tried this based on the following reference :
/messages/by-id/attachment/20352/vacattrstats-extend.diffMy issue is : When I use my way for strings, the MCV/histogram_bounds in
pg_stats doesn't have double quotes (" ") surrounding string. That is,If normal *update_attstats* is used, histogram_bounds for *TPCH
nation(n_name)* are : *"ALGERIA ","ARGENTINA ",...*
If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
nation(n_name)* are : *ALGERIA,ARGENTINA,...*This becomes an issue if the string has ',' (commas), like for example in
*n_comment* column of *nation* table.Could someone point out the problem and suggest a solution?
Thank you.
--
Regards,
Ashoke
--
Regards,
Ashoke
Ok, I was able to figure out that when strings contained 'spaces',
PostgreSQL appends them with double quotes.
On Tue, Jul 8, 2014 at 12:04 PM, Ashoke <s.ashoke@gmail.com> wrote:
As a follow-up question,
I found some of the varchar column types, in which the histogram_bounds
are not being surrounded in double quotes (" ") even in the default
implementation.
Ex : *c_name* column of *Customer* tableI also found histogram_bounds in which only some strings are surrounded in
double quotes and some are not.
Ex : *c_address *column of* Customer *tableWhy are there such inconsistencies? How is this determined?
Thank you.
On Tue, Jul 8, 2014 at 10:52 AM, Ashoke <s.ashoke@gmail.com> wrote:
Hi,
I am trying to implement a functionality that is similar to ANALYZE, but
needs to have different values (the values will be valid and is stored in
inp->str[][]) for MCV/Histogram Bounds in case the column under
consideration is varchar (C Strings). I have written a function
*dummy_update_attstats* with the following changes. Other things remain
the same as in *update_attstats* of *~/src/backend/commands/analyze.c**---*
*{** ArrayType *arry; *
* if (*
*strcmp(col_type,"varchar") == 0*
* )*
* arry = construct_array(stats->stavalues[k],*
* stats->numvalues[k], *
* CSTRINGOID,*
* -2, *
* false,*
* 'c'); *
* else*
* arry = construct_array(stats->stavalues[k], *
* stats->numvalues[k],*
* stats->statypid[k], *
* stats->statyplen[k],*
* stats->statypbyval[k], *
* stats->statypalign[k]);*
* values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
---and I update the hist_values in the appropriate function as:
---*if (strcmp(col_type,"varchar") == 0**)*
* hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),*
* false,*
* -2);*
*---*I tried this based on the following reference :
/messages/by-id/attachment/20352/vacattrstats-extend.diffMy issue is : When I use my way for strings, the MCV/histogram_bounds in
pg_stats doesn't have double quotes (" ") surrounding string. That is,If normal *update_attstats* is used, histogram_bounds for *TPCH
nation(n_name)* are : *"ALGERIA ","ARGENTINA ",...*
If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
nation(n_name)* are : *ALGERIA,ARGENTINA,...*This becomes an issue if the string has ',' (commas), like for example in
*n_comment* column of *nation* table.Could someone point out the problem and suggest a solution?
Thank you.
--
Regards,
Ashoke--
Regards,
Ashoke
--
Regards,
Ashoke