Index size

Started by Samuel Williamsover 9 years ago21 messagesgeneral

space.ship.traveller@gmail.com

over 9 years ago

I'd like to understand a bit more about indexes in PG.

When I have a row in a table, and an index, say, for a single column,
does that duplicate the entire row on disk? Or is there some kind of
id lookup involved?

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Adrian Klaver

adrian.klaver@aklaver.com

over 9 years ago

In reply to: Samuel Williams (#1)

Re: Index size

On 11/29/2016 03:30 PM, Samuel Williams wrote:

I'd like to understand a bit more about indexes in PG.

https://www.postgresql.org/docs/9.5/static/indexam.html

"An index is effectively a mapping from some data key values to tuple
identifiers, or TIDs, of row versions (tuples) in the index's parent
table. A TID consists of a block number and an item number within that
block (see Section 63.6). This is sufficient information to fetch a
particular row version from the table. ..."

When I have a row in a table, and an index, say, for a single column,
does that duplicate the entire row on disk? Or is there some kind of
id lookup involved?

Thanks.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Samuel Williams

space.ship.traveller@gmail.com

over 9 years ago

In reply to: Adrian Klaver (#2)

Re: Index size

Is there any reason why for the same data set, and same indexes, that
the data in postgres would be significantly larger than
innodb/mariadb?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 9 years ago

In reply to: Samuel Williams (#3)

Re: Index size

On Thu, Dec 01, 2016 at 12:38:37AM +1300, Samuel Williams wrote:

Is there any reason why for the same data set, and same indexes, that
the data in postgres would be significantly larger than
innodb/mariadb?

Sure: because they do entirely different things on-disk.

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Adrian Klaver

adrian.klaver@aklaver.com

over 9 years ago

In reply to: Samuel Williams (#3)

Re: Index size

On 11/30/2016 03:38 AM, Samuel Williams wrote:

Is there any reason why for the same data set, and same indexes, that
the data in postgres would be significantly larger than
innodb/mariadb?

Hard to say without the table/index definitions and some indication of
what the data is.

What version of Postgres?

Also how are you measuring the size of the data set?

Have you looked at/tried the functions here?:

https://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADMIN-DBOBJECT

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Samuel Williams

space.ship.traveller@gmail.com

over 9 years ago

In reply to: Adrian Klaver (#5)

Re: Index size

Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

Postgres (about 250 million rows):

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

For the composite indexes, if any value is null, we are also less
interested in it, so it could be ignored from the index.

Thanks,
Samuel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Steve Atkins

steve@blighty.com

over 9 years ago

In reply to: Samuel Williams (#6)

Re: Index size

On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:

Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

index_user_event_on_what_category_id_created_at_latlng | 22806.00
index_user_event_for_reporting | 18211.00
index_user_event_on_created_at | 9519.00
index_user_event_on_user_id | 6884.00
index_user_event_on_poi_id | 4891.00
index_user_event_on_deal_id | 3979.00

Postgres (about 250 million rows):

index_user_event_on_what_category_id_created_at_latlng | 25 GB
index_user_event_for_reporting | 19 GB
index_user_event_on_created_at | 7445 MB
index_user_event_on_user_id | 7274 MB
index_user_event_on_deal_id | 7132 MB
index_user_event_on_poi_id | 7099 MB

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Melvin Davidson

melvin6925@gmail.com

over 9 years ago

In reply to: Steve Atkins (#7)

Re: Index size

On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.

com> wrote:

Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

index_user_event_on_what_category_id_created_at_latlng | 22806.00
index_user_event_for_reporting | 18211.00
index_user_event_on_created_at | 9519.00
index_user_event_on_user_id | 6884.00
index_user_event_on_poi_id | 4891.00
index_user_event_on_deal_id | 3979.00

Postgres (about 250 million rows):

index_user_event_on_what_category_id_created_at_latlng | 25 GB
index_user_event_for_reporting | 19 GB
index_user_event_on_created_at | 7445 MB
index_user_event_on_user_id | 7274 MB
index_user_event_on_deal_id | 7132 MB
index_user_event_on_poi_id | 7099 MB

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You are comparing apples to oranges. MySQL and PostgreSQL engines are
different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L,
Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Samuel Williams

space.ship.traveller@gmail.com

over 9 years ago

In reply to: Melvin Davidson (#8)

Re: Index size

So, uh, my main question was, does MySQL add null values to an index, and
is this different from Postgres. The schema is irrelevant, except that the
column allows null values. I noticed when you create an index you can add a
where clause. Could it be I should add WHERE the fields are not null?

Here is the schema. Any advice or suggestions most welcome.

-- PostgreSQL database dump

-- Dumped from database version 9.5.4

-- Dumped by pg_dump version 9.5.4

SET statement_timeout = 0;

SET lock_timeout = 0;

SET client_encoding = 'UTF8';

SET standard_conforming_strings = on;

SET check_function_bodies = false;

SET client_min_messages = warning;

SET row_security = off;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

-- Name: user_event; Type: TABLE; Schema: public; Owner: http

CREATE TABLE user_event (

id integer NOT NULL,

user_id integer NOT NULL,

latitude numeric(9,6),

longitude numeric(9,6),

active boolean DEFAULT true NOT NULL,

poi_id integer,

deal_id integer,

category_id integer,

what character varying(32) NOT NULL,

locale character varying(8),

created_at timestamp without time zone NOT NULL,

parameters jsonb

);

ALTER TABLE user_event OWNER TO http;

-- Name: user_event_id_seq; Type: SEQUENCE; Schema: public; Owner: http

CREATE SEQUENCE user_event_id_seq

START WITH 1

INCREMENT BY 1

NO MINVALUE

NO MAXVALUE

CACHE 1;

ALTER TABLE user_event_id_seq OWNER TO http;

-- Name: user_event_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner:
http

ALTER SEQUENCE user_event_id_seq OWNED BY user_event.id;

-- Name: id; Type: DEFAULT; Schema: public; Owner: http

ALTER TABLE ONLY user_event ALTER COLUMN id SET DEFAULT
nextval('user_event_id_seq'::regclass);

-- Name: user_event_pkey; Type: CONSTRAINT; Schema: public; Owner: http

ALTER TABLE ONLY user_event

ADD CONSTRAINT user_event_pkey PRIMARY KEY (id);

-- Name: index_user_event_for_reporting; Type: INDEX; Schema: public;
Owner: http

CREATE INDEX index_user_event_for_reporting ON user_event USING btree
(latitude, longitude, created_at, user_id);

-- Name: index_user_event_on_created_at; Type: INDEX; Schema: public;
Owner: http

CREATE INDEX index_user_event_on_created_at ON user_event USING btree
(created_at);

-- Name: index_user_event_on_deal_id; Type: INDEX; Schema: public; Owner:
http

CREATE INDEX index_user_event_on_deal_id ON user_event USING btree
(deal_id);

-- Name: index_user_event_on_poi_id; Type: INDEX; Schema: public; Owner:
http

CREATE INDEX index_user_event_on_poi_id ON user_event USING btree (poi_id);

-- Name: index_user_event_on_user_id; Type: INDEX; Schema: public; Owner:
http

CREATE INDEX index_user_event_on_user_id ON user_event USING btree
(user_id);

-- Name: index_user_event_on_what_category_id_created_at_latlng; Type:
INDEX; Schema: public; Owner: http

CREATE INDEX index_user_event_on_what_category_id_created_at_latlng ON
user_event USING btree (what, category_id, created_at, latitude, longitude);

-- PostgreSQL database dump complete

#10

Samuel Williams

space.ship.traveller@gmail.com

over 9 years ago

In reply to: Melvin Davidson (#8)

Re: Index size

Melvin, of course there are differences. However, I suspect there are at
least SOME tangible differences which can be identified.

On 4 December 2016 at 15:53, Melvin Davidson <melvin6925@gmail.com> wrote:

Show quoted text

On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

On Dec 3, 2016, at 3:57 PM, Samuel Williams <

space.ship.traveller@gmail.com> wrote:

Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

index_user_event_on_what_category_id_created_at_latlng | 22806.00
index_user_event_for_reporting | 18211.00
index_user_event_on_created_at | 9519.00
index_user_event_on_user_id | 6884.00
index_user_event_on_poi_id | 4891.00
index_user_event_on_deal_id | 3979.00

Postgres (about 250 million rows):

index_user_event_on_what_category_id_created_at_latlng | 25 GB
index_user_event_for_reporting | 19 GB
index_user_event_on_created_at | 7445 MB
index_user_event_on_user_id | 7274 MB
index_user_event_on_deal_id | 7132 MB
index_user_event_on_poi_id | 7099 MB

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You are comparing apples to oranges. MySQL and PostgreSQL engines are
different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L,
Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#11

Melvin Davidson

melvin6925@gmail.com

over 9 years ago

In reply to: Samuel Williams (#10)

Re: Index size

On Sun, Dec 4, 2016 at 4:43 AM, Samuel Williams <
space.ship.traveller@gmail.com> wrote:

Melvin, of course there are differences. However, I suspect there are at
least SOME tangible differences which can be identified.

On 4 December 2016 at 15:53, Melvin Davidson <melvin6925@gmail.com> wrote:

On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

On Dec 3, 2016, at 3:57 PM, Samuel Williams <

space.ship.traveller@gmail.com> wrote:

Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

index_user_event_on_what_category_id_created_at_latlng | 22806.00
index_user_event_for_reporting | 18211.00
index_user_event_on_created_at | 9519.00
index_user_event_on_user_id | 6884.00
index_user_event_on_poi_id | 4891.00
index_user_event_on_deal_id | 3979.00

Postgres (about 250 million rows):

index_user_event_on_what_category_id_created_at_latlng | 25 GB
index_user_event_for_reporting | 19 GB
index_user_event_on_created_at | 7445 MB
index_user_event_on_user_id | 7274 MB
index_user_event_on_deal_id | 7132 MB
index_user_event_on_poi_id | 7099 MB

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You are comparing apples to oranges. MySQL and PostgreSQL engines are
different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L,
Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

*Samuel, *

*Please note that the accepted convention in this forum is to bottom post.
Please do not top post.>Melvin, of course there are differences. However, I
suspect there are at least SOME tangible differences which can
be identified.*

*I have to ask, WHY do you think it is important to identify any
differences in the index methods.*

*What is important is that you understand why you need indexes and how they
are used.Tutorial -
Indexeshttps://www.tutorialspoint.com/postgresql/postgresql_indexes.htm
<https://www.tutorialspoint.com/postgresql/postgresql_indexes.htm>CREATE
INDEXhttps://www.postgresql.org/docs/9.4/static/sql-createindex.html
<https://www.postgresql.org/docs/9.4/static/sql-createindex.html>*
--

*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#12

Andreas Joseph Krogh

andreas@visena.com

over 9 years ago

In reply to: Melvin Davidson (#11)

Re: Index size

På søndag 04. desember 2016 kl. 16:51:54, skrev Melvin Davidson <
melvin6925@gmail.com <mailto:melvin6925@gmail.com>>:

On Sun, Dec 4, 2016 at 4:43 AM, Samuel Williams <
space.ship.traveller@gmail.com <mailto:space.ship.traveller@gmail.com>> wrote:
Melvin, of course there are differences. However, I suspect there are at least
SOME tangible differences which can be identified.

On 4 December 2016 at 15:53, Melvin Davidson <melvin6925@gmail.com
<mailto:melvin6925@gmail.com>> wrote: On Sat, Dec 3, 2016 at 9:32 PM, Steve
Atkins<steve@blighty.com <mailto:steve@blighty.com>> wrote:

On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com

<mailto:space.ship.traveller@gmail.com>> wrote:

Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

index_user_event_on_what_category_id_created_at_latlng | 22806.00
index_user_event_for_reporting | 18211.00
index_user_event_on_created_at | 9519.00
index_user_event_on_user_id | 6884.00
index_user_event_on_poi_id | 4891.00
index_user_event_on_deal_id | 3979.00

Postgres (about 250 million rows):

index_user_event_on_what_category_id_created_at_latlng | 25 GB
index_user_event_for_reporting | 19 GB
index_user_event_on_created_at | 7445 MB
index_user_event_on_user_id | 7274 MB
index_user_event_on_deal_id | 7132 MB
index_user_event_on_poi_id | 7099 MB

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
Steve
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
<http://www.postgresql.org/mailpref/pgsql-general>

You are comparing apples to oranges. MySQL and PostgreSQL engines are
different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L, Auto(AM-S7)
gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

-- Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Samuel,
Please note that the accepted convention in this forum is to bottom post.
Please do not top post.

The accepted convention in this forum is to not have purple signature:-)

-- Andreas Joseph Krogh

#13

Samuel Williams

space.ship.traveller@gmail.com

over 9 years ago

In reply to: Andreas Joseph Krogh (#12)

Re: Index size

Melvin, uh... I'm a software engineer... since when was it a problem to
want to know how things work and why they are different? If you have
nothing to contribute of a relevant technical nature, please don't reply,
I'm really not interested.....

#14

Joshua D. Drake

jd@commandprompt.com

over 9 years ago

In reply to: Samuel Williams (#6)

Re: Index size

On 12/03/2016 03:57 PM, Samuel Williams wrote:

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

I don't think you are going to find many answers about the MySQL
implementation (since it will be storage engine specific and it isn't
Postgres) on a PostgreSQL list.

Sincerely,

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

Melvin Davidson

melvin6925@gmail.com

over 9 years ago

In reply to: Samuel Williams (#13)

Re: Index size

On Mon, Dec 5, 2016 at 8:06 PM, Samuel Williams <
space.ship.traveller@gmail.com> wrote:

Melvin, uh... I'm a software engineer... since when was it a problem to
want to know how things work and why they are different? If you have
nothing to contribute of a relevant technical nature, please don't reply,
I'm really not interested.....

Samuel,

since when was it a problem to want to know how things work and why they

are different?
I am a retired DBA/Software Engineer/Customer Engineer. I have worked all
aspects of IT.
Hardware, Software, Database, etc. I never said there was a problem wanting
to know
the difference between things. What I did say was that there IS a
difference. You are
concentrating your inquiry in the wrong area. If you want to know something
specific about
a PostgreSQL index, then ask that, but don't expect anyone to explain why
it's different
from a MySQL index, because there is no point in it.

Could it be I should add WHERE the fields are not null?

That depends on the queries they will be used with.
If you have a WHERE clause in your query that requires a not null field,
then the index
will be useful, providing the table is large enough to make the index scan
faster than
a sequential scan.

So if you have a query like:
SELECT id, user_id, poi_id
FROM user_event
WHERE deal_id IS NULL;

Then, yes, adding a WHERE clause to index_user_event_on_deal_id for NULL
deal_id's
is fine.

But if you want
SELECT id, user_id, poi_id
FROM user_event
WHERE deal_id = <some_value> ;
Then you don't need the WHERE deal_id IS NULL in your index.

BTW, most dba's prefer simpler, shorter, index names.
EG: idx_user_event_deal_id

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#16

K. Brannen

kbrannen@pwhome.com

over 9 years ago

In reply to: Melvin Davidson (#15)

Re: Index size

Samuel Williams <space.ship.traveller@gmail.com> wrote:

So, uh, my main question was, does MySQL add null values to an index, and is this different from Postgres...

Samuel,

A quick google says that Mysql does index NULLs. Ask a Mysql group to get a more definitive answer.

More relevant to your original question, I'll go out on a limb as I struggle to recall a fuzzy memory.

The difference between Mysql and Postgresql is fundamental architecture, so yes the index creation will be very different, as others have said. IIRC (and I may not be), Mysql stores where a row is on the disk via the PK index. That means that secondary indexes point to the proper row in the PK index, which does mean that when you use a secondary index to get data that there is a double look up. They claim that's faster for updates and other stuff because a change to a row only requires 1 index to be changed.

Postgresql stores the direct disk location in each index, which slows down updates a little, but makes selects faster (and I find this really amusing because so many people say Mysql is great because it's fast at reads, yet architecturally PG is faster). If I'm wrong here, I'm sure I'll be corrected. :)

So you can see that Mysql indexes should be smaller than PG indexes because of what they carry. Personally, I think the diff is small enough I'm not going to worry about it, but math is such that some numbers are smaller than others. :) So that should explain what you're seeing.

My opinion is that you shouldn't worry about the index size. Which DB does what you want the best? That obviously depends on what your needs are, but after using both Mysql and PG, I'll take PG whenever possible, thank you. Mysql has gotten better over the last 5-8 years, but there are still many pits of quicksand ready to swallow you up there that PG doesn't have. If you know where those pits are and/or don't travel into that part of the jungle, great for you; personally, I prefer to get the job done without having to watch where I'm walking. ;)

HTH,
Kevin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Import Notes

Resolved by subject fallback

#17

Samuel Williams

space.ship.traveller@gmail.com

over 9 years ago

In reply to: K. Brannen (#16)

Re: Index size

Thanks Kevin, that makes sense. Yeah, I understand the architectural
difference a bit more now. I also read that when you change a column
which is not index, all the indexes for that row need to be updated
anyway. Is that correct?

On 7 December 2016 at 05:27, <kbrannen@pwhome.com> wrote:

Samuel Williams <space.ship.traveller@gmail.com> wrote:

So, uh, my main question was, does MySQL add null values to an index, and is this different from Postgres...

Samuel,

A quick google says that Mysql does index NULLs. Ask a Mysql group to get a more definitive answer.

More relevant to your original question, I'll go out on a limb as I struggle to recall a fuzzy memory.

The difference between Mysql and Postgresql is fundamental architecture, so yes the index creation will be very different, as others have said. IIRC (and I may not be), Mysql stores where a row is on the disk via the PK index. That means that secondary indexes point to the proper row in the PK index, which does mean that when you use a secondary index to get data that there is a double look up. They claim that's faster for updates and other stuff because a change to a row only requires 1 index to be changed.

Postgresql stores the direct disk location in each index, which slows down updates a little, but makes selects faster (and I find this really amusing because so many people say Mysql is great because it's fast at reads, yet architecturally PG is faster). If I'm wrong here, I'm sure I'll be corrected. :)

So you can see that Mysql indexes should be smaller than PG indexes because of what they carry. Personally, I think the diff is small enough I'm not going to worry about it, but math is such that some numbers are smaller than others. :) So that should explain what you're seeing.

My opinion is that you shouldn't worry about the index size. Which DB does what you want the best? That obviously depends on what your needs are, but after using both Mysql and PG, I'll take PG whenever possible, thank you. Mysql has gotten better over the last 5-8 years, but there are still many pits of quicksand ready to swallow you up there that PG doesn't have. If you know where those pits are and/or don't travel into that part of the jungle, great for you; personally, I prefer to get the job done without having to watch where I'm walking. ;)

HTH,
Kevin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18

Melvin Davidson

melvin6925@gmail.com

over 9 years ago

In reply to: Samuel Williams (#17)

Re: Index size

On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams <
space.ship.traveller@gmail.com> wrote:

Thanks Kevin, that makes sense. Yeah, I understand the architectural
difference a bit more now. I also read that when you change a column
which is not index, all the indexes for that row need to be updated
anyway. Is that correct?

On 7 December 2016 at 05:27, <kbrannen@pwhome.com> wrote:

Samuel Williams <space.ship.traveller@gmail.com> wrote:

So, uh, my main question was, does MySQL add null values to an index,

and is this different from Postgres...

Samuel,

A quick google says that Mysql does index NULLs. Ask a Mysql group to

get a more definitive answer.

More relevant to your original question, I'll go out on a limb as I

struggle to recall a fuzzy memory.

The difference between Mysql and Postgresql is fundamental architecture,

so yes the index creation will be very different, as others have said. IIRC
(and I may not be), Mysql stores where a row is on the disk via the PK
index. That means that secondary indexes point to the proper row in the PK
index, which does mean that when you use a secondary index to get data that
there is a double look up. They claim that's faster for updates and other
stuff because a change to a row only requires 1 index to be changed.

Postgresql stores the direct disk location in each index, which slows

down updates a little, but makes selects faster (and I find this really
amusing because so many people say Mysql is great because it's fast at
reads, yet architecturally PG is faster). If I'm wrong here, I'm sure I'll
be corrected. :)

So you can see that Mysql indexes should be smaller than PG indexes

because of what they carry. Personally, I think the diff is small enough
I'm not going to worry about it, but math is such that some numbers are
smaller than others. :) So that should explain what you're seeing.

My opinion is that you shouldn't worry about the index size. Which DB

does what you want the best? That obviously depends on what your needs are,
but after using both Mysql and PG, I'll take PG whenever possible, thank
you. Mysql has gotten better over the last 5-8 years, but there are still
many pits of quicksand ready to swallow you up there that PG doesn't have.
If you know where those pits are and/or don't travel into that part of the
jungle, great for you; personally, I prefer to get the job done without
having to watch where I'm walking. ;)

HTH,
Kevin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

*>I also read that when you change a column which is not index, all the

indexes for that row need to be updated anyway. Is that correct?That is

not correct. Indexes are changed under the following conditions:A. An
insert is done to the table which involves an index.B. A delete is done to
the table which involves an index.C. An update is done that involves
columns included in an index.*

*D. An index is REINDEXed *

*Indexes point to the tid of the row for which the column(s) in the
indexare involved. So if columns updated are not involved in the index,
there is no need to change the
index.https://www.postgresql.org/docs/9.4/static/sql-createindex.html
<https://www.postgresql.org/docs/9.4/static/sql-createindex.html>https://www.postgresql.org/docs/9.4/static/sql-reindex.html
<https://www.postgresql.org/docs/9.4/static/sql-reindex.html>*
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#19

Peter J. Holzer

hjp-pgsql@hjp.at

over 9 years ago

In reply to: Melvin Davidson (#18)

Re: Index size

On 2016-12-09 21:45:35 -0500, Melvin Davidson wrote:

On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams <space.ship.traveller@gmail.com>
wrote:

I also read that when you change a column which is not index, all the
indexes for that row need to be updated anyway. Is that correct?

That is not correct. Indexes are changed under the following conditions:
A. An insert is done to the table which involves an index.
B. A delete is done to the table which involves an index.
C. An update is done that involves columns included in an index.
D. An index is REINDEXed

Indexes point to the tid of the row for which the column(s) in the index
are involved. So if columns updated are not involved in the index,
there is no need to change the index.

I don't think this is generally correct. The TID is a (block,item)
tuple. It the updated version of the row doesn't fit into the same block
it has to be stored in a different block, so the TID will change (AIUI
there is a bit of trickery to avoid changing the TID if the new version
is stored in the same block). This means that all the index entries for
this row (not just for the changed field) will have to be updated. You
can set fillfactor to a smaller value to make this less likely.

#20

Melvin Davidson

melvin6925@gmail.com

over 9 years ago

In reply to: Peter J. Holzer (#19)

Re: Index size

On Sat, Dec 10, 2016 at 5:42 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2016-12-09 21:45:35 -0500, Melvin Davidson wrote:

On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams <

space.ship.traveller@gmail.com>

wrote:

I also read that when you change a column which is not index, all the
indexes for that row need to be updated anyway. Is that correct?

That is not correct. Indexes are changed under the following conditions:
A. An insert is done to the table which involves an index.
B. A delete is done to the table which involves an index.
C. An update is done that involves columns included in an index.
D. An index is REINDEXed

Indexes point to the tid of the row for which the column(s) in the index
are involved. So if columns updated are not involved in the index,
there is no need to change the index.

I don't think this is generally correct. The TID is a (block,item)
tuple. It the updated version of the row doesn't fit into the same block
it has to be stored in a different block, so the TID will change (AIUI
there is a bit of trickery to avoid changing the TID if the new version
is stored in the same block). This means that all the index entries for
this row (not just for the changed field) will have to be updated. You
can set fillfactor to a smaller value to make this less likely.

hp

--
_ | Peter J. Holzer | A coding theorist is someone who doesn't
|_|_) | | think Alice is crazy.
| | | hjp@hjp.at | -- John Gordon
__/ | http://www.hjp.at/ | http://downlode.org/Etext/alicebob.html

*Yes, I see your point, but the case where the row does not fit into the
same block would only occur with unlimited field types such as var[],
bytea[], etc. I believe that to be the exception, and not the rule, so can
we agree that we are both right in that for the general case indexes are
updated as I have described and for the exception they act as you describe?
*--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#21

Christophe Pettus

xof@thebuild.com

over 9 years ago

In reply to: Melvin Davidson (#20)