mysql load_file() function

Started by Ben-Nes Yonatanabout 21 years ago18 messagesgeneral
Jump to latest
#1Ben-Nes Yonatan
da@canaan.co.il

Hi all,

Does anyone know if PostgreSQL got a function which work like
load_file() of mySQL?
I need it for uploading of big files... i encounter a memory limit when
i try to upload a query with the file data in it and i recieved this
response for a question about it at php-general maillist:
"load_file() is mysql internal function. the file is read by mysql
server, so it completely bypass php and also client libraries. you need
to have file priviledge."
Or maybe there is another way to upload big queries without running off
the memory limit?

Another recommendation that i received is to just upload the files to
the file system... but then ill lose the integrity of the database
(though i can make safety checks to know that everything is as it should
be) so i prefer to find a way to insert it to the DB.

My server is as follows:
1. PHP 4.3.9
2. DB - Postgresql 7.4
3. Apache 1.3.26

Thanks in advance,
Ben-Nes Yonatan

#2David Fetter
david@fetter.org
In reply to: Ben-Nes Yonatan (#1)
Re: mysql load_file() function

On Fri, Feb 04, 2005 at 09:27:08AM +0200, Ben-Nes Yonatan wrote:

Hi all,

Does anyone know if PostgreSQL got a function which work like
load_file() of mySQL?

I am not quite sure what load_file() does, but check the COPY command
and the analgous \copy in psql. As with many other PostgreSQL
commands, COPY has a man page in lower case:

man copy

I need it for uploading of big files... i encounter a memory limit when
i try to upload a query with the file data in it and i recieved this
response for a question about it at php-general maillist:
"load_file() is mysql internal function. the file is read by mysql
server, so it completely bypass php and also client libraries. you need
to have file priviledge."
Or maybe there is another way to upload big queries without running off
the memory limit?

Another recommendation that i received is to just upload the files to
the file system... but then ill lose the integrity of the database
(though i can make safety checks to know that everything is as it should
be) so i prefer to find a way to insert it to the DB.

My server is as follows:
1. PHP 4.3.9
2. DB - Postgresql 7.4

Consider getting PostgreSQL 8.0.1 (or whatever the latest is when you
get this message ;)

3. Apache 1.3.26

Thanks in advance,
Ben-Nes Yonatan

HTH :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#3Ben-Nes Yonatan
da@canaan.co.il
In reply to: Ben-Nes Yonatan (#1)
Re: mysql load_file() function

On Fri, Feb 04, 2005 at 09:27:08AM +0200, Ben-Nes Yonatan wrote:

Hi all,

Does anyone know if PostgreSQL got a function which work like
load_file() of mySQL?

I am not quite sure what load_file() does, but check the COPY command
and the analgous \copy in psql. As with many other PostgreSQL
commands, COPY has a man page in lower case:

man copy

First thanks for your answer David but im afraid that i still got a problem
with this solution... im not trying to upload a big file which contain data
which is supposed to be divided to plenty of rows, i want to upload a big
file (wav,ppt,whatever...) into one column one row... maybe i can use COPY
to upload the rows to a temp_table and then from there build the file at the
primary table so it will be only postgresql working in his own enviroment...
though that seem "too heavy" isnt there a better elegant solution?

I need it for uploading of big files... i encounter a memory limit when
i try to upload a query with the file data in it and i recieved this
response for a question about it at php-general maillist:
"load_file() is mysql internal function. the file is read by mysql
server, so it completely bypass php and also client libraries. you need
to have file priviledge."
Or maybe there is another way to upload big queries without running off
the memory limit?

Another recommendation that i received is to just upload the files to
the file system... but then ill lose the integrity of the database
(though i can make safety checks to know that everything is as it should
be) so i prefer to find a way to insert it to the DB.

My server is as follows:
1. PHP 4.3.9
2. DB - Postgresql 7.4

Consider getting PostgreSQL 8.0.1 (or whatever the latest is when you
get this message ;)

Soon :)

3. Apache 1.3.26

Thanks in advance,
Ben-Nes Yonatan

HTH :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

Whats HTH? and to what should i vote? ;)

Thanks alot
Yonatan

#4Venkatesh Babu
venkatbabukr@yahoo.com
In reply to: Ben-Nes Yonatan (#3)
Update command too slow

Hello,

We have a table cm_quotastates which has exactly
4624564 rows and 25 columns and 9 indexes... Out of
these, our code retrieves 75262 rows and modifies just
one column in each row... but updating these to
database is taking some significant time (around 20
minutes)... Tried the following with the update

(a) Tried updating after removing all the 9 indexes
associated with the table
(b) Tried updating the 75K rows in batches
(c) vacuum analyze the table before updation

but none are helping and update still takes the same
amount of time. Is there anything else that can be
done so that update takes lesser time... Also, where
can I find info about how postgres update actually
works?

Thanks,
Venkatesh

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#5Shridhar Daithankar
ghodechhap@ghodechhap.net
In reply to: Ben-Nes Yonatan (#3)
Re: mysql load_file() function

On Friday 04 Feb 2005 7:04 pm, Ben-Nes Yonatan wrote:
First thanks for your answer David but im afraid that i still got a problem

with this solution... im not trying to upload a big file which contain data
which is supposed to be divided to plenty of rows, i want to upload a big
file (wav,ppt,whatever...) into one column one row... maybe i can use COPY
to upload the rows to a temp_table and then from there build the file at
the primary table so it will be only postgresql working in his own
enviroment... though that seem "too heavy" isnt there a better elegant
solution?

May be this will help..

http://www.postgresql.org/docs/8.0/static/datatype-binary.html

HTH :)

Whats HTH? and to what should i vote? ;)

HTH is Hope this/that helps..

Shridhar

#6Sean Davis
sdavis2@mail.nih.gov
In reply to: Ben-Nes Yonatan (#3)
Re: mysql load_file() function

On Feb 4, 2005, at 8:34 AM, Ben-Nes Yonatan wrote:

On Fri, Feb 04, 2005 at 09:27:08AM +0200, Ben-Nes Yonatan wrote:

Hi all,

Does anyone know if PostgreSQL got a function which work like
load_file() of mySQL?

I am not quite sure what load_file() does, but check the COPY command
and the analgous \copy in psql. As with many other PostgreSQL
commands, COPY has a man page in lower case:

man copy

First thanks for your answer David but im afraid that i still got a
problem
with this solution... im not trying to upload a big file which contain
data
which is supposed to be divided to plenty of rows, i want to upload a
big
file (wav,ppt,whatever...) into one column one row... maybe i can use
COPY
to upload the rows to a temp_table and then from there build the file
at the
primary table so it will be only postgresql working in his own
enviroment...
though that seem "too heavy" isnt there a better elegant solution?

Try looking here--again if I understand what you are trying to do....

http://www.postgresql.org/docs/8.0/interactive/largeobjects.html

Large objects in postgresql can be files, images, etc. There are
several functions for manipulating them (loading, etc.) available in
postgresql.

Sean

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Venkatesh Babu (#4)
Re: Update command too slow

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

We have a table cm_quotastates which has exactly
4624564 rows and 25 columns and 9 indexes... Out of
these, our code retrieves 75262 rows and modifies just
one column in each row... but updating these to
database is taking some significant time (around 20
minutes)... Tried the following with the update

Any triggers or foreign keys on that table? Also, what PG version is
this? Are you doing this in a single UPDATE command, or 75262 separate
commands?

regards, tom lane

#8Venkatesh Babu
venkatbabukr@yahoo.com
In reply to: Tom Lane (#7)
Re: Update command too slow

Hi,

There aren't any triggers but there are 75262 update
statements. The problem is that we have a datatype
called as "Collection" and we are fetching the data
rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Thanks,
Venkatesh

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

We have a table cm_quotastates which has exactly
4624564 rows and 25 columns and 9 indexes... Out

of

these, our code retrieves 75262 rows and modifies

just

one column in each row... but updating these to
database is taking some significant time (around

20

minutes)... Tried the following with the update

Any triggers or foreign keys on that table? Also,
what PG version is
this? Are you doing this in a single UPDATE
command, or 75262 separate
commands?

regards, tom lane

__________________________________
Do you Yahoo!?
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

#9Venkatesh Babu
venkatbabukr@yahoo.com
In reply to: Venkatesh Babu (#8)
Re: Update command too slow

Hi,

There aren't any foreign keys and we are currently
using Postgres version 7.4...

--- Venkatesh Babu <venkatbabukr@yahoo.com> wrote:

Hi,

There aren't any triggers but there are 75262 update
statements. The problem is that we have a datatype
called as "Collection" and we are fetching the data
rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Thanks,
Venkatesh

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

We have a table cm_quotastates which has exactly
4624564 rows and 25 columns and 9 indexes... Out

of

these, our code retrieves 75262 rows and

modifies

just

one column in each row... but updating these to
database is taking some significant time (around

20

minutes)... Tried the following with the update

Any triggers or foreign keys on that table? Also,
what PG version is
this? Are you doing this in a single UPDATE
command, or 75262 separate
commands?

regards, tom lane

__________________________________
Do you Yahoo!?
Yahoo! Mail - Find what you need with new enhanced
search.
http://info.mail.yahoo.com/mail_250

---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

#10Bruce Momjian
bruce@momjian.us
In reply to: Venkatesh Babu (#8)
Re: Update command too slow

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

Hi,

There aren't any triggers but there are 75262 update
statements. The problem is that we have a datatype
called as "Collection" and we are fetching the data
rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Are you running in autocommit mode? You might try disabling autocommit and
committing once at the end of the transaction. Postgres runs faster when not
committing more than necessary.

--
greg

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Venkatesh Babu (#8)
Re: Update command too slow

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

There aren't any triggers but there are 75262 update
statements. The problem is that we have a datatype
called as "Collection" and we are fetching the data
rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Well, that's going to be dog-slow in any case compared to putting the
logic on the server side, but a couple of things you could possibly
do: make sure all of this is in one transaction block (a commit per
row updated is a lot of overhead) and use a prepared statement for the
UPDATE to get you out from under the repeated parse/plan overhead.
Check the UPDATE's plan, too, and make sure it's an indexscan on the
primary key rather than anything less efficient.

regards, tom lane

#12Chris Browne
cbbrowne@acm.org
In reply to: Tom Lane (#7)
Re: Update command too slow

Oops! venkatbabukr@yahoo.com (Venkatesh Babu) was seen spray-painting on a wall:

There aren't any triggers but there are 75262 update
statements. The problem is that we have a datatype
called as "Collection" and we are fetching the data
rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Is that "Collection" in your application the entire table?

If it is, then you can get a nice win thus:

--> Delete from stbl; --- One fast statement
--> COPY stbl from stdin; --- Another fast statement
row 1 data
row 2 data
row 3 data
...
row 75262 data
\.

That update would be REALLY fast!

Even if it isn't, consider trying the following transaction:

BEGIN;
select * into temp table stbl_12341 from stbl limit 0;
-- Note that 12341 is the PID of your process, so that should be
-- pretty unique

copy stbl_12341 from stdin; -- Load your 75262 rows in one fell swoop
row 1 data
row 2 data
...
row 75262 data
\.

-- Now, delete from stbl all the rows that are in the replacement table...
delete from stbl where pkey in (select pkey from stbl_12341);
insert into stbl (select * from stbl_12341);
COMMIT;

Both approaches will be WAY faster than doing the processing row by
row.
--
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxfinances.info/info/postgresql.html
"Waving away a cloud of smoke, I look up, and am blinded by a bright,
white light. It's God. No, not Richard Stallman, or Linus Torvalds,
but God. In a booming voice, He says: "THIS IS A SIGN. USE LINUX, THE
FREE Unix SYSTEM FOR THE 386." -- Matt Welsh

#13Ben-Nes Yonatan
da@canaan.co.il
In reply to: Ben-Nes Yonatan (#1)
Re: mysql load_file() function

On Feb 4, 2005, at 8:34 AM, Ben-Nes Yonatan wrote:

On Fri, Feb 04, 2005 at 09:27:08AM +0200, Ben-Nes Yonatan wrote:

Hi all,

Does anyone know if PostgreSQL got a function which work like
load_file() of mySQL?

I am not quite sure what load_file() does, but check the COPY command
and the analgous \copy in psql. As with many other PostgreSQL
commands, COPY has a man page in lower case:

man copy

First thanks for your answer David but im afraid that i still got a
problem
with this solution... im not trying to upload a big file which contain
data
which is supposed to be divided to plenty of rows, i want to upload a big
file (wav,ppt,whatever...) into one column one row... maybe i can use
COPY
to upload the rows to a temp_table and then from there build the file at
the
primary table so it will be only postgresql working in his own
enviroment...
though that seem "too heavy" isnt there a better elegant solution?

Try looking here--again if I understand what you are trying to do....

http://www.postgresql.org/docs/8.0/interactive/largeobjects.html

Large objects in postgresql can be files, images, etc. There are several
functions for manipulating them (loading, etc.) available in postgresql.

Sean

Hi again David and Sean and thanks for your help,

Sean you were correct!! huray for you!! you dont know how much time i spent
on that problem (beside of all things other maillists also).
I dont understand why they dont write anything at the manual on the better
memory usage that the large objects interface have with other languages as
PHP... they just write there that you can upload a sum of 2GB instead of the
new TOAST which can hold 1GB which is more then enough to 99.99% of the
projects....

Anyway thanks alot again for both of you,
Ben-Nes Yonatan

#14guegue
guegue@gmail.com
In reply to: Venkatesh Babu (#4)
Re: Update command too slow

How are you updating this tables? Do you use UPDATE .... WHERE ....

On Fri, 4 Feb 2005 05:57:32 -0800 (PST), Venkatesh Babu
<venkatbabukr@yahoo.com> wrote:

Show quoted text

Hello,

We have a table cm_quotastates which has exactly
4624564 rows and 25 columns and 9 indexes... Out of
these, our code retrieves 75262 rows and modifies just
one column in each row... but updating these to
database is taking some significant time (around 20
minutes)... Tried the following with the update

(a) Tried updating after removing all the 9 indexes
associated with the table
(b) Tried updating the 75K rows in batches
(c) vacuum analyze the table before updation

but none are helping and update still takes the same
amount of time. Is there anything else that can be
done so that update takes lesser time... Also, where
can I find info about how postgres update actually
works?

Thanks,
Venkatesh

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

#15guegue
guegue@gmail.com
In reply to: Venkatesh Babu (#9)
Re: Update command too slow

you mention you use one update statement by record, this may be to
basic but anyway, it just happened to me...

do you use the WHERE clause in your UPDATE statement, and if so is the
column you use to filter indexed?

javier

On Sat, 5 Feb 2005 03:14:52 -0800 (PST), Venkatesh Babu
<venkatbabukr@yahoo.com> wrote:

Show quoted text

Hi,

There aren't any foreign keys and we are currently
using Postgres version 7.4...

--- Venkatesh Babu <venkatbabukr@yahoo.com> wrote:

Hi,

There aren't any triggers but there are 75262 update
statements. The problem is that we have a datatype
called as "Collection" and we are fetching the data
rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Thanks,
Venkatesh

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

We have a table cm_quotastates which has exactly
4624564 rows and 25 columns and 9 indexes... Out

of

these, our code retrieves 75262 rows and

modifies

just

one column in each row... but updating these to
database is taking some significant time (around

20

minutes)... Tried the following with the update

Any triggers or foreign keys on that table? Also,
what PG version is
this? Are you doing this in a single UPDATE
command, or 75262 separate
commands?

regards, tom lane

__________________________________
Do you Yahoo!?
Yahoo! Mail - Find what you need with new enhanced
search.
http://info.mail.yahoo.com/mail_250

---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#16Venkatesh Babu
venkatbabukr@yahoo.com
In reply to: Tom Lane (#11)
Re: Update command too slow

Hello,

Thanks for providing info... I tried disabling
autocommit, as suggested by Mr. Greg Stark, I tried
issuing the command "set autocommit to off", but got
the following error message:

ERROR: SET AUTOCOMMIT TO OFF is no longer supported

Also, I can't implement the suggestions of Mr.
Christopher Browne, because I'm not working with
database directly. There is an abstract layer built
over the database. This abstract layer provides an
interface between application objects and data tables
corresponding to those objects. Our application is
developed over this abstract layer. Infact, we are
using "Collection" datatype provided by this layer.
Collection is similar to java vectors in that it can
store any kind of persistable objects, also it
implements the save method (which updates the tables
corresponding to each object present in the
collection), hence one update statement generated per
object present in the collection.

all i can do is to play with indexes for the tables or
change postgres settings. I hope the problem is clear
now... Also, the suggestions of Mr. Tom Lane on
transaction blocking and making use of prepared
statements and indexes on primary have been taken care
of.... i forgot to mention that even though i deleted
all indexes, i ensured that the index on primary key
is not deleted....

to give more background information, we've migrated
the database from db2 to postgres.... things were fine
in db2... is this migration having any effect on the
poor performance of updates (i mean to say is this
problem happening due to some improper migration???)

Thanks,
Venkatesh

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

There aren't any triggers but there are 75262

update

statements. The problem is that we have a datatype
called as "Collection" and we are fetching the

data

rows into it, modifying the data and call
Collection.save(). This save method generates one
update satement per record present in it.

Well, that's going to be dog-slow in any case
compared to putting the
logic on the server side, but a couple of things you
could possibly
do: make sure all of this is in one transaction
block (a commit per
row updated is a lot of overhead) and use a prepared
statement for the
UPDATE to get you out from under the repeated
parse/plan overhead.
Check the UPDATE's plan, too, and make sure it's an
indexscan on the
primary key rather than anything less efficient.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose
an index scan if your
joining column's datatypes do not match

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#17Venkatesh Babu
venkatbabukr@yahoo.com
In reply to: guegue (#15)
Re: Update command too slow

Hi,

The where clause is used in update statements and the
column present in the where clause is indexed...

but still updates are slow.

Thanks,
Venkatesh

--- guegue <guegue@gmail.com> wrote:

you mention you use one update statement by record,
this may be to
basic but anyway, it just happened to me...

do you use the WHERE clause in your UPDATE
statement, and if so is the
column you use to filter indexed?

javier

On Sat, 5 Feb 2005 03:14:52 -0800 (PST), Venkatesh
Babu
<venkatbabukr@yahoo.com> wrote:

Hi,

There aren't any foreign keys and we are currently
using Postgres version 7.4...

--- Venkatesh Babu <venkatbabukr@yahoo.com> wrote:

Hi,

There aren't any triggers but there are 75262

update

statements. The problem is that we have a

datatype

called as "Collection" and we are fetching the

data

rows into it, modifying the data and call
Collection.save(). This save method generates

one

update satement per record present in it.

Thanks,
Venkatesh

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Venkatesh Babu <venkatbabukr@yahoo.com>

writes:

We have a table cm_quotastates which has

exactly

4624564 rows and 25 columns and 9 indexes...

Out

of

these, our code retrieves 75262 rows and

modifies

just

one column in each row... but updating these

to

database is taking some significant time

(around

20

minutes)... Tried the following with the

update

Any triggers or foreign keys on that table?

Also,

what PG version is
this? Are you doing this in a single UPDATE
command, or 75262 separate
commands?

regards, tom lane

__________________________________
Do you Yahoo!?
Yahoo! Mail - Find what you need with new

enhanced

search.
http://info.mail.yahoo.com/mail_250

---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

---------------------------(end of

broadcast)---------------------------

TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly

__________________________________
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail

#18Doug McNaught
doug@mcnaught.org
In reply to: Venkatesh Babu (#16)
Re: Update command too slow

Venkatesh Babu <venkatbabukr@yahoo.com> writes:

Hello,

Thanks for providing info... I tried disabling
autocommit, as suggested by Mr. Greg Stark, I tried
issuing the command "set autocommit to off", but got
the following error message:

ERROR: SET AUTOCOMMIT TO OFF is no longer supported

Autocommit is handled by the drivers now.

Also, I can't implement the suggestions of Mr.
Christopher Browne, because I'm not working with
database directly. There is an abstract layer built
over the database. This abstract layer provides an
interface between application objects and data tables
corresponding to those objects. Our application is
developed over this abstract layer. Infact, we are
using "Collection" datatype provided by this layer.
Collection is similar to java vectors in that it can
store any kind of persistable objects, also it
implements the save method (which updates the tables
corresponding to each object present in the
collection), hence one update statement generated per
object present in the collection.

Sounds like Hibernate--is that what you're using? Make sure you use
your mapping library's transaction mechanism to execute the save()
inside a transaction and you may get get some speedup.

-Doug