Once You Know, You Newegg

mySQL How many rows?

Return the number of rows changed (updated, deleted, etc.) by the last command.

This is usually used after a non-SELECT execute statement.
$rowcount = $sth->rows;

#so,
$st->rows ##returns 1

This is handy for many reasons. I use it to bail out of a sub when no rows are returned and for loop security redundancy.
So I can do a :
until ($rows==0){
foreach (@somearray){
print $_.'<br />’;
$rows–;
}
}

Which in some cases, I am not actually using @somearray rows! I may just need to process @somearray up to a certain point.

Perl mySQL Create Table

$result = $dbh->prepare(“CREATE TABLE `?` If NOT EXISTS ( `ID` INT( 255 ) NOT NULL AUTO_INCREMENT , `URL` VARCHAR( 255 ) NOT NULL , PRIMARY KEY ( `ID` )) ENGINE = MYISAM ;”);
$result->execute($host);
$result = $dbh->prepare(“INSERT INTO `?` (URL) VALUES (‘?’)”);
$result->execute($host, $href);
print “Host added: ” . $host . “\n”;


So here is a sub from one of my scripts showing this in use.

#!/usr/bin/perl -w
use strict;
use CGI;
use CGI ‘:standard’;
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
print CGI::header(-expires => ‘-1d’);
use DBI;
$|=1;
my $us_phone_regex =’1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?’;
my $email_regex =’^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$’;
my $link_regex = ‘^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\’\\\+&%\$#\=~_\-]+))*$’;
##
sub makecitytable{
$city = shift;
$pagelink = shift;
$email = shift;
$phone= shift;
#######################################################
unless( $pagelink =~ m/$link_regex/ ){$pagelink=’failed’;}
#######################################################
unless( $phone =~ m/$us_phone_regex/ ){$phone=’none’;}
#######################################################
unless( $email =~ m/$email_regex/ ){$email=’none’;}
#######################################################
############# I create a table based on a city name $city. Then populate it with all the links found
my ($moe,$larry,$curly) = hookup();
my $hookup = DBI->connect($moe,$larry,$curly) or die “$DBI::errstr”;
### if I wanted unique email or whatever:
### $result = “CREATE TABLE IF NOT EXISTS `$city` (`ad_page` VARCHAR(255),PRIMARY KEY(ad_page),`email` VARCHAR(255),`phone` VARCHAR(26),UNIQUE KEY `email` (`email`))”;
### end unique example
$result = “CREATE TABLE IF NOT EXISTS `$city` (`ad_page` VARCHAR(255),PRIMARY KEY(ad_page),`email` VARCHAR(255),`phone` VARCHAR(26))”;
$result = $hookup->prepare($result);
$result->execute() or die “SQL Error: $DBI::errstr\n”;
#$result->execute($city) or die “Make table did not execute”;
$result = “INSERT INTO `$city` (ad_page,email,phone) VALUES (?,?,?) ON DUPLICATE KEY UPDATE `email`=values(email), `phone`=values(phone)” or die “Prepare insert did not execute”;
$result = $hookup->prepare($result);
$result->execute($pagelink,$email,$phone) or die “Insert email and phone did not execute”;
print qq~ added: pagelink=$pagelink email=$email and phone=$phone into $city
~;
$result->finish;
$hookup->disconnect;
#exit;
}# END SUB
############### connect to the database ######################
sub hookup {
my $moe = ‘DBI:mysql:databasename:localhost:3306′;
my $larry = ‘username’;
my $curly = ‘password’;
return (“$moe”,”$larry”,”$curly”);
}

mySQL Update – On Duplicate Key

MySQL Performance Tip: ON DUPLICATE KEY is faster than INSERT IGNORE, but failing with a duplicate key error is a lot faster

I’m working on merging a couple of pretty big MySQL databases.

Each database consists of just three tables, but each of those tables has at least 9 million rows.

Because I need to maintain and update a number of data associations, it was necessary to write a custom script to perform the merge. (I chose to use Perl and DBI.)

At one point in my script, I need to do a bunch of “insert this row from database B into database A if it doesn’t exist in database A” type operations. Doing a separate lookup for each record would cost roughly 9 million queries.

At first, I just went ahead and did the insert, allowing the single insert to fail with a unique key error if the record already existed. This caused a bunch of DBD errors to fill the screen, none of which really concerned me, because it didn’t have any bearing on the result. Still, all of those messages printed to STDERR could bury a more serious error, so I looked into handling it another way.

I found this post that recommends using either INSERT IGNORE or ON DUPLICATE KEY to handle these queries. After some investigation, I found that while ON DUPLICATE KEY is a lot faster than INSERT IGNORE, just allowing the thing to fail is the fastest option.

First, some simplified pseudo-code to give you an idea of what I’m doing:

FOR EACH RECORD IN A.TABLE1
FOR EACH RECORD IN A.TABLE2
IF THAT RECORD NOT EXISTS IN B.TABLE2,
DO AN INSERT INTO B.TABLE2

I wanted to avoid doing a separate SQL query at step 3, as that query would have to run 9 million+ times. Since I have a unique key set on the columns I’m interested in, any duplicate rows will not be inserted, thus maintaining my data integrity.

Here are the speed benchmarks I ran for my particular script with each of the three options I tried. The numbers are in terms of iterations of my particular loop (which does a lot more than the pseudo-code above) so the numbers should only be considered relative to each other.

Option 1: Let the query fail with a Unique Key error.

INSERT INTO A (COL1, COL2) VALUES (val1, val2);

Loop iterations per second: 24.5 (avg)

Option 2: INSERT … IGNORE
view plaincopy to clipboardprint?

INSERT IGNORE INTO A (COL1, COL2) VALUES (val1, val2);

Loop iterations per second: 13.3 (avg)

Option 3: INSERT … ON DUPLICATE KEY

INSERT INTO A (COL1, COL2) VALUES (val1, val2) ON DUPLICATE KEY UPDATE id=id;

Loop iterations per second: 18.2 (avg)

As you can see, the fastest of the three options is letting the query fail, followed by ON DUPLICATE KEY as the second fastest.

I’ll leave it to the MySQL experts to explain why this is the case. I have some guesses, but I can’t say for sure.

But I do know one thing: On my setup, for my purposes, letting those queries fail is the quickest option.

mySQL get data into arrays

Okay, say you have a database with some banner data. This was used for a PHP script but, I have not tested in Perl.
The placement there is a column `placement` which could contain data like sideadtop, sideadmiddle, sideadbottom or any position info.
Below will get each found `placement` into its’ own array.

$result = @mysql_query(“SELECT url, image, placement FROM advert WHERE user=’1′ ORDER BY placement”);
$i = 0;
while($process = @mysql_fetch_assoc($result)){
$row[$i] = $process;
$placement = $row[$1][‘placement’];
$$placement[‘url’] = $row[$1][‘url’];
$$placement[‘image’] = $row[$1][‘image’];
$i++;
}

so now we have the three arrays
$sideadtop,$sideadmiddle,$sideadbottom
with contents url and image
sorted by placement alphabetically ascending


Here is another way to get data into rows:
############################### get lines into individual arrays ###################################
#This approach stores the results of a SQL call in an array. In this example,
#the query is calling for rows with three columns of data, and printing each
#row

use DBI;

$dbh = DBI->connect(‘DBI:mysql:[DATABASE]‘, ‘[USER]‘, ‘[PASSWORD]‘)
|| die “ERROR: $DBI::errstr”;

$query = “[SQL QUERY GOES HERE]“;

$sth = $dbh->prepare($query);

$sth->execute();

$data = $sth->fetchall_arrayref();
$sth->finish;

foreach $data ( @$data) {

($variable1, $variable2, $variable3) = @$data;

print “$variable1\n”;
print “$variable2\n”;
print “$variable3\n”;

}

$dbh->disconnect(

Access mySQL data Efficiently

Elegance, Efficiency, and Robustness

All of the above is fine when you know precisely what the query will be in advance. What about when you need to create a query where one or two of the variables might change? For example, say that you’re writing an application that looks up prices for furniture in a database. The user enters the name of the item of furniture, the program sends the query to the database, and bammo, you’ve got the price. (Sorry, I’m shopping for sofas right now and getting a bit obsessed.) How do you do it?

Well, one way would be to create a new SQL statement each time you run the query. This will mostly work, but it’s not recommended:

# A bad way of doing it while (my $furniture_name = ) { chomp($furniture_name); my $sth = $dbh->prepare("SELECT price FROM furniture_prices WHERE furniture_name='$furniture_name'") or die "Couldn't prepare: '$DBI::errstr'; stopped"; $sth->execute() or die "Couldn't execute: '$DBI::errstr'"; my ($price) = $sth->fetchrow_array(); print STDOUT "Item: $furniture_name Price: $price\n"; }

What’s the problem with this? (Other than that there’s no user interface.) Well, what if one of the chairs is a “Tess of the D’Ubervilles Chair”? Then we’d suddenly get an error, as the apostrophe would get read as an SQL end-quote. The second problem is efficiency. The computer must waste time in every query preparing a statement which is almost exactly like the statement it ran a second ago.

How can we solve these problems? Well, DBI allows the use of things called placeholders in statements. Placeholders tell the database, “at execution time, we’re going to plug a value into here.” Then, you pass the values to fill into the placeholders using the execute() command. Using placeholders, the code would look like this:

# A better way of doing it my $sth = $dbh->prepare("SELECT price FROM furniture_prices WHERE furniture_name=?") or die "Couldn't prepare: '$DBI::errstr'; stopped"; while (my $furniture_name = ) { chomp($furniture_name); $sth->execute($furniture_name) or die "Couldn't execute: '$DBI::errstr'"; my ($price) = $sth->fetchrow_array(); print STDOUT "Item: $furniture_name Price: $price\n"; }

There, the statement handle is only prepared once, and we pass $furniture_type via the execute() method. Plus, when placeholders are used, DBI and the driver know enough to escape any troublesome characters like “‘”.

All of the above works relatively quickly if there’s only one row in the response. But what if we wanted to print out a list of all of the different kinds of loveseats in the database? One might write something like this:

# One way of reading multiple rows my $sth = $dbh->prepare("SELECT furniture_name, price FROM furniture_prices WHERE furniture_type=?") or die "Couldn't prepare: '$DBI::errstr'; stopped"; while (my $furniture_type = ) { chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::errstr'"; while ( my ($furniture_name, $price) = $sth->fetchrow_array() ) { print STDOUT "Item: $furniture_name Price: $price\n"; } }

Each fetchrow_array() adds a little bit of extra time. The variable has to be read out of the database, copied to the array, then copied to the database. It’ll work, but you can do it much faster. (And cooler.) How? Well, many databases allow you to bind variables directly to a statement, so that as soon as the row is fetched, the variable is filled in with the column’s value.

You bind a scalar variable to a column using the bind_col() method.

$sth->bind_col($column, \$scalar)

So, a better way of doing the above would be:

# Faster way of reading multiple rows my $sth = $dbh->prepare("SELECT furniture_name, price FROM furniture_prices WHERE furniture_type=?") or die "Couldn't prepare: '$DBI::errstr'; stopped"; while (my $furniture_type = ) { my ($furniture_name, $price); chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::errstr'"; $sth->bind_col(1, \$furniture_name); $sth->bind_col(2, \$price); while ( $sth->fetch() ) { print STDOUT "Item: $furniture_name Price: $price\n"; } }

That’s the most efficient way of reading from the database. The fetch() call above is really just a handy alias to fetchrow_arrayref(), but since we’re not doing anything with the array reference, it’s clearer to say fetch().

So that’s about it. When you’re done with the database, call $dbh->disconnect() to close the connection. If you need or want more information on DBI, do ‘perldoc DBI’, go to the DBI pages. Programming the Perl DBI is an excellent book by Alligator Descartes and Tim Bunce.

Note 1: Different database drivers sometimes handle connection differently; at time of writing DBD::Pg is one of these. See your database driver documentation for how to connect if you’re having problems connecting.

I’m going to go through some things that I thought were lacking in this tutorial. They aren’t necessary, but are helpful.

Selectrow_array:

One thing many perl coders try to do is make their code as short as possible. One way to shorten the code is to use selectrow_array, instead of fetcrow_array. This is helpful if you are only executing a query once, and only getting one row of data.

my @row = $dbh->selectrow_array(q{SELECT my_data1, my_data2 WHERE furniture_type=?},undef, $furniture_type);

This is an extremely simple, and elegant way of selecting, and will be as optimized as any selectrow, that is executed without binding, as $furniture_type is the data executed.

RaiseError:

I find RaiseError to be extremely helpful for mission critical queries, and the majority of queries are mission critical to a script. Therefore it may be easier to just set RaiseError, and stop using or die.

$dbh = DBI->connect($DBDSN, $DBUser, $DBPassword, {RaiseError => 1});

To unset RaiseError on a query in which you want to specify something else to do, if the query doesn’t work just add the code: $dbh->{RaiseError} = 0;, and set raiserror back to 1, after you are done.

Bind_columns:

my ($furniture_name, $price); chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::errstr'"; $sth->bind_col(1, \$furniture_name); $sth->bind_col(2, \$price);

You don’t need to bind a specific column here. Instead you can just bind all the columns in order, with bind_columns.
So instead of all that code you can write:

chomp($furniture_type); $sth->execute($furniture_type) or die "Couldn't execute: '$DBI::errstr'"; $sth->bind_columns(\my ($furniture_name, $price));

This gets rid of pre-declaring the variables, and using bind_col twice.

Update: Did some research yesterday and found that bind_columns is faster for *all* queries! Strange, but it seems that bind_columns is faster than a fetchrow_arrayref, even if only one row is being caught. This renders fetchrow_array(ref) obselete in a sense, without using bind_columns, as a few lines of code makes a query much faster, regardless of how many rows are being returned.

Perl Push function

Perl's push() function is used to push a value or values onto the end of an array, which increases the number of elements. The new values then become the last elements in the array. It returns the new total number of elements in the array. It's easy to confuse this function with unshift(), which adds elements to the beginning of an array.

@myNames = ('Larry', 'Curly');
push(@myNames, 'Moe');

Picture a row of numbered boxes, going from left to right. The push() function would push the new value or values on to the right side of the array, and increase the elements. In the examples, the value of @myNames becomes ('Larry', 'Curly', 'Moe').

The array can also be thought of as a stack - picture of a stack of numbered boxes, starting with 0 on the top and increasing as it goes down. The push() function would push the value into the bottom of the stack, and increase the elements.

@myNames = (
'Larry',
'Curly'
);
push(@myNames, 'Moe');

You can push multiple values onto the array directly:

@myNames = ('Larry', 'Curly');
push(@myNames, ('Moe', 'Shemp'));

Or by pushing on an array:

@myNames = ('Larry', 'Curly');
@moreNames = ('Moe', 'Shemp');
push(@myNames, @moreNames);

Custom PHP.ini for shared servers

Here is a php.ini that you can plop into any directory where your server has issues with your script. This overrides the memory, email and a couple other limitations.
Great for phplist and wordpress scripts when there are issues because you are on a shared environment. Be sure to change the email address section to your preferred email.

[PHP]
engine = On
short_open_tag = On
asp_tags = Off
precision = 12
y2k_compliance = On
output_buffering = Off
zlib.output_compression = Off
implicit_flush = Off
unserialize_callback_func=
serialize_precision = 100
allow_call_time_pass_reference = On
safe_mode = Off
safe_mode_gid = Off
safe_mode_include_dir =
safe_mode_exec_dir =
safe_mode_allowed_env_vars = PHP_
safe_mode_protected_env_vars = LD_LIBRARY_PATH
disable_functions =
disable_classes =
expose_php = On
max_execution_time = 60 ; Maximum execution time of each script, in seconds
max_input_time = 120 ; Maximum amount of time each script may spend parsing request data
memory_limit = 512M ; Maximum amount of memory a script may consume (32MB)
error_reporting = E_ALL & ~E_NOTICE
display_errors = On
display_startup_errors = Off
log_errors = On
ignore_repeated_errors = Off
ignore_repeated_source = Off
report_memleaks = On
track_errors = Off
error_log = error_log
variables_order = “EGPCS”
register_globals = Off
register_argc_argv = On
post_max_size = 8M
gpc_order = “GPC”
magic_quotes_gpc = On
magic_quotes_runtime = Off
magic_quotes_sybase = Off
auto_prepend_file =
auto_append_file =
default_mimetype = “text/html”
include_path = “.:/usr/lib/php:/usr/local/lib/php”
doc_root =
user_dir =
extension_dir = “/usr/local/lib/php/extensions/no-debug-non-zts-20060613″
zend_extension=”/usr/local/IonCube/ioncube_loader_lin_5.2.so”
zend_extension_ts=”/usr/local/IonCube/ioncube_loader_lin_5.2_ts.so”
extension=”eaccelerator.so”
eaccelerator.shm_size=”16″
eaccelerator.cache_dir=”/tmp/eaccelerator”
eaccelerator.enable=”1″
eaccelerator.optimizer=”1″
eaccelerator.check_mtime=”1″
eaccelerator.debug=”0″
eaccelerator.filter=””
eaccelerator.shm_max=”0″
eaccelerator.shm_ttl=”0″
eaccelerator.shm_prune_period=”0″
eaccelerator.shm_only=”0″
eaccelerator.compress=”1″
eaccelerator.compress_level=”9″
enable_dl = On
;;;;;;;;;;;;;;;;
; File Uploads ;
;;;;;;;;;;;;;;;;
file_uploads = On
upload_max_filesize = 50M
;;;;;;;;;;;;;;;;;;
; Fopen wrappers ;
;;;;;;;;;;;;;;;;;;
allow_url_fopen = Off
default_socket_timeout = 60
;;;;;;;;;;;;;;;;;;;;;;
; Dynamic Extensions ;
;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;
; Module Settings ;
;;;;;;;;;;;;;;;;;;;
[Syslog]
define_syslog_variables().
define_syslog_variables = Off
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;[mail function]
; For Win32 only.
;SMTP = localhost
;smtp_port = 26
; For Win32 only.
;sendmail_from = me@localhost.com
;sendmail_from = email@your.com
; For Unix only. You may supply arguments as well (default: “sendmail -t -i”).
;sendmail_path = “/usr/sbin/sendmail -t -i”
;sendmail_path = “/usr/sbin/sendmail -t -i -f email@your.com -r email@your.com”
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[Java]
[SQL]
sql.safe_mode = Off
[ODBC]
odbc.allow_persistent = On
odbc.check_persistent = On
odbc.max_persistent = -1
odbc.max_links = -1
odbc.defaultlrl = 4096
odbc.defaultbinmode = 1
[MySQL]
mysql.allow_persistent = On
mysql.max_persistent = -1
mysql.max_links = -1
mysql.default_port =
mysql.default_socket =
mysql.default_host =
mysql.default_user =
mysql.default_password =
mysql.connect_timeout = 60
mysql.trace_mode = Off
[mSQL]
msql.allow_persistent = On
msql.max_persistent = -1
msql.max_links = -1
[PostgresSQL]
pgsql.allow_persistent = On
pgsql.auto_reset_persistent = Off
pgsql.max_persistent = -1
pgsql.max_links = -1
pgsql.ignore_notice = 0
pgsql.log_notice = 0
[Sybase]
sybase.allow_persistent = On
sybase.max_persistent = -1
sybase.max_links = -1
sybase.min_error_severity = 10
sybase.min_message_severity = 10
sybase.compatability_mode = Off
[Sybase-CT]
sybct.allow_persistent = On
sybct.max_persistent = -1
sybct.max_links = -1
sybct.min_server_severity = 10
sybct.min_client_severity = 10
[dbx]
dbx.colnames_case = “unchanged”
[bcmath]
bcmath.scale = 0
[browscap]
[Informix]
ifx.default_host =
ifx.default_user =
ifx.default_password =
ifx.allow_persistent = On
ifx.max_persistent = -1
ifx.max_links = -1
ifx.textasvarchar = 0
ifx.byteasvarchar = 0
ifx.charasvarchar = 0
ifx.blobinfile = 0
ifx.nullformat = 0
[Session]
session.save_handler = files
session.save_path = /tmp
session.use_cookies = 1
session.name = PHPSESSID
session.auto_start = 0
session.cookie_lifetime = 0
session.cookie_path = /
session.cookie_domain =
session.serialize_handler = php
session.gc_probability = 1
session.gc_divisor = 100
session.gc_maxlifetime = 1440
register_globals
session.bug_compat_42 = 1
session.bug_compat_warn = 1
session.referer_check =
session.entropy_length = 0
session.entropy_file =
session.cache_limiter = nocache
session.cache_expire = 180
session.use_trans_sid = 0
url_rewriter.tags = “a=href,area=href,frame=src,input=src,form=,fieldset=”
[MSSQL]
mssql.allow_persistent = On
mssql.max_persistent = -1
mssql.max_links = -1
mssql.min_error_severity = 10
mssql.min_message_severity = 10
mssql.compatability_mode = Off
mssql.secure_connection = Off
[Assertion]
[Ingres II]
ingres.allow_persistent = On
ingres.max_persistent = -1
ingres.max_links = -1
ingres.default_database =
ingres.default_user =
ingres.default_password =
[Verisign Payflow Pro]
pfpro.defaulthost = “test-payflow.verisign.com”
pfpro.defaultport = 443
pfpro.defaulttimeout = 30
[com]
[Printer]
[mbstring]
[FrontBase]
[Crack]
[exif]
mbstring.internal_encoding
extension=pdo.so
extension=pdo_sqlite.so
extension=sqlite.so
extension=pdo_mysql.so
[Zend]
zend_extension_manager.optimizer=/usr/local/Zend/lib/Optimizer-3.3.3
zend_extension_manager.optimizer_ts=/usr/local/Zend/lib/Optimizer_TS-3.3.3
zend_optimizer.version=3.3.3
zend_extension=/usr/local/Zend/lib/ZendExtensionManager.so
zend_extension_ts=/usr/local/Zend/lib/ZendExtensionManager_TS.so