<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Kamibu &#187; Regular Expressions</title>
	<atom:link href="http://blog.kamibu.com/category/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.kamibu.com</link>
	<description></description>
	<lastBuildDate>Sat, 14 Nov 2009 10:51:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>vim regexp magic</title>
		<link>http://blog.kamibu.com/2008/09/23/vim-regexp-magic/</link>
		<comments>http://blog.kamibu.com/2008/09/23/vim-regexp-magic/#comments</comments>
		<pubDate>Tue, 23 Sep 2008 14:12:42 +0000</pubDate>
		<dc:creator>dionyziz</dc:creator>
				<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://blog.kamibu.com/?p=44</guid>
		<description><![CDATA[Input, this one big liner: function UnitUserSettingsSave( tInteger $dobd, tInteger $dobm, tInteger $doby, tText $gender, tInteger $place, tInteger $education, tInteger $school, tInteger $mood, tText $sex, tText $religion, tText $politics, tText $slogan, tText $aboutme, tText $favquote, tText $haircolor, tText $eyecolor, tInteger $height, tInteger $weight, tText $smoker, tText $drinker, tText $email, tText $msn, tText $gtalk, tText $skype, [...]]]></description>
			<content:encoded><![CDATA[<p>Input, this one big liner:</p>
<p>function UnitUserSettingsSave( tInteger $dobd, tInteger $dobm, tInteger $doby, tText $gender, tInteger $place, tInteger $education, tInteger $school, tInteger $mood, tText $sex, tText $religion, tText $politics, tText $slogan, tText $aboutme, tText $favquote, tText $haircolor, tText $eyecolor, tInteger $height, tInteger $weight, tText $smoker, tText $drinker, tText $email, tText $msn, tText $gtalk, tText $skype, tText $yahoo, tText $web, tText $oldpassword, tText $newpassword, tText $emailprofilecomment, tText $notifyprofilecomment, tText $emailphotocomment, tText $notifyphotocomment, tText $emailpollcomment, tText $notifypollcomment, tText $emailjournalcomment, tText $notifyjournalcomment, tText $emailreply, tText $notifyreply, tText $emailfriendaddition, tText $notifyfriendaddition, tText $emailtagcreation, tText $notifytagcreation, tText $emailfavourite, tText $notifyfavourite ) {</p>
<p>Output, this beautifully spaced multiliner:</p>
<p><pre class="php"><span style="color: #000000; font-weight: bold;">function</span> UnitUserSettingsSave<span style="color: #66cc66;">&#40;</span> tInteger <span style="color: #0000ff;">$dobd</span>, tInteger <span style="color: #0000ff;">$dobm</span>,
          tInteger <span style="color: #0000ff;">$doby</span>, tText <span style="color: #0000ff;">$gender</span>,
          tInteger <span style="color: #0000ff;">$place</span>, tInteger <span style="color: #0000ff;">$education</span>,
          tInteger <span style="color: #0000ff;">$school</span>, tInteger <span style="color: #0000ff;">$mood</span>,
          tText <span style="color: #0000ff;">$sex</span>, tText <span style="color: #0000ff;">$religion</span>,
          tText <span style="color: #0000ff;">$politics</span>, tText <span style="color: #0000ff;">$slogan</span>,
          tText <span style="color: #0000ff;">$aboutme</span>, tText <span style="color: #0000ff;">$favquote</span>,
          tText <span style="color: #0000ff;">$haircolor</span>, tText <span style="color: #0000ff;">$eyecolor</span>,
          tInteger <span style="color: #0000ff;">$height</span>, tInteger <span style="color: #0000ff;">$weight</span>,
          tText <span style="color: #0000ff;">$smoker</span>, tText <span style="color: #0000ff;">$drinker</span>,
          tText <span style="color: #0000ff;">$email</span>, tText <span style="color: #0000ff;">$msn</span>,
          tText <span style="color: #0000ff;">$gtalk</span>, tText <span style="color: #0000ff;">$skype</span>,
          tText <span style="color: #0000ff;">$yahoo</span>, tText <span style="color: #0000ff;">$web</span>,
          tText <span style="color: #0000ff;">$oldpassword</span>, tText <span style="color: #0000ff;">$newpassword</span>,
          tText <span style="color: #0000ff;">$emailprofilecomment</span>, tText <span style="color: #0000ff;">$notifyprofilecomment</span>,
          tText <span style="color: #0000ff;">$emailphotocomment</span>, tText <span style="color: #0000ff;">$notifyphotocomment</span>,
          tText <span style="color: #0000ff;">$emailpollcomment</span>, tText <span style="color: #0000ff;">$notifypollcomment</span>,
          tText <span style="color: #0000ff;">$emailjournalcomment</span>, tText <span style="color: #0000ff;">$notifyjournalcomment</span>,
          tText <span style="color: #0000ff;">$emailreply</span>, tText <span style="color: #0000ff;">$notifyreply</span>,
          tText <span style="color: #0000ff;">$emailfriendaddition</span>, tText <span style="color: #0000ff;">$notifyfriendaddition</span>,
          tText <span style="color: #0000ff;">$emailtagcreation</span>, tText <span style="color: #0000ff;">$notifytagcreation</span>,
          tText <span style="color: #0000ff;">$emailfavourite</span>, tText <span style="color: #0000ff;">$notifyfavourite</span> <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></pre></p>
<p>How? With one command, in vim:</p>
<p><pre class="php">:s/\<span style="color: #66cc66;">&#40;</span>.\<span style="color: #66cc66;">&#123;</span><span style="color: #cc66cc;">-1</span>,<span style="color: #66cc66;">&#125;</span>\<span style="color: #66cc66;">&#41;</span>,\<span style="color: #66cc66;">&#40;</span>.\<span style="color: #66cc66;">&#123;</span><span style="color: #cc66cc;">-1</span>,<span style="color: #66cc66;">&#125;</span>\<span style="color: #66cc66;">&#41;</span>,/\<span style="color: #cc66cc;">1</span>,\<span style="color: #cc66cc;">2</span>,\r        /ig</pre></p>
<p>What does it do? </p>
<p>First off, :s/<em>needle</em>/<em>replacement</em>/g searches the current line for regular expression <em>needle</em> and replaces it with expression <em>replacement</em>. The current line is being searched because we didn&#8217;t specify a range before the &#8220;s&#8221;. &#8220;s&#8221; is the extended command that we&#8217;re running, which stands for &#8220;search and replace&#8221;. The &#8220;g&#8221; modifier after the final slash stands for &#8220;global&#8221;, meaning it should feel free to replace several occurrences in the same line, not just the first.</p>
<p>Now, for the <em>needle</em> expression. It can be essentially split into two parts:<br />
1) \(.\{-1,}\),<br />
2) \(.\{-1,}\),</p>
<p>These two expressions match exactly the same thing. They match anything they want, denoted with a dot, followed by a comma (the one that you see at the end of each expression). The &#8220;anything they want part&#8221; denoted with a single dot is just one character, so we&#8217;re modifying it to be able to match more than just one character (as many as it needs to satisfy the comma at the end) by adding the lazy quantifier \{-1,} after the dot.</p>
<p>The expression .\{-1,} means: match as many of any characters as you need to match the whole expression. In reality, because this is a lazy quantifier, it matches as less characters as possible providing it can find a comma right afterwards (but not the comma itself).</p>
<p>So both expressions tied together match anything followed by a comma followed by anything followed by a comma. Translation? They match two of the arguments of those provided in the function argument list.</p>
<p>The parentheses around each of them denoted \( and \) capture what is within, to be used in the replacement string. Our replacement string is simply &#8220;\1,\2,\r        &#8220;. It will replace \1 with the first parenthesized match, then add a comma, then replace the \2 with the second parenthesized match, then add yet another comma. Finally it will add a new line (\r) and some whitespace.</p>
<p>Repeating this pattern with the &#8220;global&#8221; modifier applies the regular expression several times on the line, yielding to new lines being added after every second argument.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.kamibu.com/2008/09/23/vim-regexp-magic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Change collation on all columns of a database</title>
		<link>http://blog.kamibu.com/2008/05/25/change-collation-on-all-columns-of-a-database/</link>
		<comments>http://blog.kamibu.com/2008/05/25/change-collation-on-all-columns-of-a-database/#comments</comments>
		<pubDate>Sun, 25 May 2008 10:51:08 +0000</pubDate>
		<dc:creator>dionyziz</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://blog.kamibu.com/?p=36</guid>
		<description><![CDATA[It was recently required for me to change the collation of each and every column of every table in a database from &#8216;latin1&#8242; to &#8216;utf8&#8242;. Although the table collations were correct, the column collations were incorrect. It&#8217;s a cumbersome process to perform manually, and there&#8217;s apparently no real automated way to do it without a [...]]]></description>
			<content:encoded><![CDATA[<p>It was recently required for me to change the collation of each and every column of every table in a database from &#8216;latin1&#8242; to &#8216;utf8&#8242;. Although the table collations were correct, the column collations were incorrect. It&#8217;s a cumbersome process to perform manually, and there&#8217;s apparently <a href="http://lists.mysql.com/mysql/186286">no real automated way to do it without a script</a>. Although collation information is only meta-data, not actual data, I found this problem interesting.</p>
<p>Changing one column collation information is easy enough to do with one MySQL query:</p>
<p><pre class="php">ALTER TABLE `moods` 
CHANGE `mood_label` `mood_label` text CHARACTER SET utf8 COLLATE utf8_unicode_ci;</pre></p>
<p>Changing all the columns is more difficult. Here&#8217;s a small script that I came up with to do it recently:</p>
<p><pre class="php">dionyziz@orion:~$ mysqldump -u root --password=<span style="color: #cc66cc;">1234</span> \ 
--no-data --no-create-db --<a href="http://www.php.net/compact"><span style="color: #000066;">compact</span></a> ccbeta \
|egrep <span style="color: #ff0000;">'CREATE TABLE|latin1'</span> \
|sed <span style="color: #ff0000;">'s/CREATE TABLE `<span style="color: #000099; font-weight: bold;">\(</span>.*<span style="color: #000099; font-weight: bold;">\)</span>` (/;ALTER TABLE `<span style="color: #000099; font-weight: bold;">\1</span>`/'</span> \
|sed <span style="color: #ff0000;">'s/character set latin1/CHARACTER SET utf8 COLLATE utf8_unicode_ci/'</span> \
|sed <span style="color: #ff0000;">'s/  `<span style="color: #000099; font-weight: bold;">\(</span>.*<span style="color: #000099; font-weight: bold;">\)</span>`/ CHANGE `<span style="color: #000099; font-weight: bold;">\1</span>` `<span style="color: #000099; font-weight: bold;">\1</span>`/'</span>&gt;columns
dionyziz@orion:~$ php -r <span style="color: #ff0000;">'file_put_contents( &quot;columns&quot;, 
    preg_replace( &quot;#^;|ALTER TABLE `.*`(<span style="color: #000099; font-weight: bold;">\\</span>s*;|$)#&quot;, &quot;&quot;, 
    preg_replace( &quot;#,(<span style="color: #000099; font-weight: bold;">\\</span>s*);#&quot;, &quot;;<span style="color: #000099; font-weight: bold;">\\</span>1&quot;, 
    file_get_contents( &quot;columns&quot; ) ) ) );'</span>
dionyziz@orion:~$ <a href="http://www.php.net/mysql"><span style="color: #000066;">mysql</span></a> -u root --password=<span style="color: #cc66cc;">1234</span> ccbeta &lt;columns</pre></p>
<p>Let&#8217;s go through it step-by-step.</p>
<p><pre class="php">mysqldump -u root --password=<span style="color: #cc66cc;">1234</span> --no-data --no-create-db --<a href="http://www.php.net/compact"><span style="color: #000066;">compact</span></a> ccbeta</pre></p>
<p>This creates a list of CREATE TABLE statements for all our tables. That&#8217;s good because it&#8217;ll allow us to determine whether the collation of a column is incorrect. Here&#8217;s an example CREATE TABLE statement:</p>
<p><pre class="php">CREATE TABLE `albums` <span style="color: #66cc66;">&#40;</span>
  `album_id` int<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">11</span><span style="color: #66cc66;">&#41;</span> NOT <span style="color: #000000; font-weight: bold;">NULL</span> auto_increment,
  `album_userid` int<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">11</span><span style="color: #66cc66;">&#41;</span> NOT <span style="color: #000000; font-weight: bold;">NULL</span> <span style="color: #000000; font-weight: bold;">default</span> <span style="color: #ff0000;">'0'</span>,
  `album_created` datetime NOT <span style="color: #000000; font-weight: bold;">NULL</span> <span style="color: #000000; font-weight: bold;">default</span> <span style="color: #ff0000;">'0000-00-00 00:00:00'</span>,
  `album_name` text character set latin1 NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
  `album_description` text character set latin1 NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
  PRIMARY <a href="http://www.php.net/key"><span style="color: #000066;">KEY</span></a>  <span style="color: #66cc66;">&#40;</span>`album_id`<span style="color: #66cc66;">&#41;</span>,
  <a href="http://www.php.net/key"><span style="color: #000066;">KEY</span></a> `album_userid` <span style="color: #66cc66;">&#40;</span>`album_userid` <span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span> ENGINE=MyISAM AUTO_INCREMENT=<span style="color: #cc66cc;">55</span> <span style="color: #000000; font-weight: bold;">DEFAULT</span> CHARSET=utf8 COLLATE=utf8_unicode_ci;</pre></p>
<p>In this example, the `album_name` and `album_description` columns are wrong and need their collations changed.</p>
<p><pre class="php">egrep <span style="color: #ff0000;">'CREATE TABLE|latin1'</span></pre></p>
<p>This simple line limits our results to only lines that contain &#8220;CREATE TABLE&#8221; or &#8220;latin1&#8243;. That&#8217;s useful since it&#8217;ll only show the table names followed by a list of all incorrectly collated columns, if any. The result would be something like this:</p>
<p><pre class="php">CREATE TABLE `relations` <span style="color: #66cc66;">&#40;</span>
CREATE TABLE `searches` <span style="color: #66cc66;">&#40;</span>
 `search_query` text character set latin1 NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
CREATE TABLE `shoutbox` <span style="color: #66cc66;">&#40;</span>
 `shout_text` text character set latin1 NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
 `shout_delreason` text character set latin1 NOT <span style="color: #000000; font-weight: bold;">NULL</span>,</pre></p>
<p>(with more entries potentially)</p>
<p>Good. Now all we need to do is modify these lines to make them ALTER TABLE lines:</p>
<p><pre class="php">sed <span style="color: #ff0000;">'s/CREATE TABLE `<span style="color: #000099; font-weight: bold;">\(</span>.*<span style="color: #000099; font-weight: bold;">\)</span>` (/;ALTER TABLE `<span style="color: #000099; font-weight: bold;">\1</span>`/'</span></pre></p>
<p>Ah, the magic of regular expressions. This removes the final &#8220;(&#8221; of every CREATE TABLE line, as we don&#8217;t need it and also changes the word &#8220;CREATE&#8221; into &#8220;ALTER&#8221;. It also adds a semicolon in front of the ALTER TABLE statement (to terminate the previous statement).</p>
<p><pre class="php">sed <span style="color: #ff0000;">'s/character set latin1/CHARACTER SET utf8 COLLATE utf8_unicode_ci/'</span></pre></p>
<p>Straightforward enough, this replaces the existing character set instruction from latin1 to utf8, and adds the correct collation as well.</p>
<p><pre class="php">sed <span style="color: #ff0000;">'s/  `<span style="color: #000099; font-weight: bold;">\(</span>.*<span style="color: #000099; font-weight: bold;">\)</span>`/ CHANGE `<span style="color: #000099; font-weight: bold;">\1</span>` `<span style="color: #000099; font-weight: bold;">\1</span>`/'</span></pre></p>
<p>Finally, this adds the word &#8220;CHANGE&#8221; in front of every column line and repeats the column name (as we want to tell MySQL which column to change (first repetition) and to which to change it (second repetition)). The result is:</p>
<p><pre class="php">;ALTER TABLE `relations`
;ALTER TABLE `searches`
 CHANGE `search_query` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
;ALTER TABLE `shoutbox`
 CHANGE `shout_text` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
 CHANGE `shout_delreason` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT <span style="color: #000000; font-weight: bold;">NULL</span>,</pre></p>
<p>Pretty close to what we actually want. You&#8217;ll notice three problems:</p>
<ul>
<li>There are empty ALTER statements</li>
<li>There&#8217;s an extra comma at the end of every column (providing all your tables have a primary key, as they should)</li>
<li>There&#8217;s a redundant semicolon at the beginning</li>
</ul>
<p>These problems cannot easily be fixed by sed because sed performs a line-to-line processing. A sed expert might have been able to provide us with a better solution, but I&#8217;ll prefer to use the PREG feature of PHP. To use PHP, first let&#8217;s save our current result into a file:</p>
<p><pre class="php">&gt;columns</pre></p>
<p>Time to run our PHP code on the target file:</p>
<p><pre class="php">php -r <span style="color: #ff0000;">'file_put_contents( &quot;columns&quot;, 
    preg_replace( &quot;#^;|ALTER TABLE `.*`(<span style="color: #000099; font-weight: bold;">\\</span>s*;|$)#&quot;, &quot;&quot;, 
    preg_replace( &quot;#,(<span style="color: #000099; font-weight: bold;">\\</span>s*);#&quot;, &quot;;<span style="color: #000099; font-weight: bold;">\\</span>1&quot;, 
    file_get_contents( &quot;columns&quot; ) ) ) );'</span></pre></p>
<p>Let&#8217;s analyze it in short.</p>
<p><pre class="php"><a href="http://www.php.net/file_get_contents"><span style="color: #000066;">file_get_contents</span></a><span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">&quot;columns&quot;</span> <span style="color: #66cc66;">&#41;</span>;</pre></p>
<p>This, simply enough, reads the &#8220;columns&#8221; file into memory. Now we&#8217;ll perform two regular expression replacements:</p>
<p>First, we&#8217;ll match the following regular expression:</p>
<p><pre class="php"><span style="color: #808080; font-style: italic;">#,(\s*);# </span></pre></p>
<p>(notice that the # are separators that wrap the regular expression for clarity &#8212; they aren&#8217;t part of the actual regular expression)</p>
<p>Anything matching this will be replaced by ;\1. This means that a comma followed by any whitespace (including a new line) followed by a semicolon will be replaced by only a semicolon (and the same whitespace). This simply removes the redundant comma at the end of every ALTER statement.</p>
<p>Second, we&#8217;ll match the following:</p>
<p><pre class="php"><span style="color: #808080; font-style: italic;">#^;|ALTER TABLE `.*`(\\s*;|$)# </span></pre></p>
<p>Anything matching will be removed. You&#8217;ll notice that this regular expression matches basically two things (separated by the first alternation (pipe) character). </p>
<p>The first part is:</p>
<p><pre class="php"><span style="color: #808080; font-style: italic;">#^;# </span></pre></p>
<p>It&#8217;ll remove the first line if it only contains a single semicolon (which it does in our example). </p>
<p>The second part is:</p>
<p><pre class="php"><span style="color: #808080; font-style: italic;">#ALTER TABLE `.*`(\\s*;|$)# </span></pre></p>
<p>This will look for empty ALTER TABLE statements (an ALTER TABLE statement followed only by whitespace and a semicolon or an end-of-file) and remove them. </p>
<p>Finally, we&#8217;ll write the result back to the file we read from:</p>
<p><pre class="php">file_put_contents<span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">&quot;columns&quot;</span>, ... <span style="color: #66cc66;">&#41;</span>;</pre></p>
<p>Now if we <em>cat </em> that file we&#8217;ll see that it contains all ALTER statements in the form we want them:</p>
<p><pre class="php">ALTER TABLE `searches`
 CHANGE `search_query` `search_query` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT <span style="color: #000000; font-weight: bold;">NULL</span>;
ALTER TABLE `shoutbox`
 CHANGE `shout_text` `shout_text` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT <span style="color: #000000; font-weight: bold;">NULL</span>,
 CHANGE `shout_delreason` `shout_delreason` text CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT <span style="color: #000000; font-weight: bold;">NULL</span>;</pre></p>
<p>Excellent. Finally, let&#8217;s execute it:</p>
<p><pre class="php"><a href="http://www.php.net/mysql"><span style="color: #000066;">mysql</span></a> -u root --password=<span style="color: #cc66cc;">1234</span> ccbeta &lt;columns</pre></p>
<p>You can also add &#8216;time&#8217; in front of it to measure how long it&#8217;ll take. We can now validate that the collations were changed successfully by, again, performing our initial dump and grepping for &#8216;latin1&#8242; to confirm that there are none.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.kamibu.com/2008/05/25/change-collation-on-all-columns-of-a-database/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
