<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>Project 2061 Techlog &#187; PHP</title> <atom:link href="http://techlog.p2061.org/category/php/feed/" rel="self" type="application/rss+xml" /><link>http://techlog.p2061.org</link> <description>Blogging from Project 2061 technology group</description> <lastBuildDate>Tue, 06 Sep 2011 14:49:39 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>Optimizing database queries in CakePHP</title><link>http://techlog.p2061.org/2011/08/19/optimizing-database-queries-in-cakephp/</link> <comments>http://techlog.p2061.org/2011/08/19/optimizing-database-queries-in-cakephp/#comments</comments> <pubDate>Fri, 19 Aug 2011 17:01:09 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[CakePHP]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <category><![CDATA[CakePHP 1.2.x]]></category> <category><![CDATA[CakepHP 1.3.x]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[optimization]]></category> <category><![CDATA[sql]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=182</guid> <description><![CDATA[We&#8217;ve been using CakePHP for a while now. I&#8217;ve found it to be very helpful for quick prototyping and development of core site functionality. This is nice because it allows more time for tweaking a site&#8217;s user interface, functionality, and performance. That last one, performance, can occasionally be a challenge. One of the means of [...]]]></description> <content:encoded><![CDATA[<p>We&#8217;ve been using CakePHP for a while now. I&#8217;ve found it to be very helpful for quick prototyping and development of core site functionality. This is nice because it allows more time for tweaking a site&#8217;s user interface, functionality, and performance. That last one, performance, can occasionally be a challenge. One of the means of optimizing a site&#8217;s performance is through the crafting of database queries that take into account indexes and conditions. But because CakePHP is building the SQL we have to keep in mind how CakePHP does this as we construct our query parameters.</p><p>Let&#8217;s review one of those situations. Say we have the following (simplified) models:</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">class</span> Item <span style="color: #000000; font-weight: bold;">extends</span> AppModel <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$hasMany</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Answer'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$belongsTo</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Packet'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> Student <span style="color: #000000; font-weight: bold;">extends</span> AppModel <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$hasMany</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Answer'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> Answer <span style="color: #000000; font-weight: bold;">extends</span> AppModel <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$belongsTo</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Student'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'Item'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'Packet'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> Packet <span style="color: #000000; font-weight: bold;">extends</span> AppModel <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$hasMany</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Answer'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'Student'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$hasAndBelongsToMany</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div><p>Let me explain the model relationships. We have test items that are assigned to packets. Packets are sent out for testing to a group of student. So each answer is unique based on the packet, item, and student. The (somewhat circular) relationships between these models provides for easy access to specific data. For example, let&#8217;s say we want to know all the answers from a particular state for a couple of our items. Here&#8217;s an example query that present some optimization issues:</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$queryOpts</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
  <span style="color: #0000ff;">'fields'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item.*'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #0000ff;">'conditions'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item.id'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1995</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1726</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1707</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #0000ff;">'contain'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
    <span style="color: #0000ff;">'Answer'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
      <span style="color: #0000ff;">'fields'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Answer.*'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #0000ff;">'conditions'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
        <span style="color: #0000ff;">'Answer.packet_id IN (SELECT id FROM packets WHERE state = &quot;CA&quot;)'</span>
      <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$items</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">Item</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">find</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'all'</span><span style="color: #339933;">,</span><span style="color: #000088;">$queryOpts</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div><p>You&#8217;ll note that I&#8217;m not using CakePHP syntax for the Answer model conditional. We had been using this conditional in our web application prior to moving to CakePHP. Since there is not, so far as I can tell, an easy way to work with SQL subqueries we decided to start by forcing a query similar to the original.</p><p>Running the above creates the following SQL statement when querying the answers table:</p><div
class="wp_syntax"><div
class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`answers`</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`Answer`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`packet_id`</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> id <span style="color: #993333; font-weight: bold;">FROM</span> packets <span style="color: #993333; font-weight: bold;">WHERE</span> packet_type <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">&quot;F&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`item_id`</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1995</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1726</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1707</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #66cc66;">&#41;</span></pre></div></div><p>We have a fairly large data set so the above query takes ~32 seconds to run on my system. But that query is not optimized; reordering the conditions in the WHERE clause produces the following query:</p><div
class="wp_syntax"><div
class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.*</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`answers`</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`Answer`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`item_id`</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1995</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1726</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1707</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`packet_id`</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> id <span style="color: #993333; font-weight: bold;">FROM</span> packets <span style="color: #993333; font-weight: bold;">WHERE</span> packet_type <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">&quot;F&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div><p>The optimized query performs significantly better, &lt;1 second.</p><p>Since for this example we know which items we want we can update the query to manually force the desired ordering:</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$queryOpts</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
  <span style="color: #0000ff;">'fields'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item.*'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #0000ff;">'conditions'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item.id'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1995</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1726</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1707</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #0000ff;">'contain'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
    <span style="color: #0000ff;">'Answer'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
      <span style="color: #0000ff;">'fields'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Answer.*'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #0000ff;">'conditions'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
        <span style="color: #0000ff;">'Item.id'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1995</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1726</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1707</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #009900;">&#41;</span>
        <span style="color: #0000ff;">'Answer.packet_id IN (SELECT id FROM packets WHERE state = &quot;CA&quot;)'</span>
      <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$items</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">Item</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">find</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'all'</span><span style="color: #339933;">,</span><span style="color: #000088;">$queryOpts</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div><p>But what if we don&#8217;t know what items we want. Say we&#8217;re pulling items (and answers) as part of another containable query. We have no way to specify the items in the Answer model conditionals. But we can improve the query so that <a
title="gauravgs asked How do I write a subquery in cakephp at CakePHP Questions" href="http://ask.cakephp.org/questions/view/how_do_i_write_a_subquery_in_cakephp">we&#8217;re not using a subquery</a> each time (i.e. we&#8217;re doing it the &#8220;CakePHP&#8221; way):</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$fields</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Packet.id'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$conditions</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Packet.packet_type'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'F'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$packet_list</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">Item</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">Packet</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">find</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'list'</span><span style="color: #339933;">,</span><span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'fields'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$fields</span><span style="color: #339933;">,</span><span style="color: #0000ff;">'conditions'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$conditions</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$queryOpts</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
  <span style="color: #0000ff;">'fields'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item.*'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #0000ff;">'conditions'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Item.id'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">1995</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1726</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1707</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #0000ff;">'contain'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
    <span style="color: #0000ff;">'Answer'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
      <span style="color: #0000ff;">'fields'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Answer.*'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #0000ff;">'conditions'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
        <span style="color: #0000ff;">'Answer.packet_id'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #000088;">$packet_list</span>
      <span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$items</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">Item</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">find</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'all'</span><span style="color: #339933;">,</span><span style="color: #000088;">$queryOpts</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div><p>Which produces the following two queries:</p><div
class="wp_syntax"><div
class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">`Packet`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`id`</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`packets`</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`Packet`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`Packet`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`packet_type`</span> <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'F'</span>;
<span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.*,</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`item_id`</span> <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">`answers`</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">`Answer`</span> <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #ff0000;">`Answer`</span><span style="color: #66cc66;">.</span><span style="color: #ff0000;">`packet_id`</span> <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1278</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1277</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1276</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1274</span><span style="color: #66cc66;">,...,</span><span style="color: #cc66cc;">1726</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1971</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1972</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">1995</span><span style="color: #66cc66;">&#41;</span>;</pre></div></div><p>Again we&#8217;re seeing the kind of performance we want, &lt;1 second for both queries. As good as when we knew all our conditions in advance. The main drawback to doing it this way is the size of the query string sent to your database. If your first query is pulling a large number of results then your second query will be quite large. And most databases have limits on how large a query can be.</p><p>This isn&#8217;t the only way to get to the end point either. As with any programming task there are a variety of ways to arrive at the desired result. For example, you could break up the query into multiple queries that build off of the previous query. Sticking with our items/answers query you could first get the items in one query. Then <code>foreach</code> through the results and run a query to get the answers, integrating the results manually. It&#8217;s more work, and likely would decrease performance since you&#8217;re increasing the number of database queries, but if you can&#8217;t seem to optimize your all-in-one queries it&#8217;s worthwhile to try other options to see if you can get better performance.</p><p>Aside: performance problems due to poorly optimized queries can be somewhat mitigated by query caching. Even so, you&#8217;re better of with an optimized query to begin with since it saves you from a performance hit every time the relevant cached result is flushed.</p> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2011/08/19/optimizing-database-queries-in-cakephp/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Integrating calculated fields and model data in CakePHP</title><link>http://techlog.p2061.org/2011/05/13/integrating-calculated-fields-and-model-data-in-cakephp/</link> <comments>http://techlog.p2061.org/2011/05/13/integrating-calculated-fields-and-model-data-in-cakephp/#comments</comments> <pubDate>Fri, 13 May 2011 20:48:26 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <category><![CDATA[CakePHP]]></category> <category><![CDATA[CakePHP 1.2.x]]></category> <category><![CDATA[fail]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=493</guid> <description><![CDATA[(This is mostly a summary of Dealing with calculated fields in CakePHP&#8217;s find().) One of the great things about CakePHP is that if it doesn&#8217;t have some core functionality you want/need there are easy ways to add it. More and more I&#8217;m taking advantage of this ability. This all comes about because I wanted to [...]]]></description> <content:encoded><![CDATA[<p>(This is mostly a summary of <a
title="Dealing with calculated fields in CakePHP's find()" href="http://nuts-and-bolts-of-cakephp.com/2008/09/29/dealing-with-calculated-fields-in-cakephps-find/">Dealing with calculated fields in CakePHP&#8217;s find()</a>.)</p><p>One of the great things about CakePHP is that if it doesn&#8217;t have some core functionality you want/need there are easy ways to add it. More and more I&#8217;m taking advantage of this ability. This all comes about because I wanted to have calculated fields available inline with the model data. By calculated fields I mean results that are not data columns (e.g. <code
lang="sql">SELECT *, CURDATE() AS current_date FROM users</code> &#8230; yes, that&#8217;s a fairly contrived example).</p><p>By default Cake places results from calculated fields outside the model data, like this:</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">Array</span>
<span style="color: #009900;">&#40;</span>
    <span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">Array</span>
        <span style="color: #009900;">&#40;</span>
            <span style="color: #009900;">&#91;</span>User<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">Array</span>
                <span style="color: #009900;">&#40;</span>
                    <span style="color: #009900;">&#91;</span>id<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #cc66cc;">1</span>
                    <span style="color: #009900;">&#91;</span>username<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> aaas
                <span style="color: #009900;">&#41;</span>
&nbsp;
            <span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">Array</span>
                <span style="color: #009900;">&#40;</span>
                    <span style="color: #009900;">&#91;</span>current_date<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'2011-05-13'</span>
                <span style="color: #009900;">&#41;</span>
&nbsp;
        <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#41;</span></pre></div></div><p>What we want is to place the &#8220;current_date&#8221; calculated field inside the User model, so it&#8217;s more naturally accessed with <code
lang="php">$users[0]['User']['current_date']</code> instead of <code
lang="php">$users[0]['User'][0]['current_date']</code>. Easy enough to do through a model&#8217;s <code><a
href="http://book.cakephp.org/view/1050/afterFind">afterFind()</a></code> callback method (to make it widely available place the function in app_model.php).</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">function</span> afterFind<span style="color: #009900;">&#40;</span><span style="color: #000088;">$results</span><span style="color: #339933;">,</span> <span style="color: #000088;">$primary</span><span style="color: #339933;">=</span><span style="color: #009900; font-weight: bold;">false</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$primary</span> <span style="color: #339933;">==</span> <span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span>Set<span style="color: #339933;">::</span><span style="color: #004000;">check</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$results</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'0.0'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;&amp;</span> Set<span style="color: #339933;">::</span><span style="color: #004000;">check</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$results</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;0.&quot;</span> <span style="color: #339933;">.</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">alias</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #000088;">$fields</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_keys</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$results</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$results</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$key</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$value</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
				<span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$fields</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$fieldName</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
					<span style="color: #000088;">$results</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$key</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">alias</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$fieldName</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$value</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$fieldName</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
				<span style="color: #009900;">&#125;</span>
				<span style="color: #990000;">unset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$results</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$key</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #009900;">&#125;</span>
		<span style="color: #009900;">&#125;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$results</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div><p>And yet, even as I add such enhanced functionality to my web apps I&#8217;m finding limits. The above logic is complicated a little because you don&#8217;t know what type of results you&#8217;re getting. The results you get from calling, for example, <code>find('all')</code> versus <code>find('count')</code> are not the same. But Cake doesn&#8217;t give any hints to the <code>afterFind()</code> callback, and as a result additional logic is included to try and guess at our data structure. The above adds a quick but incomplete hack by a) checking that the results are for the primary model queried (e.g. not from contained model), b) checking for the presence of nested numerical keys, and c) checking that there is model data to integrate with.</p><p>The take away is that while the code produces the desired data structure, in its current form it does so only for specific results.</p> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2011/05/13/integrating-calculated-fields-and-model-data-in-cakephp/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>suhosin to [internal web app]: you talk too much</title><link>http://techlog.p2061.org/2010/12/03/suhosin-to-internal-web-app-you-talk-too-much/</link> <comments>http://techlog.p2061.org/2010/12/03/suhosin-to-internal-web-app-you-talk-too-much/#comments</comments> <pubDate>Fri, 03 Dec 2010 16:13:07 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[System Administration]]></category> <category><![CDATA[php]]></category> <category><![CDATA[security]]></category> <category><![CDATA[suhosin]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=375</guid> <description><![CDATA[Following up on my earlier post, I&#8217;ve had to make some further configuration adjustments to avoid suhosin-related restrictions in one of our custom web applications. This particular application has a function that generates a summary of data from student assessments. The summaries can be generated based on groupings of packets and items. Depending on the [...]]]></description> <content:encoded><![CDATA[<p>Following up on my <a
href="http://techlog.p2061.org/2010/11/05/suhosin-gives-wordpress-the-smackdown/">earlier post</a>, I&#8217;ve had to make some further configuration adjustments to avoid suhosin-related restrictions in one of our custom web applications. This particular application has a function that generates a summary of data from student assessments. The summaries can be generated based on groupings of packets and items. Depending on the filtering parameters selected there can be a fairly large number of packets and items. Not all of the packets necessarily contain the items of interest, but it&#8217;s always easier to select all if you want an overall summary of item performance.</p><p>I recently noticed the following alert in the system log:</p><blockquote><p>ALERT &#8211; configured POST variable limit exceeded &#8211; dropped variable &#8216;included_packet_ids[]&#8216; (attacker <em>USER_IP_ADDRESS</em>, file <em>REPORT_FILTERING_PAGE</em>)</p></blockquote><p>One of the reasons I use POST variables on this page is to avoid the relatively small data size limit of GET. Suhosin adds additional limits, including in the number of times you can reference an individual variable.  Our limit was set at 1000, meaning there were over 1000 packets selected. This points to a need to adjust how the filter &#8220;selects all&#8221; &#8230; but for now I&#8217;ve adjusted the suhosin limit upward by modifying the <code>suhosin.post.max_vars</code> setting.</p><p><strong>References:</strong></p><ul><li><a
href="http://blog.motane.lu/2009/09/15/alert-configured-post-variable-limit-exceeded/">ALERT &#8211; configured POST variable limit exceeded ::Suhosin configuration blocks script | Tudor Barbu&#8217;s professional blog</a></li></ul> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2010/12/03/suhosin-to-internal-web-app-you-talk-too-much/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>suhosin to WordPress: go on a diet</title><link>http://techlog.p2061.org/2010/11/05/suhosin-to-wordpress-go-on-a-diet/</link> <comments>http://techlog.p2061.org/2010/11/05/suhosin-to-wordpress-go-on-a-diet/#comments</comments> <pubDate>Fri, 05 Nov 2010 20:45:46 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[System Administration]]></category> <category><![CDATA[WordPress]]></category> <category><![CDATA[php]]></category> <category><![CDATA[security]]></category> <category><![CDATA[suhosin]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=352</guid> <description><![CDATA[We were seeing a lot of suhosin alerts in the system messages log of the type: ALERT &#8211; script tried to increase memory_limit to 268435456 bytes which is above the allowed value (attacker SERVER_IP_ADDRESS, file WP_MAIN_ADMIN_PAGE, line 96) The source of the issue is WordPress. The application is trying to raise the memory limit and [...]]]></description> <content:encoded><![CDATA[<p>We were seeing a lot of suhosin alerts in the system messages log of the type:</p><blockquote><p>ALERT &#8211; script tried to increase memory_limit to 268435456 bytes which is above the allowed value (attacker <em>SERVER_IP_ADDRESS</em>, file <em>WP_MAIN_ADMIN_PAGE</em>, line 96)</p></blockquote><p>The source of the issue is WordPress. The application is trying to raise the memory limit and suhosin won&#8217;t let it. Apparently <a
href="http://hakre.wordpress.com/2010/08/17/mysteries-about-the-wordpress-memory-limit/">WordPress will try to set a 256MB memory limit</a> before executing certain functions. The necessity of adjusting this setting seems questionable to me, but I also understand that it&#8217;s often better to play it safe when developing software for public consumption.</p><p>I don&#8217;t particularly like applications attempting to specify their own resource usage in a web environment. In my mind applications should specify a required/recommended memory limit in the system requirements and stay away from adjusting this setting behind-the-scenes. Tell me during setup if the current setting may result in non-optimal performance or even a halt in script execution. That&#8217;s not how it&#8217;s done here, but really no harm is done in the long run beyond the annoyance of suhosin throwing errors at the system logs.</p><p>There are two easy fixes to the problem:</p><ol><li>Set the PHP memory limit to 256MB</li><li>Modify the suhosin.memory_limit parameter to 256MB</li></ol><p>In our particular situation it&#8217;s just as easy to set the PHP memory limit. There&#8217;s always a risk of overloading the physical resources, but this site receives little enough traffic that I&#8217;m not concerned about the right confluence of request occurring to cause a crash.</p><p><strong>References:</strong></p><ul><li><a
href="http://hakre.wordpress.com/2010/08/17/mysteries-about-the-wordpress-memory-limit/">Mysteries about the WordPress Memory Limit | hakre on wordpress</a></li><li><a
href="http://www.hardened-php.net/suhosin/configuration.html#suhosin.memory_limit">Hardened-PHP Project &#8211; PHP Security &#8211; Configuration</a></li></ul><div
id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">ALERT – script tried to increase memory_limit to 268435456 bytes which is above the allowed value</div> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2010/11/05/suhosin-to-wordpress-go-on-a-diet/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>CakePHP and CURRENT_TIMESTAMP</title><link>http://techlog.p2061.org/2009/04/17/cakephp-and-current_timestamp/</link> <comments>http://techlog.p2061.org/2009/04/17/cakephp-and-current_timestamp/#comments</comments> <pubDate>Fri, 17 Apr 2009 20:15:54 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[CakePHP]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=229</guid> <description><![CDATA[As of cakePHP 1.2.1.8004 you can&#8217;t use CURRENT_TIMESTAMP as the default for a column. With MySQL when the default is set to CURRENT_TIMESTAMP the current date+time is inserted for a new row that doesn&#8217;t specifically define the value of the column. From what I can tell, when CakePHP specifies the values for all columns when [...]]]></description> <content:encoded><![CDATA[<p>As of cakePHP 1.2.1.8004 you can&#8217;t use CURRENT_TIMESTAMP as the default for a column. With MySQL when the default is set to CURRENT_TIMESTAMP the current date+time is inserted for a new row that doesn&#8217;t specifically define the value of the column.</p><p>From what I can tell, when CakePHP specifies the values for all columns when it creates a new record. A column with a defined default that is not specifically set by the user is manually set by CakePHP (rather than let MySQL handle defaults upon record insertion). But CakePHP doesn&#8217;t understand the CURRENT_TIMESTAMP keyword and so treats it as a string and wraps it in quotes. This breaks the resulting INSERT statement and you receive an error:</p><blockquote><p>Incorrect datetime value: &#8216;CURRENT_TIMESTAMP&#8217;</p></blockquote><p>Interestingly, columns that are named &#8220;created&#8221; and &#8220;modified&#8221; receive special handling by CakePHP. These columns are treated like auto-update columns by CakePHP and it sets them as expected. With the special handling of these columns in mind it is possible to get around the CURRENT_TIMESTAMP bug by following the recommended settings for created/modified columns, i.e. specifying the field as DATETIME with a default of NULL. CakePHP will automatically update the columns when inserting/updating records.</p><p>References:</p><ul><li><a
href="http://dev.mysql.com/doc/refman/5.0/en/timestamp.html">MySQL Reference Manual: 10.3.1.1. TIMESTAMP Properties</a></li><li><a
href="http://book.cakephp.org/view/69/created-and-modified">CakePHP Cookbook: 3.7.2.3 created and modified</a></li></ul> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2009/04/17/cakephp-and-current_timestamp/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>CakePHP, Apache, and Ampersands</title><link>http://techlog.p2061.org/2009/01/15/cakephp-apache-and-ampersands/</link> <comments>http://techlog.p2061.org/2009/01/15/cakephp-apache-and-ampersands/#comments</comments> <pubDate>Thu, 15 Jan 2009 19:44:23 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[CakePHP]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=169</guid> <description><![CDATA[I&#8217;m still learning CakePHP. My most recent experimentation is with forms and URL redirecting. In the process I noticed that if the URL includes an ampersand, even in URL-encoded form, CakePHP will read in the URL incorrectly. In my case, I was taking a form submission and redirecting it to a new page, appending the [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;m still learning CakePHP. My most recent experimentation is with forms and URL redirecting. In the process I noticed that if the URL includes an ampersand, even in URL-encoded form, CakePHP will read in the URL incorrectly.</p><p>In my case, I was taking a form submission and redirecting it to a new page, appending the submitted data as URL parameters (e.g. of the form /controller/action/<strong>name:value</strong>). CakePHP handles ampersand in POST-submitted forms just fine, but during the redirect I noticed that the value of the parameter was cut off at the ampersand. For example, I&#8217;m creating a search form for a list of items (located at /items/index). When you submit the form with a value of &#8220;ball &amp; chain&#8221; the controller sees it just fine. The form is submitted to the search action (/items/search) which takes the values, constructs a new URL, and redirects back to the item list (/items/index/Search.keywords:ball+%26+chain). However, the index action only sees &#8220;ball &#8221; &#8230; the rest of the value is assumed to be additional key/value pairs in the querystring.</p><p>This problem seems like a pretty major issue to me. A <a
href="http://www.mail-archive.com/tickets-cakephp@googlegroups.com/msg00775.html">bug report</a> has already been filed, but I don&#8217;t believe it will be addressed anytime soon. And for good reason, the bug is actually in Apache&#8217;s mod_rewrite module rather than in CakePHP. When mod_rewrite processes a URL it unescapes any escaped characters (newer releases unescape to two levels). What this means for CakePHP is that any escaped characters in the URL become normal characters in the querystring (%26 becomes &amp;).</p><p>The best overview of the problem has a good work-around, but it requires modification of the core CakePHP scripts. I&#8217;ll leave that to the experts. On our install I&#8217;ve found that a triple-escaped value will only be unescaped twice, resulting in the proper escaping after being parsed by mod_rewrite.</p><p><strong>References<br
/> </strong></p><ul><li><a
href="http://www.dracos.co.uk/code/apache-rewrite-problem/">The Apache mod_rewrite Problem</a></li><li><a
title="alternate source: http://web.archive.org/web/20090515124524/http://www.usenet-forums.com/apache-web-server/39839-mod_rewrite-ampersand.html" href="http://www.usenet-forums.com/apache-web-server/39839-mod_rewrite-ampersand.html" class="broken_link">mod_rewrite &amp; ampersand</a></li><li><a
href="https://issues.apache.org/bugzilla/show_bug.cgi?id=23295">Escape problem in mod_rewrite [P] action&#8230;</a></li><li><a
href="https://issues.apache.org/bugzilla/show_bug.cgi?id=32328">Make mod_rewrite escaping optional / expose internal map</a></li><li><a
href="http://drupal.org/node/68886">Handle ampersands in search queries and other URLs when clean URLs are on</a></li><li><a
href="http://www.mediawiki.org/wiki/Manual:Short_URL/Ampersand_solution">Short URL/Ampersand solution</a></li></ul> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2009/01/15/cakephp-apache-and-ampersands/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>The Mechanical Turk Knows All</title><link>http://techlog.p2061.org/2008/11/21/the-mechanical-turk-knows-all/</link> <comments>http://techlog.p2061.org/2008/11/21/the-mechanical-turk-knows-all/#comments</comments> <pubDate>Fri, 21 Nov 2008 19:28:24 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[MySQL]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=91</guid> <description><![CDATA[With a large set of test packets in need of transcription, little time to do so, and even less desire to spend the money on temporary workers, I needed to think of some other way to perform the task. Luckily I was able to come up with a method using a tool from Amazon that [...]]]></description> <content:encoded><![CDATA[<p>With a large set of test packets in need of transcription, little time to do so, and even less desire to spend the money on temporary workers, I needed to think of some other way to perform the task. Luckily I was able to come up with a method using a tool from Amazon that proved relatively easy to implement. Mechanical Turk .</p><p><span
id="more-91"></span></p><p><strong>Data Collection Hassles</strong></p><p>For the past few years the Assessment team has sent out a large batch of items in what we call pilot tests. These tests are used to gauge the effectiveness of assessment items. The format consists of an item followed by a set of questions about the item (Was there anything confusing?, Did the picture help?, Is answer A correct?, etc.). The questions have set answers (Yes, No, Not Sure) and/or a space for the student to provide a short written response.</p><p>In previous years we have used temporary workers to input the student responses into the Items Utility. This proved to be extremely slow and relatively expensive.</p><p>Last year we attempted a new method of data collection using an automated system to capture the Yes/No/Not Sure responses. This system used optical mark recognition (OMR) and required the packets to be formatted in a consistent manner on each page. Formatted in this way we could create a master file indicating how to judge answer selection. This proved fairly successful, though we did have to outsource the scanning to a third party. The results of the scan came back as Excel documents. A little PHP coding and I easily imported the scanned data into the Items Utility. The main drawback was that we still had to hire temps to enter the written responses.</p><p>This year we embarked on a new automated scanning system for the Yes/No/Not Sure responses that proved even better than in previous years. This system uses an in-house scanner and design software. A lot of prep work had to be performed by the researchers to set up their test packets. And we did have to use specific printers to produce the packets. But I believe in the end we saved money, particularly since the majority of the leg work had already been done in figuring out how to import the data into the utility. The only snag is that we again had no way of capturing the written responses electronically.</p><p>However, one of the benefits to the new system is that in addition to creating a data file for the Yes/No/Not Sure values it can produce images of the scanned regions. At first I thought maybe we could find some way of performing character recognition on the scanned images. Unfortunately, there&#8217;s just no economical way to do that on hand-written text. So our only real option was, again, transcription by a human being. Past experience with the time and money required made the option unpalatable and we were likely going to just move ahead without the hand-written data.</p><p><strong>Enter the Turk</strong></p><p>Amazon recently released a web-based service, <a
href="http://aws.amazon.com/mturk/">Mechanical Turk</a> (MTurk), that provides access to a distributed workforce. The service provides a means of creating what Amazon calls &#8220;Human Intelligence Tasks&#8221; (HITs), tasks that are currently too complex for a computer but ideally suited for your average person. We decided to try the service out for transcription of the written responses.</p><p>Since MTurk is web-based, the HITs are set up as web pages. An online editor makes setting up HIT templates fairly simple, though it helps to know a bit of HTML. Each template consists of instructions and a form where the worker enters the requested information. It&#8217;s up to the requester to determine the amount paid for each HIT completed (starting as low as one cent). Amazon tacks on an overhead fee for each HIT that is completed by a worker and accepted by the requester.</p><p>The template uses variables to fill in each HIT with unique information. Once the template is set up you provide a data file that contains the variable data. MTurk then publishes your HITs on a public interface where anyone in the world with an internet connection can perform the work.</p><p><strong>Using MTurk</strong></p><p>As I said, the program we used to scan the packets is able to save scanned regions as images. These images were organized such that we could easily determine the packet/student/item combination based solely on the file name and directory in which the image was located. This information is all we needed to insert data into the Items Utility, and thus provided an easy identifier to use with MTurk.</p><p>To minimize the amount of time a worker spends on each HIT it&#8217;s good to try and have all the data the user needs on the screen. Since a MTurk HIT is a web page it can reference images located on external servers. I used URLs for the image scan as values for the template variables. The HIT displayed the image side by side with a text area where the workers could input the transcribed text.</p><p>Initially I thought to just link to the original images. A quick test, however, and I realized that the size of the images would be a problem. The images were larger than needed and any bandwidth constraints on the worker end would significantly increase the time per HIT. Smaller files were needed, and to save space I thought to dynamically resize the images using PHP. A sample proved the folly of this idea. Our system resources are limited and the processing power and memory required to enable this for more than a few simultaneous workers caused the HITs to load slower than if we hadn&#8217;t resized to begin with. In the end we decided to resize all images, a scenario we had avoided due to space constraints. The smaller image would be referenced on each HIT with a link to the full size in cases where a zoomed view might be benefitial.</p><p>Since Amazon tacks on a handling fee for each HIT I attempted to organize our HITs in a way that would work out best for both us and the workers. We needed HITs that could be completed quickly but that also collected sufficient data so that the cost-per-HIT would balance out against the amount of work required. Plus we were concerned about providing equitable pay for the work being done. Based on our testing we were able to make a determination of how many images per HIT and at what rate would keep our fees acceptable while also resulting in an equitable wage.</p><p>Our due dilligence paid off. We had <span>48,351 HITs comprised of 193,401 images. </span>The work was completed within a couple of days, which was beyond my wildest estimates. The amount paid and the time taken to complete the work was significantly less than what we would have paid to hire traditional temporary workesrs to do the same.</p><p>Once the work was complete MTurk provided a data file that included the original data along with the worker input, all the information we needed to import the data into the utility.</p><p><strong>Wrapping Up</strong></p><p>While this was a bit of a learning process, using MTurk to transcribe this data proved successful. I fully expect to use this system in the future. This has been more of a review of the process, but there were a few technical aspects involved. Rather than go into detail on these I&#8217;m providing links to the files involved which should provide sufficient information in that regard (except for perhaps the import script).</p><ul><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/12/filenamesphp.txt">PHP Script to Generate a List of Files</a></li><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/12/resizeimgphp.txt">PHP Script to Resize Images</a></li><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/11/files_noblankssample.csv">Data Set Sent to MTurk</a></li><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/11/template.htm">MTurk HIT Template</a></li><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/11/batch_3838_resultsample.csv">Result Data Set Provided by MTurk</a></li><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/11/mturk_importsample.txt">Simplified Result Data Set</a></li><li><a
href="http://techlog.p2061.org/wp-content/uploads/2008/12/mturk_importphp.txt">PHP Script to Import Data</a></li></ul> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2008/11/21/the-mechanical-turk-knows-all/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Playing nice with CPU usage</title><link>http://techlog.p2061.org/2008/08/14/playing-nice-with-cpu-usage/</link> <comments>http://techlog.p2061.org/2008/08/14/playing-nice-with-cpu-usage/#comments</comments> <pubDate>Thu, 14 Aug 2008 21:32:18 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[System Administration]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=86</guid> <description><![CDATA[I needed to do a mass resampling of around 280,000 images. There are a number of ways to do this, but I settled on doing it via PHP because the images are stored on our web server, the total size of the images is large (~10GB), and I didn&#8217;t want to kill my machine trying [...]]]></description> <content:encoded><![CDATA[<p>I needed to do a mass resampling of around 280,000 images. There are a number of ways to do this, but I settled on doing it via PHP because the images are stored on our web server, the total size of the images is large (~10GB), and I didn&#8217;t want to kill my machine trying to get it done.</p><p>PHP is ideal for a task such as this: parsing directories and subdirectories for images is easy; resampling using the built-in library (GD) is a breeze; specifying the destination as a subdirectory is simple. The one minus was processor usage. Performing image manipulation eats up the CPU in a big way.</p><p>Luckily linux systems have a built-in utility for addressing a situation like this: <code>nice</code>. <code>nice</code> will &#8220;run a program with modified scheduling priority.&#8221; I&#8217;m running the image manipulation script using the following command:</p><blockquote><p><code>nice --adjustment=19 php script.php</code></p></blockquote><p>If nothing else is going on the script will use whatever resources are available. When anything with a higher priority executes, that program will take precedence over the script with regard to system resources. The script should thus not affect the responsiveness of the web server. This is the reason I was searching for this kind of functionality.</p> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2008/08/14/playing-nice-with-cpu-usage/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Creating Graphs in PHP</title><link>http://techlog.p2061.org/2008/06/19/creating-graphs-in-php/</link> <comments>http://techlog.p2061.org/2008/06/19/creating-graphs-in-php/#comments</comments> <pubDate>Thu, 19 Jun 2008 20:13:03 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=68</guid> <description><![CDATA[I was tasked with creating a demonstration of a dynamically generated graph for one of the grant proposals being worked on at present. This isn&#8217;t something we&#8217;ve really done before in PHP, but I had a feeling it would not be a unique requirement. A quick search revealed that a PEAR package exists for just [...]]]></description> <content:encoded><![CDATA[<p>I was tasked with creating a demonstration of a dynamically generated graph for one of the grant proposals being worked on at present. This isn&#8217;t something we&#8217;ve really done before in PHP, but I had a feeling it would not be a unique requirement. A quick search revealed that a PEAR package exists for just this purpose: <a
href="http://pear.php.net/package/Image_Graph">Image_Graph</a>.</p><p>Installation was not too difficult, though I did have to ensure that a few dependencies were installed before PEAR would load the package. Documentation for the module is not that great<del
datetime="2010-12-23T17:02:37+00:00">, but there are a number of <a
title="http://pear.veggerby.dk/samples/">samples</a> and a fairly active <a
title="http://pear.veggerby.dk/forum/">discussion forum</a></del>. (There used to be a decent community, but the site appears to have gone under.)</p><p>I did not find the graphing object very intuitive. It seems to me that a lot of the properties and functions are abstracted from the objects they affect. I suspect this was a design decision to allow for greater flexibility, but it may also be due to an older coding style. The current branch appears to have been developed in earnest starting in 2005, and it&#8217;s been a few years since a new release. Even so, my initial exposure leads me to believe the package may be stable enough for even a production-level project so long as sufficient QA has been performed.</p><p>It took me only a couple of days to hack together the demonstration. Though the requirements of the script were few, I decided to push the limits of my object-oriented PHP and create the graphing component as a class. Overall it was a good first-try at producing something of this nature. You can find the demonstrator at <a
href="http://flora.p2061.org/weather_data/">http://flora.p2061.org/weather_data/</a>.</p> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2008/06/19/creating-graphs-in-php/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Items Utility: Piloting Data Entry Updates</title><link>http://techlog.p2061.org/2008/06/18/items-utility-piloting-data-entry-updates/</link> <comments>http://techlog.p2061.org/2008/06/18/items-utility-piloting-data-entry-updates/#comments</comments> <pubDate>Wed, 18 Jun 2008 22:05:41 +0000</pubDate> <dc:creator>bsweeney</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[Web Development]]></category> <guid
isPermaLink="false">http://techlog.p2061.org/?p=67</guid> <description><![CDATA[Piloting packets for this year are starting to come in and the data from these packets needs to be entered into the items utility. Because the questions in the packet have changed from previous years, and due to the rigid structure of the piloting data entry form and data tables, I decided to rewrite the [...]]]></description> <content:encoded><![CDATA[<p>Piloting packets for this year are starting to come in and the data from these packets needs to be entered into the items utility. Because the questions in the packet have changed from previous years, and due to the rigid structure of the piloting data entry form and data tables, I decided to rewrite the data entry form. The new format allows more flexible on the front-end without requiring modification of the back-end. In addition I modified the way PHP interacts with the packet data from MySQL by utilizing a structured array to hold the data.</p><p><span
id="more-67"></span></p><p><strong>Updating the interface<br
/> </strong></p><p>First up is the form organization. Previously the forms were designed, I believe, with the researchers in mind. The most important data entry sections were placed at the top of the page with everything else following. Once we started using temps to enter the data, however, this format proved to be a hindrance to efficiency. So to improve efficiency I regrouped the form into logical sections: student responses, student demographics, and researcher notes. Plus, to allow for easier scanning I organized the student responses section so that it resembles the paper form.</p><p>In addition to organizing the form more logically, I wanted to make portions of the form collapsible. Since a lot of the data entry is performed by temps there is no reason for them to have to wade through the research notes, which are at the top of the page by request of the researchers. Brian W. has commented on the utility of Dreamweaver&#8217;s Spry widgets so I thought I&#8217;d give them a try. I must admit that they work quite nicely and are easy to implement, and I found just what I needed in the Collapsible Panel widget. The benefit of using the panels is that I could position the researcher notes at the top of the page without significantly hindering the ability of the temps to access the student responses by having this panel closed by default.</p><p>Which brings me to a minor change I made to the functionality of the panels: having the appropriate panels open when the data entry form is loaded (e.g. student responses always open; researcher notes closed unless otherwise requested by the user; demographics only open if no data has been entered). I used a combination of PHP and JavaScript to set the initial state. Unfortunately this seems to caused some problems for Dreamweaver. When you open the packet data entry form in Dreamweaver the program appears to freeze. It appears Dreamweaver is attempting to process the spry widgets and choking on the PHP-derived JavaScript variables. Eventually Dreamweaver informs you that &#8220;A script in file &#8230; has been running for a long time. Do you want to continue?&#8221; Answering no to this will return Dreamweaver to normal operation. I could get around this problem by setting the state of the panels after they have been initialized, but this produces a disorienting collapse of panels after the page has loaded.</p><p>The final interface modification was a small script that makes it possible to deselect radio buttons. The main purpose of developing this feature was to streamline the form which, in previous revisions, had an extra option to clear the selected response.</p><p><strong>Improving the foundation<br
/> </strong></p><p>As I mentioned, while the form is still hard-coded (each question coded by hand) I did lay out the foundation for a more dynamic approach. First I redeveloped the table that stores the data related to a packet. I reused the packets and packet_items tables, but created a new table called packet_data that stores only the question number, answer, and any comments. The benefit here is that forms with varying numbers of questions can be stored in this one table. I also moved the student meta data into a separate table, a design decision more in line with database normalization practice.</p><p>Another modification I made was to use a structured array to define the questions in the packet. While this has not been fully developed as of yet, I have used it to significantly simplify the script that saves the data. Rather than specify each question and corresponding HTML form element and writing the query to insert that information in the database I designed the PHP to loop through the array and generate the SQL code on the fly.</p><p>Finally, I used another structured array to hold data related to the packet, packet items, and students while generating the packet data entry page. I do not utilize any methods to automate the form generation at present, though there is little to prevent this from being developed. One reason I decided to code the form by hand is to allow for a level of customization that is difficult to accommodate when automating the visual side of things. Even so, utilizing structured arrays allows for more consistency between PHP pages.</p><p><strong>Moving forward</strong></p><p>Data from years prior has not yet been ported into the new structure. I do not expect this task to require too much work, but I am not sure when I&#8217;ll be able to perform the task. At the very least this will be part of the redevelopment taking place around subversion if not sooner.</p><p>To fully accommodate the data in the utility I&#8217;ll need to also update the summary tables, the school report generator, and packet data export. These updates will need to be completed fairly soon in order to support the item review that will begin taking place fairly soon.</p> ]]></content:encoded> <wfw:commentRss>http://techlog.p2061.org/2008/06/18/items-utility-piloting-data-entry-updates/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>
