A weakness in PHP

One of my biggest frustrations about PHP on Apache is that the PHP script does not start until the HTTP request is completely received. To make this a little more clear, I cannot read from the incoming POST data, until it is completely received and processe-d by PHP. I have come across this issue previously, and am running into it again. After searching the web and asking around in the community, I am convinced that it is just not possible to do.

In the past, at a previous job when working on a file upload monitor, that would be able to show the user the progress of a file upload in real-time. The idea was that the progress could be written to a temporary file as the PHP file read from the stream, and AJAX calls could be made to read that file and show the user the progress. This did not work, because the PHP script did not start until the file was completely uploaded. I was able to find a smaller free and open-source project (I can’t remember the name) that accomplished this task by posting the file to a Perl script that accomplished the task that I wanted PHP to do. The cool thing is that at the time, I did not have a clear grasp about the concept, but now I am knowledgeable enough to write it all without the help of that project.

Now I am considering writing (improving) a web service that would deal with processing files sent over HTTP in XML format. My big idea is to read the XML as a stream, and start processing the files (maybe even throwing in a little multithreading ([Which is also another one of the troubles I had with PHP])) and send the output back in real-time. The biggest optimization would be sending the response XML back before the request was finished sending. One problem though, and here’s where I had to do a little investigation and reading of the HTTP RFC. The HTTP response expects a pesky little header called Content-Length.

How am I supposed to say how much data I’m sending in the response before I’m done receiving the data to process? After reading the section on the Content-Length, I found that there was an exception where the Content-Length could be avoided. This was the case where the Transfer-Encoding was set to ‘chunked’. I immediately was thrown off because I’d never heard of such a thing. When the Transfer-Encoding is set to chunked, the data is sent in “chunks”, which is basically a format where you specify the length of the data in the chunk, then the data itself, then the next chunk, and so on. This way I could read each file, send it as a chunk, and then go straight to processing the response, while the files are still transferring to the web service. Of course I’d have to set the socket to non-blocking first.

Here’s my conclusion. One word. Python/Mod_Python. I tested out using mod_python with Apache and had a little trouble configuring it in apache. Once I got it running and executing my python code, I was able to test it from PHP using a socket, and see it executing before I ended the socket. Connecting using a socket over HTTP is a double edged sword. You have to explicitly send your HTTP headers, and process the response HTTP headers, but at the same time you can specify that you’re doing chunked transfer, or some other crazy setting. I’ve had good results so far, and can’t wait to get a little better grip on some Python. Oh, and Python has support for multi-threading, which is cool too.