Russian version
Add to Del.icio.us
English version
Digg It!

 Old-School dkLab | Constructor | Dklab_SoapClient: parallel SOAP queries, reconnect, timeout processing 

Site map :: Project Orphus :: Constructor


2008-02-04
Discuss at the Forum

You may help to develop and improve this library at GitHub

Dklab_SoapClient is the extended version of standard PHP class SoapClient. This version is intended for parallel remote procedure calls in high-load projects.

For dummies 

Remote procedure is the function, which is placed at the another machine in a cluster. SOAP is the protocol for exchanging the structured messages in distributed compute media and it is a W3C standard.

Here is the list of capabilities, additional to internal PHP SoapClient:

  • The key feature of the library is simultaneous parallel query execution to several remote procedures. If the page on your site is constructed from 5 remote blocks, each of which takes 100ms to generate, they can be run in parallel and you will get the whole page not for 500ms, but for about 100ms.
  • Reconnect in case of failure to establish connection. Unfortunately, our world is not perfect and the first attempt to connect to SOAP-server way return with a timeout. It happens particularly often when the project is placed in several data centers. Dklab_SoapClient allows to set connect timeout (e.g. 1 second) and retry to make a connection the specified number of times. It lowers the probability of complete failure by a factor of thousands times as reconnect mostly always helps in case of packet loss.
  • Support of timeout for data receiving. If the page is constructed from remote blocks, then in case one of the blocks is "hang", the whole page hangs. At the same time, it is not quite a big problem if only one block is missing, while all the others are present. You can specify, how much time Dklab_SoapClient should wait for answer from remote procedure. If the time limit is exceeded the PHP exception is thrown and you can process it as you like, without interruption of loading other blocks.

Dklab_SoapClient code is contained in one 20K file. It also supports all the features that are present in internal SoapClient (see documentation):

  • Work with cookies. One procedure can call setcookie(), while other can read value of that cookie later.
  • Work with PHP sessions. One procedure writes data to session, while other reads it.
  • Support for WSDL schemes and passing of complex business objects.
  • Catching the exceptions, that arise in remote procedure.

Compatibility and performance

Shortly, SOAP and SoapClient class is the most useful and is most efficient instrument for remote procedure calls in PHP. Why?

For dummies 

In case you PHP version does not contain SoapClient and SoapServer PHP classes, check whether you have standard extensions soap and curl loaded in php.ini.

SOAP is the standard and popular XML protocol for remote procedure calls in Web. It has support for all programming languages, so you can write SOAP server at any language (e.g. C++, Java or, of course, PHP), and use Dklab_SoapClient as client.

Another advantage of SOAP in PHP is the possibility to pass objects of any structure. For example, if the remote procedure return an array of arrays of objects of some class, you will get the same array of arrays of object at the client code.

PHP SoapClient class in written on C, so it has good performance even though it uses XML protocol for data exchange.

For dummies 

One of the disadvantages of the protocol is that it is very verbose. But if you use standard libraries SoapClient and SoapServer (Dklab_SoapClient is based on the internal PHP class SoapClient), you will not notice that disadvantage.

Usage examples

SOAP server code is quite simple

In a typical situation it is quite simple to write a SOAP server in PHP: it will be enough to create an object of SoapServer class and run it's handle method.

Listing 2: SOAP server: file http://example.com/soapserver.php
<?php
// SOAP class to be used for request handling.
class MyServer
{
    public function getComplexData($some)
    {
        return array("obj" => (object)array("prop" => $some), "some" => "thing");
    }
    public function slowMethod($sleep)
    {
        sleep($sleep);
        return "slept for $sleep seconds";
    }
}
// Create and run the server.
$soapServer = new SoapServer(null, array('uri' => 'urn:myschema'));
$soapServer->setObject(new MyServer());
$soapServer->handle();

For dummies 

SOAP can also work in WSDL mode. Being in that mode you must manually specify method names and parameter types in a special WSDL file. This mode is more complex for use, but it can give advantages in some cases. See docs.

Example: simple query via SOAP client

Listing 3: {en: Simple query to SOAP server
<?php
require_once "../../lib/Dklab/SoapClient.php";
$client = new Dklab_SoapClient(null, array(
    'location' => "http://dklab.ru/lib/Dklab_SoapClient/demo/test/Dklab_SoapClient/soapserver.php",
    'uri' => 'urn:myschema',
    'timeout' => 3,
));
$data = $client->getComplexData(array("abc")); // call MyServer::getComplexData()
$text = $client->slowMethod(1);                // call MyServer::slowMethod()

echo "<pre>";
print_r($data);
print_r($text);

Listing 4: Result: as you can see, the structure of parameter and of the result is retained.
Array
(
    [obj] => stdClass Object
        (
            [prop] => Array
                (
                    [0] => abc
                )

        )

    [some] => thing
)
slept for 1 seconds

Example: parallel queries (client->async->method())

Listing 5: Parallel queries, 1 second each
<?php
require_once "../../lib/Dklab/SoapClient.php";
$client = new Dklab_SoapClient(null, array(
    'location' => "http://dklab.ru/lib/Dklab_SoapClient/demo/test/Dklab_SoapClient/soapserver.php",
    'uri' => 'urn:myschema',
    'timeout' => 3,
));
// Send all the requests in parallel (note the "async" property).
$requests = array();
for ($i = 0; $i < 4; $i++) {
    $requests[] = $client->async->slowMethod(1);
}
// Now - print all results in 1 second, not in 4 seconds.
$t0 = microtime(true);
echo "<pre>";
foreach ($requests as $request) {
    echo $request->getResult() . "\n";
}
echo sprintf("Total time: %.2f seconds", microtime(true) - $t0);

Listing 6: Result: total time is about 1 second, not 4
slept for 1 seconds
slept for 1 seconds
slept for 1 seconds
slept for 1 seconds
Total time: 1.09 seconds

Example: reconnect

Listing 7: Three reconnect attempts
<?php
require_once "../../lib/Dklab/SoapClient.php";
$client = new Dklab_SoapClient(null, array(
    'location' => "http://microsoft.com:8876", // non-existed address
    'uri' => 'urn:myschema',
    'response_validator' => 'responseValidator',
    'timeout' => 1,
));
echo "<pre>";
try {
    $client->someMethod();
} catch (Exception $e) {
    echo $e->getMessage() . "\n";
}

/**
 * Must return true if the response is valid, false if not and we need 
 * to reconnect, or throw an exception if attemts limit is reached.
 */
function responseValidator($response, $numberOfAttempt)
{
    if ($response['http_code'] != 200 || !strlen($response['body'])) {
        if ($numberOfAttempt < 3) {
            echo date("r") . ": Failed after $numberOfAttempt attempts, retrying...\n";
            return false;
        } else {
            throw new SoapFault("Client", date("r") . ": Exception: failed after $numberOfAttempt attempts!");
        }
    }
    return true;
}

As we used address http://microsoft.com:8876/, which does not exist, even 3 attempts to connect will not succeed. It is what told in the listing below with the results of script work:

Listing 8: Result: we did not connect at the end. But if we did, it would be good.
Wed, 03 Feb 2009 23:49:55 +0300: Failed after 1 attempts, retrying...
Wed, 03 Feb 2009 23:49:56 +0300: Failed after 2 attempts, retrying...
Wed, 03 Feb 2009 23:49:57 +0300: Exception: failed after 3 attempts!

Example: data timeout

Listing 9: Call the procedure, that works 4 seconds
<?php
require_once "../../lib/Dklab/SoapClient.php";
$client = new Dklab_SoapClient(null, array(
    'location' => "http://dklab.ru/lib/Dklab_SoapClient/demo/test/Dklab_SoapClient/soapserver.php",
    'uri' => 'urn:myschema',
    'timeout' => 1,
));
try {
    // 4 is greater than timeout, so an exception will happen.
    $t0 = microtime(true);
    $client->slowMethod(3);
} catch (Exception $e) {
    echo $e->getMessage() . sprintf(" in %.2fs", microtime(true) - $t0) . "\n";
}

Listing 10: Result: timeout after 1 sec.
Response is timed out in 1.03s

Analogs and similar technologies

Apache Thrift

Apache Thrift is used, e.g. in Facebook. Pros: versification of data pass structures, support for reconnects and connection pools with accounting of "dead" servers. Cons:

  1. No support of parallel queries in PHP client.
  2. Quite "heavy" client, written in pure PHP (in particular, the protocol realization is written in PHP, not in C).
  3. Overall system complexity. For example, it is required to explicitly set the data scheme and generate the PHP-code according to it. (It is an advantage in many cases, though, but in some it is just overhead expenses.)

cURL-multi

cURL-multi is the instrument for execution of parallel HTTP queries, which is used in Dklab_SoapClient as transport for SoapClient. One cannot pass the objects directly, which the objects can be serialized and you can exchange the serialized data (it is not very convenient, of course)

XML-RP

XML-RPC is the another protocol for remote procedure calls. Sadly, it does not contain so many features as SOAP does. For example, you cannot distinguish PHP associative array from PHP object when you pass data. But the XML notation is simpler in this protocol, than it is in SOAP.

Abstract

With Dklab_SoapClient library you can construct your website page from blocks, like from a construction set. Every block is requested separately and independently from other, all the queries run in parallel. If one of the blocks did not meet the timeout, you can skip it and not display it on the page. (By the way, it is the initial principle of XScript work, which is developed at Yandex (Russian most popular search engine). But it uses COBRA protocol.)

Unlike Apache Thrift, Dklab_SoapClient class does not support working with SOAP-servers pool. Why? It is simple: I believe, that organization of servers' pool should be done not on the client level, but on the level of load balancer (e.g. HAProxy). In this case it is the balancer who disables the "dead" machines, not the library; It is more reliable.

At the moment library has "beta" status and is not dry ran enough, though it's functionality (including work with parallel requests and timeouts) is covered with tests (about 20 hard tests). There also is the hard-to-reproduce problem in Windows version of standard PHP modules cURL-multi and SoapClient, that sometimes causes PHP drop (there is no such problem in Unix-version).

We will be glad for any notes and bug reports, which is better to leave at the forum.





Dmitry Koterov, Dk lab. ©1999-2014
GZip
Add to Del.icio.us   Digg It!   Reddit