The story of BizTalk and how my messages disappeared - RIP.
BizTalk provides us with a function known as Recoverable Interchange Processing and if you aren’t familiar with the concept of an ‘Interchange’, it’s really about the granularity of messages. For example lets say I have messages that contain distinct messages within them, i.e. a collection of contained messages, this is known as an interchange. A simple singular message is also technically a complete interchange in its own right. In both scenarios each message will carry the same interchange ID but where an envelope contains multiple messages each message will receive it’s own message ID. Multi message interchanges are generally split up by a custom pipeline component but it is possible to use configuration to split these messages by employing the XmlDisassembler to break apart the messages. For more discussion on this please check Richards post. I should point out at this point that the rest of this post documents what I believe may be a bug in BTS 2006 and assumes BizTalk knowledge with regard to schema definitions specifically envelope schema’s that contain multiple messages.
Let’s say I have an Envelope Schema and wish to separate out the good messages contained within the envelope from the bad and have them processed, whilst the bad messages get suspended and eventually dealt with. Ok so lets use recoverable interchange processing. So here’s what I noticed recently given the following messages:
<?xml version=”1.0″ encoding=”utf-8″ ?> <ns0:People xmlns:ns0=”http://RecoverableInterchange.People”> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>simon</firstName> <lastName>segal</lastName> </ns1:Person> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>mark</firstName> <lastName>harris</lastName> </ns1:Person> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>steve</firstName> <lastName>cassidy</lastName> </ns1:Person> </ns0:People>
which conforms to the following message schema’s
<?xml version=”1.0″ encoding=”utf-16″?> <xs:schema xmlns:ns0=”http://RecoverableInterchange.Person” xmlns:b=”http://schemas.microsoft.com/BizTalk/2003″ xmlns=”http://RecoverableInterchange.People” targetNamespace=”http://RecoverableInterchange.People” xmlns:xs=”http://www.w3.org/2001/XMLSchema”> <xs:import schemaLocation=”.\person.xsd” namespace=”http://RecoverableInterchange.Person” /> <xs:annotation> <xs:appinfo> <b:schemaInfo is_envelope=”yes” xmlns:b=”http://schemas.microsoft.com/BizTalk/2003″ /> <b:references> <b:reference targetNamespace=”http://RecoverableInterchange.Person” /> </b:references> </xs:appinfo> </xs:annotation> <xs:element name=”People”> <xs:annotation> <xs:appinfo> <b:recordInfo body_xpath=”/*[local-name()='People' and namespace-uri()='http://RecoverableInterchange.People']“ /> </xs:appinfo> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element minOccurs=”1″ maxOccurs=”unbounded” ref=”ns0:Person” /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Note the underline section above: this allows BizTalk to identify this message type to the receive port pipeline’s XmlDisassembler.
The Envelope schema imports the following schema
<?xml version=”1.0″ encoding=”utf-16″?> <xs:schema xmlns:b=”http://schemas.microsoft.com/BizTalk/2003″ xmlns=”http://RecoverableInterchange.Person” targetNamespace=”http://RecoverableInterchange.Person” xmlns:xs=”http://www.w3.org/2001/XMLSchema”> <xs:element name=”Person” type=”Person” /> <xs:complexType name=”Person”> <xs:sequence> <xs:element name=”firstName” type=”xs:string” /> <xs:element name=”lastName” type=”xs:string” /> </xs:sequence> </xs:complexType> </xs:schema>
Now given these schemas and the example message instance, we have a perfectly good envelope schema with a multi-part message instance that should present no issues given I have the following subscriptions:
The RipSendPortOk port has the following filter:
BTS.ReceivePortName == RipReceivePort
Therefore, given the following message instance, we would expect two valid messages and one invalid message (see the bold green node) to produce an error when the PersonBad node is encountered. The problem here is that the message does not conform to the Person schema imported into our envelope message schema.
<?xml version=”1.0″ encoding=”utf-8″ ?> <ns0:People xmlns:ns0=”http://RecoverableInterchange.People”> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>simon</firstName> <lastName>segal</lastName> </ns1:Person> <ns1:PersonBad xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>mark</firstName> <lastName>harris</lastName> </ns1:PersonBad> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>steve</firstName> <lastName>cassidy</lastName> </ns1:Person> </ns0:People>
Ok, so no surprises here, recoverable interchange is working exactly as expected.
Next I try the following message, that exhibits an altogether different problem, this time it represents malformed XML and is non-schema conforming. Note the closing People node <//ns:0People>, that contains two forward slashes and not one.
<?xml version=”1.0″ encoding=”utf-8″ ?> <ns0:People xmlns:ns0=”http://RecoverableInterchange.People”> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>simon</firstName> <lastName>malformed</lastName> </ns1:Person> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>mark</firstName> <lastName>malformed</lastName> </ns1:Person> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>steve</firstName> <lastName>malformed</lastName> </ns1:Person> <//ns0:People>
This results in a suspended / resumable message, which is the expected behaviour.
Our next and final message is as follows:
<?xml version=”1.0″ encoding=”utf-8″ ?> <?xml version=”1.0″ encoding=”utf-8″ ?> <ns0:People xmlns:ns0=”http://RecoverableInterchange.People”> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>simon</firstName> <lastName>malformed</lastName> </ns1:Person> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>mark</firstName> <lastName>malformed</lastName> </ns1:Person> <ns1:Person xmlns:ns1=”http://RecoverableInterchange.Person”> <firstName>steve</firstName> <lastName>malformed</lastName> </ns1:Person> </ns0:People>
What’s wrong with this you ask? Look closely; yep it’s got duplicate XML declarations at the very beginning. Now if your like me then your thinking that this interchange is going to fail as a whole and therefore all three contained messages will be suspended. Guess again! When this message comes into port, it is indeed pick out as being BAD and does in fact get suspended. The Event Viewer reports the following two errors which clearly show that the XML disassembler in the pipeline cannot make sense of the message and match it to anything expected.
Looking at this you might expect that the malformed XML wont clear the gate keeper and the entire envelope will be suspended (that would have been my guess). Even though the messages contained within are correct per se, the entire original message is not valid XML, so nothing doing in gaining any of the Recoverable Interchange benefits cause the whole message is considered rubbish in this particular case (after all it’s invalid XML so fair enough). Next thing I tried was to resume this suspended message which should duly be expected to fail and become promptly re-suspended but you guessed it, no chocolates. Looking further into the issue I discovered that my message has actually fallen into a BizTalk black hole altogether and I have no evidence of my message at all except for the following two errors reported in the event viewer.
First error message indicates that the error occurred again but suffered a routing failure and then the second message tells us that an error occurred whilst trying to re-suspend the message. Unfortunately what none of this tells us that BizTalk has also gone and deleted the message and just when I wanted to recover failed messages! Upon further investigation it turns out that I had incorrectly set both subscribing send ports with exactly the same filter predicates.
What I should have done was setup one send port subscription on the receive ports name and another send port subscribing to errors on the same receive port. What I actually had done mistakenly was to set both send ports subscription filters identically and it would seem that when combined with enabling message routing for failed messages, will produce this error. Beware the combination of malformed XML, enabled routing for failed messages and RIP or your message might not Rest In Peace! Finally I want to mention that the example envelope and messages in the the RIP scenario scenario presented here is based on the example provided by Richard Blewett on his blog.
No commentsREST Vs. SOAP - Is the writing on the wall for WSDL & SOAP?
I have the overwhelming feeling that Microsoft will be putting more and more effort into REST, ATOM and a variety of other technologies that do not tow the SOAP / WSDL / Web Service & WS* party lines. This is extremely good for those who felt that all the technology on that stack was becoming overly complex and burdensome and to boot was not adopted by the larger internet players from whom all of us corporate behind the firewall types can learn a lot. Growing up over the past 3 or 4 years it has become more and more pronounced to me that there were these two competing worlds of technology, the one for the internet scale thinkers and doer’s and the other corporate small scale network folks, the later becoming ever so more curious about the black magic being practiced by the likes of Google and Amazon and want to come to terms with why it seemed so different and also saw benefit in borrowing their ideas and applying them to the problems in their environments.
We should take into account that Microsoft provide some highly usable and reliable tools for the corporate world, for example programming against an object (albeit a proxy) generated off a WSDL document, provided the kind of familiar comfort we feel with an old pair of socks.
Enter REST, ATOM and some of their friends and I cant help but notice the change in pitch of the new breed of Microsoft technology development teams. A lot of excitement seems to be being generated around REST and remarks like “keep using SOAP and WSDL if it works for you and your comfortable with it”, only seem to support that notion that the focus is shifting. Throw in the fact that the recruiting at Redmond also seems to be somewhat focused on a blend of great academic minds and community thinkers and celebrities, the noticeable change in culture at Microsoft is palpable. This all very good, very good indeed.
WCF Transactions - Treat with care.
The most canonical example usage of WCF Transactions (according to Juval) is the following :
A client application wraps (n) service calls in a single transaction and relies on the ACID safe haven of System.Transaction to save the day. In my opinion, one of the core characteristics of WCF is that it’s design favours (and Microsoft like to demonstrate it’s usefulness via) a remote object pattern where developers program against an objects polymorphic signatures i.e. its behavioural aspects, via a proxy. This can be seen when we consider code such as proxy.UpdateAccount(); or proxy.RemoveCustomerPurchase(Guid purchaseID); for example, where proxy is a variable name for an object classed from a CustomerAccount class or more precisely implementing the ICustomerAccount interface and it’s operation contracts. Why am I highlighting this design characteristic? It’s my view that this characterstic has permeated it’s way into the entire design of WCF and it’s impact is felt an many corners of the Platform and affects us for example in the choices we are required to make on concurrency and instancing, however before we get into that let’s look at the example below as per the canonical example.
Figure 1.0
Listing 1.0
public void Call_A_B_C(Guid purchaseId) { //…..create the binding and the addresses ChannelFactory<ServiceA> factoryA = new ChannelFactory<ServiceA>(bindingA, addressA); IServiceA channelA = factoryA.CreateChannel(); ChannelFactory<ServiceB> factoryB = new ChannelFactory<ServiceA>(bindingB, addressB); IServiceB channelB = factoryB.CreateChannel(); ChannelFactory<ServiceC> factoryC = new ChannelFactory<ServiceC>(bindingC, addressC); IServiceA channelC = factory.CreateChannel(); using(TransactionScope scope = new TransactionScope()) { channelA.RemoveCustomerPurchase(purchaseId); channelB.UpdateInventoryRequest(purchaseId); channelC.AmendBillingNotice(purchaseId); scope.Commit(); } //…..etc }
Let’s also assume that our Services A, B and C have the following Transaction flow attributes setup correctly so our client initiated (ambient) transaction flows through to the services.
[ServiceContract] public interface IServiceA { [OperationContract] [TransactionFlow(TransactionFlowOption.Mandatory)] public bool RemoveCustomerPurchase(Guid purchaseId); } public class ServiceA : IServiceA { [OperationBehaviour(TransactionScopeRequired = true)] public bool RemoveCustomerPurchase(Guid purchaseId) { //…Database access etc… } }
The idea here of course is that calling all three services in a single transaction will make our services behave in a way where the state of our systems (most likely kept in a database repository) will remain consistent in the case of a failure at any point through the transaction - it’s an all or nothing proposition. Combined, this aggregation of requested operations represents a discrete logical business workflow of some kind, where all operations or business actions must succeed or fail as a single ATOMIC transaction.
Again let’s note that these (three) operations (service method calls) are modelled more like traditional business object behaviours and will likely include some database interaction (we will talk about the File IO in figure 1.0 further on) and other potentially long running code. These operations typically implement a request / response message exchange pattern and by lining up all three of these requests inside a transaction and locking resources we are immediately creating a friction with our ability to scale and increasing our chances of encouraging database deadlocks and service timeouts. Also worth noting is that the default IsolationLevel for the TransactionScope is Serializable which happens to be the most expensive Isolation Level and will lock whatever our service touches until it’s finished doing it’s work.
As a consultant I have seen my share of Web / WCF Services running out there in the wild and it’s pretty common to see them spread over a variety of IIS machines and in some Windows Services running under the SCM and furthermore some of these services are large (have hundreds of operations) and have lots of database access code and they all hit the same tables in the same DB’s in any variety of order. For example, in our example above, it’s not uncommon to see Service A, B and C all accessing the same database and tables and it’s also common for a variety of disparate applications to use all three services independently and not in a universally consistent manner. Lets say we have client X which uses Service A and C and client Y which uses Service A, B and C and client Z which uses Service A only. Imagine that these client applications use any number of permutations in aggregating service method calls within a surrounding transaction; while these long running operations are having distributed locks applied to both the services and database queries, other client requests are waiting and become more likely to timeout.
This approach does not scale. So let’s imagine we are using RepeatableRead for an isolation level (not the default Serializable) and we have multiple clients making distributed transaction calls across services, all the locking going on in the database is going to slow things up and our services will be waiting for locks to release, which in turn will make our clients wait for our services to return responses and gradually our system will grind and timeout, suffering more and more as further users come on line. Their is an underlying complexity with WCF declarative architecture, many configuration options that can be strung together to state an intent about instancing concurrency, ATOMICITY, transport characteristic, security and the list goes on. Sometimes its possible to create a collection of declarative statements that are acceptable however when applied are redundant or countermanding. It’s possible in the scenario we are discussing that service operations under load be queuing up calls and locking out other clients whilst this potentially weighty transaction is occurring across different web servers and databases and should one of them start to cave in under this load then our system is going to spiral down and every part of it will probably become non-responsive. In the meantime who knows how many important messages are being lost on the wire without any store and forward strategy. These are indeed some of the reasons that distributed transactions are a bad idea and that it may be worth considering transactional asynchronous messaging in these scenarios. Store your message and then get it across the wire under a transaction (so it’s not lost) letting each endpoint worry about storing their own messages and processing them under the steam of a different thread than that managing the Service method. You could of course use MSMQ binding but it’s a limiting design choice and if you require HTTP as the transport it’s not viable. Also, if you have to deal with state at the point of handling the message, then Separating the threads or perhaps even the processes for dealing with the concerns of receiving the message and handling messages, makes for a more scalable and performant system, lending itself well to dealing with long living service state. Perhaps your service design includes dealing with various messages over time, where the messages need to be correlated to one another as in a Saga.
Client passes message and flows transaction to >> Service A >> Service A passes message and flows transaction to >> Service B >> Service B passes message and flows transaction to >> Service C. Now if anything goes wrong in the hops between client and services here then we expect everything that happened to be rolled back right? Wrong! With ASMX or WCF out of the box bindings that don’t include MSMQ, there is no durable storage of our messages, so if the client below in Figure 2.0 sends an order, something goes wrong somewhere and we roll back, we just lost our message (hope it wasn’t an order right?). Furthermore, in Figure 2.0 we could add more clients which use any one of the services depicted exclusively, for example Client B comes along and calls Service C only, meanwhile the depicted client (A) is making an distributed transaction across aggregated calls to Service A, B and C where it currently is waiting for Service A to complete its work. Service A (on behalf of client A) is accessing the same database table as Service C (on behalf of client B) - there is a deadlock in the database, a victim is chosen and someone rolls back, however whilst they were blocking and the transactions were open on our Per Call, Single entrant WCF Services, all the other clients requesting work were held up and waiting and potentially timed out as well.
Figure 2.0
It’s generally a good idea to keep your transactions in WCF open for as short a period of time as possible and that’s why using WCF as pure message transport (i.e. absolutely no business logic or database interaction) can help you achieve better throughput, scale and be more fault resistant. Transport your messages transactionally from queues (where the message originates) and to other queues located at the message destination, this will protect against lost messages; it’s possible to use WCF entirely as the transport protocol enabler of choice and not follow the CRUD RPC styled messaging pattern.
Figure 3.0
Because we read the message off the originating queue and send it to target queue within the scope of a transaction, our message will simply roll back of there’s a problem. If we pursue the concepts outlined in Figure 3.0 to their conclusion, we might consider MSMQ or SQL Service Broker to provide us with our Queues and Service Broker does offer us some potential Smart Client benefits in being so close to our endpoints in this design scenario. Let’s be clear however, we are proposing using Service Broker for it’s queues and only it’s ability to provide queues, we do not propose to build services, contracts et al in Service Broker, thus avoiding any ensuing licensing issues which Service Broker will create if you implement your services using that technology.
No comments







