Data is a cost issue

One of the major challenges that software developers and testers face every day stems from the inability to get real data. A lot of times, as a developer, you’re interacting with downstream services, and you have to use whatever data is in that environment because the process of getting the actual data available for your scenario is time consuming. You often can’t find the data you need and have to get it from production, which creates a new set of challenges.

Further complicating matters, personal data cannot be used from production, as this increases the risk of tissue theft, loss or exposure. Take the recent breach at Yahoo, where 500 million email accounts were hacked, or the recent breach of data for some 68 billion LinkedIn users. These violations occurred at safer levels of production. Production data used in the development world is not uncommon and tends to be less secure. Operating in this way poses a huge risk to an organization’s brand reputation. As a result, sensitive data must be scrubbed or masked, a time-consuming process that requires data expertise.

Use service virtualization to overcome data costs

Either way, data is a cost issue because it slows you down. By using service virtualization, you can not only control the behavior and functionality of dependent applications to stabilize your test environment, but you can also fully control the data sources of these dependencies and provide whatever data you need to do your job that day. At this point, the rules change, because you now control not only the data, but also the logic. You can create services that behave the way you want them to, rather than strictly following their normal behavior patterns.

In a previous article, I discussed defect virtualization, which has the same basic principles. But previously we were talking about service logic. This article takes the next step and discusses data control. To get started, let’s focus on the data challenges that testers and developers face every day today.

A typical data day in a developer’s life

In the early stages of application development, the data required for testing is often simple because the full functionality of the service has not yet been implemented. As you continue to add functionality to your development, so does the maturity of your tests and the complexity of your data.

As an example, let’s use the example from my previous article — let’s say I’m an airline and I’m working on a ticket page feature. I need to verify that the user can get a ticket, and depending on how far in the future the flight is, the user will get one of several responses that will change as the time approaches. At the beginning of development, I could simply generate a bunch of complex data with flights for the next 3 months, so I could do all the tests I needed so far. But of course the problem is, I just lit the fuse for a time bomb. In three months, this beautiful data will be out of date and I may have forgotten it. All of a sudden, all of my tests would start failing at exactly the wrong time, as the release was coming and I didn’t have time to regenerate the data…… Sound familiar?

Forge a path to sustainable development

By introducing service virtualization early in the development process, you can lay the foundation for providing solutions to these data challenges. Data for a virtual service can come from many places, but in the beginning, a simple virtual service starts with fixed data. You create these “fixed assets” or mocks to solve problems during the testing phase of the What-if scenario and keep things very simple. The idea here is, “I just need one service and it will respond with this particular payload”.

As virtual services mature, it becomes necessary to separate the data from the service so that if you want to add logic to the simulation, you don’t actually have to open the virtual service to manipulate the data. In fact, sophisticated users create virtual services by letting data sources handle most of the logic. They can then hand over the data source to testers or test data management teams to plug in any data that the service may need in the future. Adding new functionality to a service is as simple as adding a line to a data source. This allows virtualized work to be shared, and a single virtual service can accommodate multiple teams. Virtual services become living organisms that can grow and change as needed.

Where does this data come from?

Once the development has created the initial simple service, it’s time for the test team to take over. The test team will have more complex data requirements. Where does this data come from? Normally, you get this data from records and playback. This is usually the first step when creating a virtual service. You record transactions between your application and a dependent back-end system and use this record to create your virtual service. This allows you to create a very usable baseline data source that can be extended whenever needed. In the case of my airline, this would allow us to get realistic numbers of flights and destinations. The data will have all the necessary complexity, including multi-segment and international flights. Data source dependencies handle all the complex request/response relationships, and since subsequent changes to “real” data can simply be re-recorded and merged into existing virtual services, retrieving new data becomes trivial.

The data we record is not from production, which protects us from data leakage in low-level environments. The challenge with this data is that because it’s not from production, it’s not that complete or up to date. In this case, the generation and manipulation of data becomes a powerful feature of service virtualization.

Non-existent data can be supplemented with simple generation data to complete the data we need. In my airline example, the flight date in the reply can always be today’s date, offset by 3 months. By using data generation, this task becomes trivial.

We can continue to massage and manipulate data by providing dynamic data to manage any “undefined” request/response relationships. These are the types of relationships that would never exist in a static data set. In the airline example, assume that when the request is made to the downstream component, it provides the user’s current location, which will be used as the departure point in the response. Because our test cases are constantly changing, a real service must maintain all current locations in order to provide them in the response. By using virtual services, you don’t need to maintain all locations, you can simply dynamically return the user’s current location as the starting city.

Finally, using negative data can be supplied statically or inserted into a data source to facilitate negative data or exception testing. For example, in my airline example, this would be to insert a random canceled or delayed flight to verify that the user was notified before leaving for the airport.