Enthused: A bunch of things I learned at GTAC 2014 (Day 1)

[ Note 0: This became very long.

Note 1: I will try to come back and properly link to the efforts/projects mentioned in this post, but for now, you can "google" most of these and find them successfully.

Note 2: These observations have my usual biases, especially web application testing. Apologies in advance to mobile testing people doing really interesting things. ]

GTAC Schedule

Skip to Day 2

Google (Ankit keynote) uses "push on amber" rather than "push on green" as a continuous delivery philosophy (if I understood correctly, the primary motivators are that this allows for a "push decision", exploratory testing, and other manual stuff to be in the loop, but to be informed by test results).
A "hermetic test" is one that could run on a (Wifi-disabled) airplane, with no network. Such tests should be the goal of teams moving fast.

We're going to present MSL tomorrow ... we use it to achieve this on UI tests at FINRA. When this was mentioned, there was much pumping of fist on our row.

Google (Ankit keynote) advocates a testing philosophy antithetical to the "ice cream cone" anti-pattern (see here). You can see the same idea in our MSL preso here. Again, much pumping of fist when this was mentioned. May have even been an "Amen".
Google has a tool called Sheriff for identifying and AUTOMATICALLY REMOVING flaky tests.
James Graham and others at Mozilla are working on tests for browser interoperability (via the W3C Open Web Platforms standard, a work in progress). To be somewhat neutral (I guess), they've developed their own client-side test framework (looks like *Unit, in JS). They also developed a Runner, and an "expectation file format" which tells a web server what to serve up for a particular test.
Karin Lundberg's talk (although focused on testing Mobile Chrome) brought up some interesting general testing advice:

Developers tend to prefer using the same language for test dev and feature dev.
When introducing new test frameworks, interesting examples and plans for utilities are important.
Chrome has a performance testing tool called Telemetry.
Tests (always) need to be able to run in tool-agnostic but tool-compatible way.

Google did a broad analysis of the code coverage metrics on its projects and came up with a standard of 85%. They also have enhanced their code review tool with unit test coverage numbers.

I did not handle this talk well as I do not care for isolated discussions of code coverage. It is necessary (obviously you can't find a bug that you don't cover); but it is not sufficient. Yet somehow, it was the only test quality metric that got much attention today. How can we possibly have a uniform standard for code coverage without critical standards for other metrics right beside it?
[ To be clear, the presenter today wasn't suggesting anything about the "sufficiency" of coverage. The extension of the code review tool is interesting, and code coverage is important. ]

Cat.js is a tool for converting annotations in mobile application code into test cases that can run on real devices.
Vishal from Dropbox explained how they are abandoning Jenkins CI in favor of a largely homegrown CI solution called Changes. [I see no changes ... wake up in the mornin' and I ask myself ... ] The motive is better support for sharding of tests and running things in parallel.

For resource management and allocation, they use Mesos on top of LXC
[ Speaking of LXC, no one mentioned Docker today ... ]
One thing that wasn't mentioned was whether app instances were being duplicated or just test driver machines (e.g. Jenkins' "slave nodes"). If each test does its own setup of data, does it not need its own app container, 1:1?

Gareth Bowles from Netflix talked about all of the testing that they do IN PRODUCTION.

The coolest thing to me was the fact that they have coverage enabled in production. They use Cobertura, which does add an overhead of around 5% on their production systems. This is probably because Cobertura doesn't work via dynamic instrumentation but static (you have to generate an instrumented version of the source code and add this to the classpath for data to be available). He did mention that they have extensions to their base AMIs, Tomcat installs, etc. to make this relatively easy to enable.
One interesting problem at Netflix's "Internet scale": There is no possibility for a "QC environment with prod-like volume", because prod-like volume is HUGE (a significant chunk of all US Internet traffic, for example)
They have various "monkeys" which can bring down instances (Chaos Monkey), and simulate the failure of availability zones and entire regions of service in AWS. And they run these tools in production (giving warning for the latter two, though the instance termination is on by default in production).

Jay Srinivasan's talk on an infrastructure for real device testing described a real device infrastructure and accompanying test runner:

It can (obviously) target tests to a given set of devices
It can simulate network latency and locality issues
It also comes with a "robot" for systematically exploring the application space through UI events [ this reminded of the Android GUI Ripper work Atif did with the group from Naples ].

Also from Jay's talk, a study of 1-star mobile app reviews found that around 40% of underlying issues related to crashes and functional bugs (not the harder-to-test performance issues, or even "usability" or something else fancy).
Celal Ziftci presented a machine learning tool which can infer invariants from analysis of production logs and enforce these in real time (in production, after some optional filtering).

From what I understood, at Google, they use protocol buffers to enable somewhat standardized logging of serialized objects at critical points in the system (and especially between systems).
They can also compare Prod invariants to invariants picked up during test to gain some confidence (or not) about the usage coverage of testing.

Nan Li is one of Jeff Offut's former students at GMU who created a language for providing mappings from state machines of applications to "concrete test cases" (which can be executed).

Of course I love model-based testing, but one of the main reasons my academic advisor invented GUI Ripping 10 years ago was because constructing state machines is very complex for any non-trivial application. Perhaps on constrained domains (or ridiculously important ones like space ships) this is worth trying.
Nan will be applying these techniques at Medidata Solutions, doing something with genomics. It may be a great fit, but I don't see it scaling to web apps.

Orange is using Raspberry Pis to help automate testing of real set-top box devices.

These are my notes. What are yours? What did I get wrong?

Enthused

Pages

October 29, 2014

A bunch of things I learned at GTAC 2014 (Day 1)

8 comments: