Experimenting with Sustainable Engagement (Sustainable Growth Day ‘19)
0 Comments


[APPLAUSE] ROJAHN AHMADI: Hey, guys. So I’m Rojahn, one of the
founders of NeuroNation. So a NeuroNation is a
brain health company. I’ll come to this in
a couple of minutes. [INAUDIBLE] Yes. So before, I would like to ask
a question– a simple question. What should people
do more of, watch TV or take care of their
health and education? So who is for TV? Whoa. OK. Some people work in some
other industries here. [LAUGHTER] But I think most people would
agree education and health is somehow good. However, what are they doing? People spend 2.8 hours
per day watching TV. And then it’s, I don’t have time
to take care of my education, to take care of my health. And we are actually a
brain health company, so this is our
competitor– the TV. Our competitor is not some
education app, some health app, to get those couple of minutes
out of that app to our app. No, people spend, really,
a lot of time here. So I think it’s OK if you take
some of those minutes out here. NeuroNation is a
brain health company. Our first two programs– one is a cognitive training that
improves memory and attention. And the other is a
mindfulness program that helps to better
deal with stress. So we’d like to think about
behavioral science as one way to shift these minutes
a little bit over there. So I’ll talk about
some theories today. And hopefully, this gives
you some inspiration to also experiment. This leads me to the
agenda for today, which is inspired by the
shortest definition of science I could find. So science is the systematic
way of acquiring knowledge through observation
and experimentation. And this is why I’ve structured
the agenda like this. So first, we look we
look into our processes. So I saw a lot of discussions
here on processes. So I hope there’s some
interesting things in that. Then we go for the
observation, engagement KPIs. And then the experiment– I share one of the experiments
we did [? doing ?] this [? format. ?] So our process–
at NeuroNation, we work based on the
high-tempo testing framework that I got to know from this
book, “Hacking Growth,” which Sean Ellis wrote. And this is a
four-step procedure that you repeat all
over and over again. So step one is analyze. You analyze where
the problem is. Then you generate ideas,
step two, Step three, prioritize these ideas. And then put the experiment out. And we look at each– into each step. So analyze– this is a
weekly meeting where we look at the problem in the product. So how do we do this? We map the customer
journey into one funnel. So we have one place
to look at things. And this funnel, of course,
depends on the product. In our case, it would maybe
start with the installs. So getting from here,
the installation rate over to registration,
onboarding completion, initial retention. And then hopefully,
at some point, you have this customer
success KPI, or North Star, as [INAUDIBLE] mentioned. And then the question is
where the problem is, right? So you might have a lot of
discussions around that. Now, we don’t necessarily
go for the spot where we have the biggest
number, because there might be smarter ways than that. Fortunately, Google shares
a lot of benchmarks. So we have benchmark data to
compare, where do we have, actually, the difference
to the benchmark? So maybe a 70% here
might be OK for now. So you compare to benchmark. And then this is the weak spot. This is where we will
focus on in the next sprint in our product. Now, who’s getting this data? Data analyst’s getting the data. And what tools to we use? We use Firebase connected to
Google BigQuery to go to this low-level data here– behavioral data. And then the Play console
you will also, of course, need for the install
rates, at least. So let’s say we took on initial
retention as the weak spot to work on. So this is, then, the
focus for the sprint. Now, it’s about ideas. How do we solve this thing? Now, who generates ideas? Not the one product manager. So we try to make the
team generate ideas. And then you might come
up with certain things from customer research,
maybe some problems there. But also, we like
to use theories from behavioral science. There’s a lot of learnings
there that we can apply– for example, the
self determination theory that I’ll
share later today. So the ideas are
generated by the team. Then you have lots of ideas. Now the question is,
what do we go for? Prioritize– step three. So here, we basically
give a score to each idea. And this is the ICE score
from this high-tempo testing framework. And it’s built of three
components, impact, confidence, and ease. Impact says if the assumption
holds true behind this idea, then how high would the
impact be on its KPI? Then confidence– how much data
do we have to support this? Maybe we did something
similar in the past, or we saw on some
Google conference some interesting thing. Then we have a higher
confidence here. And then ease– how
easy is it to implement? And then 1 to 10
for each of these. And again, the team
votes, because else, you will have the dominant
voice in the room. We all know this in a meeting. There is some dominant person. And then we will do whatever
this dominant person does. So, no– we go for this
team voting and some table so that we can build
the averages out of that, average
over those votes. And then we have the
ICE score for each idea. And now, we can go for the
maximum ICE score idea. And we’re in this testing phase. So for this, of
course, the dev team will now build this feature. Then the data
analyst needs to just make sure we have a
proper split test. The tools are set up. For that, we use
Firebase A/B tester. And then, hopefully the variants
are set up in the proper way. And then, of course, we
need to test each variant. Often times we might have
problems in some variants, So we have to test all of them. And then we release that. So if we did all
these steps, then hopefully the next time
we go to step one again– the next weekly meeting,
maybe, or the one after that– we have some success there
on initial retention. And then our focus is– has received some
attention there. And this is the process we
repeat over and over again. So we had some learnings
as we applied this process. First of all, the team
has to set a target for the number of
experiments started per week. So this is important so
that we don’t somehow come up with this, like,
yeah, let’s build this, and then one year later we’ll
know if it’s successful. Doesn’t make sense, right? The world is moving
too fast for that. So set a number for this
number of experiments per week, because most of them will fail. We know this– most
experiments will fail. We can’t wait so long
for the experiments. Of course, if you
have a number, then it’s easy to somehow
fake this number, then come up with some
random experiments. That’s, of course,
not the purpose. So better having a
smaller number, but then meaningful experiments. And then you might apply
this again and again, and then, after a
couple of months, still, you don’t have
any high impact thing. Then you can– instead
of sorting by ICE score, you can just take the
impact factor, for example– try to sort by that
once in case there was no success for quite a while. This is one way to do it. And then the formats
were very important. So daily stand ups to
get this experiments that the team agreed on out
fast, as fast as possible. And then the weekly meetings,
where you would make sure that everybody’s looking at
the right data, the right weak spot– focused on that. So this is about our method. So now about the KPI
or about engagement. We saw this funnel. Now the question
is, how does this relate to North Star metric? So what is this
North Star metric? Depends on the
product, of course. For brain training, it’s a
little bit easier than, maybe, for some other apps
because there’s a lot of research on that. So neuroscience has shown that
if people train for, in total, 10 hours over some period–
maybe a couple of months– then there are some measurable
improvements in their lives. And so this is why this
is our North Star metric. So this comes from
neuroscience– 10 hours. Now the question is, how
do we come from there– from this North Star or
long-term customer success KPI? How do we come to something
that we can work with? So we cannot start an experiment
and wait until users reach 10 hours of training. We’ll need, again, half a year. So from there, we do the
research, the customer research, on blockers
and predictors. Can be, for example, data
scientists as, in [? the ?] US [? team, ?] identify
some factor for that. Or there might be some
interviews with customers. And then the overall question
is, why are customers not getting there? That’s the overall question. And then, if we know
this, then from there, we can derive a
short-term engagement KPI and then do rapid
experimentation on this one. Now, for brain
training, in our case, this is the 10
hours training here. And this is the 20
exercises completed. So an exercise takes about
one and a half minute. And then 20 exercises
completed in the first week, actually, for the
segment of new users. This is the thing,
because then we know the majority will
return the next week. But then it’s a process. You always try to find better
and better predictors here. But this is what
we have right now. Now for the theory
that we applied. We took on this one– so the
self-determination theory. What is this theory? We actually got to know this
one in the Playtime 2018, in Europe. And the theory says,
basically, if you put the user in the driver’s seat, that they
can somehow decide things– it’s not like the machine
tells you to do this. It’s more, OK, I can
decide for some things. Then you start to build
some ownership there. And then this is
the way to connect to the intrinsic
motivation of users to return, to build a habit. So this is what
this theory says. And the theory has a
couple of pillars– three pillars to work with. And we took the autonomy
pillar as a foundation for our experiment. And I will share now what
we did with this theory. So our focus, by the way, was
the early retention there, the week one retention– this part of the
customer journey. So we also saw in
customer research that training was quite
difficult for some people, especially for older people. So we have a quite
broad audience. And for these, it was
just too difficult. And then, of course, you
also might have younger users who find it too easy. The system already
personalizes the training. But then, how fast? So the pace of the training–
how fast it adapts to you, how fast does it
increase difficulties? These still, then,
are not perfect. This is what customer
research showed. And the idea was, let’s also,
here, try with this theory and ask the user
for feedback and let the user decide on this pace. Let them build ownership there. This is what we did there. And how did it look like
in the first version? So remember, rapid
experimentation– so we didn’t want to wait too long. Maybe this was launched
in one week or so. And this was a simple
screen after an exercise. If you perform particularly
bad or particularly good, then the system would
show you the screen and ask you if, maybe, the
pace is too low for you or it’s quite challenging, actually. Maybe you want to
[? shortly ?] reflect on that and then decide
for something, OK, I want to increase
pace or decrease pace so this better fits, then,
with my way of learning. So this is what we did there. And the early results
were like this. So for older users, 30-plus,
we saw a 42% increase in conversion. So it seemed like, OK,
we found something there that connected to this
findings from customer research where they find the
training too difficult or somehow not suited for them. So they found some
value in that. So we got more people to
unlock the full training plan– because our free part is, of
course, very, very limited– only five exercises. And here, there
was some signal– engagement didn’t
really improve yet here. But especially for the younger
users, really had bad results. So minus 9% on the
engagement and minus 24% on the conversion KPI. Because if it was only this,
then you could roll out. Then you could roll
out and then work next. But this didn’t give
us an easy answer here. So what did we do next? So we did a lot of things. Just mentioning a couple here. So one thing we thought, OK,
maybe these younger users and teenagers–
they somehow don’t like this kind of smileys. They also are not very positive. So we went over to some
more neutral thing, more like a scientist, and some
something there to look at. But then it didn’t
actually matter. So it didn’t work. And many other things
didn’t work, either. Then we came to this insight. So we watched videos of
customers using the app. And we saw that
somehow, friction seemed to be a problem. So they would come
to this screen. I mean, it doesn’t come
after every exercise. You have to perform quite
bad or quite good for that. But often enough that
it seemed to annoy them. So they didn’t want the screen. And then we got more
[INAUDIBLE] there. And so the insight was,
autonomy comes at a price. And we wanted to
reduce this friction. What we did was, before
we had the single screen that would ask this question– if you’d like to increase
pace or decrease pace. Now, what we did is, we
merged this component into an existing screen
that users already had, which is this here. This screen is the
post-exercise screen. So after each exercise, you
see your result and stuff here. And then we just
added one other box. You can scroll here. One other box there that would
ask about this autonomy thing. And they could say increase
pace or decrease pace. But they wouldn’t
be forced to answer. So they could just
say, [INAUDIBLE] [? continue ?] my session. I don’t care. Still, the majority answered,
there, the question. This was one thing we did. And the other thing was,
maybe not everybody wants to be in the driver’s seat. So maybe they are
not sure about that. So we asked in
onboarding if they would be willing
to answer questions at all during training, or if
they want the system to decide everything. Now, it mattered
a lot how we put– what wordings you put here. Whatever sounds exciting,
they will click. So we had to come up with
three, four variants of this until we somehow, at some point,
got something meaningful there. It was just quite
abstract here– regular follow-up
questions about adjusting training complexity. And then you could say,
activate or deactivate. If it was automate or manual,
they would go for automate. So it’s like,
really, this matters. So here, then, we got 90%
that would say, OK, activate. 10% would not want
any questions. This is what we did there. And then, finally, we actually– I think this was, then,
the sixth iteration of the experiment. Again, like weekly or
biweekly experiments. Then we finally
fixed the problem. So here, the 30-plus
users still– here, we got an uplift
from plus 1% to plus 7% compared to the control
group in engagement. And the conversion was also,
again, a strong positive one. I don’t have the final
number here because we didn’t wait for the
significance on this one, because it was the same
component just merged into this other screen. But most interestingly
for the young users, they had previously
minus 9% engagement. They went to a plus 10%. So this was really the
problem– engagement, and not visuals
or anything else. And the conversion for
them was actually stable. Same as control. So that’s fine. Then we could roll
it out, finally. Yes. So the learnings I can give
you, then, today that we had were these three. So the high-tempo
testing framework, really thinking about
fast experimentation, not like months– in our team, probably if
somebody would come up with something that needs six
to eight months to develop, we would say no. So it’s really fast
experimentation. Let’s first learn about
the basic assumption. If the assumption is really– shows any signal. And if so, then dive into that. Then we can develop
the six-month feature. So this is high-tempo
testing framework, and really aligning
the team on that. It’s not, OK, you will
do this, I do this, and then nobody’s responsible
for the bottom line. No, we have the one funnel. And then we work fast on that. And then that the engagement
KPI should connect to some North Star metric. I think this is very hard. But I think there might
be a lot of value, because as app developers,
we’re not there to somehow take their time. We have to somehow
deliver that value. Everybody knows this. But then working, somehow,
on finding that thing that has some impact in their lives–
this is, I think, important. Because from there, we can
work on the writing, then. And then, just as an
inspiration to look at this self-determination
theory and then, maybe, try out one of those
things and give the user a little bit more options, but
not creating too much friction. Yes. So now we have questions. [APPLAUSE] SPEAKER: Thank you. ROJAHN AHMADI: Yeah. You were the first one, I think. AUDIENCE: Thank you. Can you describe– so it sounded
like you were doing these tests at a pretty quick pace, right? ROJAHN AHMADI: Mm-hmm. AUDIENCE: Weekly or
almost bi-weekly? ROJAHN AHMADI: Yeah. AUDIENCE: Can you
describe the nature– because it seems like it would
be pretty difficult to have such– especially,
like, you seem to have detailed results
out of every week. ROJAHN AHMADI: Mm-hmm. AUDIENCE: Are we talking,
like– what kind of numbers were you looking at? What type of users, behavioral? ROJAHN AHMADI: So where do we
get data fast enough, right? I guess this is the
kind of question? Or– AUDIENCE: Sure. Well, how were you measuring
that [INAUDIBLE] retention? And was it behavioral tests,
user [? usability– ?] ROJAHN AHMADI: OK. How do we measure that? AUDIENCE: Uh-huh. ROJAHN AHMADI: Well, so we were
collecting data with Firebase and connecting to
Google BigQuery. And then we would look
at behavioral data. So in our case, actually,
the engagement KPI was completing an exercise. So it’s actually, we would
have it both on Firebase and in our back end. But we usually took BigQuery. And [? there’s– ?]
had an event. The app tracked this event,
that an exercise was completed. And then this is
how we measure that. Yeah. And then we usually had to
discuss, always with the devs, how to reduce the time
needed for a feature so that it can go
fast enough out. Right now, we’re
trying to hit four– number of four per week. But this includes
the whole funnel. So would include listing
store experiments and stuff. So if in the product we have one
release every one to two weeks, this is, I think, the
minimum that should be there. OK, maybe from– SPEAKER: Also, if you
could please say your name and company before you ask
your question, that’d be great. ROJAHN AHMADI: Next one. [? Right ?] [? there. ?] AUDIENCE: Danielle, Google. Sorry, I’m cheating. [LAUGHTER] Were you tempted to personalize
the experience for users by age after that experiment
you showed and just show the one that was really
awesome for the over-30 crowd? Were you tempted to
just bifurcate and– ROJAHN AHMADI: Yeah. You mean like– AUDIENCE: –funnel them? ROJAHN AHMADI:
This moment, right? AUDIENCE: Yeah. [LAUGHTER] Were you tempted to just
have two experiences? ROJAHN AHMADI: Yeah. Of course there was the
first person in the room– yeah, OK. No problem. For the oldies, let’s
just do the autonomy. And the others will
not see anything. And we were like, OK, guys, this
makes the product so complex to manage. And then we will, at some point,
have very fragmented product. And this was not
an easy discussion. But we said, we have to invest
some more time, some more experiments. There is some signal. Let’s really find out if we
can’t fix the problem here before we go to that. This was really a no-go area. Fragmented product, and then
customers would call and say, I don’t have this thing. Why does the other have it? So we did not want to have that. AUDIENCE: Hi. I’m [INAUDIBLE],, with
Coffee Meets Bagel. Thank you so much for
your presentation. So you talked a lot about
the velocity of test and how important that is
because most tests will fail. I’m wondering if you could
share a few things that you did or your team did that really
significantly increased the [? velocity ?] [INAUDIBLE]. For example, I think some
of the tools and processes that we can put in place
to do readouts faster– I think those can actually
really impact [INAUDIBLE].. ROJAHN AHMADI: Mm-hmm. Do you mean development
wise, or in what way? AUDIENCE: Like if there’s tools
that you guys use for readouts or tools that you guys use for
setting up tests or processes, even. ROJAHN AHMADI: Yeah. No, for the tests, it’s really
just the Firebase A/B tester and the Google BigQuery
connected to that. That’s what we use there to
just set up the experiment. So the question is, how do
we somehow increase velocity in general, right? Well, it’s a lot
of discussions, I would say, because the developer
always wants to somehow– yeah, let’s add this and then
make it proper and so on. And it’s always
like, if you don’t know if this is going to be
successful, build your stuff. And then 80% of the
cases, maybe it fails. And then you don’t need to
do this proper, clean system. You can clean up once we
know it is successful. We always have this discussion. But then I think the hard part
is really just this development thing, because other than that,
you have the tools, the A/B testers. BigQuery shows you very fast,
also, if an experiment fails. So we have these daily
standups where we always look at the current
state of the experiment. So for example, there
might be no data coming in for the variants– so some
problem there in the setup. Or we know very fast that
this is not going to win. And then after
two or three days, we might even cancel
the experiments, on to the next one. And then, also, it was
very important for us when we were
learning that we need to prepare all the concepts
before the developer [? somehow ?] comes
with the first question, because this is a
way to get slow. So you say, OK,
here’s my experiment. And then, actually, you didn’t
think about all the details. And then the developer
starts asking. And then, again, you lose
four days until the developer has all the answers. So we tried to
build up a backlog of really ready experiments,
and not like we make this– it’s 90% ready, and
then the 10%, no problem when the developer comes. So we’ll try to avoid that. Really a backlog of two or
three really ready experiments. [APPLAUSE]

Leave a Reply

Your email address will not be published. Required fields are marked *