Analytics in Action | SAS Technology Connection from SAS Global Forum 2019

Analytics in Action | SAS Technology Connection from SAS Global Forum 2019


MODERATOR: Welcome to the
SAS Global Forum Technology Connection. Please welcome the Chief
Operating Officer and Chief Technology Officer of
SAS, Oliver Schabenberger. OLIVER SCHABENBERGER:
Good morning. Good morning, and welcome
to the Technology Connection at SAS Global Forum. I’m Oliver Schabenberger,
COO and CTO of SAS and your emcee for the morning. And I want to reveal one
big secret right up front– I do own a pair of jeans. The theme of the conference
is “Analytics In Action.” And during the
next 90 minutes, we want to show you exactly that– how we solve important problems
using data and analytics using SAS technology. I speak to organizations and
customers around the world, and many conversations
have a common thread. My industry is going
through a transformation, digital transformation. Physical assets, books,
cars, computers, stores are turning into bits and bytes. The world is drowning
in data, and we’re not taking advantage of it. We know we have to
do something, and we know we have to get it right. But we can’t find the
talent to implement a data-driven business. We do not know where
and how to get started. And now there is extreme hype
around artificial intelligence and machine learning, the
secret weapon in this fight. But my organization is not
wielding that sword yet. Are we falling further behind? We do not want to
add to the hype. We do not want to
add to the confusion. Over the next 90
minutes, we want to make analytics,
machine learning, and artificial intelligence
real, bring it to life. Yes, AI is overhyped. But it’s also real and powerful. Many technologies
and methodologies are today swept under the
AI umbrella, and that’s OK. Someone quipped that,
quote, “You only call it AI until it becomes useful. Then you find
another name for it.” Our current form of narrow
artificial intelligence is data driven. And that distinguishes
this era of AI from the approaches in the past. We try to create
machine automation through handcrafted knowledge
systems, expert systems where software developers pour
our expertise into machine instructions. And that works well
for systems that are defined by clear rules. When the task is
to capture logic, it’s not to interact with a
complex and dynamic world. The incredible
improvements we experienced in computer vision and
natural language understanding in just the last decade are
based on a different approach. We worked for years
on handcrafted models for object detection, facial
recognition, natural language translation, and so on. And despite honing
those algorithms by the best of our
species, that performance does not come close to what
we can accomplish today with data-driven
approaches, approaches that let algorithms discover
patterns from data rather than coding logic. The powerful message
here is not that machines are taking over the world. It is that we are learning
that we can generate tremendous value by unlocking
the information, patterns, and the behaviors that
are captured in data, that we are understanding that
this is a new era of machine automation governed
by algorithms that are derived from
data or that have shaped themselves iteratively. And we are learning how to
use this power at scale, how to apply it across
an enterprise. During the next 90 minutes, you
will see analytics in action. At the center of the
technology connection this morning and
throughout the conference are analytics, data, the
SAS user, and not hype. I’ve been part of many
changes and transformations in the analytics market
and in our software. Our innovation is
customer driven, innovate to meet your needs, and
create tools and solutions that help you innovate. And for this to work, we have
to communicate, work together, talk to each other,
learn from each other, exchange openly what
works and what does not. Your feedback is
all important to us. Here to recognize
one of our SAS users with the Annual
User Feedback Award is Annette Harris,
Senior Vice President for Technical Support at SAS. Annette? [APPLAUSE] ANNETTE HARRIS: Hello. Thank you for being with us
at SAS Global Forum 2019. The theme for this
year’s conference is “Analytics In Action,”
and our winner today is a perfect example
of someone who is visionary in examining ways
that artificial intelligence and machine learning can be
used for demand forecasting and planning capabilities. He provided input that resulted
in the creation of a new demand capacity, Assisted
Demand Planning, that uses machine learning to
boost forecast of value added. He has shared ways that his
company is using SAS Forecast Server and SAS Demand-Driven
Planning and Optimization. He has also engaged with
our product management at SAS to discuss
functional requirements for the upcoming Demand
Planning Solution on SAS Viya. He also led the deployment of
SAS Solution to 48 countries globally. So on behalf of SAS, I am proud
to present the 2019 SAS User Feedback Award to Dr.
Davis Wu of Nestle. [APPLAUSE] DAVIS WU: Thank you very much. Thank you, Annette. I’m really glad my
contribution adds value to SAS product developments. In Nestle, SAS has
become an important tool for everyday process in
demand planning globally. And some of the
users and supporters are with me this morning. In fact, the success attributes
to the great teamwork in Nestle. Today I would like to thank
especially my sponsor Oliver Gleron, who is
also with us today and will be in a panel
discussion tomorrow. Also thanks to the support from
Nestle IT, Francois, Raghav, and from SAS, Jonathan Riches
and many of your colleagues. Thank you very much. Thank you. MODERATOR: Congratulations
to the 2019 SAS User Feedback Award winner, Dr. Davis Wu. OLIVER SCHABENBERGER:
Thank you, Annette. Thank you, Davis. Earlier I mentioned
customer driven innovation. We want to innovate
to meet your needs, and we want to empower you to
innovate with our products. Organizations are learning
about the power of analytics, and we are learning about
their needs for applications. Working together, we
can generate value for the organization, for its
constituents, and customers. It is especially rewarding
when that collaboration has a positive effect on lives,
affects them, maybe improves them, maybe even saves them. Analytics is an
opportunity and a necessity in the transformation
of health care. About two years ago, we
partnered with the Amsterdam University Medical Center
to use computer vision and predictive analytics
to improve care for cancer patients. Ladies and gentlemen,
please join me in welcoming Dr. Geert
Kazemier, Professor of Surgery and Director of Surgical
Oncology at the Amsterdam University Medical Center. Good morning, Geert. How are you? GEERT KAZEMIER: Good. Good to be here. OLIVER SCHABENBERGER:
Nice to see you. Geert, thank you so
much for being with us. Thank you for the
partnership and for being here at Global Forum and
sharing with the audience about the important work
you’re doing at Amsterdam UMC. Tell us about the medical
problem we’re trying to solve and the kind of patients
we are trying to help. GEERT KAZEMIER: Oliver, in the
product that we do together, the patients that we’re aiming
to help have what we call colorectal liver metastases. So those patients have
large bowel cancer, and the cancer has
spread to the liver. Colorectal cancer is about
the third most common type of cancer in the Western
world, and those metastases occur in about half
of the patients. So half of the
patients, the tumor does not stay in the large
bowel but travels to the liver. OLIVER SCHABENBERGER:
Well, you have to let that sink
in for a moment. One of the most common
cancers worldwide, and half of the patients
experience liver metastases. What sort of treatments are
prescribed for these patients? GEERT KAZEMIER: We have
several, but the best available treatment for these patients
is surgical removal, resection of the tumor. That’s my daily work. Unfortunately, it may not
be safe to do this resection initially because the
tumor is too large, or you have too many tumors. And those patients
can become resectable if we give them
chemotherapy upfront. So we give them chemo
first and then operate. And those are the
patients that we are focusing on in the project. OLIVER SCHABENBERGER:
So we’d like to focus on patients who
might undergo therapy to shrink tumors in order
to make them candidates for resection. Today, how do your
physicians assess whether a patient might be
responding to chemotherapy and is on the path to
becoming resectable? GEERT KAZEMIER: Now,
our radiologists do it. They use what we called
a RECIST criteria. RECIST is actually
an acronym that stands for Response Evaluation
Criteria In Solid Tumors. And to evaluate those
RECIST criteria, a radiologist selects two
lesions in a patient’s image, as shown on the screen. And for each lesion,
the radiologist manually measures its largest
diameter in the slides before and after the chemo. If the sum of the diameters
decreases by at least 30% after treatment,
that’s a good thing. The tumor shrinks. The patient is classified as
responding to the therapy. But if the sum of the diameters
increases by 20% or more, the patient is progressing. The cancer is progressing,
which is a bad thing. If it stays about the same,
the patient is called stable. And this classification
of a patient determines how we proceed
with the treatment. OLIVER SCHABENBERGER: OK. So I’m putting on
my data science head for a minute here to frame that
challenge that you’re facing. The selection of
a treatment path depends on the
classification of a patient as responding, stable,
or progressing. The classification is made
based on a rule-based system. The decision input is
measurements made manual for medical images,
actually one image. So it seems to me
that the radiologists have to make some subjective
decisions in following the RECIST guidelines, such
as which lesions to look at and which image slices. Also that RECIST criteria
does not take into account all the details we could have
available for modern scanners, like the 3D geometry
of the lesions. And finally maybe, maybe
we could develop a better predictive approach model to
predict patients respond better than summing just
diameters of two lesions. Am I on the right
track with this? GEERT KAZEMIER: Yeah,
you’re absolutely right. I mean, up until now
that was not possible. It’s just the two
lesions because more is too much work for them. The manual process
at this moment takes even more than 20 minutes
per scan for a radiologist to do. So we believe that those
medical imaging analytics that you guys have
at the SAS platform can provide alternative
criteria that are indeed more objective,
accurate, and automated. OLIVER SCHABENBERGER:
To make decisions based on data that are
objective, optimized, can be applied to all the data
because they scale, and can be carried out quickly and
consistently– that sounds to me like a win-win situation. Well, let’s see how we tackle
this problem with analytics and how much
progress we’ve made. Please meet Fijoy Vadakkumpadan,
Senior Staff Scientist in our Computer Vision Team. [VIDEO PLAYBACK] – I was a very curious
kid growing up, often tinkering with various
electronic and mechanical devices at my parents’ house. When I came across
computers in my late teens, that opened up an
entirely new world of things that I
could build and fix, these new things being
computer programs. And I haven’t stopped
coding ever since. A personal experience
that I had a few years ago changed the way I view my work. In 2015, my wife
and I were pregnant with identical twin girls. Towards the end
of the pregnancy, we were getting a detailed
ultrasound imaging exam almost twice a week. Because of all these exams,
we discovered early on that one of the
girls was not growing as fast as she should have. So we decided to move forward
with a planned C-section instead of waiting for
natural delivery, which would have been unsafe at that point. The C-section went
well, and we now have two healthy and
happy girls at home. If it weren’t for
medical image analytics, the outcome could have
been very different. And I’m very deeply
touched by that experience. The realization that I can– or my work can help make a
similar impact on someone else’s life is very gratifying. There’s no doubt
that medical imaging has revolutionized medicine. But at the same
time, this revolution has brought new
challenges to the clinic. A radiologist
typically has to look at thousands of images per day. And this is where my team at
SAS has stepped in to help. We have extended
the SAS platform to process medical images. SAS platform now
provides an environment where users can
build applications that convert medical
image data to insights that can drive decision making. My hope is that
this work can help improve the lives
of radiologists and associated health
care professionals. It may even help
save a life one day. [END PLAYBACK] OLIVER SCHABENBERGER:
Good morning, Fijoy. FIJOY VADAKKUMPADAN:
Good morning, Oliver. OLIVER SCHABENBERGER: Fijoy,
the team has been working on extending the SAS platform
for medical image processing, using it to develop applications
that can help oncological teams like Geert’s. What type of data have you
received from Amsterdam UMC? FIJOY VADAKKUMPADAN: Oliver,
we have received 3D CT images from Geert’s team,
and also RECIST data for a number of patients. The images are stored in DICOM
format, which as you know is the most popular
format used in the clinic. Geert’s team has also
provided contours of liver and lesions drawn
by expert radiologists on each of these scans. OLIVER SCHABENBERGER:
Well, can we take a look? Let’s see what it looks like. FIJOY VADAKKUMPADAN: Absolutely. What you see on screen is
a Python Jupyter notebook connected to SAS Viya. The example that I’m
going to show you is that of a female
patient who was 73 years old at the time
of her hospital visit. Maybe some of you
in the audience has a person like that near
and dear to you in your lives. On the screen are her
data from multiple sources loaded, integrated, and
processed all in SAS Viya. This is a 3D visualization
that I can interact with. The image slices that you see on
screen are three perpendicular slices from her CT scan. Along with the slices,
you also see the surface of her liver in transparent blue
and the surfaces of her lesions in orange. OLIVER SCHABENBERGER:
Geert, Fijoy’s application can capture these
highly detailed 3D geometries of the lesions
and the liver from your data. What are your thoughts on this
when you see these images? GEERT KAZEMIER: Yeah,
I’m very excited. I mean, it’s amazing to
see how far you guys came. Those details– patient
specific geometry is exactly the kind of information
ignored by the current RECIST worldwide, actually. I can’t wait– I have
to be honest– to see the new criteria we can come
up with and use those data. FIJOY VADAKKUMPADAN:
Sure, Geert. The first criterion
that we looked at was the total lesion volume
in each of these scans. We can compute quantities like
that using a specialized action in SAS Viya now. Let me run that action
and show you the results. What you see on the
x-axis are 10 patient IDs. And on the y-axis, we
have total lesion volumes. The blue bar shows the lesion
volume before any therapy. The orange bar shows the total
lesion volume after therapy. Now, therapy was continued
for some patients. And for those
patients, the green bar shows the total lesion volume
after continued therapy. It’s clear from this plot
that this volumetric captures the shrinkage of tumor
that occurs in most cases during therapy. OLIVER SCHABENBERGER: OK. That sounds great. It looks like we’re
trending in the expected direction with the criteria. I assume that this
total lesion volume is more accurate than just a
RECIST diameter, because we’re working with a 3D volume. Do we have any
quantitative evidence for how this might improve
evaluation of the treatment response? FIJOY VADAKKUMPADAN: We do. What I did was to
take each volume value and from that calculate the
diameter of a sphere that has the same volume. Let’s call it the 3D diameter
and look at the results for an example patient. On the screen are data from
a 69-year-old male patient. On the left, you see
his RECIST diameter going from 32 millimeter
to 24 millimeter. That is about 25%– that is exactly 25% reduction. Now, that didn’t quite
meet the 30% threshold that RECIST has to be
considered responsive, so he was classified as
stable by the radiologist. Now look at his 3D diameter. It goes from 33 millimeter
to 23 millimeter, which is about 30% reduction. If we use the same threshold
of 30% for this new metric, he can be classified
as responsive. GEERT KAZEMIER: And
that’s very important. That actually can
be life changing, because we know that patients
who we call responsive, they can benefit from surgery. And patients that we
call stable cannot. So this can save lives, since
we know that chemo alone can never cure a patient. We most certainly
need to investigate this new metric for the test. OLIVER SCHABENBERGER:
So we have a new metric that is potentially more
accurate than RECIST. But it’s still based on manual
delineations of the tumor boundaries, which means
that it does doesn’t quite address all the
limitations of RECIST, in terms of the
subjectivity of the work. Do you have anything
that will address those particular limitations? FIJOY VADAKKUMPADAN: That’s
a good point, Oliver. I want to show you preliminary
results of applying the object detection capability of SAS
Viya for response assessment. First I took the
pre-processed data that I showed earlier
and generated bounding boxes of lesions in all slices. Take a look. What you see on the
screen our example slices with rectangles
around tumors. Using these data, I trained a
convolutional neural network based deep learning
model in SAS Viya. Let me show you a plot that
illustrates the training process. OLIVER SCHABENBERGER:
So to make it clear, these are the bounding boxes
determined by radiologists. FIJOY VADAKKUMPADAN: Yes. OLIVER SCHABENBERGER: Now we’re
training a computer vision model on that data. FIJOY VADAKKUMPADAN:
Based on that. The last function here on y-axis
is the objective function that is minimized during training. You can see that it
gradually decreases with the number of epochs
on the x-axis, which is the number of passes
through the training data, indicating convergence. This is what you want to
see when you train a model. Now, let’s score this trained
model on a set of test slices. OLIVER SCHABENBERGER:
So now we’re looking at how well the
model you trained performs. FIJOY VADAKKUMPADAN: While
the model is running, it’s called TinyYOLOV2. It has nine convolutional
layers and about 11 million parameters. It looks like the model
has finished running. Let me scroll it
up so you can see. What you see on the
screen are results of automatic lesion
detection performed in SAS Viya on some example slices. OLIVER SCHABENBERGER:
This is very impressive. So we now have an
AI model trained. What impact here does this
have such an automatic metric we can derive from here? Like, this one have
teams like yours. How can this be
deployed in the clinic? GEERT KAZEMIER: Yeah,
first such automation will save those radiologists
a lot of time– as I explained to you earlier,
20 minutes per scan. This is very important, given
that some of our radiologist spent about a third of
the daily work on RECIST. OLIVER SCHABENBERGER:
A third every day. GEERT KAZEMIER: And I
can’t share the secret with you, Oliver. They don’t consider
these measuring tasks the most inspiring part of
their job, as you can imagine. And secondly, it provides a more
objective response assessment metric that will help us to
treat patients consistently. I’m very, very, very
impressed with the results. FIJOY VADAKKUMPADAN: We
have actually a plot that shows the objective metric. What I did was to take these
bounding boxes and then calculated a single lesion-sized
metric for each scan based on the side lengths
of the bounding boxes. Let’s call this
the YOLO diameter and look at the results
for all patients. Again, on the x-axis,
you see the patient IDs. On the y-axis, now we
have the YOLO diameter. The colors have the
same meaning as before. You can see that this new metric
captures the shrinkage of tumor that occurs during
therapy in most cases, just like the 3D volumetric
that we looked at earlier. What you’ve just seen
is a demonstration of the value proposition of
Viya in medical image analytics, specifically its ability to
support applications that can almost fully automatically
go from raw images to objective metrics that
may be used in the clinic. OLIVER SCHABENBERGER:
That’s wonderful. Here, looking ahead,
it seems to me that the new criteria we’re
developing and deriving, you have applications beyond
colorectal cancer and liver metastases and
colorectal cancer. Where do you see
applications outside of this? GEERT KAZEMIER: Yeah,
most definitely. First, the new
criteria we’re deriving may be applicable to
other solid tumors. I mean, this is just a use
case that we came up with– other tumors, like breast
cancer, lung cancer. And secondly, some
of those new criteria by themselves, or in
combination with other data– I could imagine genomic
data, your DNA, or whatever– may help us to predict outcome
of surgery and overall patient survival much better
than we do now. And such predictive analytics
is extremely important to us. We know that not all
patients respond to surgery or chemotherapy equally well. OLIVER SCHABENBERGER:
Yeah, we’ve made some great
progress here to develop more reliable and repeatable
metrics for medical images. It helps with automation,
saving precious time of medical
professionals, and when we talk about artificial
intelligence augmenting us, supporting us, making
us better at what we do, this is exactly what
we have in mind. But we’ve really only scratched
the surface of what’s possible. Geert, I totally agree. Predictive analytics based on
combining better intelligence about medical images with
other sources of data, genetic information,
environmental information, is the next logical
and important step. And personalized medicine,
reliably predicting what will happen to the
patient rather than to an average patient–
that should be our goal. Fijoy, where can
we find out more about medical image
analytics on the SAS platform and the SAS partnership
with Amsterdam UMC? FIJOY VADAKKUMPADAN: We
have two breakout sessions on these topics, one presented
by myself and Dr. Joost Huiskens, and another
by Dr. Xindian Long. Please check them out. OLIVER SCHABENBERGER:
Geert and Fijoy, thank you very much for being
with us today and for the very important work that you do. FIJOY VADAKKUMPADAN:
Thank you, Oliver. OLIVER SCHABENBERGER: Good job. [APPLAUSE] Ladies and gentlemen, you just
experienced the following– medical image
processing in SAS Viya to improve estimates of
tumor lesion size and volume, augmenting a clinician by
applying a machine learning model, and the
power of combining data sources in the service
of predicting health outcomes. In this demo, Fijoy worked with
an artificial neural network to recognize tumor
lesions on those images. And while the algorithm allows
us to process more images, extracting better
information faster, such automation also
raises important questions. Are the algorithms reliable? Can they be trusted? Are they performing as
expected and anticipated by their designers? Are they equally accurate
for men and women? Are factors that
matter accounted for? Are protected classes
indeed protected? Saying that software
works as coded has never been an acceptable answer. All software works as coded. In this era of machine learning
and artificial intelligence, we must rethink our
approach and ask whether algorithms
work not as designed, but do they work as intended? A set of data is a
snapshot of the world. It does not tell us
how the world works. Take all the patient
data in the world, and algorithms can find patterns
and correlate conditions with outcomes. But they cannot learn medicine. The desire and need for
transparent and fair decisions naturally leads us to questions
about interpretability, explainability, and
bias of algorithms. None of this is, new but
it is amplified today because of the speed and
the scale with which we can automate human tasks
and the new domains, as you have just
seen, into which data automation has penetrated. We rightly want to
know how we fare when important decisions
about our lives are arbitrated by technology
that is outside of our control. A poorly placed ad is
much less consequential than a misdiagnosed disease
a college admission denied or financial reputation
harmed by misrepresenting a disadvantaged group. Interpretability uses a
mathematical understanding of the outputs of a
machine learning model. How does the model react
to changes in the inputs, for example. Explainability goes
further than that. It involves full
verbal explanation of how a model functions,
what parts of it were derived automatically,
what parts were modified in post-processing, how does
the model meet regulations, and so forth. Here to discuss and demonstrate
model interpretability and bias is Xin Hunt, software developer
in the AI and Machine Learning R&D at SAS. [VIDEO PLAYBACK] – The first time I really
got interested in software was in college. I was in an engineering
degree, so we had some programming classes. One of the classes was
teaching compiled language. It was really fun and really
got me interested in software developing. I think what I’m doing,
what I’m building, is going to have a big impact on
the future of machine learning because in order
for general public to accept certain tools,
these kind of models, for the society to accept it,
you have to understand it. And also it’s really fun. I like working with
the people here. We have a wonderful,
dedicated, hardworking group of people who are super-smart. And ever since I came
here as an intern, I felt like it’s a great
group to work with. Everybody was so
friendly and so smart. And all our products
are vigorously tested, so we know it’s going to be easy
to use and robust and reliable. One thing about SAS software is
so much dedication innovation goes in there. We have whole groups working
on the cutting edge machine learning and AI algorithms. It’s also, I think– SAS software is for everyone,
from novice practitioners to data scientists, very
senior data scientists, you can always find a platform
that suits the best for you. [END PLAYBACK] OLIVER SCHABENBERGER:
Good morning, Xin. Welcome to the stage. Xin, this is your first
Global Forum, right? XIN HUNT: Yes, very
excited to be here. OLIVER SCHABENBERGER:
Way to start out. Xin, many machine
learning models and AI models we are building today
are not easily understandable. We cannot just look at their
parameters and figure out what’s going on and
make sense of it. And it’s these type
of models that we want to focus on right now. Xin, how would
interpretability help the radiologists,
the clinicians, in the lesion
detection application Geert and Fijoy just showed us? XIN HUNT: I’d love to
tell you all about that. But before that, let’s take a
step back and take a quick look at one of the difficulties
detection algorithms tend to have. So if you look at
the demo right here, we’ll see that for each of
the lesions the model detects, it gives you a
probability the model decides the legion
actually exists there. So this means in the
model the algorithm has to set a threshold. And in the end, the model
only shows you a bounding box if the probability is
higher than that threshold. This is tricky to set. OLIVER SCHABENBERGER:
So we could have, depending on what you
said, more false positives. So we might miss a lesion
that actually exists. How can we mitigate that risk? XIN HUNT: Yes. So let me show you
an example first. Here in the middle, you
see the ground truth labeled by the clinician. On the left and right,
we intentionally set the threshold a
little bit too high and a little bit too low. And you can see
that in both cases, you’re met with either false
positives or false negatives. OLIVER SCHABENBERGER: So
what do we do about this? How do we set those thresholds? XIN HUNT: Exactly. OLIVER SCHABENBERGER:
Or how do we explain how those images are detected? XIN HUNT: Right. For these cases for
medical applications, it’s extremely tricky because
even a small number of mistakes is dangerous. So what we really need is a
clinician’s final decision and judgment. So luckily for a good
model, most of the mistakes are made right
near the threshold. I call those the marginal cases. As you can see on the
right, the marginal cases tend to have low contrast
and irregular shapes. Those are best recognized
by a trained professional. So what we want to
do is have the model take a look at the
images first, label those it’s confident
about in green as lesions directly, and
pass on those marginal cases to the clinician so they
can make the final decision. OLIVER SCHABENBERGER:
It’s almost like giving the clinician
a virtual assistant. The model explains,
or tries to explain, what it sees in the image. XIN HUNT: Exactly. It’s like an
assistant– actually, let’s fire up our
assistant here. In this assistant
here, we combine the capability of
model interpretability and our natural
language generation to generate a short
report for the clinician. OLIVER SCHABENBERGER: So
you’re running Shapley method in SAS Viya. XIN HUNT: Yes. The Shapley method
we’re running here, we actually call it HyperShap. It’s a patent pending algorithm
we developed here at SAS. We patented this very scalable,
accurate model agnostic explainer based on Shapley
values, which gives you an idea how each variable–
or in this case how each pixel– contributes
to the final decision made by the model. OLIVER SCHABENBERGER: And
without those performance improvements, without
that scalability, we would not be able actually
to automate that virtual system that you’re showing us now. XIN HUNT: Right. OLIVER SCHABENBERGER:
So the results are back. What do we see here? XIN HUNT: Let’s take a look. The report says, hey, I
found two lesions here with high probabilities. So I labeled them directly
in green on the left. There’s one more area on
the top of the image labeled in orange, because I’m
not super-sure about it. The red pixels in the
explanations in that area shows why the model thought
there could be a lesion. OLIVER SCHABENBERGER:
And the text, where does that come from? XIN HUNT: That is from the
natural language generation tool I was talking about. It can be changed to fit
any type of application we’re running. OLIVER SCHABENBERGER:
So we want to reduce the workload of the
clinicians by doing an initial pass with the model. But why does the clinician need
to know what the model thinks? XIN HUNT: There
are a few reasons. First of all, you see
that the marginal cases we are passing on
to the clinicians tend to have low contrast, and
it’s hard for really anyone to see. So if we can highlight here,
where’s the red pixels, and show where the
area really is, the clinicians can
make a decision faster and more reliable. It also– OLIVER SCHABENBERGER:
Yeah, go ahead. XIN HUNT: It also works
as a feedback loop, where the explanations– if the
model makes a mistake, the clinician can send
the explanations back to the person who
built the model, and it can potentially be used
to figure out what went wrong and to further
improve that model. OLIVER SCHABENBERGER: That’s
a very important point. When we talk about augmentation,
it’s not just the machine augmenting us. It’s also us
augmenting the machine. It’s really
augmenting both ways. XIN HUNT: Yes. OLIVER SCHABENBERGER:
That’s exciting. That’s wonderful. So we have a model that now
makes itself interpretable. The computer vision
model explains its eyes, both visually
and in natural language. Let’s shift gears a little bit. And I’m going to take on
a different persona now. I’m in charge of college
admissions at a university or in a county or state. And I’m thinking about
using machine learning– machine learning to gauge
maybe a student’s propensity or aptitude for college. And I’ve heard there’s some
really cool machine learning stuff out there in AI. And so I asked the
data science team to come up with a
model, which they did. And they handed it to me. They said, it’s a gradient
boosting thingamajiggy. It’s really, really cool. I don’t know what that means. So should I not deploy
this model for real? Should I use this
to score students and use this in
college admissions? Xin, you’re my ethicist. Thank god you’re here. Tell me what I do
with this model. XIN HUNT: Sure. So let’s first load the
model and take a look. So now we have the model. The first thing we
will want to see is what is in there,
what variables are contributing to the
decision process of that model. So what we do is we
run partial dependence to analyze all the potential
variables that possibly would be used in the data set and take
a look at their contribution. OLIVER SCHABENBERGER: All right. We’ve got a graph back. What does that tell us? XIN HUNT: We see in
the data set there are five relevant
variables, including SAT score, the highest
math class the student took in school, GPA, extracurricular
activities, and high school ranking. The analysis found that
out of the five variables, four of them have significant
contribution to the decision making process. And this one variable, the
high school rank variable, does not affect the
model very much. So it’s probably not
being used by the model. OLIVER SCHABENBERGER: Oh, and
I see you used natural language generation to help me
actually understand what that graph says. That’s great. So that makes sense to me. I see the probability
for college admission depends on your SAT score,
goes up with an increasing SAT score. That makes sense. I feel more comfortable
now about this model. But I still don’t quite
know how it works. What would happen if I applied
this model to the students? XIN HUNT: So one thing
we will want to see is if the model is fair
and unbiased, especially towards different
groups of people. So here we have,
say, two counties, and we want to make sure that
the model is behaving fairly to the students from them. So what we run here– OLIVER SCHABENBERGER: So we
have sort of an expectation how the model should behave. And now we’re
comparing the reality against the expectation. XIN HUNT: Yes. Here I ran two things. On the left is the ICE
plot, Individual Conditional Expectation. On the right is the partial
dependence plot by county. So on the left, each
line is an individual, how their probability
of admission would change if you changed
your SAT scores and holding everything else constant. On the right-hand side
is the group average. So what we see here, there is
actually a small discrepancy between the two groups. OLIVER SCHABENBERGER: Well,
I don’t know– what was that, Individual Conditional– I don’t know what that
means, Individual Conditional Expectation. But I can look at the
plot on the right, and I’m not comfortable. So if students have the same
SAT score– say, 1,000– then if they live in County B or
going to school in County B, they are less likely to
get into college compared to a student in County A. XIN HUNT: Yes, that’s what
the explanation for the model is saying to us. OLIVER SCHABENBERGER: I
would not have expected that. We’ll provide the
same resources, we have the same quality
teachers in the counties. What could explain
that difference? XIN HUNT: Well, since our
models are trained on the data, usually we want to find out what
was causing it from the data. So the first step is to
take a look at our data and see what’s different between
those two groups in the data, and that will give us an idea
of why the model predicts different
probabilities for them. Here I plot– on the left
is the mean difference between the two counties,
using County B as a baseline. We have four dots, four
different variables. And we see, out of the four
variables used by the model, three of them are
pretty similar. Their difference
is close to zero. And only one
variable stands out. It’s the highest math level. County A students tend
to have highest math level than County B. OLIVER SCHABENBERGER: Oh, OK. I see what’s driving this. If you take higher
math classes, then this is a contributing factor to
increasing the probability that you get into college. XIN HUNT: Right. OLIVER SCHABENBERGER: But I
would not have expected that, because I thought that the
math levels we’re offering in the counties are similar. XIN HUNT: Well, there
are two possibilities. One is the two counties
are actually offering different educational programs. In that case, you would
want to change the model to include that county
information so we don’t penalize students
from County B by just being in a different county. On the other hand, if
the assumption– or we know that two counties are
offering similar classes, students are taking
them but we are seeing a difference
in the data, then that means we
could be collecting data that’s not representative
of the student population. OLIVER SCHABENBERGER:
So now we’re starting to talk about the
root cause of a model deviating from our expectation. It could be the model is
wrong where the model needs to be corrected,
or the input data does not represent what
we really had in mind. And then should we
correct the model, or should we correct the data? XIN HUNT: It depends
on the assumption. Here we assume that
the data is bad because we assume the
two counties are actually offering similar classes. Students take them
similarly too. So we are seeing the
distribution difference in the student taking classes. Then we want to either
recollect the data, or if that’s not feasible
we balance the data. OLIVER SCHABENBERGER:
I don’t have funds to go out and collect
data on all the students in all the counties now. But I see that
this is unexpected. Distribution of the students
in the highest math level should be the same. Can we just focus
on those students and add more samples for that? XIN HUNT: Yes, we can do that. We can resample the
data to increase the percentage of
County B students with high math classes, so that
the distribution between two counties are similar in the end. OLIVER SCHABENBERGER:
And we would have to retrain the model, then? XIN HUNT: Yes, we will
have to retrain the model. And we do that and plot out the
partial dependence and ICE plot again. On the left is the original
plots we saw earlier. And on the right is
after the data balancing, the two counties’ differences
are now very small. And basically they’re not
statistically significant. OLIVER SCHABENBERGER: Xin, thank
you very much for joining us and for demoing this morning. XIN HUNT: Thank you. OLIVER SCHABENBERGER:
It was wonderful. [APPLAUSE] Thank you. Well, should we
correct the model, or should we change the data? We just showed you how
to identify and correct potential bias in a model. I think there’s a very
important message here– that this is not a task that’s
left to the data scientist alone. It requires agreements
on policy, regulations, and a clear definition of
what success looks like, as well as an understanding
of the data we expect, what it should be representative
of on the data that we have. This is really a
conversation for all of us. Ladies and gentlemen, this
segment you saw the following– a complex computer vision
model that makes itself interpretable, a patent
pending enhancement to the popular
Shapley method that makes that
interpretability scalable, and how to examine and
correct data in a model for possible bias. Putting analytics into
action invariably requires automation of data flows, data
processing, and decisioning. We are dealing with
increasingly voluminous data, and automation
allows us to scale data prep and data processing. We are dealing with
increasingly varied data, unstructured data from
logs transcripts and voice recordings. Automating natural
language processing ensures that these data
are not left behind. And we are dealing with
increasingly complex models. Finding the best model
and its best parameters and hyperparameters is really
facilitated through automation. And maybe more most
importantly, we are democratizing analytics, and
allowing and enabling everyone to consume and to
produce analytics. The business analyst,
the field engineer, police officers at
headquarters and on the street should be able to produce and
consume right-time insights. Last night, during
the opening session, we introduced you to New Hanover
County in North Carolina, home of the city of
Wilmington and ground zero for the opioid epidemic. The extent of this
epidemic comes into focus when you think about
this statistic– 12% of the population of New
Hanover County, one in eight, are abusing opioids. This has huge
impact on children. With SAS Visual
Investigator on Viya, the Department of
Social Services can bring together
disparate data sources from law enforcement, case
management, 911 calls, and generate in near real
time rule-based alerts when a child’s risk
level has increased. Now, let’s kick this up a notch. What if– what if we could use
the historical data to develop a machine learning
model to predict a risk score for every child? And that score can
accompany the alert and helps the social
work to prioritize visits and follow-ups. How then could we automate the
modeling and deployment steps and derive a model that we feel
good about, a model that we trust? Here to put
analytics into action are Susan Haller, Director
of Advanced Analytics at R&D, and Dragos Coles, Senior
Machine Developer at SAS– Machine Learning Developer. [VIDEO PLAYBACK] – I have been at SAS for
20 years, over 20 years. So I’ve spent half
of my life here. And what I find exciting
is that every day I come through the door
I’m happy to be here, and I’m excited about
the new challenges that are presented to me,
working with my colleagues to come up with creative ways
to kind of solve those problems. – I mean, work is
one thing, right? Work is important. It’s important you like to work. But it’s probably
just as important that you like the people
that you work with. – We have created
a new product that allows you to build dynamic
and automated machine learning models. – If you want to do
machine learning, a data scientist would
go through multiple steps to be able to model and build
that final model, right? We’re taking all
that work and we’re hiding it behind one click. – This particular project
has been super exciting to me since day one. If you think about it,
we’re taking analytics and we’re making them
accessible to everyone. – You know, we talk
to a lot of customers who, when you mention
machine learning, they’re interested in it. They’ve heard the terminology. But they’re afraid of it, so
they don’t know how to get started. This is going to be an enabling
technology for those users. It’s rewarding when
you work on something that will be a real application
that somebody can use. So I’m not talking about things
that are just cool because they sound cool, but things
that are cool because they can have an impact. – At the end of the day, I hope
that the work that I’m doing helps our customers do their
job better and more efficiently, so make them more productive,
enable them to answer more complex business
problems, allow them to look in their data
and find information that may help them make a difference. [END PLAYBACK] [APPLAUSE] OLIVER SCHABENBERGER:
Susan and Dragos, thank you for joining us today. Before we start out,
I want to point out that the technology
you’re about to see is not yet in use by
New Hanover County. We are showing
technology that will soon be available from SAS. OK– Susan, your role is now
the senior data scientist, and you’re guiding
Dragos, a business analyst at the Department
of Social Services. Dragos, you are
about to be augmented by artificial intelligence
and machine learning. Good luck. DRAGOS COLES: I’m excited. SUSAN HALLER: Thank you, Oliver. As you’ve just
heard, we have been tasked with building a
machine learning model to generate and assign a
safety risk score to each of the kids who
are being followed by the Department
of Social Services. As you can imagine,
lots of people are interested in the
field of machine learning, but not everybody knows
how to get started in building such a model. With that in mind, our
team of data scientists has built a very simple
and custom web application that the business analyst
in the department, such as my colleague
Dragos, can use to get started building a
dynamic and automated model. So we’re going to spend
just a few minutes with you this morning walking through
building that model using this custom application while
at the same time walking you through each of the steps
that we’re executing underneath the covers. So Dragos, let’s get started. DRAGOS COLES: OK. So what do you want me to do,
just fill in these parameters? SUSAN HALLER: That’s it. DRAGOS COLES: That’s
simple enough. So assign a project name,
select a data source. SUSAN HALLER: So here,
the data science team has gone ahead and identified a
handful of tables that could be useful in this model exercise. Considering what we’ve been
asked to build, let’s go ahead and select the
child safety data. DRAGOS COLES: OK. And what’s our goal here? SUSAN HALLER: Now
that we have our data, we’re presented with a list
of variables in that data. And by goal, we’re
simply asking for you to identify the
variable that represents the goal or the outcome
that we’re trying to project in this model. DRAGOS COLES: OK. So in this case, we’re going
with their safety risk flag. SUSAN HALLER: That’s right. That’s it. You have now provided all
of the required information that I need for you to go ahead
and start building a model. All that’s left is for Dragos to
click that Build Model button. Behind that button is
a very powerful tool coming from SAS that offers an
API for dynamic automated model building. DRAGOS COLES: OK. So, I mean, this sounds really
simple, but what is an API? SUSAN HALLER: Ah. API– anyone can build
their own custom application as we’ve seen here based
on their business problem, while at the same time embedding
and leveraging SAS’ machine learning capabilities. DRAGOS COLES: Maybe
that’s too easy. I’ll just run it. SUSAN HALLER: Let’s run it. DRAGOS COLES: OK, so
right now machine learning is running behind the scenes. Does that include any
data preparation steps? SUSAN HALLER: Of course. Imagine, if you will,
that this API is simply emulating what I
as a data scientist would do if I had been
tasked with building this model by hand. So first I’m going
to explore my data. Are there any issues
that I need to resolve? Second, I’m going to iterate
through different data preparation techniques–
transformations, imputation, things such as that. And finally, I’m even going
to automate the building of features for you. DRAGOS COLES: OK. As a data scientist,
though, you have to consider different
type of models when you want to build
the best model, right? What’s available here? SUSAN HALLER: So
the API is obviously going to consider a variety
of different models, finding the best model
type for your data. It’s going to look
at things like radium boosting models, neural
networks, random forest to name a few. DRAGOS COLES: OK, sounds good. But one thing that I heard
about data scientists working on projects
like this is they go through this
iterative process of data preparation, some
feature engineering, and then more modeling. Is that iterative process
running behind the scenes? SUSAN HALLER: This is
where the intelligent part of the automation
comes into play. So at each step along
the way, the API is going to
continually reassess. It’s going to add
steps to the model. It’s going to remove things
that are no longer necessary. It may go back and
revisit existing steps and make modifications to them. And when the API is happy
with the data preparation and the model that
it’s built, it goes one step further and
creates an ensemble model, trying to improve our
overall model accuracy. DRAGOS COLES: Wow, Susan. I mean, it really
sounds like what we have here is a data scientist
behind the click, right? It’s kind of you
behind a button. SUSAN HALLER: I guess
you can say that. And in just a few
short minutes, you can see here that as we walk
through each of the steps that we’re running
behind that API, Dragos has gone
ahead and created a model that helps us predict
that safety risk score. DRAGOS COLES: OK. Now, we got all this
output from the API. Since I’m new to
this, let me see if I can understand
what’s happening here. If we look at the project
summary which, top left side, seems like we’re getting
a summary of the project, but it seems a little bit like
this text might be dynamic. So it was telling
us that our model is based on the KS statistic
on the Test partition. We have an accuracy
rate of about 90%. SUSAN HALLER: I’m
glad you noticed that. Worth mentioning, included
in this automation process is natural language generation,
where we’re dynamically building this text for you based
on your model and your data. DRAGOS COLES: OK. If I look over to
the right side, I see that our best model
is a gradient boosting model with 10% misclassification. On the bottom left, the
most important variable plot seems that this is listing our
predictive attributes, sorted by relative importance. And looking at these attributes,
I can understand some of them, because I know the data. So we have school reports
in the last 60 days. We have the parental
attachment score. I can intuitively understand
where these prefixes are coming from, like impute or transform. This PC1 and PC3, I’m pretty
sure those variables are not in the original data. You know, I really
wish I could see what happened behind the scenes
so I can understand where these things are coming from. SUSAN HALLER: You are in luck. So if you will, go ahead and
select that Open Pipeline link at the top of your application. Now, when Dragos executed the
API to build his dynamic model, he also created a new
project in a SAS product called Visual Data Mining
and Machine Learning. Visual Data Mining
and Machine Learning provides a very nice visual
representation and editable representation of each of
the steps of the model that was created for us. DRAGOS COLES: OK. So you’re saying that the
process is transparent and now this
project is editable? SUSAN HALLER: That’s
exactly right. And remember, dynamic as well– so data specific. Had Dragos selected a
different data source or even a different goal
for that matter, this pipeline could
look vastly different. DRAGOS COLES: OK, let’s
go through this pipeline a little bit. It looks like the orange nodes
are data pre-processing nodes. So we see we have
some transformations, we have Variable Selection,
Imputation, Feature Extraction here. I mean, this is
fairly intuitive, just understanding the process. SUSAN HALLER: And it’s
these exact data preparation steps that resulted
in those variables that Dragos inquired
about just a minute ago in his variable
importance listing. The feature extraction
node, for example, is running a principal
component analysis. And that principal
component analysis is creating some
new features for us that were labeled
PC1, PC2, and PC3, and we found those as
significant in our model. DRAGOS COLES: OK. Looking further down, we
have our modeling nodes. It looks like the green
ones are the modeling nodes. You mentioned that the
project is editable, right? So if I select a node, now I
get a property panel over there on the right side. I can edit those properties? SUSAN HALLER: That’s
exactly right. So here we’re looking at
the properties associated with the Gradient
Boosting model. But every node in our pipeline
has a similar property listing. Not only do you see the
properties themselves, but you also see
the optimal value for each property that was
selected by the automation process. So I, as a data scientist,
if I wanted to come in here and start changing
things, see if I could make some modifications,
could easily do. So for example, I
might want to see if I could reduce the complexity
of my gradient boosting model while at the same time
retaining the same accuracy. The optimization process
selected 75 trees from a gradient based model. Dragos, why don’t you go
ahead and change it to 50? You see we can easily do
this, he can rerun the node, and update the model. DRAGOS COLES: So
what if I want to add a new node in the project? Can I do that? SUSAN HALLER: Of course. So just like you can edit
the properties to update your model, you can
also insert new steps. And you can do that by dragging
nodes from the tools palette that he has expanded here into
any step within your pipeline. So it’s a very editable
process, also very flexible. If you notice,
there are two nodes listed on the palette that allow
you to inject your own custom code. That custom code can be
SAS-based code, obviously, or it can be open source,
if you want to include R or Python into your model. DRAGOS COLES: OK. I mean, we have a
project that gave us a good model we’re
happy with, right? So how are we going
to give this model and put it in the
hands of the consumer so they can start making
a more informed decision? SUSAN HALLER:
Excellent question. Obviously we all know
that building the model is only the first
step in the process. It’s just as important that
we’re able to deploy this model and get the model into
the hands of those who want to consume it. So at this point,
Dragos, let’s go ahead and leave the SAS Visual Data
Mining and Machine Learning product and go back into
your custom application. You see a Deploy Model button
embedded in this application. Why don’t you go
ahead and click that? DRAGOS COLES: OK. Is this another one of those
APIs you were talking about? SUSAN HALLER: Of course. Just like we had a button that
allowed us access to an API for dynamic and
automated model building, we have embedded a
similar button here that surfaces another SAS API
for one click model deployment. DRAGOS COLES: I mean, Susan,
I’m really excited about this. In about 10 minutes,
you showed me how to leverage machine
learning behind the scenes with the click of a button. I can open that project that
gets created behind the scenes. I can use it as a learning
tool or as a prototyping tool, and then we deployed a
model also fairly easily. I feel really enabled. Thank you. SUSAN HALLER: I’m happy
you’re excited about the API. More importantly, that
something like this will enable and empower
Dragos and other data analysts in the department to continue
building models such as this in the future. And if you consider
our specific use case, imagine now that when
an agent in the field gets an alert that a child
needs a follow-up visit, that alert is now augmented
with a model-based risk score indicative of their safety. DRAGOS COLES: Wow. Awesome. OLIVER SCHABENBERGER:
(SINGING) Happy birthday. Happy birthday to you. Happy birthday, Dear Susan. Happy birthday to you. SUSAN HALLER: Thank you. OLIVER SCHABENBERGER: Well done. And happy birthday. SUSAN HALLER: Thank you. OLIVER SCHABENBERGER: Dragos. That was amazing. And DCSH County is quite
advanced in its use of machine learning. Of course, it’s a
fictitious county named after Dragos
Coles and Susan Haller, but there’s nothing fictitious
about the application or the demo. Susan and Dragos,
thank you very much. Ladies and gentlemen, you just
experienced the following– automating the
iterative construction of a complex machine
learning model in 10 minutes by simply calling one API;
transparency of the resulting model– you can examine, you can
understand, you can modify; and deploying a final model just
as easily by simply calling one API. Digital transformation
and analytics are not science projects. While pilot projects
and POCs are important to prove
feasibility and ROI, the goal is to impact the
organization positively by increasing revenue,
lowering costs, raising safety, maybe by launching
a new business. And there are many
barriers to success in data-driven initiatives,
chief among them lack of talent, lack of data of
the right quality and quantity, difficulty
operationalizing analytics, taking it from the
science project to operational excellence. Susan and Dragos
showed us how SAS helps overcome these barriers. Automation of the
model building process, automation of the
model selection process through challenging
existing models, automation of the data
preparation and feature engineering steps, abstraction
of steps that previously required deprogramming expertise
and deep analytic expertise, choosing your desired
level of automation from an open API to
a visual interface to programming interfaces. We call this
intelligent automation. It is data led,
dynamic, transparent, and you can look under
the hood any time. Automation does not
mean to look away. Automation does not mean
you cannot intervene. It is not the same as autonomy. Analytics is not
a science project, and it is not the domain of
only statisticians and data scientists– not anymore. Everyone can contribute,
, everyone can consume, everyone can produce. We’ve just now developed and
deployed predictive analytics. For each case and child, we
can predict a risk score. Why have we not yet fully
operationalized the model yet? How do we put it in
the hands of the users? Please meet our next
contestant, Sebastian Charrot, Senior Manager in our
Scottish R&D team. [VIDEO PLAYBACK] – I recently became a dad, so
I don’t have much spare time. But when I do I like to do
a bit of art and drawing. My dad was a cartoonist for a
number of French newspapers, so as soon as I
could hold a pen, I was trying to imitate him. And there’s something quite
satisfying about the emotion you get when you’re
really deep in drawing. It’s quite similar to the
flow that you get when you’re solving a programming problem. If I think back to when I
first began the world of work after graduating, I
still remember the sense of deep satisfaction
of knowing that I was working on a
real product, solving real problems for real people. Once you get a taste for
that, it’s hard to give up. So the bigger police
forces currently raise around a million
intelligence reports a year. That’s a million trips
back to the office to raise the information that
they’ve gathered in the field. That’s a lot of waste of
time and effort and manpower. Having Mobile Investigator
means that you’re no longer desk bound to
access the information or the capabilities that
you need to do your job. It means maximizing the time
that you have in the field and allowing you to access
all those rich and powerful capabilities on the go. And it marks the
first time that we’ll be surfacing the operational
and investigative powers of Viya to users in the field. So it’s a big step. So we release a lot
of software at SAS. And it’s easy to fall into
the mindset of thinking about your work in terms of the
releases that you ship or the bugs you fix or the
features that you implement. But in reality, we’re
not in the business of delivering features. We’re in the business of solving
problems for our customers. I’m very fortunate
to be in a position where I think I know the
challenges that our customers face and actually have the
power to do something about it. I work with some of the most
wickedly smart, terrifyingly capable, generous,
and creative people, and it’s a real joy
to be able to build great things with them. [END PLAYBACK] [APPLAUSE] OLIVER SCHABENBERGER:
Seb, welcome to the stage. SEBASTIAN CHARROT:
Thank you much. OLIVER SCHABENBERGER:
Seb, what are the applications of the model
Dragos has just shown us? In the lab, we use
machine learning to detect and flag children who are
potentially at high risk. Now that we have the data,
what do we do with it? How do we make use of
it in the field, put it in the hands of
those who need it? SEBASTIAN CHARROT: Well,
SAS has a powerful suite of tools which allow our
users to triage alerts, manage their intelligence,
and then coordinate any investigations that
need to follow from those. Until recently, however,
access to those capabilities was limited to users sitting
at their desks in the office or the station, which
is why I’m really proud to announce that we
recently launched SAS Mobile Investigator, a mobile
application which surfaces the operational and
investigative powers of SAS Viya to users in the field. So if we pick up where
Susan and Dragos left off and continue our
scenario, let’s say that I’m a police officer
working in the Child Protection Unit. So it’s my job to liaise
with social workers, visit certain at risk children,
assess the situations, and then determine any
necessary course of action that we need to take. And how do I know who to visit? Well, using Susan
and Dragos’ model, we can generate
a number of tasks to visit the highest
risk children and assign those tasks to
myself and other officers in the field. So why don’t we just jump
in and see how it plays? OK, so on my home
screen here, you see at the bottom right hand,
I have Mobile investigator installed. So we’ll launch the app, and
we’ll sign into the system. Now, the first screen
you’ll see here will be the Mobile
Investigator homepage. It’s your one-stop shop for all
functionality in the system. And on the banner, you’ll see I
have a number of notifications. Now, clicking this will take
me to my prioritized task view. So that’s a view of
every task that’s been assigned to me in the system. OLIVER SCHABENBERGER:
So the model that Dragos and Susan
developed is already running? It’s prioritized your tasks
based on the risk score? SEBASTIAN CHARROT: Absolutely. So the highest
risk is at the top. So it looks like Jack is
indeed the highest risk child on my list. So we’ll click into his
records and have a look. So we have an address. We even have it
plotted on the map. So how about we
just go visit Jack? Now, there’s a button underneath
my map here to navigate. If I click that, it’ll
take Jack’s address and then launch my
external map app and show me a route together. Now, that’s the first of many
examples of Mobile Investigator tapping into native
capabilities to streamline things for its users. OLIVER SCHABENBERGER: Let’s
pause on this for a second, just so we can appreciate this. The only other way I could
have previously accessed all this information is
turn the car around, go back to the station, go to the
desk, do some research, and then head back out again. SEBASTIAN CHARROT: Absolutely. OLIVER SCHABENBERGER: A lot
of waste of time and effort, now eliminated by just
placing that information right into the hands of the police
officer or the child safety person. SEBASTIAN CHARROT: Exactly. Now, let’s say
we’re heading there. We jump into the
car, and my partner is driving using
those directions. And while we’re en route, I
want to do a bit of research to see what else
we know about Jack. So I can take a look here. There’s some basic details,
everything you’d expect. He’s a nine-year-old boy. I can see his
family details, so I know that Pete and
Jane are his parents. And crucially, I can
see the risk factors that have come to play to
determine Jack’s high risk score. So these are all things
that we should maybe be looking at which explain
our level of concern. And I could really
drill into those and appraise myself
of that if I wanted. Now, as well as all
this core information, I can also see
any documents that have been uploaded and
associated with this file. So I can see a couple of prior
social care visit reports, and we even have a
photograph of Jack. OLIVER SCHABENBERGER:
Recognize Jack? That was this COO/CTO
a few years ago. SEBASTIAN CHARROT:
So additionally, I can see that Jack’s
file here forms part of a much larger network
of information in our system. So he’s actually related to
other reports in our data. And a couple of things
jump out immediately. Firstly, I see that
Pete Marsh, his dad, is suspected to be in possession
of an unlicensed firearm. Now, that’s an officer
safety concern. And it’s going to
change my approach to how I carry out my task. OLIVER SCHABENBERGER:
So this information you’re receiving in this field
is now affecting, shaping how you approach the task. SEBASTIAN CHARROT: Absolutely. So I may choose to not
go into the premises. Or I may choose to bring backup. But regardless, I’m
aware and informed. So having this information
in the field ahead of time can save officer lives. Now, secondly, I can
see that Jane, his mom, was arrested only
yesterday on a DUI. Now, that’s timely and
relevant information which I need to have
access to and which is going to shape my overall
evaluation of Jack’s situation. And lastly, I see that
Jack has been involved in a number of school incidents
in the recent past which maybe I want to discuss with him
and his family when I sit down. OLIVER SCHABENBERGER:
So you visit to Jack. You conduct an interview
and an assessment. While the information is fresh
in your mind, what do you do? How can you record
your findings? SEBASTIAN CHARROT: Yep. There’s one last
piece of research I want to do before
we do that, and that’s a neighborhood search. So we know how crucial the
quality of a neighborhood is to the welfare of a child. So what I want to do is
click this top button to launch my
neighborhood search. It’ll take my current location. It’ll search my
immediate vicinity for any relevant intelligence
or investigations or incidents that could be of interest to me. So we’ll kick off that search. And actually, when
the results come back, I see there’s a fair amount
of drug-related activity in the neighborhood. So that’s also something
that’s going to factor into my overall assessment. So as you see, now it’s time
to raise a new visit report. So I’ll click a
button to do that, and I can start filling this
in to my heart’s content. Jack was fine. I’m always amazed when that
works with a Scottish accent. OLIVER SCHABENBERGER:
Technology is amazing. SEBASTIAN CHARROT: It’s amazing. Now, I can really
start fleshing this out with all the information
I’ve gathered during the course of my visit. And what you’ll notice is that
Mobile Investigator is also capturing and adding
in its own information to augment my report. So the visit date has
been set automatically. I’ve been set as the
reporter, as well as details of how to contact me– that’s not my real number– as well as the county
that I was in– in fact, the exact location that I was
in when I raised that report. OLIVER SCHABENBERGER: Yep. It’s about automating the
obvious, time-consuming, and possibly the
error-prone task. Why spend time on that? SEBASTIAN CHARROT: Yeah. And it provides crucial
context for the report that I’m raising. Now, if anything else
comes to my attention, I could always just take a
photo of it and upload that. So I’ll take a photo of
this terrific audience. But that could just as easily
be a picture of the neighborhood or drug paraphernalia
or really anything that I deem to be of relevance. So now all that information
is in the system. OLIVER SCHABENBERGER:
Maybe it’s obvious, but I want to point out
just how powerful this is. The information in your
report is now available using SAS Visual
investigator to everyone who has access to
SAS VI at the station or through Mobile
Investigator in the field. Systems are updated
in real time, not through an
overnight batch job. And with that new
information available, we can do additional reporting. We could even kick off
that modeling pipeline Dragos and Susan
developed a moment ago. Because the data collected
through our site visit might contain important
information and insights that might change how we do
the risk score calculations. SEBASTIAN CHARROT: Exactly. Now, imagine a world
without Mobile Investigator. I would have had to
go back to the station to raise that report, and
it might be one of a dozen that I have to raise every day. Having this app means that
I can make that information available to everyone
as soon as we know it and not as soon as
traffic or bureaucracy allows. OLIVER SCHABENBERGER: Indeed. For the first time we have
placed the power of SAS Viya into the hands of
operational users, allowing them access to data
and analytics wherever they are. The users can spend
more time in the field, are better informed,
better equipped, and can do their job
more effectively. Seb, great work. Thank you very much. SEBASTIAN CHARROT: Yes. OLIVER SCHABENBERGER:
I don’t want to imagine a world without
Mobile Investigator. [APPLAUSE] Ladies and gentlemen, you just
experienced the following– a highly flexible
application that can be customized for
almost any use case; real-time interaction
between back end and front line, analytics on the go; the
blending of systems of records, systems of engagement, and
systems of intelligence. You heard that term throughout
the morning, the model. We are building a model, testing
a model, deploying a model. Models are at the
heart of analytics, at the heart of data science. But they are no longer
just narrowly defined statistical models,
like a finite mixture or proportional hazard models. Today, models are
complex pipelines of data transformations,
data reductions, with internal tournaments
and ensembles of approaches. The input is data,
the output can be a report, a prediction,
a recommendation, a classification, and so on. How many models do you
have in your organization? Susan and Dragos
built one for us. Do you have two,
three, 400, 2,000? When you work with models,
some of the major challenges are knowing whether
they are still valid, how can I track their
version, their vintage? Is this model superior
to one developed in a different language
with different libraries? How do I move the model from
the sandbox into production? How do I deploy the
model in a data stream in Hadoop inside a database
or capture its end point with an API– I have models in
SAS, R, and Python. How do I manage them all? Model management
is a key ingredient in making analytics
real, in making analytics stick in operation. With SAS Model,
Manager, you can control the versioning of models,
compare them, test them, and publish them. You can monitor their
performance over time, challenge them, retrain
them, and update. You can integrate open source
models in your data science pipeline and govern them
alongside SAS models. Please look for presentations on
visual data mining and machine learning and Model Manager
in the Quad, super demos, and in paper sessions. This is the technology
journey we took you on today. The theme of the tech
connection this morning was “Analytics in Action.” We use SAS technology
to tackle problems in health care,
child safety, fraud, and security intelligence. Problems that can only
be solved through data and analytic automation
exist in many, many fields. Here to discuss a
domain that is near and dear to all of our hearts,
the health of our planet, is John Gibson, Chairman for
Energy Technology at Tudor, Pickering, Holt, and Company. Welcome, John. [APPLAUSE] JOHN GIBSON: How
you doing, Oliver? OLIVER SCHABENBERGER:
Hello, my friend. JOHN GIBSON: Good to see you. OLIVER SCHABENBERGER:
Come on in. John, thank you for being
here at Global Forum. We’re in Texas, the nerve center
of the oil and gas industry. And I admire your boots. Do you admire my boots? You have deep roots in
the oil and gas industry, and you’re an absolute
expert in that field. Share a little bit with the
audience your background. JOHN GIBSON: Well,
Oliver, believe it or not, my first use of SAS was about
1988 at Chevron Research. And so I’ve been a user then– don’t ask me to do
anything now, though. I couldn’t do a demo for you. OLIVER SCHABENBERGER:
We can automate this. We can visualize it. Visual program is very easy. We’ll get you into Quad
in front of lectern. JOHN GIBSON: Well, my
career after Chevron, I have had the opportunity to
run two of the largest software companies in oil and gas. So Landmark Graphics, which
we sold to Halliburton. Then left Halliburton and
did Paradigm Geophysical, which is now Emerson
E&P. And so I was CEO of both of
those organizations and helped build
those platforms, which really do a lot
of computer vision and others for the
subsurface there. So I had a lot of work in
the software technology area. OLIVER SCHABENBERGER: John,
probably the most important topic to the oil
and gas industry– and I think to the world– is carbon, CO2,
greenhouse gases. All link quite closely
to climate change. How much carbon is being
created on an annual basis? JOHN GIBSON: Well,
on an annual basis, we’re at about 36 gigatons. OLIVER SCHABENBERGER: 36– JOHN GIBSON: Gigatons–
which you and I were talking about it. It’s hard to
visualize 36 gigatons. So to try to create
a mental image, if you’re familiar
with Jerry’s World, the AT&T Stadium, if we took
all of the air out of it and extracted the CO2, we’d
get about 2.2 tons of CO2. So we only need 18 billion
or so Jerry’s Worlds in order to extract
the amount of CO2 we’re emitting each year
above the carbon cycle. OLIVER SCHABENBERGER:
Billion, with a B. JOHN GIBSON: B, billion. OLIVER SCHABENBERGER: We
always talk about technology and we focus on
its urgencies, what it wants, the progress
we made, that today’s better than the Stone Age
because of technology. And then there are
these side effects, the unintended consequences of
technology, like CO2 emissions, like greenhouse gases. What is going to
happen to the world if we do not address
carbon dioxide? JOHN GIBSON: So, you sort
of put me on the spot as an oil and gas guy. And we’ve been on the spot
for the last few years. If we don’t address carbon
dioxide as a hydrocarbon industry, we can’t sustain
the hydrocarbon industry. We’ll have to go to a
different form of energy. We can’t see CO2 levels
grow from 410 to 450 without having a plant
to begin to address them. We now estimate it could be up
to 300,000 years for the Earth to restore the carbon level
if we just left it alone. And so we’re going to have to
make positive actions in order to actually reduce the
levels as we’re growing them. OLIVER SCHABENBERGER:
You mentioned this amazing– this huge
number, this mind-boggling number of annual output. So if you know how
much it is, why don’t we do something about it? Isn’t this just an attribution
problem, who generates what? JOHN GIBSON: Well, it is. I mean, I kind of
follow politics on this, and it’s getting to
be very political. The Green New Deal– I won’t ask everybody
to shout out if they’re for it or against it. But directionally,
that tells you where our country, where the
sentiment of our government’s going. And so as a result, you’re
going to see regulations come. We’ve got about 60 bills that
are going to be introduced, 30 in the House of
Representatives, 30 in the Senate. In the absence of
a strong EPA, we’re seeing congressional
efforts in that. So even as we’re
speaking now, we’re very close to
launching OCO3, which is our newest carbon emission
satellite here in the US. And so it got no approval
from the White House, but it got approval
from Congress, and it will be going up shortly. OLIVER SCHABENBERGER: So
how does data and analytics play a role in all this? Who collects the data today? What do organizations
need to know? JOHN GIBSON: Well, the
regulation which is coming is really– most people are using
greenhouse gas protocol. And so on that
greenhouse gas protocol, you report in scope one,
scope two, scope three– which is what do
you use directly, what do you use indirectly–
so electricity generated that might come
in in scope two– and then scope three would
include business travel. So if you’re sitting on
a United Airlines flight coming here to the
conference, what portion of the
emissions from that should you be accounting back? Now, as it turns out,
one company’s scope one is another company’s
scope three. And so you can see the
hydrocarbon industry has Uber as scope three. And then Uber has the
hydrocarbon industry in scope one and producing it. So just the sheer accounting
and reporting of this is going to require some
significant analytical models going forward. OLIVER SCHABENBERGER: I
read a fascinating article about the actual carbon
footprint of some of the things we’re doing today. There was a carbon footprint
about streaming platforms, and we thought the carbon
footprint was high when we all used vinyl on turntables. But actually it turns
out, the carbon footprint might be higher for
the streaming music because of all the back end
computing and the energy we have to generate
to support that. JOHN GIBSON: There’s
no question there’s unintended consequences. We tried to remove carbon,
and we increase it. We see that in Europe where
an intent to be carbon neutral ends up increasing
carbon because we end up having to outsource
power to coal plants. We’ve also seen an
elimination of coal in the US, and we’ve seen coal consumption
grow by 3% globally. So we’ve underestimated
the human element, which is that need
for cheap energy in order to grow the quality
of life in other countries. And so consequently we’re
doing the right thing here, and we’re getting
the wrong outcome. And so it’s a very
complicated problem. OLIVER SCHABENBERGER: Yeah. Carbon accounting
systems– so it’s rolling up all the contribution. You mentioned scope
one, two, three. Where do you see SAS fitting
into this urgent need to address carbon? JOHN GIBSON: Well, there’s
no question that SAS I think could have a tremendous role. And I’m hoping that
the end of this session is the beginning of a new
journey for SAS in climate accounting, because
each company, if you’re one of the
chief data scientists here or chief technology officer,
you should be thinking about, how do you do scope one, how
do you do scope two and scope three, and build a model? And then understand, as
you turn those knobs, do you get the
desired consequence or an unintended consequence? And how does that risk
performance really get coordinated or communicated
to a board of directors? You’re at the board level at
SAS with these carbon models and how that’s going to
create financial risk for organizations. I hope next year I’m here and
we have somebody actually doing a demo for you that’s
really showing how they’ve done their climate model. OLIVER SCHABENBERGER: How
about you come back and drive that demo for us? JOHN GIBSON: Well, I’m not
sure I’m the right guy. OLIVER SCHABENBERGER:
Something we have not mentioned much today is
IOT and connectivity. But I see opportunities for
when everything is connected, when devices are
talking to each other, just as they report how
much electricity they need, maybe they can start reporting
without us having to know it their carbon footprint–
you know, their scope one, two, three contributions–
and we could roll it up. JOHN GIBSON: There’s
no question– OLIVER SCHABENBERGER:
Use technology to address that problem
with technology. JOHN GIBSON: It
has to be that way. I mean, it can’t
be a system where– if we take a look,
there’s a quote on a slide that’ll
tell you that KPMG, that 75% of global
companies that are producing the majority of
the revenue don’t have any statement
on climate change. In the US, 50% don’t have a
statement on climate change. Very few are doing greenhouse
gas protocol reporting. In the absence of data,
we get no progress. And I think that with SAS and
with a data-driven activity associated with climate,
we have a future. Without it, we
have a real problem that’s continuing
to accrete if we put more and more CO2 in the air. OLIVER SCHABENBERGER: Well,
let’s work on the problem to secure the future. John, thank you for sharing
your insights this much. JOHN GIBSON: Thank you so much. I appreciate it so much. Thank you. OLIVER SCHABENBERGER:
And John will be here presenting on
Tuesday about predicting the unpredictable. Technology is unstoppable. It’s who we are and what we do– not just at SAS, as a species. Technologies are all the
inventions of the human mind, not just tools and gadgets– analytics. The multidisciplinary effort
to derive insight from data is technology. And as such, it exhibits the
same urgency as all technology. It wants to reorganize. It wants to become more
distributed, abundant, and accessible. What we have shown this
morning is how these organizing principles manifest
themselves, enabling insight and decisioning based on
data by those without degrees in data science– jobs made easier,
more productive, decisions made more reliably
and faster, analytics that follows the data. It becomes more distributed. It’s supplied by the right
person at the right place and time. Analytics moves from
science projects into operations, the
hospital, the Department of Social Services, the field
engineer, and police officer. At SAS, we are on a mission– on a mission to remove barriers
to producing and consuming analytics through
visual interfaces at parity with
programming interfaces, through open source
integration, through APIs that make building and
deploying models simpler, through automation of
analytics, embedding analytics. You can see this play out
throughout this conference in talks, super-demos,
and in the Quad. Look for it– analytics in
action, hidden in plain sight. Enjoy the rest of
the conference, and thank you very much. [APPLAUSE]

Leave a Reply

Your email address will not be published. Required fields are marked *