Showing posts with label Featured Blog Posts - Data Science Central. Show all posts

Sunday, August 16, 2020

Weekly Digest, August 17

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.

Announcement

Data Mastering @ Scale with Michael Stonebraker

Featured Resources and Technical Contributions

Featured Articles

Picture of the Week

Source: article flagged with a +

To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.

from Featured Blog Posts - Data Science Central https://ift.tt/3h5xZU1
via Gabe's MusingsGabe's Musings

Sunday, August 9, 2020

Weekly Digest, August 10

Featured Resources and Technical Contributions

Featured Articles

Picture of the Week

Source: article flagged with a +

To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.

from Featured Blog Posts - Data Science Central https://ift.tt/3iqcihE
via Gabe's MusingsGabe's Musings

Wednesday, August 5, 2020

The Census Bureau Says Less Than 9% of US Companies are Using AI! Really?

Summary: Less than 9%? What this study really shows and what we should take away from it.

Wow. Less than 9%! Can this be true? Well according to a large scale survey study conducted by the US Census Bureau it’s actually a little worse than that since the 9% applies to a basket of advanced technologies some of which don’t involve AI, like RFIDs. So what’s the story?

Turns out the Census Bureau has a mandate to survey US businesses about their use of technology. This report of their most current work was presented in July at a National Bureau of Economic Research conference. There are several twists to this story but the main one is how do we reconcile this result with the many other surveys we all read that say that between 20% and 33% are implementing AI ‘at scale’ and that a much larger percentage are right behind. Even that estimate dates back to surveys conducted in 2018 and based on current literature you’d have to believe adoption is much, much higher.

About the Census Survey

The “2018 Annual Business Survey (ABS)” (survey data from 2017) sought to answer the degree to which US companies were adopting advanced technologies. They first queried about ‘digitization’ figuring that was a precursor to all advanced technologies, then about ‘cloud computing’, and finally about a basket of ‘advanced technologies’ some of which relate to AI and some which don’t.

The advantage the Census has over other surveys we’ve seen is that response is legally required. They mailed out about 850,000 surveys and received 583,000 responses or almost 69%. That’s much better than the 20% or 30% response rates of large business surveys we’ve reported in the past. This actually lets them project out over the entire US business population with some statistical accuracy without the curse of respondent bias.

The first twist, and it’s unclear why they elected this, is that the survey includes only “all private, non-agricultural sectors of the economy”. So the exclusion of public companies immediately excludes at least the largest 4,000 or 5,000 US companies, who logically, are leaders in AI adoption.

So the results of this survey are a kind of trickle down story about what the rest of us are doing, with most responses from very small companies.

1 to 9 employees 75%
10 to 49 employees 20%
50 to 249 employees 4%
250+ employees 1%

Unfortunately without access to the raw data we can’t further refine their findings to look at least at the largest of this group.

And the findings about ‘advanced technologies’ relate to these categories which fall within AI:

Machine Learning 8%
Voice Recognition 5%
Machine Vision 7%
Robotics 3%
Natural Language 2%
Automated Vehicles 8%

Versus these categories which don’t relate to AI according to the definitions given with the survey:

Touchscreens 9%
RFID 1%
Augmented Reality 8%

If you’ve been quick to add these up, you’ll see the AI category totals a little over 10% but this includes overlaps. The real answer to utilization of AI is likely much smaller and more like 6% across this group.

Do Survey Respondents Even Recognize AI?

We’ve written in the past about how difficult it is to measure adoption. There’s no end of organizations conducting surveys. If a large company has implemented a chatbot in one operation do we give them credit for adoption (as many surveys do)? Do the folks who respond to these surveys know what’s going on in other parts of their companies if those organizations are large and dispersed?

In smaller companies like those surveyed you’d expect them to know if, for example, their operation used a robot. However so much of AI is now buried in applications these companies may be using they may be completely unaware.

A further confounding factor is that the survey specifically asked the respondents about the use of these technologies in “the production of goods and services”. Were they sophisticated enough in their understanding to include or exclude the many types of AI found in support systems like HR and finance that heavily incorporate machine vision and NLP. Probably not.

What’s the Degree of Adoption?

The survey did allow the respondents to provide some information about the degree to which they had adopted these technologies expressed as percentage ranges over their entire company.

What’s the Takeaway?

There are several.

Despite AI companies wanting to know about market size and penetration, the traditional voluntary survey of large companies suffers from low response rates and respondent bias making any statistical conclusions no better than guesses.

In smaller companies respondents are unlikely to have sufficiently deep understanding of AI applications to actually spot all the places AI may already be at use in applications.

In large companies the specific respondents answering surveys are unlikely to be completely aware of all the AI applications planned or used throughout a large dispersed corporation.

Particularly in large companies, do we give credit for AI adoption if there is one chatbot in customer service or one ML model embedded in a purchased-in recommender. The real metric ought to be total spend, internal and external on AI.

Increasingly, AI is disappearing into the infrastructure of all sorts of applications we purchase where users may be completely unaware. Computer vision, chatbots, NLP, and even ML algos are increasingly embedded in all sorts of day-to-day applications. It’s no longer relevant to ask about AI on the assumption that there has been a specific in-house project to develop and deploy an application that has captured significant attention and development effort.

Finally, of real concern is what government policy makers may make of this. Focusing AI policy or allocating resources to this end of the market without understanding the shortcomings of the survey may seriously misallocate funding and efforts.

It’s too much to hope that we’ll see an end to these ‘AI Adoption’ surveys. AI is here. You may not recognize all the activities you undertake that use AI, but that’s a good thing. As a smaller private company it’s not necessary to make a conscious decision to adopt AI. The vendors who provide your apps and services will see to that.

Other articles by Bill Vorhies.

About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2.1 million times.

Bill@DataScienceCentral.com or Bill@Data-Magnum.com

from Featured Blog Posts - Data Science Central https://ift.tt/2PiKrUl
via Gabe's MusingsGabe's Musings

Saturday, August 1, 2020

Calculus For Data Science: What Do You Really Need to Know?

This one picture shows what areas of calculus and linear algebra are most useful for data scientists.

If you read any article worth its salt on the topic Math Needed for Data Science, you'll see calculus mentioned. Calculus (and it's closely related counterpart, linear algebra) has some very narrow (but very useful) applications to data science. If you have a decent algebra background (which I'm assuming you do, if you're a data scientist!) then you can learn all of the calculus you need in a few hours of study.

You don't usually need to know exactly how to take derivatives, minimize sums of squares or create clustering algorithms from scratch--there are calculators for that! But if you have a general idea of what's working in the background you'll be able to recognize when results don't make sense or what better alternatives might be available.

References

MATH7502: Mathematics for Data Science 2 (Linear Algebra and Topics in Multivariable Calculus).

How Much Math Do You Need to Become a Data Scientist?

Cluster Analysis: Basic Concepts and Algorithms

The Mathematics Behind Principal Component Analysis

Lossy Compression

Fuzzy Relation Calculus in the Compression and Decompression of Fuzzy Relations

from Featured Blog Posts - Data Science Central https://ift.tt/3gmBncQ
via Gabe's MusingsGabe's Musings

Feature engine python package for feature engineering

In this post, we explore a new python package for feature engineering

Feature engineering is the process of using domain knowledge of the data to transform existing features or to create new variables from existing ones, for use in machine learning. Using feature engineering, we can pre-process raw data and make it suitable for use in machine learning algorithms.

The package covers the following functions

1. Missing Data Imputation

Complete Case Analysis
Mean / Median / Mode Imputation
Random Sample Imputation
Replacement by Arbitrary Value
End of Distribution Imputation
Missing Value Indicator

2. Categorical Encoding

One hot encoding
Count and Frequency encoding
Target encoding / Mean encoding
Ordinal encoding
Weight of Evidence
Rare label encoding

3. Variable transformation

Logarithm transformation - log(x)
Reciprocal transformation - 1 / x
Square root transformation - sqrt(x)
Exponential transformation - exp(x)
Yeo-Johnson transformation
Box-Cox transformation

4. Discretisation

Equal width discretisation
Equal Frequency discretisation
Discretisation using decision trees

5. Outliers

Outlier removal
Treating outliers as missing values
Top / bottom / zero coding
Discretisation

6. Feature Scaling

Standardisation
Min-Max Scaling
Maximum Absolute Scaling
Robust Scaling.
Mean normalisation
Scaling to unit length

8. Feature Creation

9. Aggregating Transaction Data

From the github page

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn functionality with fit() and transform() methods to first learn the transforming parameters from data and then transform the data

Feature engine package on github

Documentation of feature engine package

Package created by Dr Soledad Galli

I plan to contribute to this package. In August, at Data Science Central, I also plan create a mini e-book on feature engineering which will use this page (co-authored with Aysa Tajeri ). Feature Engineering is a complex /multifaceted domain. Our goal is to present an overview of feature engineering for various domains. Proposed outline is

Understanding the feature engineering pipeline
Concepts/ maths techniques you need to understand feature engineering
Implementing feature engineering using the package above
Applications in industries

from Featured Blog Posts - Data Science Central https://ift.tt/33in1GE
via Gabe's MusingsGabe's Musings

Sunday, July 26, 2020

Weekly Digest, July 27

Announcements

Featured Resources and Technical Contributions

Featured Articles

Picture of the Week

Source: article flagged with a +

To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.

from Featured Blog Posts - Data Science Central https://ift.tt/2BwyhUo
via Gabe's MusingsGabe's Musings

5 common causes of friction between data scientists and the rest of the stakeholders

Overview:

As a data scientist, have you ever been frustrated that your stakeholders don’t see the value that you bring to the table? You may ask yourself, “How far should I go in explaining the work I do or what my models are doing?” If that sounds like you, then pay close attention to this post and the next, as they are all about improving collaboration between data scientists and other stakeholders.

This is a two-part post: This article covers underlying assumptions and gaps in understanding that cause friction between data scientists and stakeholders; my other article, offers concrete steps for better collaboration.

Comprehensibility:

Machine Learning (ML) models are inherently complex and hard to explain. Data scientists know that the events between data input and output generation are not easily mapped to an explainable process. Before the adoption of ML techniques, the world was much simpler; it was easier to explain how the output was generated. Back then, data science operated on a rules-based system; in fact, it still does, even though the rules are now more complex.

Governance:

More importantly, in the world before ML models, the rules were governed by stakeholders throughout the process. (Note that my usage of the word stakeholder is broad...this could be a general manager, business owner, marketing lead, product manager, etc.) While this is no longer the case, stakeholders can still have a lot of say in the ML process. For example, stakeholders still hold the key to the input data in many environments. Sometimes they even own much of the process that leads to data output.

Investment:

Further, a lack of understanding can lead to miscommunication, which damages trust between the parties. This lack of trust can seriously impede a data scientist’s ability to provide the right level of support throughout the end-to-end process of model building (i.e., collecting, building, deploying, and iterating on the models).

Oftentimes, management support for resources/time/capital allocation is needed to get better outcomes from ML models. Without such investments, the results of ML models are either subpar, or even worse, a waste of time. Remember, a model that predicts with 50% accuracy is of no use.

The above are some of the primary gaps from the stakeholders’ point of view (POV) resulting in misaligned expectations and a lack of trust. While these gaps impact all parties involved, here are the implications from a data scientist’s point of view.

Lack of support and guidance:

I have heard data scientists express that when the company started setting up a data science team or hiring new data scientists, there was so much excitement; however, the support and enthusiasm they experienced at the beginning faltered after a few months or quarters. Of course, not to say all companies are alike, but this pain is most often experienced when companies start the journey towards incorporating data science principles/techniques in their products/processes. Again, if the stakeholders feel they are not getting what they want due to communication issues, data scientists will end up feeling let down and/or neglected.

Misaligned expectations:

Companies often hire the wrong type of data scientists or the wrong level of seniority for a project. This usually happens when the company is getting started with data science and has no clear understanding of what it wants from this team/role/person. This misalignment ends up further souring relationships while wasting time and effort across the board.

The set of gaps listed above are indeed fixable. It takes time, effort, learning, and setting up some frameworks so that both entities - data scientists and stakeholders - can foster better collaboration and achieve more together. Check out some proposed solutions for fixing communication gaps between data scientists and stakeholders in my next article!

from Featured Blog Posts - Data Science Central https://ift.tt/30Rvtd8
via Gabe's MusingsGabe's Musings

Blockdrop to Accelerate Neural Network training by IBM Research

Scaling AI with Dynamic Inference Paths in Neural Networks

Introduction

IBM Research, with the help of the University of Texas Austin and the University of Maryland, has tried to expedite the performance of neural networks by creating technology, called BlockDrop. Behind the design of this technology lies the objective and promise of speeding up convolutional neural network operations without any loss of fidelity, which can offer a great savings of cost to the ML community.

This could “further enhance and expedite the application and use as well as boost the performance of neural nets, leading to particularly in places and on cloud/edge servers with limited computing capability and power limitations”.

An increase in accuracy level have been accompanied by increasingly complex and deep network architectures. This presents a problem for domains where fast inference is essential, particularly in delay-sensitive and realtime scenarios such as autonomous driving, robotic navigation, or user-interactive applications on mobile devices.

Further research results show regularization techniques for fully connected layers, is less effective for convolutional layers, as activation units in these layers are spatially correlated and information can still flow through convolutional networks despite dropout.

BlockDrop method introduced by IBM Research is a “complementary method to existing model compression techniques, as this form of structured NEURAL NETWORK based dropout, drops spatially correlated information, resulting in compressed representations. The residual blocks of a neural network can be kept for evaluation, and can be further pruned for greater speed”.

The below figure illustrates blockdrop mechanism for a given image input to the convolution network. The green regions in the 2 right side figures include the activation units which contain semantic information in the input image. The activations dropped at random is not effective in removing semantic information.

For a NN with iteration at each step, there are nearby activations contain closely related information. The best strategy employed in spatial compression algorithms is to drop continuous regions that represent similar region and context either by color or shape. By this it helps to remove certain semantic information (e.g., head or feet), propelling remaining units to learn detailed features for classifying input image.

Source

Policy Network for Dynamic Inference Paths

BlockDrop mechanism learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy. It exploits the robustness of Residual Networks (ResNets) by dropping layers that aren’t necessary to compute to achieve the desired level of accuracy, resulting in the dynamic selection of residual blocks for a given novel image. Thus it aids in:

Allocating system resources in a more efficient manner with the objective of saving cost.

Facilitating further insights into ResNets, e.g., whether and how different blocks encode information about objects and understanding the dynamics behind encoding object-level features.

Achieving minimal block usage through more compressed representations by emphasizing decisions at an image pixel level. These image-specific decisions (with features) undertaken at different layers of hidden neurons, helps to optimally drop blocks.

For example, given a pre-trained ResNet, a policy network is trained into an “associative reinforcement learning setting for the dual reward of utilizing a minimal number of blocks while preserving recognition accuracy”.

Experiments on CIFAR and ImageNet reveal learned policies not only accelerate inference but also encode meaningful visual information. A ResNet-101 model, with this method, achieves a speedup of 20% on average, going as high as 36% for some images, while maintaining the same 76.4% top-1 accuracy on ImageNet.

BlockDrop strategy learns a model, referred to as the policy network, that, given a novel input image, outputs the posterior probabilities of all the binary decisions for dropping or keeping each block in a pre-trained ResNet.

The policy network is trained using curriculum learning to maximize a reward that incentivizes the use of as few blocks as possible while preserving the prediction accuracy.

In addition, the pre-trained ResNet is further jointly fine-tuned with the policy network to produce feature transformations targeted for block dropping behavior. The method represents an instantiation of associative reinforcement learning where all the decisions are taken in a single step given the context (i.e., the input instance). This results in lightweight policy execution and scalable to very deep networks.

Deep Learning Neural networks like a recurrent model (LSTM) could also serve as the policy network, however, research findings reveal a CNN to be more efficient with similar performance.

The below Figure represents a conceptual overview of BlockDrop, that learns a policy to select the minimal configuration of blocks needed to correctly classify a given input image. The resulting instance-specific paths in the network not only reflect the image’s difficulty, where easier samples have been known to use fewer blocks. It has also been possible to encode meaningful visual information with patterns of blocks, that correspond to clusters of visual features.

Source -IBM

The above figure depicts policy network architecture of Blockdrop. On any given new image, the policy network outputs dropping and keeping decisions for each block in a pre-trained ResNet. This final active blocks retained are used for evaluating prediction.

Each and every both block usage and prediction accuracy, have been known to cumulatively account for Policy rewards. The policy network is further trained to optimize the expected reward with a curriculum learning strategy, which helps to provide a generic algorithm for global optimization of non-convex functions.

In order to attain this objective the policy network is jointly fine-tuned with the ResNet.

Source -IBM

The above figure illustrates samples from ImageNet. The top row contains images that are classified with high accuracy with the least number of blocks by removing redundancy, while samples in the bottom row utilize the most blocks and take in more space.

Samples using fewer blocks are indeed easier to identify since they contain single frontal view objects positioned in the center, while several objects, occlusion, or cluttered background occur in samples that require more blocks.

This is based on the hypothesis “that block usage is a function of instance difficulty where BlockDrop automatically learns “sorting” images into easy or hard cases”.

Usage (Reference https://github.com/Tushar-N/blockdrop.git)


Library and Usage

git clone https://github.com/Tushar-N/blockdrop.git  
pip install -r requirements.txt  wget -O blockdrop-checkpoints.tar.gz 
https://www.cs.utexas.edu/~tushar/blockdrop/blockdrop-checkpoints.tar.gz  
tar -zxvf blockdrop-checkpoints.tar.gz  


#Train a model on CIFAR 10 built upon a ResNet-110  python cl_training.py --model R110_C10 --cv_dir cv/R110_C10_cl/ --lr 1e-3 --batch_size 2048 --max_epochs 5000  #Train a model on ImageNet built upon a ResNet-101  python cl_training.py --model R101_ImgNet --cv_dir cv/R101_ImgNet_cl/ --lr 1e-3 --batch_size 2048 --max_epochs 45 --data_dir data/imagenet/  # Finetune a ResNet-110 on CIFAR 10 using the checkpoint from cl_training  python finetune.py --model R110_C10 --lr 1e-4 --penalty -10 --pretrained cv/cl_training/R110_C10/ckpt_E_5300_A_0.754_R_2.22E-01_S_20.10_#_7787.t7 --batch_size 256  --max_epochs 2000 --cv_dir cv/R110_C10_ft_-10/  # Finetune a ResNet-101 on ImageNet using the checkpoint from cl_training  python finetune.py --model R101_ImgNet --lr 1e-4  --penalty -5 --pretrained cv/cl_training/R101_ImgNet/ckpt_E_4_A_0.746_R_-3.70E-01_S_29.79_#_484.t7 --data_dir data/imagenet/ --batch_size 320 --max_epochs 10 --cv_dir cv/R101_ImgNet_ft_-5/  python test.py --model R110_C10 --load cv/finetuned/R110_C10_gamma_10/ckpt_E_2000_A_0.936_R_1.95E-01_S_16.93_#_469.t7  python test.py --model R101_ImgNet --load cv/finetuned/R101_ImgNet_gamma_5/ckpt_E_10_A_0.764_R_-8.46E-01_S_24.77_#_10.t7   R110_C10 Model Output  Accuracy: 0.936 Block Usage: 16.933 ± 3.717 FLOPs/img: 1.81E+08 ± 3.43E+07 Unique Policies: 469  Imagenet Model Output Accuracy: 0.764 Block Usage: 24.770 ± 0.980F LOPs/img: 1.25E+10 ± 4.28E+08Unique Policies: 10

Conclusion

In this blog, we have discussed the BlockDrop strategy aimed to speed up the training of neural networks. It has the following characteristics:

Speed AI-based computer vision operations and save the running time of servers.

Approximately takes 200 times less power per pixel than comparable systems using traditional hardware.

Facilitates the deployment of top-performing deep neural network models on mobile devices by effectively reducing the storage and computational costs of such networks.

Determines the minimal configuration of layers, or blocks, needed to correctly classify a given input image. The simplicity of images helps to remove more layers and save more time.

Application has been extended to ResNets for faster inference by selectively choosing residual blocks to evaluate in a learned and optimized manner conditioned on inputs.

Extensive experiments conducted on CIFAR and ImageNet show considerable gains over existing methods in terms of the efficiency and accuracy trade-off.

References

BlockDrop: Dynamic Inference Paths in Residual Networks https://arxiv.org/pdf/1711.08393.pdf
https://www.ibm.com/blogs/research/2018/12/ai-year-review/

from Featured Blog Posts - Data Science Central https://ift.tt/30Nchx7
via Gabe's MusingsGabe's Musings

P Value vs Critical Value

P-values and critical values are so similar that they are often confused. They both do the same thing: enable you to support or reject the null hypothesis in a test. But they differ in how you get to make that decision. In other words, they are two different approaches to the same result. This picture sums up the p value vs critical value approaches.

References

James Jones. Probability Values.

How to find critical values.

Hartmann, K., Krois, J., Waske, B. (2018): E-Learning Project SOGA: Statistics and Geospatial Data Analysis. Department of Earth Sciences, Freie Universitaet Berlin.

from Featured Blog Posts - Data Science Central https://ift.tt/3f3wYtI
via Gabe's MusingsGabe's Musings

Sunday, July 19, 2020

Weekly Digest, July 20

Featured Resources and Technical Contributions

Featured Articles

Announcement

Accelerated AI and Advanced Graphics to Accelerate Multiple Workloads

Picture of the Week

Source: article flagged with a +

To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.

from Featured Blog Posts - Data Science Central https://ift.tt/2WE8yAV
via Gabe's MusingsGabe's Musings

Anomaly Detection from Head and Abdominal Fetal ECG — A Case study of IOT anomaly detection using Generative Adversarial Networks

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal Metrics

Motivation

In this blog, we discuss about the role of Variation Auto Encoder in detecting anomalies from fetal ECG signals.

Variational Auto Encoder ways to accurately determine anomalies from seasonal metrics occurring at
regular intervals ( i.e. daily/weekly/bi-weekly/monthly or periodic events at finer granular levels of mins/secs) so as to facilitate timely actions from the concerned team. Such timely actions help to recover from serious issues such as predictive maintenance) in the field of web applications, retail, IoT, telecom, and healthcare industry.

The metrics/KPIs that plays an important role in determining anomalies are composed of noises that are assumed to be independent, zero-mean Gaussian at every point. In fact, the seasonal KPIs comprises of seasonal patterns with local variations, and statistics of the Gaussian noises.

Role of IoT/Wearables

Portable low-power fetal ECG collectors like wearables have been designed for research and analysis and, which can collect maternal abdominal ECG signals in real time. The ECG data can be sent to a smartphone client via Bluetooth to individually analyse signals captured from fetal brain and maternal abdomen . The extracted fetal ECG signals can be used to detect any anomaly in fetal behavior.

Variation Auto-Encoder

Deep Bayesian networks employ black-box learning patterns with neural networks to express the relationships between variables in the training dataset. Variational Auto Encoders are nothing but Deep Bayesian Networks which are often used in training and prediction, uses Neural Networks to model posteriors of the distributions.

Variational Auto Encoders (VAEs) supports optimization by setting a lower bound on the likelihood via a reparameterization of the Evidence Lower Bound (ELBO). The ELBO method uses a 2 step process of maximizing the log-likelihood, the likelihood tries to make the generated sample (image/data) more correlated to the latent variable, which makes the model more deterministic. In addition, it minimizes the KL divergence between the posterior and the prior.

Characteristics/Architecture of DoNut

The Donut recognizes the normal pattern of a partially abnormal x, and find a good posterior in order to estimate how well x follows the
normal pattern. The fundamental characteristic of Donut is to enhance its ability to find good posteriors by reconstructing normal points within abnormal windows. This property is infused in its training property by M-ELBO (Modified ELBOW), that turns out to be superior, in contrast to excluding all windows containing anomalies and missing points from the training data.

Thus summarizing the three techniques employed in VAE based anomaly detection algorithm in Donut architecture includes the following:

Modified ELBO – Ensures that an average, a certain minimum number of bits of information are encoded per latent variable, or per group of the latent variable. This helps to increase the information capacity and reconstruction accuracy.
Missing Data Injection for training – A kind of data augmentation procedure used to fill the missing points as zeros. It amplifies the effect of ELBO by injecting the missing data before the training epoch starts and recovering the missing points after the epoch is finished.
MCMC Imputation for better anomaly detection – Improves posterior estimation by synthetically generated missing points.

The network structure of Donut. Gray nodes are random variables, and white nodes are layers. Source (Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

The data preparation stage deals with Standardization, Missing value Injection and grouping data in terms of Sliding Window (length say (W) over key metrics), where each point xt is being processed as xt−W +1, . . . , x. The training process encompasses Modified ELBO and Missing Data Injection. In the final prediction stage, MCMC Imputation (as shown in the figure below) is applied to yield a better posterior distribution.

MCMC Imputation and Anomaly Detection Source

To know more about ELBO in VATE check out https://medium.com/@hfdtsinghua/derivation-of-elbo-in-vae-25ad7991fdf7 or refer to the references below.

File Imports

import numpy as np from donut import complete_timestamp, standardize_kpi import pandas as pd import csv import matplotlib.pyplot as plt import seaborn as sns sns.set(rc={'figure.figsize':(11, 4)}) from sklearn.metrics import accuracy_score import mne import pandas as pd import numpy as np import matplotlib.pyplot as plt

Loading and TimeStamping the data

Here we add timestamps to the Fetal ECG data, under the assumption that each data point is recorded at an interval of 1 second, (although the data-set source suggests that the signal are recorded at 1 Khz.). We further resample the data at an interval of 1 minute by taking an average of 60 samples.

data_path = '../abdominal-and-direct-fetal-ecg-database-1.0.0/' file_name = 'r10.edf'  edf = mne.io.read_raw_edf(data_path+file_name) header = ','.join(edf.ch_names) np.savetxt('r10.csv', edf.get_data().T, delimiter=',', header=header)  df = pd.read_csv('r10.csv') periods = df.shape[0]  dti = pd.date_range('2018-01-01', periods=periods, freq='s') print(dti.shape, df.shape) df['DateTs'] = dti  df.set_index('DateTs') df.index = pd.to_datetime(df.index, unit='s') df1 = df.resample('1T').mean()

Once the data is indexed by time-stamps we plot the individual features and try to explore seasonality patterns if any. We also add a label feature metric, signifying potential anomalies that could be present in the input data by considering at high-level of brain signal fluctuations (>= .00025 and <= -.00025). We chose the brain signal, as it closely resembles the signal curves and spikes of 4 other abdominal signals.

Data Labelling and Plotting the Features

As there are total 5 signals (one from fetal brain and 4 from abdomen

df1.rename_axis('timestamp', inplace=True) print(cols, df1.index.name)  df1['label'] =  np.where((df1['# Direct_1'] >= .00025) | (df1['# Direct_1'] <= -.00025), 1, 0) print(df1.head(5))  for i in range(0, len(cols)):     if(cols[i] != 'timestamp'):         plt.figure(figsize=(20, 10))         plt.plot(df1[cols[i]], marker='^', color='red')         plt.title(cols[i])         plt.savefig('figs/f_' + str(i) + '.png')

Training the data using Adversarial Networks

df2 = df1.reset_index() df2 = df2.reset_index(drop=True) #drop the index, instead use as it as a feature vector before discovering the missing data points  # Read the raw data for 1st feature Direct_1 timestamp, values, labels = df2['timestamp'], df2['# Direct_1'], df2['label'] # If there is no label, simply use all zeros. labels = np.zeros_like(values, dtype=np.int32)   # Complete the timestamp, and obtain the missing point indicators. timestamp, missing, (values, labels) = \     complete_timestamp(timestamp, (values, labels))   # Split the training and testing data. test_portion = 0.3 test_n = int(len(values) * test_portion) train_values, test_values = values[:-test_n], values[-test_n:] train_labels, test_labels = labels[:-test_n], labels[-test_n:] train_missing, test_missing = missing[:-test_n], missing[-test_n:]  # Standardize the training and testing data. train_values, mean, std = standardize_kpi(     train_values, excludes=np.logical_or(train_labels, train_missing)) test_values, _, _ = standardize_kpi(test_values, mean=mean, std=std)   import tensorflow as tf from donut import Donut from tensorflow import keras as K from tfsnippet.modules import Sequential from donut import DonutTrainer, DonutPredictor   # We build the entire model within the scope of `model_vs`, # it should hold exactly all the variables of `model`, including # the variables created by Keras layers. with tf.variable_scope('model') as model_vs:     model = Donut(         h_for_p_x=Sequential([             K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),                            activation=tf.nn.relu),             K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),                            activation=tf.nn.relu),         ]),         h_for_q_z=Sequential([             K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),                            activation=tf.nn.relu),             K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),                            activation=tf.nn.relu),         ]),         x_dims=120,         z_dims=5,     )   trainer = DonutTrainer(model=model, model_vs=model_vs, max_epoch=512) predictor = DonutPredictor(model)   with tf.Session().as_default():     trainer.fit(train_values, train_labels, train_missing, mean, std)     test_score = predictor.get_score(test_values, test_missing)      pred_score = np.array(test_score).reshape(-1, 1)     print(len(test_missing), len(train_missing), len(pred_score), len(test_values))     y_pred = np.argmax(pred_score, axis=1)

The model is trained with default parameters as listed below:

use_regularization_loss=True, max_epoch=512,  batch_size=256, valid_batch_size=1024,  valid_step_freq=100, initial_lr=0.001,  optimizer=tf.train.AdamOptimizer,  grad_clip_norm=10.0 #Clip gradient by this norm.

The model summary with its trainable parameters, number of hidden layers can be obtained as :

Trainable Parameters (24,200 in total) donut/p_x_given_z/x_mean/bias (120,) 120 donut/p_x_given_z/x_mean/kernel (50, 120) 6,000 donut/p_x_given_z/x_std/bias (120,) 120 donut/p_x_given_z/x_std/kernel (50, 120) 6,000 donut/q_z_given_x/z_mean/bias (5,) 5 donut/q_z_given_x/z_mean/kernel (50, 5) 250 donut/q_z_given_x/z_std/bias (5,) 5 donut/q_z_given_x/z_std/kernel (50, 5) 250 sequential/forward/_0/dense/bias (50,) 50 sequential/forward/_0/dense/kernel (5, 50) 250 sequential/forward/_1/dense_1/bias (50,) 50 sequential/forward/_1/dense_1/kernel (50, 50) 2,500 sequential_1/forward/_0/dense_2/bias (50,) 50 sequential_1/forward/_0/dense_2/kernel (120, 50) 6,000 sequential_1/forward/_1/dense_3/bias (50,) 50 sequential_1/forward/_1/dense_3/kernel (50, 50) 2,500

This model is obtained from the following code snippet:  model = Donut( h_for_p_x=Sequential([ K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001), activation=tf.nn.relu), K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001), activation=tf.nn.relu), ]), h_for_q_z=Sequential([ K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001), activation=tf.nn.relu), K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001), activation=tf.nn.relu), ]), x_dims=120, z_dims=5, )

This DoNut Network contains uses The variational auto-encoder (“Auto-Encoding Variational Bayes”,Kingma, D.P. and Welling) which is a deep Bayesian network, with observed variable x and latent variable z. The VAE is generated using TFSnippet (library for writing and testing tensorflow models). The generative process of Auto-Encoder is initiated with parameter z with prior distribution p(z), and a hidden network h(z), then uses observed variable x with distribution p(x | h(z)). The posterior inference p(z | x), variational inference techniques are adopted, to train a separated distribution q(z | h(x)).

Here each Sequential function creates a multi-layer perception, with 2 hidden layers of 50 units and RELU activation. The 2 distributions “h_for_p_x” and “h_for_q_z“, are created using the same Sequential function (as evident from Model Summary (Sequential and Sequential_1) and they represent the hidden networks for “p_x_given_z” and “q_z_given_x”.

Plotting the Anomalies/Non-Anomalies together or Individually

We plot the anomalies (in red) together with non-anomalies (green) and also try to superimpose both of them together in the same graph so as to analyse the combined impact.

In the Donut prediction, the higher the prediction score the data is less anomalous. We prefer to choose (-3) as the threshold margin of predicting anomalous points.

We also compute the number of inliers and outliers and plot them against a time-stamped values along the x axis.

    plt.figure(figsize=(20, 10))     split_test  = int((test_portion)*df.shape[0])      anomaly = np.where(pred_score > -3, 0, 1)      df3 = df2.iloc[-anomaly.shape[0]:]     df3['outlier'] = anomaly     df3.reset_index(drop=True)      print(df3.head(2), df3.shape)     print("Split", split_test, df3.shape)     di = df3[df3['outlier'] == 0]     do = df3[df3['outlier'] == 1]      di = di.set_index(['timestamp'])     do = do.set_index(['timestamp'])      print("Outlier and Inlier Numbers", do.shape, di.shape, di.columns, do.columns)      outliers = pd.Series(do['# Direct_1'], do.index)     inliers = pd.Series(di['# Direct_1'], di.index)      plt.plot(do['# Direct_1'], marker='^', color='red', label="Anomalies")     plt.plot(di['# Direct_1'],  marker='^', color='green', label="Non Anomalies")      plt.legend(['Anomalies', 'Non Anomalies'])     plt.title('Anomalies and Non Anomalies from Fetal Head Scan')     plt.show()      di = di.reset_index()     do = do.reset_index()     plt.figure(figsize=(20, 10))      do.plot.scatter(y ='# Direct_1', x = 'timestamp', marker='^', color='red', label="Anomalies")      plt.legend(['Anomalies'])     plt.xlim(df3['timestamp'].min(), df3['timestamp'].max())     plt.ylim(-.0006, .0006)     plt.title('Anomalies from Fetal Head Scan')     plt.show()     plt.figure(figsize=(20, 10))     di.plot.scatter(y='# Direct_1', x='timestamp', marker='^', color='green', label="Non Anomalies")     plt.legend(['Non Anomalies'])     plt.xlim(df3['timestamp'].min(), df3['timestamp'].max())     plt.ylim(-.0006, .0006)     plt.title('Non Anomalies from Fetal Head Scan')     plt.show()

Anomaly Plots for Direct electrocardiogram recorded from fetal head

The three consecutive plot displays anomalous and non-anomalous points plotted against each other or separately as labeled, especially for signals obtained from Fetal Head Scan.

Anomaly Plots for Direct electrocardiogram recorded from maternal abdomen

The three consecutive plot displays anomalous and non-anomalous points plotted against each other or separately as labeled, especially for signals obtained from Fetus’s Maternal Abdomen.

Conclusion

Some of the key. learnings of the Donut Architecture are:

Dimensionality reduction based anomaly detection techniques needs to use reconstruction mechanism to identify the variance and consequently identify the anomalies.
Anomaly detection with generative models needs to train with both normal and abnormal data.
Not relying on data imputation by any algorithm weaker than VAE, as this may degrade the performance.
In order to discover the anomalies fast, the reconstruction probability for the last point in every window of x is computed.

We should also explore other variants of Auto Encoders (RNN, LSTM, LSTM with Attention Networks, Stacked Convolutional Bidirectional LSTM) in discovering anomalies for IoT devices.

The complete source code is available at https://github.com/sharmi1206/featal-ecg-anomaly-detection

References

https://physionet.org/content/adfecgdb/1.0.0/
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications – https://arxiv.org/abs/1802.03903
Don’t Blame the ELBO! A Linear VAE Perspective on Posterior Collapse : https://papers.nips.cc/paper/9138-dont-blame-the-elbo-a-linear-vae-perspective-on-posterior-collapse.pdf
https://github.com/NetManAIOps/donut — Installation and API Usage
Understanding disentangling in β-VAE https://arxiv.org/pdf/1804.03599.pdf%20.
A Fetal ECG Monitoring System Based on the Android Smartphone : https://www.mdpi.com/1424-8220/19/3/446

from Featured Blog Posts - Data Science Central https://ift.tt/32GGj8e
via Gabe's MusingsGabe's Musings

Overview on Forecasting Models in Power BI

Time Series forecasting in PBI is based on the thumb technique of smoothening time series prediction called Exponential Smoothening (ES). ES of time series data assigns exponentially decreasing weights for newest to oldest observations. ES is also be used for time series with trend and seasonality. This model is usually used to make short term forecasts, as longer-term forecasts using this technique can be quite unreliable. Collectively, the methods are sometimes referred to as ETS models, referring to explicit modeling for errors, Trend and Seasonality.

Types of Exponential Smoothening models in PBI

Simple exponential smoothening : - uses a weighted moving average with exponentially decreasing weights
Holt’s trend-corrected double exponential smoothening :- usually more reliable for handling data that shows trends, compared to the single procedure
Triple exponential smoothening :- usually more reliable for parabolic trends or data that shows trends and seasonality

Handling the missing values

In some cases, your timeline might be missing some historical values. Does this pose a problem?

Not usually – the forecasting chart can automatically fill in some values to provide a forecast. If the total number of missing values is less than 40% of the total number of data points, the algorithm will perform linear interpolation prior to performing the forecast.

If more than 40% of your values are missing, try to fill in more data, or perhaps aggregate values into larger time units, to ensure that a more complete data series is available for analysis.

Reference Code ::-

import requests
import pandas as pd
import json
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, Holt
import numpy as np
%matplotlib inline
plt.style.use('Solarize_Light2')

r = requests.get('https://ift.tt/3jkfBsa)
jobj = json.loads(r.text[18:-1])
data = jobj[0]['data']
df = pd.DataFrame(data, columns=['time','data']).set_index('time')
train = df.iloc[100:-10, :]
test = df.iloc[-10:, :]
train.index = pd.to_datetime(train.index)
test.index = pd.to_datetime(test.index)
pred = test.copy()

model = SimpleExpSmoothing(np.asarray(train['data']))
model._index = pd.to_datetime(train.index)

fit1 = model.fit()
pred1 = fit1.forecast(9)
fit2 = model.fit(smoothing_level=.2)
pred2 = fit2.forecast(9)
fit3 = model.fit(smoothing_level=.5)
pred3 = fit3.forecast(9)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(train.index[150:], train.values[150:])
ax.plot(test.index, test.values, color="gray")
for p, f, c in zip((pred1, pred2, pred3),(fit1, fit2, fit3),('#ff7823','#3c763d','c')):
ax.plot(train.index[150:], f.fittedvalues[150:], color=c)
ax.plot(test.index, p, label="alpha="+str(f.params['smoothing_level'])[:3], color=c)
plt.title("Simple Exponential Smoothing")
plt.legend();

model = Holt(np.asarray(train['data']))
model._index = pd.to_datetime(train.index)

fit1 = model.fit(smoothing_level=.3, smoothing_slope=.05)
pred1 = fit1.forecast(9)
fit2 = model.fit(optimized=True)
pred2 = fit2.forecast(9)
fit3 = model.fit(smoothing_level=.3, smoothing_slope=.2)
pred3 = fit3.forecast(9)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(train.index[150:], train.values[150:])
ax.plot(test.index, test.values, color="gray")
for p, f, c in zip((pred1, pred2, pred3),(fit1, fit2, fit3),('#ff7823','#3c763d','c')):
ax.plot(train.index[150:], f.fittedvalues[150:], color=c)
ax.plot(test.index, p, label="alpha="+str(f.params['smoothing_level'])[:4]+", beta="+str(f.params['smoothing_slope'])[:4], color=c)
plt.title("Holt's Exponential Smoothing")
plt.legend();

Evaluating the Forecast

Hindcasting and adjusting confidence intervals are two good ways to evaluate the quality of the forecast.

Hindcast is one way to verify whether the model is doing a good job If the observed value doesn’t exactly match the predicted value, it does not mean the forecast is all wrong – instead, consider both the amount of variation and the direction of the trend line. Predictions are a matter of probability and estimation, so if the predicted value is close to but not exactly the same as the real value, it could be a better indicator of prediction quality than if the value exactly matched the real result. In general, when a model too closely mirrors the values and trends within the input dataset, it might be overfitted, meaning it likely won’t provide good predictions on new data.

You are the best judge of how reliable the input data is, and what the real range of possible predictions might be.

from Featured Blog Posts - Data Science Central https://ift.tt/2OEUoej
via Gabe's MusingsGabe's Musings

Thursday, July 16, 2020

ANOVA vs Regression in One Picture

If you scour the internet for "ANOVA vs Regression", you might be confused by the results. Are they the same? Or aren't they? The answer is that they can be the same procedure, if you set them up to be that way. But there are differences between the two methods. This one picture sums up those differences.

References

ANOVA vs Regression

from Featured Blog Posts - Data Science Central https://ift.tt/32lcN87
via Gabe's MusingsGabe's Musings

Pages - HTML

Translate

Pages

Pages

Pages

Sunday, August 16, 2020

Sunday, August 9, 2020

Wednesday, August 5, 2020

Saturday, August 1, 2020

References

1. Missing Data Imputation

2. Categorical Encoding

3. Variable transformation

4. Discretisation

5. Outliers

6. Feature Scaling

8. Feature Creation

9. Aggregating Transaction Data

Sunday, July 26, 2020

Scaling AI with Dynamic Inference Paths in Neural Networks

Introduction

Policy Network for Dynamic Inference Paths

Usage (Reference https://github.com/Tushar-N/blockdrop.git)

Conclusion

References

Sunday, July 19, 2020

Anomaly Detection from Head and Abdominal Fetal ECG — A Case study of IOT anomaly detection using Generative Adversarial Networks

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal Metrics

Motivation

Role of IoT/Wearables

Variation Auto-Encoder

Characteristics/Architecture of DoNut

File Imports

Loading and TimeStamping the data

Data Labelling and Plotting the Features

Training the data using Adversarial Networks

Plotting the Anomalies/Non-Anomalies together or Individually

Anomaly Plots for Direct electrocardiogram recorded from fetal head

Anomaly Plots for Direct electrocardiogram recorded from maternal abdomen

Conclusion

References

Thursday, July 16, 2020

References