Translate

Pages

Pages

Pages

Intro Video
Showing posts with label Featured Blog Posts - Data Science Central. Show all posts
Showing posts with label Featured Blog Posts - Data Science Central. Show all posts

Tuesday, September 15, 2020

Stumped by Bayes' Theorem? Try This Simple Workaround

Bayes' Theorem formula.

Bayes' Theorem, which The Stanford Encyclopedia of Philosophy calls "...a simple mathematical formula" can be surprisingly difficult to actually solve. If you struggle with Bayesian logic, solving the "simple" formula involves not much more than guesswork. You have to translate a problem into "A given B" and "B given A", cross your fingers that you're guess for whatever A and B is is right, double check your thoughts, get thoroughly lost, and punch the resulting fractions into a calculator. The calculator will spit out an answer which may or may not be correct as you have no idea what your point-oh-something solution means in terms of the original problem. If this sounds like you, you're not alone: various studies have shown that the vast majority of physicians can't work the formula either.

But there's a more intuitive way to get to the same answer, without the counter-intuitive formula. The procedure in question? None other than the humble probability tree.

How to Use a Tree to Solve Bayes' Formula

This example problem is adapted from a problem in Gigrenzer & Hoffrage's How to Improve Bayesian Reasoning Without Instruction: Frequency Formats

Out of 1,000 patients, 10 have a rare disease. Eight of those diseased individuals display symptoms. Out of the 990 healthy individuals, 95 display symptoms. What is the probability a patient with symptoms actually has the disease?

Here's the traditional textbook method, using the Bayesian algorithm.

If you're good with numbers, you may be able to immediately see that the answer this question with a simple ratio: number of diseased people with symptoms / total number of people with symptoms. 

Now let's construct the same answer with a probability tree:

From there, the math is a simple ratio:

Number of people with disease and symptoms (8) / Total number with symptoms (8 + 95)

which gives us:

8 / 103 = 0.078.

Let's try another example (borrowed from Bayes' Theorem Problems):

You want to know a patient’s probability of having liver disease if they are an alcoholic. 10% of patients at a certain clinic have liver disease. Five percent of the clinic’s patients are alcoholics. Out of those patients diagnosed with liver disease, 7% are alcoholics.

Like the first problem, the first branch here is also "disease", but the second branch needs to address "alcoholism" instead of "symptoms". We're not told "how many" patients, so I'll use 1000--which is usually a sufficient number for problems like this. You're also not told explicitly the number of alcoholics (or % of non-liver disease alcoholics), but you can use a little logical deduction:

Out of 1000, patients, 5% (50 total) are alcoholic,

7% of patients with liver disease are alcoholic. That gives you 7 (green box), leaving 43 for the orange box.

Now all we have to do is figure out the ratio:

Number of people with disease and alcoholism (7) / Total number with alcoholism (50)

which gives us:

7 / 50 = 0.14

Which is exactly the same answer you would get by actually working the formula. In fact, I've never come across a Bayes' related problem that can't be answered with a probability tree and a little logical reasoning. So if the formula is giving you headaches, just do what I did--and ditch it in favor of a more intuitive approach.

References

Gigrenzer, G. & Hoffrage, U.  How to Improve Bayesian Reasoning Without Instruction: Frequency Formats. Psychological Review, 102 (4), 1995, 684–704. www.apa.org/journals/rev/

Gould, S. J. (1992). Bully for brontosaurus: Further reflections in natural history. New York: Penguin Books.

Bayes' Theorem



from Featured Blog Posts - Data Science Central https://ift.tt/35DGBOX
via Gabe's MusingsGabe's Musings

Saturday, September 12, 2020

4 Steps to Building a Video Search System

Image for post

As its name suggests, searching for videos by image is the process of retrieving from the repository videos containing similar frames to the input image. One of the key steps is to turn videos into embeddings, which is to say, extract the key frames and convert their features to vectors. Now, some curious readers might wonder what the difference is between searching for video by image and searching for an image by image? In fact, searching for the key frames in videos is equivalent to searching for an image by image.

You can refer to our previous article Milvus x VGG: Building a Content-based Image Retrieval System if interested.

1. System overview

The following diagram illustrates the typical workflow of such a video search system.

Image for post

When importing videos, we use the OpenCV library to cut each video into frames, extract vectors of the key frames using image feature extraction model VGG, and then insert the extracted vectors (embeddings) into Milvus. We use Minio for storing the original videos and Redis for storing correlations between videos and vectors.

When searching for videos, we use the same VGG model to convert the input image into a feature vector and insert it into Milvus to find vectors with the most similarity. Then, the system retrieves the corresponding videos from Minio on its interface according to the correlations in Redis.

2. Data preparation

In this article, we use about 100,000 GIF files from Tumblr as a sample dataset in building an end-to-end solution for searching for video. You can use your own video repositories.

3. Deployment

The code for building the video retrieval system in this article is on GitHub.

Step 1: Build Docker images.

The video retrieval system requires Milvus v0.7.1 docker, Redis docker, Minio docker, the front-end interface docker, and the back-end API docker. You need to build the front-end interface docker and the back-end API docker by yourself, while you can pull the other three dockers directly from Docker Hub.

# Get the video search code
$ git clone -b 0.10.0 https://github.com/JackLCL/search-video-demo.git

# Build front-end interface docker and api docker images
$ cd search-video-demo & make all

Step 2: Configure the environment.

Here we use docker-compose.yml to manage the above-mentioned five containers. See the following table for the configuration of docker-compose.yml:

Image for post

The IP address 192.168.1.38 in the table above is the server address especially for building the video retrieval system in this article. You need to update it to your server address.

You need to manually create storage directories for Milvus, Redis, and Minio, and then add the corresponding paths in docker-compose.yml. In this example, we created the following directories:

/mnt/redis/data /mnt/minio/data /mnt/milvus/db

You can configure Milvus, Redis, and Minio in docker-compose.yml as follows:

Image for post

Step 3: Start the system.

Use the modified docker-compose.yml to start up the five docker containers to be used in the video retrieval system:

$ docker-compose up -d

Then, you can run docker-compose ps to check whether the five docker containers have started up properly. The following screenshot shows a typical interface after a successful startup.

Image for post

Now, you have successfully built a video search system, though the database has no videos.

Step 4: Import videos.

In the deploy directory of the system repository, lies import_data.py, script for importing videos. You only need to update the path to the video files and the importing interval to run the script.

Image for post

data_path: The path to the videos to import.

time.sleep(0.5): The interval at which the system imports videos. The server that we use to build the video search system has 96 CPU cores. Therefore, it is recommended to set the interval to 0.5 second. Set the interval to a greater value if your server has fewer CPU cores. Otherwise, the importing process will put a burden on the CPU, and create zombie processes.

Run import_data.py to import videos.

$ cd deploy
$ python3 import_data.py

Once the videos are imported, you are all set with your own video search system!

4. Interface display

Open your browser and enter 192.168.1.38:8001 to see the interface of the video search system as shown below.

Image for post

Toggle the gear switch in the top right to view all videos in the repository.

Click on the upload box on the top left to input a target image. As shown below, the system returns videos containing the most similar frames.

Next, have fun with our video search system!

5. Build your own

In this article, we used Milvus to build a system for searching for videos by images. This exemplifies the application of Milvus in unstructured data processing.

Milvus is compatible with multiple deep learning frameworks, and it makes possible searches in milliseconds for vectors at the scale of billions. Feel free to take Milvus with you to more AI scenarios: https://github.com/milvus-io/milvus.



from Featured Blog Posts - Data Science Central https://ift.tt/2ZAyNd0
via Gabe's MusingsGabe's Musings

5 Challenges To Be Prepared For Before Scaling Machine Learning Models

Machine Learning (ML) models are designed for defined business goals. ML model productionizing refers to hosting, scaling, and running an ML Model on top of relevant datasets. ML models in production also need to be resilient and flexible for future changes and feedback. A recent study by Forrester states that improving customer experience, improving profitability & revenue growth as the key goals organizations plan to achieve specifically using ML initiatives.

Though gaining worldwide acclaim, ML models are hard to be translated into active business gains. A plethora of engineering, data, and business concerns become bottlenecks while handling live data and putting ML models into production. As per our poll, 43% of people said they get roadblocked in ML model production and integration. It is important to ensure that ML models deliver their end objectives as intended by businesses as their adoption across organizations globally is increasing at an unprecedented rate, thanks to robust and inexpensive open-source infrastructure. Gartner predicts that over 40% of the world’s leading organizations plan to actually deploy AI solutions by the end of 2020. In order to understand the common pitfalls in productionizing ML models, let’s dive into the top 5 challenges that organizations face.

1. Complexities with Data

One would need about a million relevant records to train an ML model on top of the data. And it cannot be just any data. Data feasibility and predictability risks jump into the picture. Assessing if we have relevant data sets and do we get them fast enough to do predictions on top isn’t straightforward. Getting contextual data is also a problem. In one of Sigmoid’s ML scaling with Yum Brands, some of the company’s products like KFC (with a new royalty program) didn’t have enough customer data. Having data isn’t enough either. Most ML teams start with a non-data-lake approach and train ML models on top of their traditional data warehouses. With traditional data systems, data scientists often spend 80% of their time in cleaning and managing data rather than training models. A strong governance system and data cataloging are also required so that data is shared transparently and gets cataloged well to be leveraged again. Due to the data complexity, the cost of maintaining and running an ML model relative to the return diminishes over time.

2. Engineering and Deployment

Once the data is available, the infrastructure and technical stacks have to be finalized as per the use case and future resilience. ML systems can be quite difficult to engineer. A wide breadth of technology is available in the machine learning space. Standardizing different technology stacks in different areas while choosing each one such that it wouldn’t make productionizing harder is crucial for the model’s success. For instance, Data scientists may use tools like Pandas and code in Python. But these don’t necessarily translate well to a production environment where Spark or Pyspark is more desirable. Improperly engineered technical solutions can cost quite a bit. And then the lifecycle challenges and managing and stabilizing multiple models in production can become unwieldy too.



3. Integration Risks

A scalable production environment that is well integrated with different datasets and modeling technologies is crucial for the ML model to be successful. Integrating different teams and operational systems is always challenging. Complicated codebases have to made into well-structured systems ready to be pushed into production. In the absence of a standardized process to take a model to production, the team can get stuck at any stage. Workflow automation is necessary for different teams to integrate into the workflow system and test. If the model isn’t tested at the right stage, the entire ecosystem would have to be fixed at the end. Technology stacks have to be standardized else integration could be a real nightmare. Integration is also a crucial time to make sure that the Machine Learning experimentation framework isn’t a one-time wonder. Else if the business environment changes or during a catastrophic event, the model would cease to provide value.

4. Testing and Model Sustenance

Testing machine learning models is difficult but is as important, if not more, as other steps of the production process. Understanding results, running health checks, monitoring model performance, watching out for data anomalies, and retraining the model together close the entire productionizing cycle. Even after running the tests, a proper machine learning lifecycle management tool might be needed to watch out for issues that are invisible in tests.


5. Assigning Roles and Communication

Maintaining transparent communication across data science, data engineering, DevOps, and other relevant teams is pivotal to ML models’ success. But assigning roles, giving detailed access, and monitoring for every team is complex. Strong collaboration and an overdose of communication are essential to identify risk across different areas at an early stage. Keeping data scientists deeply involved can also decide the future of the ML model.

In addition to the above challenges, unforeseen events such as the COVID-19 have to be watched out for. When the customer’s buying behaviors suddenly change, the solutions from the past cease to apply and the absence of new data to adequately train models becomes a roadblock. Scaling ML models isn’t easy. Watch out for our next piece on the best practices to productionize ML models at scale.

Watch the full presentation here



from Featured Blog Posts - Data Science Central https://ift.tt/2RpfLBy
via Gabe's MusingsGabe's Musings

6 Most Important Data Science Skills

Data science is a collective pool of various algorithms, tools, machine learning principles that work in unison to extract hidden patterns from raw data. It requires a diverse set of skills and demands knowledge from aspects of mathematics, science, communication, and business. Honing a diverse skill set, data scientists gain the ability to analyze numbers and influence decisions.

The core objective of data scientists lay in bridging the gap between numbers and actions by using information to affect real world decisions. This demands excellent communication skills along with understanding the difference between data science and big data analysis and recommendations to businesses.

DATA VISUALIZATION

Probably a major responsibility of a data scientist is to make data as presentable as possible for users to get better insights of raw data and to derive the desired information out of it. Visualizations are important in the first place because they guide the thought process of people viewing it for further analysis. They are used to create impactful data stories that communicate an entire set of information in a systematic format so that the audiences are able to extract meaning out of it and detect problem areas in order to propose solutions.

Without data visualization tools, it would be practically impossible to implement change or cater to the desired problems. Today, there are many data visualization tools to select from. In most of the programming languages, you’ll find libraries that enable visualization of data. In JavaScript, data can be visualized using the D3.js visualization library, Python uses Matplotlib and pandas while R offers many data visualization tools including ggplot2.

Tableau is the most trending, high-level platform that offers amazing data visualization options extracting data from many different sources.

DATA WRANGLING

Often the data comes from a variety of sources and needs remodelling to be able to derive informational insights. It is important to make the data free from imperfections such as inconsistent formatting, missing values etc. Data wrangling allows you bring the data on a uniform level that can be further processed easily. Obviously, for a data scientist to use data to their best, it is important to possess the knowledge of organizing clean data from the unmanageable raw data.

  • PROGRAMMING LANGUAGES & SOFTWARE

Data scientists deal with raw data that comes from a variety of sources and in different formats. Such data is filled with misspellings, duplications, misinformation and incorrect formats that can mislead your results. To correctly present the data, it is important to extract the data, clean it, analyze and visualize it. Below are six broadly used tools that are recommended strongly for data scientists:

  1. R: R is a programming language that is widely used for data visualization, statistical analysis and predictive modelling. It has been around since many years and has been contributing largely to data analysts with its huge network (CRAN) that provides a complete package to allow analysts to perform various data-related tasks.
  2. Python: Python initially was not looked upon as a data analytics tool. The pandas python library enables vectorized processing operations and efficient data storage. This high-level programming language is fast, user-friendly, easy to learn and powerful. It has been used for general programming purposes for long now and therefore allows easy merger of general-purpose code and Python data processing.
  3. Tableau: Lately emerged as an amazing data visualization tool, Tableau, a Seattle-based software company offers an exclusive suite of high-end products that surpass the science resources such as R and Python. Although Tableau lacks the ultimate efficiency in reshaping and cleaning data and doesn’t provide options for procedural computations or offline algorithms, it is increasingly becoming a popular tool for data analysis and visualizations due to its highly interactive interface and efficiency in creating beautiful, dynamic dashboards.
  4. SQL: Structured Query Language (SQL) is a special purpose programming language that allows for extracting and curing data that is held in relational database management systems. SQL allows users to write queries, insert data, update, modify and delete data. Though all of these can also be done using R and Python, writing an SQL code derives more efficient output and provides reproducible scripts.
  5. Hadoop: Hadoop, an open source software framework fosters distributed processing of large amounts of data sets using simple algorithms from large clusters of computers. Hadoop is largely used in industries due to its immense computing power, fault tolerance, flexibility and scalability. It enables programming models such as MapReduce that enables processing of vast amounts of data.

STATISTICS

Though there are many automated statistical tests embedded within software, a data scientist needs to possess a rational statistical sensibility to apply the most relevant test for performing result-oriented interpretations. A solid knowledge of linear algebra and multivariable calculus assist data scientists in building analysis routines as needed.

Data scientists are expected to understand linear regression, exponential and logarithmic relationships while also knowing how to use complex techniques such as neural networks. Most of the statistical functions are done by computers in minutes, however, understanding the basics is essential in order to extract the full potential. A major task of data scientists lay in deriving the desired output from computers and this can be done by posing right questions and learning how to make computers answer them. Computer science is backed in many ways by mathematics and therefore data scientists need to have a clear understanding of mathematical functions to be able to efficiently write codes to make computers do their job perfectly.

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

AI is the most trending topics today. It empowers machines by providing intelligence in the real sense to minimize manual intervention to extreme levels. Machine learning works on algorithms that are automated to obtain rules and analyse data and is largely used in search engine optimizations, data mining, medical diagnosis, market analysis and many other areas. Understanding the concepts of AI & Machine learning for beginners play a vital role in learning industry needs and therefore are at the forefront of data science skills that a data scientist must possess.

MICROSOFT EXCEL

Even before any of the modern data analysis tools existed, MS-Excel had been there. It is probably the oldest and most popular data tools.

Although now there are multiple options to replace MS-Excel, it has been proven that Excel offers some really surprising benefits over others. It allows you to name & create ranges, sort/filter/manage data, create pivot charts, clean data and look up for certain data among millions of records. So, even though you might feel that MS-Excel is outdated, let me tell you it is absolutely not. Non-technical people still prefer using Excel as their only source of storing and managing data. It is an important pre-requisite for data scientists to have an in-depth understanding of Microsoft Excel to be able to connect to the data source and efficiently pick data in the desired format.



from Featured Blog Posts - Data Science Central https://ift.tt/32mPMRQ
via Gabe's MusingsGabe's Musings

Studying Risk-based Algomorphology Using Thunderbird Charts

I recently wrote about a stock trading approach that I call skipjack: it allows the user to trade directly from special charts by exploiting skipjack waveforms and formations.  I mentioned that I use an application - a simulation environment - to trade from these charts.  Below I present a screenshot of SimTactics, the application that I created to support this simulated trading.  To the right is a notepad containing four trades generated by an autopilot feature of the SimTactics.  Over the time period, the Nasdaq Composite increased 41.72 percent.  SimTactics squeezed out 61.44 percent before closing out its position at the end of the data file.  SimTactics makes use of a risk-based index constructed from the same waveforms that support skipjack stock trading.

The main problem with manual stock trading is that it requires a lot of effort.  Because a human is involved, the results might be inconsistent.  Also, markets move quickly - perhaps faster than the skipjack trading model can handle well.  I have shown however how manual trading can sometimes deliver relatively high returns over short time periods.  It is a perspective worth maintaining.  But in this blog, I will be introducing the use of Thunderbird Charts to study algomorphology.  This technique supports a different type of trading.

SimTactics uses combinations of buy-and-sell percentages to respond a "trigger package."  This package is made up of the Levi-Tate group of technical metrics that I designed for skipjack.  For example, a snap buy-and-sell combination of 23/24 means that SimTactics buys at a risk level of 23 percent or below and sells at 24 percent or above.  Now, if a matrix is created to study many possible combinations, the result is a Thunderbird Chart as shown below.  This chart contains data in the year prior to the stock market crash of October 1987 for the Dow Jones Industrial Average.

Any combination in the deep purple area would have resulted in a 30 to 40 percent return.  But wait, how about the crash?  Indeed, let use step back and consider the period before, during, and after the crash.  Suddenly, the returns are much lower.  This only tells us that the best place to be during a crash is not in the stock market.  However, it is no simple matter simply sitting on large amounts of cash particularly for portfolio managers.  Even for retail investors, it is unclear how long precisely to avoid the market.  Avoidance also represents loss of opportunity.  The light green area below experienced returns from 20 to 40 percent - significantly above the market return of 5.44 percent during the same period.

There is some level of overlap in terms of optimal positioning.  The chart below shows that best-performing combinations occupied a thin band on the Thunderbird Chart.  It is a curious sweet spot well worth further investigation.

If it is possible to buy a market-indexed ETF rather than individual stock, personally I would avoid the uncertainty posed by selection.  However, everyone is different.  The stock below increased about 11.79 percent.  The chance of beating market is reasonably good within the deep purple area - keeping in mind that I can refer to the spreadsheet grid.  Then there is quite literally a sweet spot in an unorthodox location that I myself would never deliberately choose.  Why?  Well, optimum - or maybe I should call it optimus after the transformer - is in motion.  The spot will likely move in the future.  At the same time, the whole idea of studying algomorphology to ascertain what underlying phenomena bring about the quantitative outcomes

So far given these charts, I suspect readers have a sense of "normal" versus "abnormal" tactical placement.  But I will be studying the point in greater detail - examining different investments under changing market conditions and cycles.  I know that this is an interesting way to think about investments and algorithms.



from Featured Blog Posts - Data Science Central https://ift.tt/2FqGzih
via Gabe's MusingsGabe's Musings

What is the connection between AI, Cloud-Native and Edge devices?

 

I was asked this question: What is the connection between AI, Cloud-Native and Edge devices?

  

On first impressions, it sounds like an amalgamation of every conceivable buzzword around - but I think there is a coherent answer which points to a business need. 

 

Let us start with the term ‘Cloud Native.’

 

Cloud-native computing is an approach in software development that utilizes cloud computing technologies such as

  • Containers
  • Microservices
  • Continuous Delivery
  • DevOps

 

Using Cloud Native technologies, we can create loosely coupled systems that are scalable and resilient.

 

In practice, this means

a) The system is built as a set a set of microservices that run in Docker containers

b) The containers may be orchestrated via Kubernetes

c) The deployment is managed through docker containers through a CI/CD process

 

In itself, this approach is valuable and follows a stack that is rapidly emerging at the Enterprise level. 

 

But how does it tie to Edge devices?

  1. Docker allows you to create a single packaged deployment through a container, which creates a virtualized environment at the target device. AI models are trained in the cloud and deployed on edge devices. The docker/ cloud-native format enables you to run AI in containers across various environments, including at the Edge. The container-based architecture is especially relevant for AI on edge devices because of the diversity of devices.
  2. Secondly, AI models need to be refreshed and deployed frequently – including on edge devices. For this reason, also, the cloud-native and container architecture helps.

 

Welcome thoughts and comments

Image source: Cloud Native Definition

 



from Featured Blog Posts - Data Science Central https://ift.tt/3isVCqh
via Gabe's MusingsGabe's Musings

Thursday, September 3, 2020

Free online book - Machine Learning from Scratch

Hi all,

I'm writing to share a book I just published that I think many of you might find interesting or useful. 

The book is called "Machine Learning from Scratch." It provides complete derivations of the most common algorithms in ML (OLS, logistic regression, naive Bayes, trees, boosting, neural nets, etc.) both in theory and math. It also demonstrates constructions of each of these methods from scratch in Python using only numpy.

My aim with the book is to provide a very thorough rundown of the fitting process behind the algorithms we see every day. I hope that seeing the models derived in math or constructed in code helps readers understand the models at a deeper level and feel more comfortable optimizing them for their own work.

Any comments or questions would be very much appreciated either on this post, on the book's github, or to me directly at dafrdman@gmail.com. 

The book is available here

What this Book Covers

This book covers the building blocks of the most common methods in machine learning. This set of methods is like a toolbox for machine learning engineers. Those entering the field of machine learning should feel comfortable with this toolbox so they have the right tool for a variety of tasks. Each chapter in this book corresponds to a single machine learning method or group of methods. In other words, each chapter focuses on a single tool within the ML toolbox.

In my experience, the best way to become comfortable with these methods is to see them derived from scratch, both in theory and in code. The purpose of this book is to provide those derivations. Each chapter is broken into three sections. The concept sections introduce the methods conceptually and derive their results mathematically. The construction sections show how to construct the methods from scratch using Python. The implementation sections demonstrate how to apply the methods using packages in Python like scikit-learn, statsmodels, and tensorflow.

Why this Book

There are many great books on machine learning written by more knowledgeable authors and covering a broader range of topics. In particular, I would suggest An Introduction to Statistical Learning, Elements of Statistical Learning, and Pattern Recognition and Machine Learning, all of which are available online for free.

While those books provide a conceptual overview of machine learning and the theory behind its methods, this book focuses on the bare bones of machine learning algorithms. Its main purpose is to provide readers with the ability to construct these algorithms independently. Continuing the toolbox analogy, this book is intended as a user guide: it is not designed to teach users broad practices of the field but rather how each tool works at a micro level.

Who this Book is for

This book is for readers looking to learn new machine learning algorithms or understand algorithms at a deeper level. Specifically, it is intended for readers interested in seeing machine learning algorithms derived from start to finish. Seeing these derivations might help a reader previously unfamiliar with common algorithms understand how they work intuitively. Or, seeing these derivations might help a reader experienced in modeling understand how different algorithms create the models they do and the advantages and disadvantages of each one.

This book will be most helpful for those with practice in basic modeling. It does not review best practices—such as feature engineering or balancing response variables—or discuss in depth when certain models are more appropriate than others. Instead, it focuses on the elements of those models.

What Readers Should Know

The concept sections of this book primarily require knowledge of calculus, though some require an understanding of probability (think maximum likelihood and Bayes’ Rule) and basic linear algebra (think matrix operations and dot products). The appendix reviews the math and probability needed to understand this book. The concept sections also reference a few common machine learning methods, which are introduced in the appendix as well. The concept sections do not require any knowledge of programming.

The construction and code sections of this book use some basic Python. The construction sections require understanding of the corresponding content sections and familiarity creating functions and classes in Python. The code sections require neither.



from Featured Blog Posts - Data Science Central https://ift.tt/3lQAInh
via Gabe's MusingsGabe's Musings

Sunday, August 30, 2020

Dynamism of IoT-powered Gas Monitoring Solution

IoT is a stay-ahead technology that is providing an effective paradigm shift in multiple industries, including the oil and gas sector. With the help of real-time monitoring and advanced analytics features, this disruptive technology is allowing the O&G companies to effectively monitor their assets ranging from pipeline networks to machines and pumping equipment.

Currently, technology investment in the gas sector completely depends on mobility, asset management, cloud adoption, and analytics. It is further targeted to reduced infrastructure costs which are greatly justified with the usage of IoT. The Internet of Things is more focused on generating data efficiencies and more informed decision-making.

Along with improving operational and asset efficiency, IoT holds a great potential to increase output in some cases. For instance, Intel has claimed that IoT-powered infrastructure and data analytics procedures used in gas wells can provide improved results by 30%. Also, McKinsey estimated that offshore platforms are performing at 77% of maximum production levels.

Hence, if applied accurately, IoT and advanced analytics tools allow the gas companies to increase their potential in collecting massive data and gain valuable insights for the future. Moreover, it can generate a greater ROI of up to 30 to 50 times the actual investment in a very short deployment period.

The Need for An IoT-powered Gas Monitoring Solution

A large variety of applications and processes in the gas industry use highly dangerous flammable and toxic gases. The inevitable occasional escape of such gases creates a hazardous environment for each worker and even the nearby residents. This results in devastating incidents involving asphyxiation, unwanted explosions, and life losses.

In most industries, one of the safety plans to avoid such situations is to implement a gas monitoring solution as an early warning system where it is easier and safer to pre-detect the toxicity in the atmosphere.

It Serves Multiple Areas

Apart from huge oil and gas refineries and plants, a gas monitoring solution can also be used in commercial areas like parking lots, laboratories, sewage treatment, hospitals, and swimming pools. Such areas are prone to high-risk due to the vigorous machineries and plant operations. A slight gas leak could lead to major havoc and cost human lives when not taken the necessary action. Let's find out how this system works.

Workability

  • Real-time Monitoring
  • End-to-end Solution
  • Control Visualization
  • Immediate Alerts

These are the key factors on which a gas monitoring solution relies. The sensors installed on the assets allow you to sense the presence of gas concentrations in the atmosphere at all times and from any location. This is a completely tailored solution that is enabled with all the necessary hardware and software capabilities to simplify your purchases. Moreover, it is equipped with advanced analytics to optimize and keep control of industrial operations. Along with full control, this solution enables remote monitoring to help identify the presence of toxic gases in remotely located infrastructures. After detecting the presence of gases, it immediately alarms the authorities to stay prepared with necessary measures.

A gas monitoring solution comprises of the latest sensor devices that automatically fetch data from the assets and allow the authorities to predict necessary results for improved facility maintenance. A centralized dashboard is a versatile management desk for the managers to not only supervise environmental conditions but also perform administrative and managerial tasks for the plant facility.

Here are some of the benefits that you can avail by using a gas monitoring solution at your premises.

Benefits of Implementing IoT in the Gas Industry

Remote Monitoring

                How convenient is it to be able to monitor the industrial facility from anywhere and at any time? This is what IoT provides to the industries. It allows effective remote monitoring of the industrial equipment or assets through sensor devices. In remotely located oil and gas infrastructure, toxic gases are produced in a high ratio. The managers use remote monitoring to find out the presence of these gases in those infrastructures and fetch real-time data to take immediate actions in case there are hints of any mishap occurring. Though it is riskier for the plant authorities to manually go and supervise the infrastructure, they use remote monitoring through IoT as a significant alternative to avoid the chances of gas explosions. This, in turn, also saves a huge amount of money spent in constructing the entire framework for oil and gas production.

Predictive/Preventive Maintenance

                Many O&G companies require real-time monitoring of the machinery to keep a regular check on their condition and performance. Also, detecting harmful and toxic gases within the industrial facility is of major concern to avoid disasters. This is significantly possible with the help of an IoT-powered gas monitoring solution that offers remote services to the authorities. This allows the facilities to react immediately via predictive maintenance. The solution utilizes advanced capabilities of the sensors which are installed on the industrial equipment to identify the presence of harmful gases. With the help of sensors, a quick alert is generated that helps the workers and any other authorities evacuate the premises in case of any disaster. Also, these sensors send valuable data on a cloud-based platform for the authorities to predict future scenarios and avoid chaotic situations. Using predictive maintenance, the managers can analyze different scenarios and bring forth necessary outcomes to work upon. 

Asset Management

                An IoT-powered gas monitoring solution leverages the use of analytics and wireless connectivity to offer an improved and consolidated asset management process. It is an end-to-end solution that provides remote monitoring of the industrial assets to avoid explosive disasters within the facilities. Apart from smart asset management it also provides useful insights to keep the equipment up-to-date. Moreover, a gas monitoring solution helps the plant authorities to operate the assets with the help of predictive maintenance, making the automated workflows more intelligent. It helps provide real-time visibility of the assets and their performance history for better analysis of their working conditions.

Implementing an IoT-powered solution to detect harmful gases is the most productive asset to your industry. Hence, IoT is giving us major goals to implement technology for our safety purposes. Its smart techniques and innovative concepts not only help you live in safer surroundings but also provide automation to resolve task complexities. Who would have known that it’ll be possible to measure accurate gas concentrations in the air even in huge refineries where the risk rate is 99.99%? But IoT is significantly making it possible and enabling smart plant management with reduced life risk.



from Featured Blog Posts - Data Science Central https://ift.tt/3b78Whi
via Gabe's MusingsGabe's Musings

Skipjack Stock Trading Environment

While the skipjack tuna is a reasonably attractive fish, it is perhaps less so if its colours are inverted as in the case below.  Later in the blog I will be introducing the "Destroying Angel Formation."  This can be found during a full-spectrum inversal of the skipjack waveforms, which might occur before a major stock-market correction or crash.

Skipjack as a form of technical analysis is primarily about math and geometry.  It is not a psychological approach.  Unlike studying Japanese candlesticks, I make little or no effort to guess or assess the state-of-mind of investors.  I would say that it is meant to exploit market tendency - this being the predisposition to form skipjacks.  In terms of making this determination that a market has tended and will tend to do anything, it is necessary to study its history and then to some extent assume that it will persist.  This represents both the strength and weakness of skipjack because of course it is impossible to be completely certain. 

I introduced the skipjack in my previous blog.  Since there seems to be some interest in how the trades are made and the rationale behind the trading, I am providing a bit of an elaboration here.  I have yet to determine the full extent and pace to which I will be sharing information about the methodology.

The underlying skipjack shape or object is the skipjack waveform as presented below.  Within this waveform, the value of a stock might increase perhaps 6 percent over several days or weeks.  I know this is not saying much these days given that stocks sometimes increase more than 10 percent on a single day!  Well, if I ever develop a model that can predict a 10 percent jump in a single day, that would certainly be an accomplishment.  So no, using the skipjack is not nearly as lucrative as simply being at the right place and time - e.g. to catch a sudden spike.

Generally speaking in order for a skipjack to work, its "action line" (on the chart) has to enjoy fluctuating near the water line or neutral.  In some markets - for example during a euphoric, non-correcting rise - the action does not do this, and it is necessary to make use of a slightly different technique.  If an investor is game, a buy-and-hold strategy might be more efficient.  This being said, if an investor holds precisely during this time, there is an elevated chance of holding during a major market correction.

For a period of time as the action line forms the waveform shown above - again, assuming it completes the form - the "effect line" begins to become highly correlated with the price of the stock.  Since the effect line tends to have an entry point near the base (the paa) and an exit near the head (the ulo), the hope here is that the investor will hold the stock during the rise.  This is the general concept.  However, I consider the exact details important - e.g. the exact entry and exit points.

I took out my ruler and physically measured the distance between the hypotenuses (in mm) and the change in value (in dollars) of the highest and lowest points of the effect line; this results in the gradient shown below.  The correlation during the rise during this particular waveform by the way was more than 0.99.  I know readers might be thinking, an investor practically needs a drafting table to trade this way.  Well, I suppose software can do a fair amount of the work especially if it is designed to use the "triggers" for the algorithm.  But I prefer the lived experience of using my hands to handle a ruler, protractor, and mechanical pencils - and then gazing out at the graphical outcomes of the data while sipping coffee.

Below in blue just using a quick visual inspection, I point out some skipjacks on the Nasdaq Composite.  The interesting thing about this sample data is that it contains a horrific stock market crash - the first I personally encountered - Black Monday or the Crash of October 1987.  I bet the crash sticks out even using this waveform.  See it?

Like other forms of technical analysis, skipjack makes use of formations.  Understandably since the trading methodology was inspired by my fear of stock-market crashes, the first formation called a Full Spectrum Inversal or the Destroying Angel is of a stock market crash.  There is no healthy-looking skipjack anywhere near the Destroying Angel below.  There is a rather sickly-looking skipjack at the far left, which if played likely would have caused the trader to exit at around October 6 - the "crash" being often associated with October 19.  Between October 6 and 19, I personally would have ran for the hills given the technical developments.  However, as I mentioned earlier, the skipjack was so off-formation, I likely would not have have played it at all.

I hope this blog has explained the general trading technique using skipjacks reasonably well.  As I take this opportunity to map out the scenery and develop different formations, I admit that nothing is nearly as enjoyable as having all the time in the world and not trying to publish a book or sell investment products.  For me for the most part, this is research.  Of course, having some protection against a major correction or stock-market crash doesn't hurt.



from Featured Blog Posts - Data Science Central https://ift.tt/34Pi74K
via Gabe's MusingsGabe's Musings

Saturday, August 22, 2020

It's tempting to think that GP3 will solve all NLP problems but it does not

In my previous blog what is driving the innovation in nlp and gpt3 , I talked about how GPT3 has evolved from the basic transformer architecture.

Based on that blog, a start-up approached me saying that they had an idea which they felt could only be implemented by GPT3.

They were eagerly waiting to be approved (isn’t everybody - he he!)

Apart from waiting for GPT3- there was another critical flaw in their argument

Their idea was not generative i.e. it did not need GPT3 in the first place (or for that matter any similar architecture)

It’s tempting to think that GPT-3 will solve all the NLP problems .. but it does not

let me explain by this what I mean by this

Below is the basic flow of NLP services and a listing of NLP applications

NLP services include:

  • Text Summarization
  • Text Generation
  • Chatbots
  • Machine Translation
  • Text to Speech
  • Text Classification
  • Sentence Similarity
  • Finding similar sentences

 

Image source – Dr Amita Kapoor

While many of these are generative- not all of them are.

The GPT3 and transformer-based applications basically address the generative elements of NLP

That still leaves a large number of other applications which use NLP but are not generative (for example Text classification or Text summarization).

 

You can also look at the same situation from the perspective of word embeddings. Word embeddings are a type of word representation that allows words with similar meaning to have a similar representation.

Historically, word2vec and GloVe have worked well for word embeddings but these were shallow approaches. Transformers solve this problem by providing a functionality similar to what we see in transfer learning for CNNs (thereby not all layers need to be trained if you use a pre-built model)

 

To conclude

Hence, we can say that GPT3 is very interesting and will continue to be so.

However, there will be always a subset of NLP applications which will not be covered by any of the transformer-based approaches because they are not generative.

 



from Featured Blog Posts - Data Science Central https://ift.tt/2QjO393
via Gabe's MusingsGabe's Musings

Alternative to the Arithmetic, Geometric, and Harmonic Means

Given n observations x1, ..., xn, the generalized mean (also called power mean) is defined as 

The case p = 1 corresponds to the traditional arithmetic mean, while p = 0 yields the geometric mean, and p = -1 yields the harmonic mean. See here for details. This metric is favored by statisticians. It is a particular case of the quasi-arithmetic mean

Here I introduce another kind of mean called exponential mean, also based on a parameter p, that may have an appeal to data scientists and machine learning professionals. It is also a special case of the quasi-arithmetic mean. Though the concept is basic, there is very little if any literature about it. It is related to the LogSumExp and the Log semiring. It is defined as follows:

Here the logarithm is in base p, with p positive. When p tends to 0, mp is the minimum of the observations. When p tends to 1, it yields the classic arithmetic mean, and as p tends to infinity, it yields the maximum of the observations. 

I tested both means (exponential and power means) for various values of p ranging between 0 and 2. See above chart, where the X-axis represents the parameter p, and the Y-axis represents the mean. The test data set consists of 10 numbers randomly chosen between 0 and 1, with an average value of 0.53. Note that if p = 1, then mp = Mp = 0.53 is the standard arithmetic mean. 

The blue curve in the above chart is very well approximated by a logarithm function, except when p is very close to zero or p is extremely large. The red curve is well approximated by a second-degree polynomial. Convergence to the maximum of the observations (equal to 0.89 here), as p tends to infinity, occurs much faster with the power mean than with the exponential mean. Note that the minimum is 0.07, and the exponential mean will start approaching that value only when p is extremely close to zero.

Finally, the central limit theorem applies both to the power and exponential means, when the number n of observations becomes larger and larger.



from Featured Blog Posts - Data Science Central https://ift.tt/3gpOwky
via Gabe's MusingsGabe's Musings

Tuesday, August 18, 2020

What is s driving the innovation in NLP and GPT-3?

2019 and 2020 have seen rapid strides in NLP

What’s driving the rapid strides in NLP and will this trend continue?

Here is a simple way to explain the rise and rise of NLP

Today, GPT-3 is displaying some amazing results. Some call it more like AGI (Artificial General Intelligence). Created by OpenAI with a large investment from Microsoft, GPT stands for Generative Pretrained Transformer

The three words offer a clue to the success and future trajectory of NLP

  • Let’s start with ‘Transformer’. Introduced in 2017, the Transformer is a deep learning model designed for NLP. Like recurrent neural networks (RNNs), Transformers handle sequential data. However, unlike RNNs, due to the attention mechanism, Transformers do not require that the data be processed in a sequential manner. This allows for much more parallelization in Transformers (in comparison to RNNs).  In turn, parallelization during training allows for training on larger datasets. 
  • This in turn has led to the second benefit of transformers i.e. the possibility of pre-trained models. This is similar to Transfer learning in CNNs and it allows you to build  more complex models on top of existing models. The earliest example of this is BERT (Bidirectional Encoder Representations from Transformers). BERT itself led to other models trained in specific domains such as BioBERT: a pre-trained biomedical language representation model for biomedical text mining
  • Finally, the model is Generative. GPT-3 is the best example of this. GPT-3 is a transformer based model trained on 45TB of text data and with 175 billion parameters . The generative ability of GPt-3 is magical – with everything from SQL queries to basic UI.

Conclusions

The Transformer mechanism is the main innovation driving NLP. Transformers enable new models to be built on the foundations of other models (like Transfer learning does for CNNs). As the ability to train on larger corpus grows, transformer based models like GPT will be more ‘magical’.  

With contributions from Vineet Jaiswal

Image source: OpenAI   



from Featured Blog Posts - Data Science Central https://ift.tt/3iUxb4O
via Gabe's MusingsGabe's Musings

Monday, August 17, 2020

Summarizing Most Popular Text-to-Image Synthesis methods with Python

Comparative Study of Different Adversarial Text to Image Methods

Introduction

Automatic synthesis of realistic images from text has become popular with deep convolutional and recurrent neural network architectures to aid in learning discriminative text feature representations.

Discriminative power and strong generalization properties of attribute representations even though attractive, its a complex process and requires domain-specific knowledge. Over the years the techniques have evolved as auto-adversarial networks in space of machine learning algorithms continue to evolve.

In comparison, natural language offers an easy, general, and flexible plugin that can be used to identify and describing objects across multiple domains by means of visual categories. The best thing is to combine the generality of text descriptions with the discriminative power of attributes.

This blog addresses different text to image synthesis algorithms using GAN (Generative Adversarial Network) that aims to directly map words and characters to image pixels with natural language representation and image synthesis techniques.

The featured algorithms learn a text feature representation that captures the important visual details and then use these features to synthesize a compelling image that a human might mistake for real.

1. Generative Adversarial Text to Image Synthesis

  • This image synthesis mechanism uses deep convolutional and recurrent text encoders to learn a correspondence function with images by conditioning the model conditions on text descriptions instead of class labels.
  • An effective approach that enables text-based image synthesis using a character-level text encoder and class-conditional GAN. The purpose of the GAN is to view (text, image) pairs as joint observations and train the discriminator to judge pairs as real or fake.
  • Equipped with a manifold interpolation regularizer (regularization procedure which encourages interpolated outputs to appear more realistic) for the GAN generator that significantly improves the quality of generated samples.
  • The objective of GAN is to view (text, image) pairs as joint observations and train the discriminator to judge pairs as real or fake.
  • Both the generator network G and the discriminator network D perform has been trained to enable feed-forward learning and inference by conditioning tightly only on textual features.


Source, LICENSE- Apache 2.0

  • Discriminator D, has several layers of stride2 convolution with spatial batch normalization followed by leaky ReLU.
  • The GAN is trained in mini-batches with SGD (Stochastic Gradient Descent).
  • In addition to the real/fake inputs to the discriminator during training, it is also fed with the third type of input consisting of real images with mismatched text, which aids the discriminator to score it as fake.

The below figure illustrates text to image generation samples of different types of birds.


Source — (Open Source Apache 2.0 License)

Library and Usage

git clone https://github.com/zsdonghao/text-to-image.git [TensorFlow 1.0+, TensorLayer 1.4+, NLTK : for tokenizer] python downloads.py [download Oxford-102 flower dataset and caption files(run this first)] python data_loader.py [load data for further processing] python train_txt2im.py [train a text to image model] python utils.py  [helper functions] python models.py [models]

2. Multi-Scale Gradient GAN for Stable Image Synthesis

Multi-Scale Gradient Generative Adversarial Network (MSG-GAN) is responsible for handling instability in gradients passing from the discriminator to the generator that become uninformative, due to a learning imbalance during training. It uses an effective technique that allows the flow of gradients from the discriminator to the generator at multiple scales helping to generate synchronized multi-scale images.

  • The discriminator not only looks at the final output (highest resolution) of the generator but also at the outputs of the intermediate layers as illustrated in the below figure. As a result, the discriminator becomes a function of multiple scale outputs of the generator (by using concatenation operations) and importantly, passes gradients to all the scales simultaneously.


The architecture of MSG-GAN for generating synchronized multi-scale images. Source — (Open Source MIT License)

  • MSG-GAN is robust to changes in the learning rate and has a more consistent increase in image quality when compared to progressive growth (Pro-GAN).
  • MSG-GAN shows the same convergence trait and consistency for all the resolutions and images generated at higher resolution maintain the symmetry of certain features such as the same color for both eyes, or earrings in both ears. Moreover, the training phase allows a better understanding of image properties (e.g., quality and diversity).

Library and Usage

git clone https://github.com/akanimax/BMSG-GAN.git [PyTorch] python train.py --depth=7 \                    --latent_size=512 \                   --images_dir=<path to images> \                   --sample_dir=samples/exp_1 \                   --model_dir=models/exp_1

3. T2F-text-to-face-generation-using-deep-learning (StackGAN++ and ProGAN)

  • In the ProGAN architecture, works on the principle of adding new layers that model increasingly fine details as training progresses. Here both the generator and discriminator start by creating images of low resolution and adds images’ in-depth details in subsequent steps. It helps in a more stable and faster training process.
  • StackGAN architecture consists of multiple generators and discriminators in a tree-like structureThe different branches of the tree represent images of varying scales, all belonging to the same scene. StackGAN has been known for yielding different types of approximate distributions. These multiple related distributions include multi-scale image distributions and joint conditional and unconditional image distributions.
  • T2F uses a combined architecture of ProGAN and StackGANProGAN is known for the synthesis of facial images, while StackGAN is known for text encoding, where conditioning augmentation is the principle working methodology. The textual description is encoded into a summary vector using an LSTM network. The summary vector i.e. Embedding as illustrated in the below diagram is passed through the Conditioning Augmentation block (a single linear layer) to obtain the textual part of the latent vector (uses VAE like parameterization technique) for the GAN as input.
  • The second part of the latent vector is random Gaussian noise. The latent vector yielded is then fed to the generator part of the GAN. The embedding thus formed is finally fed to the final layer of the discriminator for conditional distribution matching. The training of the GAN proceeds layer by layer. Every next layer adds spatial resolutions at an increasing level.
  • The fade-in technique is used to introduce any new layer. This step helps to remember and restore previously learned information.


T2F architecture for generating face from textual descriptions, Source, LICENSE-MIT

The below figure illustrates the mechanism of facial image generation from textual captions for each of them.

Library and Usage


Source — https://github.com/akanimax/T2F.git , LICENSE-MIT

git clone https://github.com/akanimax/T2F.gitpip install -r requirements.txtmkdir training_runsmkdir training_runs/generated_samples training_runs/losses training_runs/saved_modelstrain_network.py --config=configs/11.comf

4. Object-driven Text-to-Image Synthesis via Adversarial Training


AttnGAN Source LICENSE — MIT

  • Object-driven Attentive GAN (Obj-GAN) performs fine-grained text-to-image synthesis. Such in-depth granular image synthesis occurs in two steps. At first, a semantic layout (class labels, bounding boxes, shapes of salient objects) is generated and then the generating images are synthesized by a de-convolutional image generator.
  • However semantic layout generation is accomplished with the sentence being served as input to Obj-GAN. This facilitates the Obj-GAN to generate a sequence of objects specified by their bounding boxes (with class labels) and shapes.
  • The box generator is trained as an attentive seq2seq model to generate a sequence of bounding boxes, followed by a shape generator to predict and generate the shape of each object in its bounding box.
  • In the image generation step, the object-driven attentive generator and object-wise discriminator are designed to enable image generation conditioned on the semantic layout generated in the first step. The generator concentrates on synthesizing the image region within a bounding box by focusing on words that are most relevant to the object in that bounding box.
  • Attention-driven context vectors serve as an important tool encode information from the words that are most relevant to that image region. This is accomplished with the help of both patch-wise and object-wise context vectors for defined image regions.
  • Fast R-CNN based object-wise discriminator is also used. It is able to offer rich object-wise discrimination signals. These signals help to determine whether the synthesized object matches the text description and the pre-generated layout.
  • Object-driven attention (paying attention to most relevant words and pre-generated class labels) performs better than traditional grid attention, capable of generates complex scenes in high quality.

The open-source code for Obj-GAN from Microsoft is not available yet.


Source– (License-OpenSource)

5. MirrorGan

  • MirrorGAN is built to emphasize global-local attentive features. It helps in the semantic-preserving text-to-image-to-text framework.
  • MirrorGAN is equipped to learn text-to-image generation by re-description. It is composed of three modules: “a semantic text embedding module (STEM), a global-local collaborative attentive module for cascaded image generation (GLAM), and a semantic text regeneration and alignment module (STREAM)”.
  • STEM generates word-and sentence-level embeddings using recurrent neural network (RNN) to embed the given text description into local word-level features and global sentence-level features.
  • GLAM has a multi-stage cascaded generator. It is designed by stacking three image generation networks sequentially for generating target images from coarse to fine scales. During target image generation, it leverages both local word attention and global sentence. This helps to progressively enhance the diversity and semantic consistency of the generated images.
  • STREAM purposes to regenerate the text description from the generated image. The image semantically aligns with the given text description.
  • Word-level attention model takes in neighboring contextual high related words. This helps to generate an attentive word-context feature. Word embedding and the visual feature is taken as the input in each stage. The word embedding is first converted into an underlying common semantic space of visual features by a perception layer and multiplied with the visual feature to obtain the attention score. Finally, the attentive word-context feature is obtained by calculating the inner product between the attention score and perception layer along with word embedding.
  • MirrorGAN’s two most important components semantic text regeneration and alignment module maintains overall sync between input text and output image. These two modules help to regenerate the text description from the generated image. The output finally semantically aligns with the given text description. In addition, an encoder decoder-based image caption framework is used to generate captions in the architecture. The encoder is a convolutional neural network (CNN) and the decoder is an RNN.
  • MirrorGAN performs better than AttnGAN at all settings by a large margin, demonstrating the superiority of the proposed text-to-image-to-text framework and the global-local collaborative attentive module since MirrorGAN generated high-quality images with semantics consistent with the input text descriptions.

Library and Usage

git clone git@github.com:komiya-m/MirrorGAN.git [python 3.6.8, keras 2.2.4, tensor-flow 1.12.0] Dependencies : easydict, pandas, tqdm python main_clevr.py cd MirrorGAN python pretrain_STREAM.py python train.py

6. StoryGAN

  • Story visualization takes as input a multi-sentence paragraph and generates at its output sequence of images, one for each sentence.
  • Story visualization task is a sequential conditional generation problem where it jointly considers the current input sentence with the contextual information.
  • Story GAN gives less focus on the continuity in generated images (frames), but more on the global consistency across dynamic scenes and characters.
  • Relies on the Text2Gist component in the Context Encoder, where the Context Encoder dynamically tracks the story flow in addition to providing the image generator with both local and global conditional information.
  • Two-level discriminator and the recurrent structure on the inputs help to enhance the image quality and ensure consistency across the generated images and the story to be visualized.

The below figure illustrates a StoryGAN architecture. The variables represented in gray solid circles serves as an input story S and individual sentences s1, . . . , sT with random noise 1, . . . , T . The generator network is built using specific customized components –Story Encoder, Context Encoder and Image Generator. There are two discriminators on top that actively serve its primary task to discriminate each image sentence pair and each image-sequence-story pair is real or fake.


The framework of StoryGAN, Source– LICENSE-MIT

The Story GAN architecture is capable of distinguishing real/fake stories with the feature vectors of the images/sentences in the story when they are concatenated. The product of image and text features is embedded to have a compact feature representation that serves as an input to a fully connected layer. The fully connected layer is employed with a sigmoid non-linearity to predict whether it is a fake or real story pair.


Structure of the story discriminator, Source , LICENSE-MIT

Library and Usage

git clone https://github.com/yitong91/StoryGAN.git   [Python 2.7, PyTorch, cv2]python main_clevr.py

7. Keras-text-to-image :

In Keras text to image translation is achieved using GAN and Word2Vec as well as recurrent neural networks.

It uses DCGan(Deep Convolutional Generative Adversarial Network) which has been a breakthrough in GAN research as it introduces major architectural changes to tackle problems like training instability, mode collapse, and internal covariate shift.


Sample DCGAN Architecture to generate 64×64 RGB pixel images from the LSUN dataset, Source, License -MIT

Library and Usage

git clone https://github.com/chen0040/keras-text-to-image.git import os  import sys  import numpy as np from random import shuffle   def train_DCGan_text_image():     seed = 42      np.random.seed(seed)          current_dir = os.path.dirname(__file__)     # add the keras_text_to_image module to the system path     sys.path.append(os.path.join(current_dir, '..'))     current_dir = current_dir if current_dir is not '' else '.'      img_dir_path = current_dir + '/data/pokemon/img'     txt_dir_path = current_dir + '/data/pokemon/txt'     model_dir_path = current_dir + '/models'      img_width = 32     img_height = 32     img_channels = 3          from keras_text_to_image.library.dcgan import DCGan     from keras_text_to_image.library.utility.img_cap_loader import load_normalized_img_and_its_text      image_label_pairs = load_normalized_img_and_its_text(img_dir_path, txt_dir_path, img_width=img_width, img_height=img_height)      shuffle(image_label_pairs)      gan = DCGan()     gan.img_width = img_width     gan.img_height = img_height     gan.img_channels = img_channels     gan.random_input_dim = 200     gan.glove_source_dir_path = './very_large_data'      batch_size = 16     epochs = 1000     gan.fit(model_dir_path=model_dir_path, image_label_pairs=image_label_pairs,             snapshot_dir_path=current_dir + '/data/snapshots',             snapshot_interval=100,             batch_size=batch_size,             epochs=epochs) def load_generate_image_DCGaN():     seed = 42     np.random.seed(seed)      current_dir = os.path.dirname(__file__)     sys.path.append(os.path.join(current_dir, '..'))     current_dir = current_dir if current_dir is not '' else '.'          img_dir_path = current_dir + '/data/pokemon/img'     txt_dir_path = current_dir + '/data/pokemon/txt'     model_dir_path = current_dir + '/models'      img_width = 32     img_height = 32          from keras_text_to_image.library.dcgan import DCGan     from keras_text_to_image.library.utility.image_utils import img_from_normalized_img     from keras_text_to_image.library.utility.img_cap_loader import load_normalized_img_and_its_text      image_label_pairs = load_normalized_img_and_its_text(img_dir_path, txt_dir_path, img_width=img_width, img_height=img_height)      shuffle(image_label_pairs)      gan = DCGan()     gan.load_model(model_dir_path)      for i in range(3):         image_label_pair = image_label_pairs[i]         normalized_image = image_label_pair[0]         text = image_label_pair[1]          image = img_from_normalized_img(normalized_image)         image.save(current_dir + '/data/outputs/' + DCGan.model_name + '-generated-' + str(i) + '-0.png')         for j in range(3):             generated_image = gan.generate_image_from_text(text)             generated_image.save(current_dir + '/data/outputs/' + DCGan.model_name + '-generated-' + str(i) + '-' + str(j) + '.png')

Conclusion

Here I have presented some of the popular techniques for generating images from text. You can explore more on some more techniques at https://github.com/topics/text-to-image. Happy Coding!!



from Featured Blog Posts - Data Science Central https://ift.tt/2E9OmAm
via Gabe's MusingsGabe's Musings