Working on a Proposal

Draft - Use of Machine Learning in Small Businesses

Since the 1980’s the cost of computer storage has changed dramatically. In 1981, Apple was charging approximately $700,000 per Gigabyte {Anonymous:2016wf}. As of 2017, the cost per gigabyte is now less that $.03 (Klein 2017). The zeroing effect of storage costs over the years, has led to a significant increase in digital data retention. According to EMC Digital (2014), From 2013 to 2020, the digital universe will grow by a factor of 10 – from 4.4 trillion gigabytes to 44 trillion. This 44 trillion gigabytes of data will almost represent a byte of data per star in the universe. Just looking at 2013, only 5% of the 4.4 trillion gigabytes was analyzed. By 2020 this percentage could grow to 22% (EMC Digital 2014). The low cost and rapid growth in the volume of data being stored has been mirrored in the growth and development of artificial intelligence (AI) machine learning techniques.

According to MathWorks, machine learning can be defined as: >“Machine learning is a data analytics technique that teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases.”

As companies collect ever larger quantities of data, the ability to analyze and make inferences and predictions based on the data increase exponentially. Machine learning techniques typically contain a great deal of the domain knowledge and expertise that is a hallmark of AI systems (Delisle, St-Pierre, and Copeck 2006). Because of the level of knowledge and expertise needed to utilize these systems, machine learning is used most effectively by large companies. Some examples of these companies include companies such as Google, Netflix, Amazon, Salesforce, IBM, and Facebook (Sparks 2017). In the case of Google, Amazon, and Facebook, machine learning is used directly to generate display advertisements. If a user is logged into any of these systems, previous searches are used to display related ads. On both Google and Facebook, the use of search history and user preferences was used to great effect in recent elections both in the United States and Europe (Polonski 2017).

There is a large body of research utilizing machine learning techniques to schedule primary care facilities, operating room and hospital patient scheduling. Some of the research describes tools that focus on ease of use and accessibility. This research could help guide the development of a system of planning for small business and manufacturers.

The envisioned tools would automate the process of schedule development. By automating this process, the benefits are expected to be 2-fold. There will be a reduction in the time and manpower needed to produce a schedule. Additionally, an automated process would better assist with the collection of scheduling data. A key component to a successful model is the availability of scheduling data. By comparing historic scheduling data, more accurate, and timely schedules can be developed.

Research Proposal

This research will focus on the identification of ML techniques best suited for use by small businesses, focusing on schedule optimization. Additionally, a set of recommendations will be developed to determine data retention and storage policy. The overall results are expected to include the following: • Identification Of Ml Techniques Suited For Use By Small Businesses • Reduction In Man-Hours To Develop A Schedule • Storage Of Scheduling Data • Optimization Of Manufacturing Processes • Ease Of Schedule Interpretation • Data Retention Recommendations

To achieve these benefits, this paper will focus on leveraging existing scheduling methodologies and utilizing them to identify solutions that can assist small businesses.

Literature Review

Automated Scheduling studies can be divided into 3 basic approaches ((Kusiak and Ahn 1992)): • Operations Research (Or) Approach. • Artificial Intelligence (Ai)-Based Approach • Combination Of Or And Ai-Based Approaches Operations Research (OR) Approach The Operational Research (OR) Approach can be defined as the systematic approach to problem solving ((Rajgopal 2018)). According to (Rajgopal 2018), the steps of OR are defined in Figure 1. In more basic terms, (LUCKMAN and STRINGER 1974) defined OR as:

“…the securing of improvement in social systems by means of scientific method.”

In the case of many small to medium businesses and manufacturers, there is a lack of systemized planning. According to (Everett and Watson 1998),The one of the primary causes of small business issues appear to be a lack of appropriate management skills both at start-up and on a continuing basis. Overall, automated manufacturing scheduling has numinous benefits ((Cowan 1985)), including: • Fast Response To Market Demands • Better Quality • Reduced Cost • Enhanced Performance • Better Resource Utilization • Shorter Lead Time • Reduced work in process • Flexibility

Operations Research Steps

Figure 1: Operations Research Steps

These benefits can be further advanced by developing a tool to automate the scheduling process. An automated process will reduce costs, better utilize resources, and increase flexibility. Artificial Intelligence (AI)-based Approach In recent years, computation power has reached a level where machine learning and rudimentary artificial intelligence is accessible by a wider audience. In the case of machine learning, many algorithms and machine learning methodologies are derived from Bayesian Theory. Bayes Theorem was initially described in the 1763 posthumous memoir of Thomas Bayes (Howson and Urbach 2006). The theorem is based on the idea of revising a probability value based on additional information that is later obtained. Stated mathematically, the probability of event A, given that event B has subsequently occurred :

The use and development of Bayes work has grown exponentially since the 1980’s (Howson and Urbach 2006).

Machine Learning can be defined as “a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that machines should be able to learn and adapt through experience” {SAS:wd}. As can be noted with the definition, machine learning and Bayes are related via the idea of adaptation via experience.

It can be stipulated that this growth corresponds with the rise of personal computers. Additionally, the ability to collect and store vast amounts of data, or Big Data, has proved to be prime fuel for Bayesian and Machine Learning methodologies. According to SAS ((SAS, n.d.)), a statistical analytics system company, Big Data can be defined as:

“Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.” Combination of OR and AI-based Approaches Research into automated scheduling has been documented across numerous industries including healthcare, laboratory research, NASA operations, and college scheduling, as well as manufacturing. Access and willingness to utilize healthcare options can be limited by both the number of health providers in an area. Additionally, patient satisfaction can be directly correlated to a patient’s willingness to go to a doctor. In recent years, numerous studies have reviewed various scheduling methodologies to streamline appointment scheduling. An optimized schedule can help maximize access to a limited number of health providers. Optimized schedules also increase patient satisfaction which can positively effect a persons willingness to go to the doctor ((Oh, Muriel, and Balasubramanian 2014)).

According to the National Ambulatory Medical Care Survey of 2014 ((CDC 2014)), 34% of small practices were composed of a single physician. The various scheduling methodologies typically utilize advanced tools and analytic packages to determine an optimized schedule. The research tends to overlook that many of the tools utilized are inaccessible to small primary care practices. The article focuses on scheduling optimization in primary care practices utilizing Microsoft Excel ((Oh, Muriel, and Balasubramanian 2014)).

The use of automated schedules can generate a significant cost savings. As noted in (Boyd and Savory 2001), a research laboratory was able to save 37 hours of labor a month and save approximately $11,000 per year.

NASA has utilized automated scheduling to optimize the use of limited resources. These resources include the Deep Space Network (DSN) and Hubble Telescope. In the case of the DSN used automated scheduling to identify and transition from an activity-oriented to request-oriented scheduling approach ({Johnston:2010hy}).

University class and exam scheduling is a complex and time consuming administrative function. Various investigation into this subject include (Vermuyten et al. 2016), (Babaei, Karimpour, and Hadidi 2015), and (De Causmaecker, Demeester, and Berghe 2009). These studies focus on optimizing classroom usage, teaching assignments, and student flows. Additionally, these studies identify methods to optimize the overall scheduling process.

Numerous studies investigate automated scheduling in manufacturing. Many current systems can be classified as Decision Support Systems (DSS) rather than a true combination of OR and AI-based approaches (Monfared and Yang 2007). According to (Chaudhry, Mahmood, and Shami 2011), “The objective of scheduling is to find a way to assign and sequence the use of these shared resources such that production constraints are satisfied and production costs are minimized.” An innovative approach to the development of a manufacturing schedule is the use of genetic algorithms (GAs). GAs are a heuristic search and optimization technique inspired by natural evolution (McCall 2005). According to {WhatIstheGenetic:oBSdwLTb} the GA process is as follows:

“The GA repeatedly modifies a population of individual solutions. At each step, the GA selects individual solutions at random from the current population to be parents and uses them to produce the children for the next generation. Over successive generations, the population “evolves” toward an optimal solution.”

The literature demonstrates several examples of the use of GAs in manufacturing. In (Jalilvand-Nejad and Fattahi 2013), GAs are utilized. In this study, possible future work is described as

“Another difficulty for implying the results of considered paper in a part manufacturing firm is machine failures. As an opportunity to develop the considered paper is to study the cyclic flexible job shop scheduling problem including machine failures.”

To create a GA schedule that can factor in machine failure and downtime, historical data is required.

Lack of Data

In the case of (Dexter and Ledolter 2005), Bayesian techniques were utilized in scheduling of operating rooms and surgeons schedules where there was a lack of lacked historical data. In this study, the prediction bounds of the various procedures could be reasonably calculated.

Motivation

As companies collect ever larger quantities of data, the ability to analyze and make inferences and predictions based on the data increase exponentially. Machine learning techniques typically require a great deal of the domain knowledge and expertise that is a hallmark of AI systems (Delisle, St-Pierre, and Copeck 2006). Because of the level of knowledge and expertise needed to utilize these systems, machine learning is used most effectively by large companies. The goal of this research is to identify ML techniques that are well suited for small businesses that lack ML subject matter experts. Additionally, the project will also develop data storage and recommendations for future analysis.

Problem Identification

In order to utilize “Big Data”. Data must be stored and accessible. In the case of many small businesses, scheduling data is not typically stored in a format suitable for analysis. In many cases tools such as Microsoft Outlook and Excel are used. In the case of a local manufacturing plant scheduling methodology is completed via Excel by cutting and pasting cells that are color coded by job. All parts of this process are labor intensive. Digital, historical scheduling data might be saved but it is not used analytically to influence future scheduling decisions. With the growth of ML and the growing market in easier to use ML tools, small businesses need to consider the use of machine learning to optimize schedules.

Goals and Objective

This research will focus on the identification of ML models and techniques best suited for use in small businesses. Once models have been identified, an analysis will be preformed to document the steps needed for a business to utilize the models. It is expected that the use of ML will lead to the following: • Identification of machine learning techniques suited for small business use • Reduction in Man-hours to develop a schedule • Storage of Scheduling Data • Optimization of processes via scheduling • Ease of Schedule interpretation

Using a combination of OR and AI-based approaches, the study will review and use machine learning techniques such as genetic and Bayesian algorithms to develop an automated schedule. Studies such as that described in (Dexter and Ledolter 2005) will be used to model without the use of historical data. In order to develop the described models, this research will review various scheduling considerations and methodologies.

##Research Approach As stated previously, the research objectives are as follows: • Identification Of Machine Learning Techniques Suited For Small Business Use • Reduction In Man-Hours To Develop A Schedule • Storage Of Scheduling Data • Optimization Of Processes Via Scheduling • Ease Of Schedule Interpretation

In order to meet the research objectives, an extensive literature review will be conducted. The objective of the review will be to identify machine learning techniques suitable for use by small business. It is expected that the techniques will have the following characteristics: • Ease of Use/Interpretation • Reliability in scenarios with a lack of historical data • Repeatability

As previously noted, machine learning techniques often require domain knowledge and expertise. In order for a small business to utilize ML, techniques will need to be easy to use and will need to be interpretable. In this study, the ML tools identified will need to require minimal user expertise.

Many small businesses and manufactures do not store historical scheduling data and the associated variables in formats that can be easily analyzed. The ML tools identified will need to take into account data availability. Additionally, as a small business begins to use ML techniques, recommendations will be identified to optimize data retention to optimize future analysis.

Many studies have identified highly specialized ML tools. These tools will only work in the specific scenario that was studied. The tools identified in this study will focus on more robust solutions that could be used by multiple small business scenarios with minimal modification. The Models This research will focus on the development of scheduling models. Using sample small manufacturing data, models will be tested to identify solutions that will have 4 benefits: • Reduction in Man-hours to develop a schedule • Storage of Scheduling Data • Optimization of manufacturing processes • Ease of Schedule interpretation • a minimization of idle time • a minimization of wait times

References

Babaei, Hamed, Jaber Karimpour, and Amin Hadidi. 2015. “A Survey of Approaches for University Course Timetabling Problem..” Computers and Industrial Engineering 86: 43–59. doi:10.1016/j.cie.2014.11.010.

Boyd, J C, and J Savory. 2001. “Genetic Algorithm for Scheduling of Laboratory Personnel..” Clinical Chemistry 47 (1): 118–23. CDC. 2014. “National Hospital Ambulatory Medical Care Survey.” Center for Disease Control. https://www.cdc.gov/nchs/data/ahcd/namcs_summary/2014_namcs_web_tables.pdf.

Chaudhry, I A, S Mahmood, and M Shami. 2011. “Simultaneous Scheduling of Machines and Automated Guided Vehicles in Flexible Manufacturing Systems Using Genetic Algorithms.” Journal of Central South University … 18 (5): 1473–86. doi:10.1007/s11771-011-0863-7.

Cowan, D A. 1985. Is CIM Achievable. Proceedings of the 3rd European Conference on ….

De Causmaecker, Patrick, Peter Demeester, and Greet Vanden Berghe. 2009. “A Decomposed Metaheuristic Approach for a Real-World University Timetabling Problem..” European Journal of Operational Research 195 (1): 307–18. doi:10.1016/j.ejor.2008.01.043.

Delisle, Sylvain, Josée St-Pierre, and Terry Copeck. 2006. “A Hybrid Diagnostic-Advisory System for Small and Medium-Sized Enterprises: a Successful AI Application.” Applied Intelligence 24 (2): 127–41. doi:10.1007/s10489-006-6934-z.

Dexter, Franklin, and Johannes Ledolter. 2005. “Bayesian Prediction Bounds and Comparisons of Operating Room Times Even for Procedures with Few or No Historic Data.” Anesthesiology: the Journal … 103 (6). The American Society of Anesthesiologists: 1259–1167. doi:10.109700000542-200512010-00023.

EMC Digital. 2014. “Executive Summary: Data Growth, Business Opportunities, and the IT Imperatives | the Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things.” IDC. https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm.

Everett, J, and J Watson. 1998. “Small Business Failure and External Risk Factors.” Small Business Economics. doi:10.1023/A:1008065527282.pdf. Howson, Colin, and Peter Urbach. 2006. Scientific Reasoning. Open Court Publishing.

Jalilvand-Nejad, Amir, and Parviz Fattahi. 2013. “A Mathematical Model and Genetic Algorithm to Cyclic Flexible Job Shop Scheduling Problem.” Journal of Intelligent Manufacturing 26 (6). Springer US: 1085–98. doi:10.1007/s10845-013-0841-z.

Klein, Andy. 2017. “The Cost of Hard Drives Over Time.” July 11. https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/.

Kusiak, Andrew, and Jaekyoung Ahn. 1992. “Intelligent Scheduling of Automated Machining Systems.” Computer Integrated Manufacturing Systems 5 (1): 3–14. doi:10.10160951-5240(92)90013-3.

LUCKMAN, J, and J STRINGER. 1974. “The Operational Research Approach to Problem Solving.” British Medical Bulletin 30 (3): 257–61. doi:10.1093/oxfordjournals.bmb.a071212.

McCall, John. 2005. “Genetic Algorithms for Modelling and Optimisation.” Journal of Computational and Applied Mathematics 184 (1). North-Holland: 205–22. doi:10.1016/j.cam.2004.07.034.

Monfared, M A S, and J B Yang. 2007. “Design of Integrated Manufacturing Planning, Scheduling and Control Systems: a New Framework for Automation.” The International Journal of Advanced Manufacturing Technology 33 (5-6). Springer-Verlag: 545–59. doi:10.1007/s00170-006-0476-8.

Oh, Hyun Jung Alvarez, Ana Muriel, and Hari Balasubramanian. 2014. “A User-Friendly Excel Simulation for Scheduling in Primary Care Practices..” Winter Simulation Conference. IEEE, 1177–85. doi:10.1109/WSC.2014.7019975.

Polonski, Vyacheslav. 2017. “The Good, the Bad and the Ugly Uses of Machine Learning in Election Campaigns.” August 30. https://www.centreforpublicimpact.org/good-bad-ugly-uses-machine-learning-election-campaigns/.

Rajgopal, Jayant. 2018. “Principles and Applications of Operations Research.” Pitt.Edu. Accessed February 25. http://www.pitt.edu/~jrclass/or/or-intro.html.

SAS. n.d. “What Is Big Data and Why It Matters.”

Sparks, Andrea. 2017. “5 Companies That Are Dominating with Machine-Learning.” Domo.com. October 12. https://www.domo.com/blog/5-companies-dominating-machine-learning/.

Vermuyten, Hendrik, Stefan Lemmens, Inês Marques, and Jeroen Beliën. 2016. “Developing Compact Course Timetables with Optimized Student Flows..” European Journal of Operational Research 251 (2): 651–61. doi:10.1016/j.ejor.2015.11.028.