Today, due to the use of large amounts of data from various sources and to come up with a solution to modern problems by maintaining data at one place or platform is difficult. To tackle this challenge, Data Integration and Services becomes vital. For integrating data from various sources, the data must be consistent and this demo showcases the techniques to achieve consistency.
Data Integration Techniques for maintaining data at one place or platform
Data Integration is a critical part of any application today. Due to the high volumes and many varieties of data, not all of it can be stored at the place where the application runs. This problem becomes even challenging while developing a web application. Typically, web servers can't hold large amounts of data. The data must be stored at a different place and must be accessed when the application needs it. One way to do this is by using APIs that provide data on demand(Genesereth, 2010).
APIs generally take POST or GET requests and return the data in JSON format, which can then be parsed and used in our application. Using these APIs have their own challenges. One of the main concerns is security. Most of the APIs overcome these with the help of API keys which are unique to each application.
2.0 Key System Concepts
2.1 Data Merging and Cleaning
Data merging
"Data merging is a procedure that includesmerging data from different sources and providing users through a single view of these data."
Data cleaning
"Data cleansing or data cleaning is the procedure of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and discusses to finding incomplete, incorrect, inaccurate or unrelated parts of the data and then exchanging, adjusting, or deleting the dirty or coarse data."
Issues faced during merging and cleaning:
- The 'Station_Locations.xml' had to be converted from XML to CSV format to use for merging. This issue was solved by reading the data in the xml file and writing it into a CSV file and exporting it using the Pandas library to use it in the future
- The entries in the 'Station_Locations.xml' were all in uppercase. Had to convert the entries into title case using the titlecase() function for strings to achieve consistency to merge with other data frames.
- The 'Fire Station' column in the 'Fire_Stations' file has to lead and trailing spaces. These had to be removed to perform merging with other data frames. This issue was resolved by using the map function along with a lambda function, which contained the logic of stripping these spaces using the strip () function for strings. The code is as follows, fsdf['Fire Station']=fsdf['Fire Station'].map(lambda x: x.strip())
Recommendations to QFES:
- Storage of data inconsistent format and case to enable easier merging of data and to spend lesser time cleaning the data.
- Care must be taken to make sure that non-ASCII characters are removed, and the data does not have any leading or trailing spaces(Devi & Kalia, 2015).
2.2 RESTful Web Services
Incorporation of REST in this demo
In the demo, the REST API is used to get data from a local server. The local server acts as a database. This server takes get requests and services them with response accordingly. The server is used to get the Region Names, Region IDs, Station Names and other fields required for the web application we created by querying from the respective tables in the database(Hamad, Saad, & Abed, 2010).
Problem with REST API
The main problem with the REST API used in this demo is security. Since no authentication is required to send and receive information from the server any device on the network with the URL.
2.3 Mashups
Incorporation of mashups in the demo
In the demo, mashups were incorporated in the form of Google Maps API. The locations of the Fire Stations were identified concerning the regions and marked on Google Maps. We use a drop-down on our HTML page, which contains a list of regions from our database. We then get the details of all the fire stations in the selected region. Then we iterate over the JSON object that is returned from the server as the response to extract the coordinates (longitude and latitude) of the fire stations along with the station number to mark on the map with the station numbers as the labels for better understanding.
Improvements for a better Mashup
The efficiency of the Mashup in the demo can be increased by doing the processing on the server itself and then sending the JSON rather than taking the whole response parsing it and getting the fields of our interest. This reduces a lot of processing time since; generally, server-side processing is much faster than the client-side processing(Summers & Smith, 2014).
3.0 Demo Running instructions
1. Download the zip file and extract it by right-clicking the file and selecting Extract all. This will create a folder with the same name as the zipped file and extracts the content into the folder.
2. To run the .py file, we need to install python on the machine. To do this, we can download the Anaconda Navigator which downloads python and various other dependencies needed in this demo.
3. To download Anaconda navigator, visit https://www.anaconda.com/distribution/ and click on download. This should start a download of the .exe file used to install it.
4. Using jupyter notebook, open data integration.ipynb. Click on cells option in the top and click on run all cells. This will generate the CSV required for the next steps.
5. Once downloaded, open the file and follow the instructions to install Anaconda Navigator. After the installation, click on the start button and search for anaconda prompt. Click on it to open and paste the following commands there to install the dependencies.
- pip install pandas
- pip install flask_cors
- pip install flask_jsonpify
- pip install flask_restful
- pip install Flask
6. Now that we have all the dependencies installed go to the folder where we extracted the package, right-click on the 'data_services.py' select properties and copy the path. Now go to the anaconda prompt type cd followed by a space and right-click anywhere on the window. This pastes the path and changes the directory to the current working directory.
7. Now type "data_services.py" in the prompt. If all the dependencies are installed correctly, this will start a server. If not, please follow the instructions provided by the prompt to install the remaining dependencies.
8. Once we have the server running, open the "stations_map.html" file from the extracted directory. This opens the page in a browser. There will be a dropdown at the top of the page containing different regions. You can select any one of the regions and click on the Display Stations buttons. On clicking the button, a google map opens, marking the locations of all the fire stations in the selected regions. A different region can be selected from the list and displayed whenever required.