- Data can be filtered or refined to ensure that only the relevant data is integrated.
- Describe a situation where data has been filtered to underpin a business objective.
A common technique in marketing is A/B testing where different customers are randomly given different offers on a product, e.g. one group are offered a discount whilst another is offered buy-one-get-one-free. This is particularly powerful in online stores where customers are not limited geographically (in bricks-and-mortar stores each store can only give one of the offers so it may be that a particular offer works or does not due to the area, useful information but it may mask other useful information) and the business is likely to have other data about the customer such as demographics. The business will filter the datasets on views and sales by which offer the customer was given to compare the impact on buying decisions vs other metrics such as age, gender &c. This can be used to tune future sales algorithms to offer the deal most likely to get the person to buy.
- Name 3 business objectives that could be better understood by filtering data ie improve customer satisfaction
I really struggled with this question, I don’t think I understand what it’s asking for. Reduce costs by identifying under performing business units and product lines and focusing on them.
Increase sales in an underperforming region by filtering datasets to just that region
Reduce customer complaints by filtering datasets on the best ten and worst ten sites into two separate datasets to allow you to focus on the differences that may be lost in a larger dataset.
Increase sales by comparing high performing regions with low.
Reduce staff absence by identifying trends.
- Data integration techniques can be categorized as follows:
- Manual Integration or the Common User Interface
- Application Based Integration
- Middleware Data Integration
- Virtual Integration or Uniform Data Access
- Physical Data Integration or Common Data Storage
- Explain the Manual data Integration technique and describe the following Common User Interfaces
- Dynamic charts
Manual integration involves a person manually combining, filtering and summarizing data from different systems. Commonly this will be taking spreadsheets, CSV files etc and loading them into a spreadsheet program (e.g. Excel) or desktop database (e.g. Access) then using lookups and formulas to process the data ready for visualization. This is most useful when the analysis is a one off so is not worth the effort of automating and for exploratory or pilot analysis where you are looking for options or the end user is unsure what they want and wants the data analyst to show them the ‘art of the possible’.
A dashboard is a collection of visualizations, typically on a single page, that provides a high level summary view of the data. The visualizations may be collated from one or more separate reports. The aim is to provide an overall indication of the situation, ‘At A Glance’, without going into detail, although an online dashboard may provide the ability to drill down to more detailed visualizations and the underlying data.
A score card is a form of dashboard that compares actual values to a target or predicted value. For example you may not be concerned about the specific sum spent per department each month, just if it is within a given tolerance of the budgeted amount so, if your tolerance is 5%, you might use a RAG status with Green for upto +/- 5% of budget, Amber for +/- 5-10% of budget and Red for +/- over 10% and apply the colours to arrows representing which direction. This quickly draws the readers attention to those areas most in need of attention.
A dynamic chart is a chart that is connected to some sort of data feed and updates either on a set frequency or when triggered by a change in the feed. They may show a current status or a rolling average. These are commonly used in IT to display the status of key servers and network links, often with colour coding to highlight potential problems. Similarly they are often used in call centres to show metrics important to the minute by minute management of the centre. Typically they may show a breakdown of number agents in calls vs in wrap up vs waiting for calls vs logged off, current call volumes vs callers in queue vs callers on hold, current longest call, average call length as a rolling average for the past hour &c.
- Explain the Virtual data integration technique
Virtual Data Integration uses direct links to the source systems, often via views or ‘canned queries’ that connect using database links or an API (Application Programming Interface), to get data for analysis when it is needed. These have the advantage that you are always looking at the most current data and do not need to keep extra copies of the data. Key disadvantages are that you may not be able to view historic data, version control can be challenging as the same report run by different people seconds apart may give different results and running reports will place extra load on the source system. This last can be a major issue on high volume transactional systems as they are tuned for transactional work, typically many small updates with few queries, and those queries are often using indexed key values, where are reporting and analysis will often involve long running queries that often use full table scans.
- Explain the Physical data integration technique and describe the following process
Physical data integration is taking datasets from source systems and moving it to a dedicated reporting/analysis platform, often a datawarehouse or datamart. This process is often referred to as ETL, Extract Transform Load. The Extract phase involves taking a copy of the data from the source systems and putting it in a staging area, the data may be filtered at this phase, commonly to just new data since the last extract as previous data may already be on the target platform. During the Transform phase the data is manipulated to make it more suitable for reporting. This often involves denormalization, creation of summary values and data conversions (e.g. a multinational company may convert all transactions to a single currency and store that alongside the actual transaction value in the original currency). The Load phase involves moving the dataset onto the reporting/analysis platform. This could be database tables, a spreadsheet, an OLAP (OnLine Application Processing) cube. Often this will also involve creating objects such as indexes and materialized views to support queries used in reports.