Why and how did you filter the database?
The raw data sheets from the government database can only be downloaded as entire Council’s and includes all EPC data including those that have expired. To cover the whole of Central London, I downloaded the separate data sheets for Southwark, Westminster and City of London. I then had to filter the data by postcode to remove EPC’s for places like Bermondsey that ae not considered to be ‘Central London’. I also filtered by Use Class, specifically B1 to align with my clients assets and ensured that any EPCS prior to August 2014 were removed as these had expired.
What format was the data in?
The original data was a very large excel spreadsheet of raw data that was difficult to read and included masses of data.
I then filtered the data to include relevant data only and put this into simple tables showing the 5 year average and yearly increases from 2014-2024.
How does COUNTIF function work?
It works by calculating how many of a particular selected work or number comes up in a pool of data. For this example I carried out the countif function for each EPC grading from A+ to G and for each year from 2014-2024.
COUNTIF(A1:A100 ‘A’)
- Type the word ‘COUNTIF’ in all capitals
- Highlight the column of data you want to select from
- Then in inverted commas, type the word/number you are looking for.
Once I had this for each year I put this into a table for each year and calculated the percentage of each rating used for each year to establish an average rating.
What did the data tell you about the Central London market?
That the average rating across Central London was a ‘B’ with 57% of EPCs at a B as of August 2024.
Average was a ‘D’ in 2020 with 33% at a D.
How did their portfolio compare?
Below average with an average rating of ‘C’
How did you present your findings?
I consolidated the original three raw data sets into one table
How do you define sensitive data?
What is enterprise-grade security:
Enterprise-grade security refers to a comprehensive, multi-layered strategy and set of tools designed to protect the entire information technology infrastructure of a large organisation (an “enterprise”). It is not a single product, but a holistic approach built to handle the scale, complexity, and high-stakes risk environment that large businesses face.
What were the benefits of using an internal database:
Are there any risks with using cloud servers?
Data breaches, misconfigurations, malware, denial-of-service (DoS) attacks, and data loss
For instance I’m aware that BOX had a data breach a few years ago and I’m aware that many companies, for instance Microsoft, have suffered from cyberattacks.
Did you consider any other methods of transferring data, if so what?
BOX - higher risk of data breach
DropBox - cannot edit once files moved over
WeTransfer - Links expire quickly, timeframes often don’t align with handover period requiring multiple links.
Email - cannot facilitate large files and high risk of interception.
Are there any limitations or disadvantages of using JLL SharePoint?
How does JLL SharePoint comply with GDPR?
✓ Transparency: Clear access permissions show who can view data
✓ Fairness: Controlled sharing prevents unauthorised access
⚠ Lawfulness: Must ensure appropriate legal basis documented for data sharing
✓ Specific folders/sites can be designated for particular handover purposes
✓ Access controls prevent data use beyond intended purpose
⚠ Risk: Broad SharePoint access could enable purpose creep
✓ Selective sharing - only relevant documents/folders shared
✓ Granular permissions prevent access to unnecessary data
⚠ Challenge: Easy to over-share entire document libraries
✓ Version control ensures latest, accurate data shared
✓ Real-time updates maintain data accuracy
✓ Single source of truth prevents outdated copies
✓ Automated retention policies can delete data after specified periods
✓ Access removal post-handover supports time-limited processing
⚠ Default settings may retain data longer than necessary
✓ Enterprise encryption protects data in transit and at rest
✓ Access controls maintain confidentiality
✓ Audit logs support integrity monitoring
✓ Multi-factor authentication strengthens security
✓ Comprehensive audit trails demonstrate compliance measures
✓ Permission management shows data protection controls
✓ Integration with JLL policies supports accountability framework
✓ Activity monitoring enables compliance demonstration
Overall Assessment: SharePoint generally supports GDPR principles when properly configured, but requires active management of permissions, retention, and purpose limitation to ensure full compliance.
What other data analysis methods could you have employed for this EPC analysis?
There are much more advanced options available including:
- software such as Phython or SQL or Tableau - but expensive, would require training as I haven’t used them before and also unnecessary for the scope of the task.
- AI - subject to error, should proceed with caution. AI continues to develop but definitely would not have been reliable in August 2024 and no audit trail.
Other excel functions such as:
- pivot tables (overly complex for the task and more suited to data visualisation rather that statistical analysis) the client needed simple percentage breakdowns of EPC ratings, not dynamic cross-tabulation capabilities)
- VLOOKUP/INDEX-MATCH: These lookup functions are designed for linking data between tables, not performing statistical analysis. They would only be relevant if combining EPC data with external property databases
- Statistical Functions (STDEV, PERCENTILE, QUARTILE): EPC ratings are categorical data (A, B, C, D, E, F, G), not continuous numerical data suitable for standard deviation or percentile analysis. These functions assume numeric data distributions, but EPC ratings are ordinal categories.
SUMPRODUCT for Multiple Criteria: While SUMPRODUCT offers more sophisticated conditional counting, the analysis primarily involved single-criterion counts (buildings with rating A, B, C, etc.) within each filtered dataset.
COUNTIF was suitable because the primary objective was establishing frequency distributions of EPC ratings rather than complex statistical relationships. The client needed straightforward comparative data for portfolio.
Excel with COUNTIF functions provided adequate analytical capability without requiring specialized software training or additional licensing costs. This matched the project scope and client expectations.
With categorical data (EPC ratings A-G), frequency counting was more appropriate than continuous variable analysis methods. The dataset size was manageable within Excel’s capabilities.
COUNTIF provides transparent, auditable calculations that clients can verify independently if required. Specialist software often provides the result but not the audit trail or formula applied which makes it harder to spot errors upon review.
How did you ensure data quality when removing duplicates from the three datasets?
How did you do it?
Step 1: Data Sorting for Visual Inspection Before using automated tools, sorting data by key fields (like address, then postcode, then lodgement date) groups potential duplicates together. This allows you to scan through and spot patterns - for example, seeing “123 High Street, London, EC1” appearing multiple times consecutively. This visual review helps identify whether apparent duplicates are genuine (same property, same assessment) or legitimate separate entries (same property but different floors, dates, or assessment purposes).
Step 2: Conditional Formatting for Duplicate Highlighting Excel’s conditional formatting can highlight duplicate values across selected columns. You select your data range, go to Home > Conditional Formatting > Highlight Cell Rules > Duplicate Values. This colours all cells containing duplicate information, making it immediately visible which records share common elements. You can apply this to individual columns (addresses, postcodes) or combinations to see overlapping data patterns before making removal decisions.
Step 3: Remove Duplicates Function Located under Data > Remove Duplicates, this tool allows you to specify which columns Excel should consider when identifying duplicates. You can select all columns (requiring exact matches across every field) or specific key columns (like address + floor area). Excel then shows you how many duplicates it found and removes them automatically, but only after you’ve verified through the previous steps that the removal criteria are appropriate.
Why This Layered Approach Works: The manual inspection prevents errors that automated removal might make. For instance, two EPC certificates for the same building might be legitimate if they cover different floors or time periods, but automated removal might incorrectly eliminate one. By sorting and highlighting first, you can identify these edge cases and either adjust your removal criteria or manually preserve records that appear duplicate but serve different purposes.
The EPC data came from a public government database. What data protection considerations did you need to account for when processing and presenting this information to your client?
I considered Crown Copyright as although the data is publicly available, there are restrictions on how the data can be used, processed and published.
To ensure adherence, I checked the licencing terms on the government website.
Open Government Licence (OGL): All data fields other than the address and postcode data (address, address 1, address 2, address 3, postcode) are licensed under the OGL. This allows you to re-use the information (excluding logos) in any format or medium, including for commercial purposes, with attribution.
Address and Postcode Data: The address and postcode data contain intellectual property rights belonging to Ordnance Survey and/or Royal Mail Group Limited.
You are permitted to use this address data for the specific purposes outlined in the copyright and database right notice on the Open Data Communities website (e.g., for research related to energy efficiency).
To reduce risk I ensure I was only provided key statistics to my client and not the property addresses or postcode.
How did you verify the accuracy and reliability of the government EPC database before basing your client advice upon it?
I am aware that there is an Opt-out Clause: Property owners or occupiers have the right to “opt out” of having their property’s EPC data publicly disclosed. If a property is opted-out, its data will be removed from public registers and future bulk data releases, and you must not use it.
Also, sometimes they are lodged incorrectly, e.g. with the wrong postcode.
Therefore, I caveated to my client that the data may not be 100% representative of the whole Central London office market. However, this was also the most reliable source available.