Based on another blogpost about cross-tenant quirks of sensitivity labeled content I was curious to dig into the metadata of .DOCX files to see if I could remove a sensitivity label.
In today’s world full of Copilot and AI, Microsoft recommends classifying and labeling data that should not be fed into the training data. In such instances, it may not feel necessary to encrypt every single file but is that leaving data exposed if the label can simply be removed?
For example, a confidential document may be labelled accordingly, and a DLP policy is applied to prevent someone from uploading it to ChatGPT. Let’s see if that can be bypassed:
Create a .docx and apply a basic label that doesn’t encrypt the file (doesn’t have the little lock icon):

Download the file, this way I can confirm the label has applied in Word:


Extract the .DOCX file using any well-known file archive tool such as 7Zip:

As we can see, a typical .DOCX file is made up of a bunch of .xml files. Since this is an unencrypted file, this is all plaintext. The labeling information can be found under \docProps\custom.xml:

As part of the label information, the SiteId matches the tenant ID.
Now simply delete \docprops\custom.xml:

Finally, re-zip everything at the root again, and rename .zip to .docx – the label is now removed, and data is intact.



Note that the data was not corrupted, but the Confidential\Anyone label was removed, and Purview is now auto-labeling to a default General\All Employees label instead.
No justification was required which would otherwise prompt the user if they are allowed to change the label:

Does this bypass DLP Policies?
If technically anyone can remove a label (as long as the file is unencrypted) – would that bypass a DLP policy configured for that particular label? Let’s test:

In the above DLP Policy a rule is created on the previously used Confidential\Anyone label to prevent copy to a removable USB device or network location.
When testing this on the workstation with the correct label applied:

I’m blocked from doing so.
However, taking the same steps as described above to remove the label:

I’m now able to copy the file without intervention from DLP. Whoops!
Co-authoring changes the label metadata
Depending on whether or not Co-authoring for files with sensitivity labels is enabled or not for the tenant, the way some of metadata is stored changes slightly. You can read more about this here.
So, if you’ve been following along but can’t find the labels in \docProps\, but also see an additional \docMetadata\ folder – this is probably why. This setting is recommended to turn on since it will allow multiple users to edit a document at the same time when it has been labeled and encrypted.
This blogpost won’t cover this scenario, expect an update in the future with some further testing.
To Encrypt or not?
The fix to get around this is to enable encryption, once you extract a file that’s encrypted it will look like this:

There’s no way to figure out where the label is anymore, and this problem goes away.
But should you encrypt ALL files? I would argue probably not.
Encrypting the files using Purview will lock you into the Microsoft Ecosystem even further, you won’t be able to pull out any encrypted files without the help of Purview, and this becomes especially relevant in recovery or migration scenarios. While there are ways to mass-decrypt files using PowerShell (and similarly removing labels), it’s an additional step that needs to be considered and planned for.
This is especially relevant in the current political landscape so take appropriate considerations before mass-applying encryption to all your labels.