Managing metadata is an important part of document automation, especially for applications that work with different document formats. In this article, we will look at how to manage metadata in Python applications, with a focus on extracting, modifying, and removing metadata.

In this guide, we will explore the following sections:

  1. What is GroupDocs.Metadata and how it helps?
  2. How to install
  3. Quick start guide
  4. Key Features
  5. Common Use Cases
  6. Getting Started with a Free Trial
  7. See Also

What is GroupDocs.Metadata and how it helps?

GroupDocs.Metadata for Python via .NET is a powerful library designed to help developers manage metadata across various document formats in Python applications. While Python provides native methods to extract basic metadata such as file size, type, and attributes, these methods fall short when dealing with more complex metadata associated with specific file formats.

GroupDocs.Metadata fills this gap by allowing users to retrieve, update, and remove metadata from files such as PDFs, Microsoft Office formats, images, audio, video, and many others. It also enables access to detailed information such as the author, title, specific properties of file formats, including GPS location data, and many others. With a straightforward API, GroupDocs.Metadata allows you to manipulate metadata effectively to cover many document automation and data processing scenarios.

How to install

GroupDocs.Metadata for Python via .NET can be easily installed using pip. You can reference our API in your Python project by installing it with the following command:

pip install groupdocs-metadata-net

Quick start guide

Getting started with GroupDocs.Metadata for Python via .NET is quick and straightforward with our examples project. Here’s a simple guide to help you set up:

  1. Clone the Repository: Clone the example repository to your local machine.

    git clone https://github.com/groupdocs-metadata/GroupDocs.Metadata-for-Python-via-.NET.git
    
  2. Navigate to the Examples Folder:

    cd ./GroupDocs.Metadata-for-Python-via-.NET/
    
  3. Install Necessary Packages:

    pip install groupdocs-metadata-net
    
  4. Run the Examples:

    python run_examples.py
    

This will execute a series of example scripts demonstrating various functionalities of the GroupDocs.Metadata library.

Key Features

GroupDocs.Metadata for Python via .NET is equipped with a range of features that make it a useful API for developers dealing with document metadata. Here are some of the key features:

  • Rich Set of Metadata Management Features: Read, update, and remove metadata from various document formats.
  • Targeted Property Manipulation: Search, update, and delete specific metadata properties that meet certain criteria.
  • Unified Property Management: Utilize tags for easy manipulation of common metadata properties in a consistent manner.
  • Support for Password-Protected Documents: Work with documents that are secured, ensuring flexibility even with sensitive data.
  • Insight into Document Properties: Extract detailed information about hidden pages, digital signatures, user comments, revisions, and more.
  • Compatibility with Popular Standards: Work with leading metadata standards such as IPTC, XMP, EXIF, and Image Resources.
  • Multimedia Metadata Handling: Manipulate native metadata properties in various formats and extract technical information from multimedia files.
  • Document Statistics Calculation: Calculate common document statistics like word count and character count for your documents.
  • File Format Detection: Identify the format and MIME type of a file based on its internal structure.
  • Support for Audio Tags: Manage various audio tags, including ID3, Lyrics, and APE.

Common Use Cases

Below are some common use cases for GroupDocs.Metadata in Python applications, along with corresponding code snippets.

Use Case 1: Retrieve metadata from an image

In this example, we will fetch metadata properties of an image file (e.g., image.jpg) based on specific criteria.

  1. Initialize the Metadata Object: Create a new instance of the Metadata class with your file.
  2. Define Search Criteria: Use a search specification to find particular properties.
  3. Fetch and Display Metadata: Retrieve the matching properties and log their names and values.
import os
import groupdocs.metadata as gm
from datetime import datetime

def run():
    with gm.Metadata("image.jpg") as metadata:
        # Check if the file format is known and the document is not encrypted
        if metadata.file_format != gm.common.FileFormat.UNKNOWN and metadata.get_document_info().is_encrypted != True:
            # Define a search specification to find all properties
            specification = gm.search.AnySpecification()
            # Find properties based on the search specification
            properties = metadata.find_properties(specification)
            # Iterate through the found properties and print their names and values
            for property in properties:
                # Handle complex data types by using the interpreted_value property
                if not (property.interpreted_value is None):
                    print(f"Property name: {property.name}, Property value: {property.interpreted_value}")
                else:
                    print(f"Property name: {property.name}, Property value: {property.value}")

Use Case 2: Clean up metadata from a document

Here we will clean up unwanted metadata from a PDF document.

  1. Load the PDF: Create a Metadata object for the PDF file.
  2. Sanitize Metadata: Remove detected metadata packages.
  3. Save the Cleaned Document: Save the sanitized document to a new file.
import groupdocs.metadata as gm

def run():
    with gm.Metadata("input.pdf") as metadata:
        # Remove detected metadata packages and log the properties removed
        affected = metadata.sanitize()
        print(f"Properties removed: {affected}")
        # Save the cleaned PDF
        metadata.save("output.pdf")

Use Case 3: Update document metadata

This example demonstrates how to update the copyright metadata of your documents — such as when a new year begins. The process involves three main steps:

  1. Load the Document: Initialize a Metadata object for the document file.
  2. Set New Properties: Update the copyright metadata of the document.
  3. Save the Updated Document: Save the changes to the file.
import os
import groupdocs.metadata as gm
import pathlib
from datetime import datetime

def run():
    files = os.listdir(input_dir_path)
    for file in files:
        with gm.Metadata(input_dir_path + file) as metadata:
            if metadata.file_format != gm.common.FileFormat.UNKNOWN and metadata.get_document_info().is_encrypted != True:
                print()
                print(file)

                # Define a search specification to find the copyright property
                specification = gm.search.ContainsTagSpecification(gm.tagging.Tags.legal.copyright)
                # Set the copyright property to a new value
                affected = metadata.set_properties(specification, gm.common.PropertyValue("Copyright (C) 2011-2025 Your Company. All Rights Reserved."))
                print(f"Affected properties: {affected}")
                # Save the updated file
                metadata.save(output_dir_path + "output" + pathlib.Path(file).suffix)

Getting Started with a Free Trial

By default, the library operates in trial mode without a license, allowing you to explore its basic features and functionalities. You can download GroupDocs.Metadata for Python via .NET from the GroupDocs Releases page.

Additionally, you can acquire a temporary license from this link. The temp license allows you to use the library without any restrictions and fully explore its capabilities.

See Also

To delve deeper and gain more insights about GroupDocs.Metadata and its functionalities, refer to the following resources: