Digital File Formats: The always up-to-date guide for conversion projects
Scanning documents may sound like a boring project. But think of how liberating it can be to have years' worth of documents available at the click of a button. And consider the prospect of not having to rely on searching out papers from folders stored in dark, dingy storage areas. You'll be surprised at how excited you feel about your upcoming document digitization project!
But before you start on a digital conversion project, it's important to deliberate on the digital file format in which you want to save the scanned files.
Many organizations spend an immense amount of time and effort discussing the timelines and budgets of a scanning services project.
But they don't have clarity or consensus on the output format they will need.
For example, do you want to view (and maybe have the option to edit) information using word processing software such as Microsoft Word? Then, you need the output files in the .DOC or .DOCX format. On the other hand, if you simply want to be able to view the text and keep the format universal and neutral, then the .PDF format is a better choice.
Similarly, some image-based files may be best saved in formats like .JPG, which is ideal for photographic quality images such as posters, paintings, or even maps. On the other hand, schematics, circuit diagrams, and engineering drawings need high-resolution scanning and the .TIFF file format is good for such detailed images.
Decisions regarding the file format of the output files are critical and must be addressed early in the project planning stage.
These decisions will impact your team's ability to work without compatibility issues between the file formats and the software they use.
If you don't have a plan in place and your team faces file format compatibility issues later, it will hamper operations, and you may have to spend extra time and effort in file format conversion.
What is a digital file?
A digital file stores information in a sequence of 'bytes' that is accessible or readable via a software program. Software applications interpret the binary data from these files as text characters, image pixels, or audio samples.
A digital file has an extension at the end of the filename, such as .DOCX or .PDF. The extension signifies the format of the file and governs which applications you can use to view or edit it.
So depending on what software applications you use, you will need to save the scanned files in a digital format that is compatible with the applications you will be using. If not, you may have to convert file types before viewing or modifying them.
You can set access permissions for each digital file. You can also view the file's characteristics, such as file size (also known as file weight in KB or GB), file creation date and last modification date. Digital files can be read, modified, and copied.
What are digital file formats?
Now that we've answered the question of 'what is a digitized file,' let's talk about file formats and look at some digital file examples.
Every digital file has a file format, as indicated by its filename extension. In the context of a document digitization project, a digital file format is the output file or deliverable you want to receive at the end of your document scanning project.
It is essential to clearly state what the digitized file format you wish to receive at the end of your scanning project as it has an impact on the project scope, the scanning process details (such as whether you require OCR or not) and the post-scan processing applied to the output files is. For example, image corrections are done on file formats such as .JPG or .TIFF.
Digital file types: Every output option available
You can choose to save the scanned output files in any of the standard digital formats listed below. If your business uses proprietary software that requires a specific file format or has to adhere to specific compliances that mandate a particular file type, be sure to mention the details to your scanning project manager at the planning stage.
Let's look at some file format definitions and the characteristics of some commonly used file types:
PDF (Portable Document Format) is widely used as it is a universal format—which means that you can use any software or application to view it. The PDF format is designed to include metadata that is hidden from the viewer but contains keywords that are descriptive of the file's content. These keywords make the file searchable via a document management system. PDF is the ideal output file if your documents contain a combination of formatted text and images—for example, an instruction manual with text and illustrations. It is also a good choice for documents like invoices and contracts containing information that needs to be readily searchable with text or keywords.
● PDF/A
The PDF/A is a subtype of the PDF format. It is specifically developed for use in archiving and long-term 'as-is' preservation of documents. The format has been designed to prohibit external linking and prevent any dynamic changes to the document over time. It ensures that content stays unmodified over an extended period, and documents saved in PDF/A will be rendered in a consistent and predictable way in the future.
Similarly, PDF/E is used for archiving engineering and technical documentation, and PDF/X is used for archiving graphics and printed material.
An excellent example of PDF/A usage is insurance policy documents that must be retained without any modification to the information for several decades.
● TIF
TIF (or TIFF) stands for "Tagged Image File Format." This format is specifically meant for raster-based images. TIF files can be compressed into smaller file weights without loss in the quality of the image. So if you have a large number of records, it is efficient to store them in TIF formats which can be compressed and require less space for data storage.
Moreover, the TIFF format is suitable for multi-page documents as it allows you to add and remove pages, making it compliant with healthcare privacy guidelines.
For example, it is ideally suited for saving employee HR documents as you can add more pages to the employee's TIFF file as the employee submits more documents.
Many companies also prefer the TIFF format over PDF, as a TIFF file preserves the document's integrity, whereas a PDF can be modified.
However, remember that if you want TIFF files to be searchable, you'll also have to store a separate text file containing the metadata (except if you are using a Document Management System).
● JPG
JPG, or The Joint Photographic Experts Group, is an image file format that uses 'lossy compression' to reduce file size. This means that the compression results in a loss in image quality. However, the loss is minute enough to go unnoticed by the viewer and does not affect the file's content.
The JPG format is ideal for digital photographs or web-based images where reducing file weight is essential. It is not usually the best choice for text-heavy documents. But you can use it effectively to store marketing brochures, company logos, and website images.
● Other file format definitions
DOCX, PPTX, and other Microsoft Office files are suitable for documents that require updates or collaborative editing. However, they're not recommended for historical or archived documents that should not be modified, and certainly not for files containing personal or sensitive information!
Digital image format types
If you choose to save the output files in an image format, you will also need to decide on the image options you need:
● Bi-tonal or Black & White
A bi-tonal image means that each pixel is either black or white. These black & white images are a good choice if you want to keep the file size small. They are ideal when you want to store large volumes of images, and the information in the files doesn't require full color to accurately reproduce the original document. They help save the costs of digital storage space. Choose bi-tonal images only if your original documents are clear and in good physical condition, so the scanned images are legible.
● Grayscale
Grayscale images reproduce the original document using several shades of gray, not just black and white pixels like in bi-tonal images. Handwritten notes or complex markings like ink stamps or seals can be reproduced with higher clarity if you use grayscale images instead of bi-tonal ones. However, the output file sizes may be larger in size or weight. So, this format is a good choice if you have sufficient digital storage space or only a small volume of images to store.
● Full-color
Color scanning gives you an authentic, lifelike representation of the original document as it retains all the original colors, just like a photograph. Be aware that full-color images take up the most space of the 3 types discussed above. So, choose this format only if you truly need the full-color representation of the original. For example, use full-color to scan student records. If you want to retain red correction marks on exam sheets, you will need a full color scan as the red mark will not be seen as 'red' in a bi-tonal or grayscale image, and it may lose its significance.
Digital file format resolution guide
High-res or high-resolution is a word we often hear when we refer to high-quality images.
'High resolution' simply means more detail in an image. Resolution is measured by the number of 'pixels per inch' (PPI) in an image or 'dots per inch' (DPI).
The standard resolution we use for most scanning projects is 300 DPI.
But we can opt for a resolution ranging anywhere from 200 DPI to 1200 DPI.
The higher the resolution you use for scanning, the heavier the output file.
Choose a high scanning resolution (600 DPI and above) if you need to reprint the scanned image and want to scan full-color photographs or important documents containing photos (such as a passport or ID card.)
300 to 600 DPI is usually sufficient for most use cases.
Low-res images (200 DPI) are usually used on websites or other online channels when you want fast download times, and you can sacrifice some quality for a smaller size.
Delivery options for digital files
Delivery options refer to how you want to receive the output files. For example, do you want the scanning company to load the digital files on an external drive or USB? Or transfer them electronically to your server or intranet using FTP (File Transfer Protocol)? You may also opt to deploy a Document Management System (if you don't already have one) and route the digitized files into it for cloud storage.
● USB
USB or Universal Serial Bus is a 'big' name for a simple device! A convenient and widely used way to share or transfer digital files, you can specify that you want us to deliver the scanned output files via a USB device. Since it is a small device, it can also be sent to your premises via courier. If your data is sensitive, a USB can also be encrypted for data security,
● FTP
You can also ask for your digitized files to be delivered via FTP.
FTP stands for File Transfer Protocol. Using this method, we can upload the files via the internet and provide you with login credentials to access and download them using a secure FTP site.
● Cloud storage
Cloud file storage is ideal for sharing files and working collaboratively on documents from different locations. It is a good choice for hybrid workplaces, distributed teams and off-site locations. It provides a centralized digital repository that is accessible and secure. It is also one of the most cost-efficient ways of storing digital files and enabling automated and frequent backups. Many applications like DropBox and Google Drive provide basic file-sharing features with cloud storage for documents. But RDS offers a far better solution with OpenText AppEnhancer—a document management system that can be deployed on the cloud and provides powerful workflow automation for businesses of any scale.
Choosing the right digital file format for your needs
As you can see, many factors must be considered before deciding what is the right format to save scanned digital files. It will depend on your industry, the type of data contained in your documents, the volume of documents you need to scan and how you want to use the output files.
Talk to our document scanning experts to discuss your business requirements, and they can recommend the best format and delivery options for your scanning project.
Should I speak with digital transformation experts?
Scanning business documents is usually the first step in a digital transformation project. And even if you don't have a transformative initiative outlined yet, you can be sure that the next steps after document digitization will lead you to further automation that puts you on the path toward digitally transforming your business operations.
Talk to RDS if you have questions about scanning documents. Thousands of organizations trust us to meet their digital transformation goals—starting with the very first steps to digitization.
Our digital transformation consultants are just a call away.
Connect with RDS, and we'll be happy to set up a consultation.