DataScience Workbook / 07: Data Acquisition and Wrangling / 2. Data manipulation / 2.1. Manipulating Excel data sheets / 2.1.1. Create worksheet from multiple text files

Introduction

This tutorial has been upgraded to incorporate Python-based automation approach, suplementing the foundational guide on Excel macros originally created by Arun Seetharam.

TASK or CHALLENGE:
If you have a large number of text files that you need to import as separate worksheets into Excel, this tutorial will show you how to automate the process, making it faster and less prone to errors.

Import multiple text files as separate worksheets in Excel

Managing and analyzing data spread across multiple text files can be a cumbersome and time-consuming task, especially when each file needs to be imported as a separate worksheet in an Excel workbook.
Manually importing each file one by one is not only tedious but also prone to errors.
This tutorial provides a step-by-step guide to automate the process.

By the end of this tutorial, you will be able to import a large number of text files into Excel efficiently, with each file placed in its own worksheet and labeled based on the file name.

A. Create macro in Excel (Windows and macOS)

This section will guide you through the process of using Excel macros to automate the import of multiple text files into Excel.
By using macros, you can streamline the process, save time, and minimize the risk of errors (compared to manual data entry).
Easier for advanced Excel users, requiring no additional software or programming knowledge (beyond VBA coding).
Less transferable across different operating systems compared to the Python-based approach.

Excel macros are sequences of instructions recorded to automate repetitive tasks within Excel, enhancing efficiency and consistency. Activating the Developer tab in Excel is necessary to create, edit, and run these macros, as it provides the tools and interface for macro management and VBA (Visual Basic for Applications) coding.

  1. Microsoft Excel: Before you begin, ensure you have macros enabled in your Excel tool.

    Windows:
    1. Open Excel: Start Excel and open a new, empty workbook.
    2. Access Excel Options:
      • Click on the File tab in the top-left corner.
      • From the menu, select Options. This will open the Excel Options dialog box.
    3. Customize Ribbon:
      • In the Excel Options dialog box, click on Customize Ribbon from the list on the left.
      • On the right side, you’ll see a list of the main tabs. Look for the Developer checkbox.
      • Check the box next to Developer.
    4. Apply and Close: Click OK to apply the changes and close the Excel Options dialog box.
      The Developer tab should now be visible in the Excel ribbon.
    macOS:
    1. Open Excel: Start Excel and open a new, empty workbook.
    2. Access Preferences:
      • Click on Excel in the menu bar at the top of the screen.
      • Select Preferences from the dropdown menu.
    3. Ribbon & Toolbar:
      • In the Excel Preferences dialog box, click on Ribbon & Toolbar. (see an image in the next step)
    4. Customize Ribbon:
      • In the Customize Ribbon section, you’ll see a list of tabs. Look for the Developer checkbox.
      • Check the box next to Developer.
    5. Apply and Close: Close the Ribbon & Toolbar preferences dialog box.
    6. Enable Full Disk Access for Excel To ensure that Excel macros work correctly and can manage inputs and outputs on macOS, you need to grant Excel full disk access.
      • Open System Preferences.
      • Go to Security & Privacy.
      • Click the Privacy tab.
      • In the left sidebar, select Full Disk Access.
      • Click the lock icon in the bottom left corner and enter your password to make changes.
      • Click the + button and add the Microsoft Excel application to the list.
  2. Text Files: A folder containing all the text files you want to import.
    The files can be in any text format such as .txt, .tsv, .csv, etc., as long as they contain plain structured text separated with a unified delimiter. The VBA code for a macro can be adjusted to handle these different file formats appropriately.

1. Prepare your text files

A detailed guide for preparing text files, along with a set of simple example files, is provided in the Organize Your Text Files section (of the Python-based automation approach), and you can use the same settings for this Excel macros approach.

Note that the file name of the text file will be used to label the worksheet (tab), without the .txt extension.

2. Add VBA code of the macro

  1. Open Excel app (graphical user interface).
  2. Click on Developer tab in the Excel wirndow and then Macros button.
  3. In the pop-up window, type in some name for a new macro (e.g., import_text) and click create or + (on macOS). The new pop-up window with Microsoft Visual Basic will appear.
  4. MS Visual Basic window: Paste the below code in the Code panel on the right:

    The VBA code for this macro differs between the requirements for Excel on Windows and macOS. Please ensure you select the version that matches your operating system.
    -for Windows: Ensure that you have the Scripting.FileSystemObject available for file operations.
    -for macOS: The script uses MacScript for folder selection.

    VBA code for Windows:

    code originally by @Arun, upgraded by @Alex

     Sub CombineTextFiles()
         Dim FilesToOpen As Variant
         Dim x As Integer
         Dim wkbAll As Workbook
         Dim wkbTemp As Workbook
         Dim ws As Worksheet
         Dim sDelimiter As String
    
         On Error GoTo ErrHandler
         Application.ScreenUpdating = False
    
         sDelimiter = "," ' Change delimiter to comma
    
         FilesToOpen = Application.GetOpenFilename _
           (FileFilter:="Text Files (*.txt), *.txt", _
           MultiSelect:=True, Title:="Text Files to Open")
    
         If TypeName(FilesToOpen) = "Boolean" Then
             MsgBox "No Files were selected"
             GoTo ExitHandler
         End If
    
         x = 1
         Set wkbTemp = Workbooks.Open(FileName:=FilesToOpen(x))
         wkbTemp.Sheets(1).Copy
         Set wkbAll = ActiveWorkbook
         wkbTemp.Close (False)
         Set ws = wkbAll.Worksheets(x)
         ws.Columns("A:A").TextToColumns _
           Destination:=ws.Range("A1"), DataType:=xlDelimited, _
           TextQualifier:=xlDoubleQuote, _
           ConsecutiveDelimiter:=False, _
           Tab:=False, Semicolon:=False, _
           Comma:=True, Space:=False, _
           Other:=False, OtherChar:=sDelimiter
         x = x + 1
    
         While x <= UBound(FilesToOpen)
             Set wkbTemp = Workbooks.Open(FileName:=FilesToOpen(x))
             With wkbAll
                 wkbTemp.Sheets(1).Move After:=.Sheets(.Sheets.Count)
                 Set ws = .Worksheets(x)
                 ws.Columns("A:A").TextToColumns _
                   Destination:=ws.Range("A1"), DataType:=xlDelimited, _
                   TextQualifier:=xlDoubleQuote, _
                   ConsecutiveDelimiter:=False, _
                   Tab:=False, Semicolon:=False, _
                   Comma:=True, Space:=False, _
                   Other:=False
             End With
             x = x + 1
         Wend
    
     ExitHandler:
         Application.ScreenUpdating = True
         Set wkbAll = Nothing
         Set wkbTemp = Nothing
         Exit Sub
    
     ErrHandler:
         MsgBox Err.Description
         Resume ExitHandler
     End Sub
    
    VBA code for macOS:

    code by @Alex

     Sub CombineTextFilesFromFolder()
         Dim FolderPath As String
         Dim CorrectedFolderPath As String
         Dim FileName As String
         Dim FileContent As String
         Dim TextFile As Integer
         Dim NewWorkbook As Workbook
         Dim ws As Worksheet
         Dim LineText As String
         Dim LineArray As Variant
         Dim i As Integer
    
         ' Select folder using MacScript
         On Error Resume Next
         FolderPath = MacScript("(choose folder with prompt ""Select Folder Containing Text Files"") as string")
         On Error GoTo 0
    
         If FolderPath = "" Then
             MsgBox "No folder was selected."
             Exit Sub
         Else
             ' Remove trailing colon and correct path format
             If Right(FolderPath, 1) = ":" Then FolderPath = Left(FolderPath, Len(FolderPath) - 1)
             FolderPath = Replace(FolderPath, ":", "/")
             If Left(FolderPath, 12) = "Macintosh HD" Then
                 CorrectedFolderPath = Mid(FolderPath, 13)
             Else
                 CorrectedFolderPath = FolderPath
             End If
         End If
    
         ' Create a new workbook
         Set NewWorkbook = Workbooks.Add
    
         ' Loop through each text file in the folder
         FileName = Dir(CorrectedFolderPath & "/*.txt")
         If FileName = "" Then
             MsgBox "No text files found in the selected folder."
             Exit Sub
         End If
    
         Do While FileName <> ""
             ' Open the text file and read its content
             TextFile = FreeFile
             Open CorrectedFolderPath & "/" & FileName For Input As TextFile
    
             FileContent = ""
             Do While Not EOF(TextFile)
                 Line Input #TextFile, LineText
                 FileContent = FileContent & LineText & vbCrLf
             Loop
             Close TextFile
    
             ' Add a new sheet to the new workbook
             Set ws = NewWorkbook.Sheets.Add(After:=NewWorkbook.Sheets(NewWorkbook.Sheets.Count))
             ws.Name = Left(FileName, Len(FileName) - 4)
    
             ' Split file content into rows and write to cells
             LineArray = Split(FileContent, vbCrLf)
             For i = LBound(LineArray) To UBound(LineArray)
                 ws.Cells(i + 1, 1).Value = LineArray(i)
             Next i
    
             ' Convert text to columns using comma delimiter
             ws.Columns("A:A").TextToColumns _
                 Destination:=ws.Range("A1"), DataType:=xlDelimited, _
                 TextQualifier:=xlDoubleQuote, _
                 ConsecutiveDelimiter:=False, _
                 Tab:=False, Semicolon:=False, _
                 Comma:=True, Space:=False, _
                 Other:=False
    
             ' Get the next file name
             FileName = Dir
         Loop
    
         ' Remove default Sheet1 if it exists
         On Error Resume Next
         Application.DisplayAlerts = False
         NewWorkbook.Sheets("Sheet1").Delete
         Application.DisplayAlerts = True
         On Error GoTo 0
    
     ExitHandler:
         Application.ScreenUpdating = True
         Set NewWorkbook = Nothing
         Exit Sub
    
     ErrHandler:
         MsgBox "Error: " & Err.Description
         Resume ExitHandler
     End Sub
    

    Now, you can simply close the window with the macro code. It is saved automatically.

    You are all set to import files!

    You can adjust both macros to handle text files with different file extensions and data delimiters by modifying the appropriate sections of the macro code.

    windows macOS
    FilesToOpen = Application.GetOpenFilename _ (FileFilter:="Text Files (*.txt), *.txt", _
    FileName = Dir(CorrectedFolderPath & "/*.txt")
    ws.Columns("A:A").TextToColumns _ Destination:=ws.Range("A1"), DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, _ ConsecutiveDelimiter:=False, _ Tab:=False, Semicolon:=False, _ Comma:=True, Space:=False, _ Other:=False, OtherChar:=sDelimiter ws.Columns("A:A").TextToColumns _ Destination:=ws.Range("A1"), DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, _ ConsecutiveDelimiter:=False, _ Tab:=False, Semicolon:=False, _ Comma:=True, Space:=False, _ Other:=False

3. Run the macro

Excel spreadsheet: Go back to your Excel spreadsheet window.

Click on Macros again. This time, the pop-up Macro window has a list of your added macros. Select the one you’ve just created. Then, click the Run button. A window should pop up asking location of the text files. Navigate to the files location and select all text files. Click OK. All text files will be imported to a new spreadsheet. It might take a while to complete, if there are large number of files, so be patient.

B. Python-based automation

This section will walk you through the process of using Python to automate the import of multiple text files into Excel.
This approach will save you time and reduce the risk of errors compared to manual data entry.
It is a universal, cross-platform solution (Windows, macOS, Linux).
It is a command-line solution (Python script executed in the terminal window).

Before you begin, ensure you have the following:

  1. Microsoft Excel: Make sure you have Excel installed on your computer.
  2. Python: This tutorial uses Python to automate the process.
    Ensure you have Python installed. You can download it from python.org.
    If you are starting your Python adventure from scratch, it is worthwhile to develop your skills gradually strengthening them over time. In that case, it’s best to follow this tutorial in order.
  3. Libraries: You will need the following Python libraries:
    • pandas: For data manipulation and analysis. …from the tutorial in this workbook
    • openpyxl: For Excel file operations. …from the documentation

    You can install these libraries using pip (once you have Python installed):

     pip install pandas openpyxl
    
  4. Text Files: A folder containing all the text files you want to import.
    The files can be in any text format such as .txt, .tsv, .csv, etc., as long as they contain plain structured text separated with a unified delimiter. The Python script can be adjusted to handle these different file formats appropriately.

1. Organize your text files

Ensure all the text files you want to import are stored in a single directory.
For example, let’s say your files are stored in a folder named text_files.

text_files
    |- file1.txt
    |- file2.txt
    | ...

Here is a set of example text files that you can use for this exercise:

data_CA.txt data_IA.txt data_IL.txt data_OH.txt data_WI.txt
Name, Age, City Alice, 30, Los Angeles Bob, 25, San Francisco Charlie, 35, San Diego Name, Age, City David, 40, Des Moines Emma, 22, Cedar Rapids Frank, 29, Davenport Name, Age, City George, 33, Chicago Hannah, 27, Springfield Ian, 31, Naperville Name, Age, City Jack, 28, Columbus Kathy, 26, Cleveland Larry, 34, Cincinnati Name, Age, City Mike, 32, Milwaukee Nina, 23, Madison Oscar, 29, Green Bay

Each file contains sample data with a common structure for simplicity. You can place these contents into their respective .txt files in the text_files directory.

File Naming: Ensure your text files have unique names, as these will be used to label the worksheets.

Sheet Name Length: Excel has a limit of 31 characters for sheet names.
Ensure your file names (excluding the .txt extension) do not exceed this limit.

2. Write the Python script

Create a new Python script (e.g., import_txt_to_excel.py). In terminal you can use the following command:

touch import_txt_to_excel.py

Open this script in your preferred development environment (e.g., VSC or Atom) or simple code editor in terminal (e.g., nano or vim).

nano import_txt_to_excel.py

Then, follow the steps below to write all sections of the script.

  1. Import Necessary Libraries
    First, import the necessary libraries:
    import os
    import pandas as pd
    from openpyxl import Workbook
    
  2. Set the variables (paths and input delimiter)
    Set the path to the directory containing your text files and the expected location for outputs:
    directory = 'path/to/your/text_files'        # Adjust directory with input TXT files
    output_path = 'path/to/your/output.xlsx'     # Adjust output path and XLSX filename
    

    Set the delimiter used in your input files:

    sep = ','                                    # Adjust delimiter if necessary
    

    Delimiter: Adjust the delimiter parameter to match the delimiter used in your text files (e.g., comma, tab, semicolon). It is used in the next part of the script in the pd.read_csv() method.

    Set the extension of your text files:

    ext = '.txt'                                 # Adjust extension of your text files
    
  3. Initialize a New Workbook
    Initialize a new Excel workbook and save it at the output_path location in your file system:
    if not os.path.exists(output_path):
     workbook = Workbook()
     workbook.save(output_path)
    
  4. Loop Through Each Text File
    Loop through each text file in the directory, read the content and add it as a new worksheet in the workbook:
    with pd.ExcelWriter(output_path, engine='openpyxl', mode='a') as writer:
     # Loop through each text file in the directory
     for filename in os.listdir(directory):
         if filename.endswith(ext):
             filepath = os.path.join(directory, filename)
             try:
                 # Read the text file into a DataFrame
                 df = pd.read_csv(filepath, delimiter=sep)
                 # Remove the .txt extension for the sheet name
                 sheet_name = os.path.splitext(filename)[0]
                 # Add the DataFrame to the workbook as a new worksheet
                 df.to_excel(writer, sheet_name=sheet_name, index=False)
             except Exception as e:
                 print(f"Error processing file {filename}: {e}")
    
     # Remove the default sheet if it's still present and empty
     if 'Sheet' in writer.book.sheetnames and len(writer.book.sheetnames) > 1:
         std = writer.book['Sheet']
         writer.book.remove(std)
    

    NOTE: The workbook is saved automatically when the with block-code is exited.

    Error Handling: The error handling catches any exceptions that occur while processing a file and prints an error message specifying the file name and the error details. Consider adding error handling to manage other potential issues, such as reading files or writing to Excel.

3. Run the script

Here is the complete script:

filename: import_txt_to_excel.py

code by @Alex

import os
import pandas as pd
from openpyxl import Workbook

# Set variables: paths (directory input and output file) and input delimiter
directory = 'path/to/your/text_files'        # Adjust directory with input TXT files
output_path = 'path/to/your/output.xlsx'     # Adjust output path and XLSX filename
sep = ','                                    # Adjust delimiter if necessary
ext = '.txt'                                 # Adjust extension of your text files

#------ AUTOMATICALLY BELOW ------#
# Create an empty Excel file if it doesn't exist
if not os.path.exists(output_path):
    workbook = Workbook()
    workbook.save(output_path)

# Use ExcelWriter to manage writing DataFrames to the workbook
with pd.ExcelWriter(output_path, engine='openpyxl', mode='a') as writer:
    # Loop through each text file in the directory
    for filename in os.listdir(directory):
        if filename.endswith(".txt"):
            filepath = os.path.join(directory, filename)
            try:
                # Read the text file into a DataFrame
                df = pd.read_csv(filepath, delimiter=sep)

                # Remove the .txt extension for the sheet name
                sheet_name = os.path.splitext(filename)[0]

                # Add the DataFrame to the workbook as a new worksheet
                df.to_excel(writer, sheet_name=sheet_name, index=False)
            except Exception as e:
                print(f"Error processing file {filename}: {e}")

    # Remove the default sheet if it's still present and empty
    if 'Sheet' in writer.book.sheetnames and len(writer.book.sheetnames) > 1:
        std = writer.book['Sheet']
        writer.book.remove(std)
  1. Open Command Prompt or Terminal
    Navigate to the directory where you saved import_txt_to_excel.py using the command prompt or terminal.

  2. Run the Script
    Execute the script by typing the following command and pressing Enter:
     python import_txt_to_excel.py
    
  3. Locate and check the output file
    After the script finishes running, navigate to the directory specified in output_path. Open the generated output.xlsx file in Excel. You should see a separate worksheet for each text file, labeled with the file name (without the .txt extension). the preview of the output Excel file