CSV, Excel, or SQL? It Doesn't Matter with PowerShell

Structured data is stored in many different ways. It could be in a CSV file, an Excel workbook, a SQL database, a MySQL database, and so on. Each has its own purpose with individual strengths and weaknesses. When managing this data, you typically have to interface with it in completely different ways. A CSV file can be managed simply in Notepad, an Excel workbook requires having Excel installed, and you’d need to fire up SQL Server Management Studio to interface with a SQL database. But what if you didn’t have to? What if you had a single place where you could manage this data, and the tool you used would automatically take care of all the translation in the background? With PowerShell, you can.

Perhaps you have a CSV file that contains employee demographic data. You need to get this information into a SQL database table. How would you do this? You could try to manually copy and paste fields from the CSV into the SQL table, but you’d die before you finished. You could also try to export the CSV and import the data into the SQL database in one big chunk, but then you’d have to worry about SQL data types, matching up fields, and so on. It’d be nice to simply do something like Get-Employee -DataSource CSV | New-Employee -DataSource SQL. If the SQL table and CSV fields are preconfigured, you shouldn’t need to worry about matching them up. There are many other things that must happen in order to make this transition, but you shouldn’t have to worry about matching fields. Here’s how.

The trick is creating individual functions for the Get/Set/Add/Remove functions for each of the data sources you’ll be using. For example, if you need to manipulate data in a CSV file, you may have Get-CsvRow, Set-CsvRow, Add-CsvRow, and Remove-CsvRow. The same goes for a SQL table, Excel workbook, and so on. It’s all about compartmentalizing the messy code inside of these functions. Once this happens, you don’t have to worry about how the data gets manipulated in each data source.

I’ve created a starter module to help you get started. I’ve only built out Get-CsvRow and Add-SqlRow in order to demonstrate the concept. Please continue building this to make it more robust. Here’s an example of what you can do with this concept.

In this example, I’m querying rows from a CSV file. Then using PowerShell’s pipeline, I'm creating a SQL INSERT string from the CSV row’s values. The string is being echoed out here to the console, but inside the script, it has code to actually run this query on the SQL server.

Notice how this function can be coded so it only specifies the bare minimum of attributes required to get the data in the CSV into the SQL database. I’m not fooling around with loading SQL modules, tweaking Where-Object filters, or anything like that. I’m simply using Get-CsvRow and feeding that output directly into Add-SqlRow.

This module can be used for multiple purposes. In this instance, I’m using it as a syncing tool. But it could just as easily be used to query multiple data sources in a single script and then merge them into some common format, for example. The concept behind this module will work for numerous tasks when you need to manipulate data from one or more data sources.

If you have a specific kind of data you’re working with, such as employees, I encourage you to create separate PowerShell modules using the same naming convention I have here. This way, the syntax will be even simpler to understand. As mentioned previously, instead of using Get-CsvRow/New-SqlRow, you could have code like Get-Employee -DataSource CSV | New-Employee -DataSource SQL, if coded correctly.

Could you use something like this starter module for a current project? Please download it and start experimenting. Mold it to your needs. Just be sure to write your scripts in a way that can be reused. That way, you'll be able to use this powerful tool with every project that requires managing data from various data sources.

Comments

Plain text