JSI Tip 3530. How do I remove duplicate records from a file?


To remove duplicate records from a file, we must first sort it. If you were interested in writing the least amount of 'code', you could include the following in your batch:

For /F "tokens=*" %%s in ('Sort^<Input_File') Do @Find "%%s" Output_File||@echo %%s>>Output_File
While the above statement might be elegant, it executes much slower than SortDup.bat. The syntax for using SortDup.bat is:

SortDup FileName

where FileName is the full path to the file you want to sort and remove duplicate records from. Example:

SortDup "C:\Documents and Settings\Jerry\My Documents\MyFile.TXT"

SortDup.bat contains:

@echo off
setlocal
if \{%1\} EQU \{\} goto syntax
if not exist %1 goto syntax
set file=%1
set file="%file:"=%"
set work=%TEMP%\%~nx1
set work="%work:"=%"
set work=%work:\\=\%
sort %file% /O %work%
del /f /q %file%
for /f "Tokens=*" %%s in ('type %work%') do set record=%%s&call :output 
endlocal
goto :EOF
:syntax
@echo ***************************
@echo Syntax: SortDup Input_File 
@echo ***************************
endlocal
goto :EOF
:output
if not defined prev_rec goto write
if "%record%" EQU "%prev_rec%" goto :EOF
:write
@echo %record%>>%file%
set prev_rec=%record%
NOTE: See tip 3538 for an amended batch file.




Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish