如何:在目录树中查询重复文件 (LINQ)
This page is specific to:.NET Framework Version:
3.5
语言集成查询 (LINQ)
如何:在目录树中查询重复文件 (LINQ)

更新:2007 年 11 月

有时,多个文件夹中可能存在同名的文件。例如,在 Visual Studio 安装文件夹中,有多个文件夹包含 readme.htm 文件。此示例演示如何在指定的根文件夹中查询这样的重复文件名。第二个示例演示如何查询其大小和创建时间也匹配的文件。

示例

Module QueryDuplicateFileNames

    Public Sub Main()

        Dim path As String = "C:\Program Files\Microsoft Visual Studio 9.0\Common7"
        'QueryDuplicates1(path)
        ' Uncomment to run this query instead
        QueryDuplicates2(path)

    End Sub
    Sub QueryDuplicates1(ByVal root As String)

        Dim duplicates = From aFile In GetFiles(root) _
                                 Order By aFile.Name _
                                 Group aFile By aFile.Name Into newGroup = Group _
                                 Where newGroup.Count() >= 2 _
                                 Select newGroup

        ' Page the display so that the results can be read.
        Dim trimLength = root.Length
        PageOutput(duplicates, trimLength)

    End Sub
    Sub QueryDuplicates2(ByVal root As String)

        ' This time a composite key is used. This sub finds all files
        ' that have been copied into multiple subfolders.
        Dim duplicates = From aFile In GetFiles(root) _
                                 Order By aFile.Name _
                                 Group aFile By aFile.Name, aFile.CreationTime, aFile.Length Into newGroup = Group _
                                 Where newGroup.Count() >= 2 _
                                 Select newGroup

        ' Page the display so that the results can be read.
        Dim trimLength = root.Length
        PageOutput(duplicates, trimLength)

    End Sub
    ' Pages console diplay for large query results. No more than one group per page.
    ' This sub specifically works with group queries of FileInfo objects
    ' but can be modified for any type.
    Sub PageOutput(ByVal groupQuery, ByVal charsToSkip)

        ' "3" = 1 line for extension key + 1 for "Press any key" + 1 for input cursor.
        Dim numLines As Integer = Console.WindowHeight - 3
        ' Flag to indicate whether there are more results to diplay
        Dim goAgain As Boolean = True

        For Each fg As IEnumerable(Of System.IO.FileInfo) In groupQuery
            ' Start a new extension at the top of a page.
            Dim currentLine As Integer = 0

            Do While (currentLine < fg.Count())
                Console.Clear()

                ' Get the next page of results
                ' No more than one filename per page
                Dim resultPage = From file In fg _
                                Skip currentLine Take numLines

                ' Execute the query. Trim the paths in the output.
                For Each line In resultPage
                    Console.WriteLine(vbTab & line.FullName.Substring(charsToSkip))
                Next

                ' Advance the current position
                currentLine = numLines + currentLine

                ' Give the user a chance to break out of the loop
                Console.WriteLine("Press any key for next page or the 'End' key to exit.")
                Dim key As ConsoleKey = Console.ReadKey().Key
                If key = ConsoleKey.End Then
                    goAgain = False
                    Exit For
                End If
            Loop
        Next
    End Sub

    ' Function to retrieve a list of files. Note that this is a copy
    ' of the file information.
    Function GetFiles(ByVal root As String) As System.Collections.Generic.IEnumerable(Of System.IO.FileInfo)
        Return From file In My.Computer.FileSystem.GetFiles _
                  (root, FileIO.SearchOption.SearchAllSubDirectories, "*.*") _
               Select New System.IO.FileInfo(file)
    End Function
End Module


第一个查询使用一个简单的键确定是否匹配;这会找到同名但内容可能不同的文件。第二个查询使用复合键并根据 FileInfo 对象的三个属性来确定是否匹配。此查询非常类似于查找同名且内容类似或相同的文件。

编译代码

  • 创建一个面向 .NET Framework 3.5 版的 Visual Studio 项目。默认情况下,该项目具有对 System.Core.dll 的引用以及针对 System.Linq 命名空间的 using 指令 (C#) 或导入的命名空间 (Visual Basic)。在 C# 项目中,添加针对 System.IO 命名空间的 using 指令。

  • 将这段代码复制到项目中。

  • 按 F5 编译并运行程序。

  • 按任意键退出控制台窗口。

可靠编程

若要对多种类型的文档和文件的内容执行密集型查询操作,请考虑使用 Windows Desktop Search(Windows 桌面搜索)引擎。

请参见

概念

© 2010 Microsoft Corporation 版权所有。   保留所有权利 | 商标 | 隐私权声明
Page view tracker
为轻量型库评级
x
依无脚本原则生成的轻量型库 (loband),添加了大家要求的功能:搜索框和默认代码语言选择。
您喜欢这个搜索框吗?
您喜欢标签式代码块吗?
此主题有用吗?
提供详细反馈。
谢谢
x
感谢您帮助改善 MSDN Online。
反馈意见
切换视图
经典视图
轻量型视图
无脚本视图
切换视图