Page 1 of 1

Making searchable SQL-dB of breached passwords

Posted: 2019-02-27 08:43am
by The Infidel
So, a few weeks ago, another breach was announced on https://haveibeenpwned.com/.

I checked my email addresses there, and sure thing, my main address was leaked. This, however, does not necessarily mean that my passwords have been breached, and thankfully, you can check it on the site above, and the process is probably fine. Your password is never sent to the site, but the first letters of the sha-1 hash and all hashes with those letters are then downloaded to your computer for a check.

However, if you're a bit paranoid, like me, this just won't do it. 8)
I made a local database to check my passwords offline. They can, of course, be breached but not be in this collection, but if it is in this collection, change it immediately. SHA-1 is now considered unsafe and it might only be a matter of time before it is being brute forced. This collection does not have login info connected to it as a security measure, but other lists do.

Make sure to have at least 250Gb free space on the drive you're using for the database. Database takes around 40Gb and log file will be up to 190Gb, but can be truncated later.

So... I downloaded and installed MS SQL Server Developer (free for non commercial single use), then SSMS to have the coding environment.

Then I downloaded the SHA-1 password base (9.8 Gb compressed, 24Gb uncompressed. (https://downloads.pwnedpasswords.com/pa ... hash-v4.7z) and unzipped it with 7-zip.

The rest was code:

First, to build the database:

Code: Select all

-- Make database

USE [master]
GO
drop database if exists passord
-- Location of DB. Make sure you have around 250Gb
    CREATE DATABASE [passord] ON  PRIMARY 
    ( NAME = N'passord_Data', FILENAME = N'C:\tmp\passord_Data.mdf' , SIZE = 20GB , MAXSIZE = 100GB, FILEGROWTH = 16384KB )
     LOG ON 
    ( NAME = N'passord_Log', FILENAME = N'C:\tmp\passord_Log.ldf' , SIZE = 1Gb , MAXSIZE = 250GB , FILEGROWTH = 16384KB )
    GO

use passord

drop table if exists dbo.collection1
create table dbo.collection1
(
pwd char(40) not null, -- always $40
ant int
) on [PRIMARY]
with (data_compression = page)

--Optimize for reading a lot of data
ALTER DATABASE passord SET RECOVERY SIMPLE;

--Now, bulk insert of total 550 000 000 #.
--7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662
BULK INSERT dbo.collection1 FROM 'c:\tmp\pwned-passwords-sha1-ordered-by-hash-v4.txt'  
 WITH 
 (
 DATAFILETYPE    = 'char',
 FIELDTERMINATOR = ':',
 ROWTERMINATOR   = '\n'
 );
 
--Index. This is _very_ important. By making the index after the table, things should go faster
-- File is already sorted on pwd.
create unique clustered index idx on dbo.collection1(pwd) with (data_compression = page) on [PRIMARY];

--Checking
select top 100 * from passord.dbo.collection1
This might take a while. On my i7-4790K with 32Gb RAM and Samsung Evo 970 500Gb drive it took around 30 minutes.
Now, you can truncate the log. You really don't need it anymore.

Then the code to check my passwords locally. I do this by creating a table with my passwords, then encrypts them with SHA-1 and checks it against dbo.collection1. This is where you really see the difference between indexed and non indexed tables. This code takes less than a second to run:

Code: Select all

use passord

drop table if exists #PwdList
create table #PwdList
(
MyPwd varchar(100),
MyPwdH  char(40)
)

insert into #PwdList (MyPwd)
Values 
 ('password'),
 ('Password'),
 ('qwerty'),
 ('abc123'),
 ('123456'),
 ('abcdefgh')
 -- your passwords here. Make sure you don't have online backup tools that backs up the table or program.
;

update #PwdList
  Set MyPwdH = upper( sys.fn_varbintohexsubstring(0, HashBytes('SHA1', MyPwd), 1, 0));

--select * from #PwdList

select top 1000 * from passord.dbo.collection1 C
   inner join #PwdList M on C.pwd = M.MyPwdH
   order by ant desc

The result (I used some bleeding obvious passwords that people use for reference. "qwerty" is used 3 810 555 times according to this password breach.

Code: Select all

pwd	ant	MyPwd	MyPwdH
7C4A8D09CA3762AF61E59520943DC26494F8941B	23174662	123456	7C4A8D09CA3762AF61E59520943DC26494F8941B
B1B3773A05C0ED0176787A4F1574FF0075F7521E	3810555	qwerty	B1B3773A05C0ED0176787A4F1574FF0075F7521E
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8	3645804	password	5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
6367C48DD193D56EA7B0BAAD25B19455E529F5EE	2834058	abc123	6367C48DD193D56EA7B0BAAD25B19455E529F5EE
8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D	117316	Password	8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D
425AF12A0743502B322E93A015BCF868E324D56A	58653	abcdefgh	425AF12A0743502B322E93A015BCF868E324D56A
I guess I didn't have to post this, but the inner nerd in me drives me, and I know there are others here.