您好,登錄后才能下訂單哦!
1.找開發去掉驗證碼或者使用萬能驗證碼
2.使用OCR自動識別
使用OCR自動化識別,一般識別率不是太高,處理一般簡單驗證碼還是沒問題
這里使用的是Tesseract-OCR,下載地址:https://github.com/A9T9/Free-Ocr-Windows-Desktop/releases
怎么使用呢?
進入安裝后的目錄:
tesseract.exe test.png test -1
準備一份網頁,上面使用該驗證碼
<html><head><title>Table test by Young</title></head><body> </br><h2> Test </h2> <img src="http://csujwc.its.csu.edu.cn/sys/ValidateCode.aspx?t=1"> </br></body></html>
要識別驗證碼,首先得取得驗證碼,這兩款采取對 頁面元素部分截圖的方式,首先獲取整個頁面的截圖
然后找到頁面元素坐標進行截取
/** * This method for screen shot element * * @param driver * @param element * @param path * @throws InterruptedException */ public static void screenShotForElement(WebDriver driver, WebElement element, String path) throws InterruptedException { File scrFile = ((TakesScreenshot) driver) .getScreenshotAs(OutputType.FILE); try { Point p = element.getLocation(); int width = element.getSize().getWidth(); int height = element.getSize().getHeight(); Rectangle rect = new Rectangle(width, height); BufferedImage img = ImageIO.read(scrFile); BufferedImage dest = img.getSubp_w_picpath(p.getX(), p.getY(), rect.width, rect.height); ImageIO.write(dest, "png", scrFile); Thread.sleep(1000); FileUtils.copyFile(scrFile, new File(path)); } catch (IOException e) { e.printStackTrace(); } }
截取完元素,就可以調用Tesseract-OCR生成text
// use Tesseract to get strings Runtime rt = Runtime.getRuntime(); rt.exec("cmd.exe /C tesseract.exe D:\\Tesseract-OCR\\test.png D:\\Tesseract-OCR\\test -1 ");
接下來通過java讀取txt
/** * This method for read TXT file * * @param filePath */ public static void readTextFile(String filePath) { try { String encoding = "GBK"; File file = new File(filePath); if (file.isFile() && file.exists()) { // 判斷文件是否存在 InputStreamReader read = new InputStreamReader( new FileInputStream(file), encoding);// 考慮到編碼格式 BufferedReader bufferedReader = new BufferedReader(read); String lineTxt = null; while ((lineTxt = bufferedReader.readLine()) != null) { System.out.println(lineTxt); } read.close(); } else { System.out.println("找不到指定的文件"); } } catch (Exception e) { System.out.println("讀取文件內容出錯"); e.printStackTrace(); } }
整體代碼如下:
1 package com.dbyl.tests; 2 3 import java.awt.Rectangle; 4 import java.awt.p_w_picpath.BufferedImage; 5 import java.io.BufferedReader; 6 import java.io.File; 7 import java.io.FileInputStream; 8 import java.io.IOException; 9 import java.io.InputStreamReader; 10 import java.io.Reader; 11 import java.util.concurrent.TimeUnit; 12 13 import javax.p_w_picpathio.ImageIO; 14 15 import org.apache.commons.io.FileUtils; 16 import org.openqa.selenium.By; 17 import org.openqa.selenium.OutputType; 18 import org.openqa.selenium.Point; 19 import org.openqa.selenium.TakesScreenshot; 20 import org.openqa.selenium.WebDriver; 21 import org.openqa.selenium.WebElement; 22 23 import com.dbyl.libarary.utils.DriverFactory; 24 25 public class TesseractTest { 26 27 public static void main(String[] args) throws IOException, 28 InterruptedException { 29 30 WebDriver driver = DriverFactory.getChromeDriver(); 31 driver.get("file:///C:/Users/validation.html"); 32 driver.manage().timeouts().pageLoadTimeout(30, TimeUnit.SECONDS); 33 WebElement element = driver.findElement(By.xpath("http://img")); 34 35 // take screen shot for element 36 screenShotForElement(driver, element, "D:\\Tesseract-OCR\\test.png"); 37 38 driver.quit(); 39 40 // use Tesseract to get strings 41 Runtime rt = Runtime.getRuntime(); 42 rt.exec("cmd.exe /C tesseract.exe D:\\Tesseract-OCR\\test.png D:\\Tesseract-OCR\\test -1 "); 43 44 Thread.sleep(1000); 45 // Read text 46 readTextFile("D:\\Tesseract-OCR\\test.txt"); 47 } 48 49 /** 50 * This method for read TXT file 51 *
52 * @param filePath 53 */ 54 public static void readTextFile(String filePath) { 55 try { 56 String encoding = "GBK"; 57 File file = new File(filePath); 58 if (file.isFile() && file.exists()) { // 判斷文件是否存在 59 InputStreamReader read = new InputStreamReader( 60 new FileInputStream(file), encoding);// 考慮到編碼格式 61 BufferedReader bufferedReader = new BufferedReader(read); 62 String lineTxt = null; 63 while ((lineTxt = bufferedReader.readLine()) != null) { 64 System.out.println(lineTxt); 65 } 66 read.close(); 67 } else { 68 System.out.println("找不到指定的文件"); 69 } 70 } catch (Exception e) { 71 System.out.println("讀取文件內容出錯"); 72 e.printStackTrace(); 73 } 74 } 75 76 /** 77 * This method for screen shot element 78 *
79 * @param driver 80 * @param element 81 * @param path 82 * @throws InterruptedException 83 */ 84 public static void screenShotForElement(WebDriver driver, 85 WebElement element, String path) throws InterruptedException { 86 File scrFile = ((TakesScreenshot) driver) 87 .getScreenshotAs(OutputType.FILE); 88 try { 89 Point p = element.getLocation(); 90 int width = element.getSize().getWidth(); 91 int height = element.getSize().getHeight(); 92 Rectangle rect = new Rectangle(width, height); 93 BufferedImage img = ImageIO.read(scrFile); 94 BufferedImage dest = img.getSubp_w_picpath(p.getX(), p.getY(), 95 rect.width, rect.height); 96 ImageIO.write(dest, "png", scrFile); 97 Thread.sleep(1000); 98 FileUtils.copyFile(scrFile, new File(path)); 99 } catch (IOException e) {100 e.printStackTrace();101 }102 }103 104 }
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。