SEARCH

Enter your search query in the box above ^, or use the forum search tool.

You are not logged in.

#1 2012-01-06 17:43:02

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Access google translate from a terminal.

Google shut down their web api for translation in December because it was being "abused" (auto-generation of web page content for black-hat SEO?). Now there's a free web page for browsers and a fee-paying api. Here's a script that lets you send in a query to that web page from a terminal. Examples:

john@raffles3:~$ translate chien
dog
john@raffles3:~$ translate legs fr
jambes
john@raffles3:~$ translate legs fr en
legacy
john@raffles3:~$ translate 手紙
Letter
john@raffles3:~$ translate 手紙 zh-TW en
Toilet paper
john@raffles3:~$ translate --help
translate <text> [[<source language>] <target language>]
if target missing, use DEFAULT_TARGET_LANG
if source missing, use auto

You can set DEFAULT_TARGET_LANG to taste.
Make a symlink in ~/bin called translate pointing to translate.sh, or add a bash alias.
You might need to install curl and html2text.
Here's the script:

#!/bin/bash
# access translate.google.com from terminal

help='translate <text> [[<source language>] <target language>]
if target missing, use DEFAULT_TARGET_LANG
if source missing, use auto'

# adjust to taste
DEFAULT_TARGET_LANG=en

if [[ $1 = -h || $1 = --help ]]
then
    echo "$help"
    exit
fi

if [[ $3 ]]; then
    source="$2"
    target="$3"
elif [[ $2 ]]; then
    source=auto
    target="$2"
else
    source=auto
    target="$DEFAULT_TARGET_LANG"
fi

result=$(curl -s -i --user-agent "" -d "sl=$source" -d "tl=$target" --data-urlencode "text=$1" https://translate.google.com)

# after redirect (get both sets of headers) use last version of encoding:
encoding=$(awk '/Content-Type: .* charset=/ {sub(/^.*charset=["'\'']?/,""); sub(/[ "'\''].*$/,""); OUT=$0}END{print OUT}' <<<"$result")

iconv -f $encoding <<<"$result" |  awk 'BEGIN {RS="</div>"};/<span[^>]* id=["'\'']?result_box["'\'']?/' | html2text -utf8
exit

N.B. Google's terms of service refer to queries coming from a real user, and as long as we're using this to look up a word, phrase or passage from our desktop I don't see any problem. If someone started flooding the server from an automatic script they might block his IP, or make the web page harder to access, or even take it down, so please be nice! smile

Tomorrow or so I'll post a bit on how this was figured out and put together, for anyone who'd like to try with some other web page.

edit: Tweaked the awk commands in the last line a bit. Probably makes no practical difference.
edit2: Hadn't noticed that when switching from grep to awk + html2text a lot of that regex stuff had become unnecessary. It's no faster now, but looks a bit simpler. (Left the old line in for comparison.)
edit 140509: Changed protocol from http to https.

Note: This is just a hack - I tried to make the regular expressions as robust as possible, bit it will probably break eventually when Google change the code. If you notice a problem please post and we can try and fix it...

-----------------------------
EDIT 150528 (pasted in from a later post)
A little wrapper script for translate.sh. Bind it to a keyboard shortcut and you can select a bit of text anywhere on your desktop and get a popup translation. (Select, then keyboard.)

#!/bin/bash
# translate-selection.sh - google translation of selected text
# needs xsel to read from clipboard

query=$(xsel)
#notify-send "Google Translate" "Query is ${query}"
translation=$($HOME/scripts/translate.sh "$query")
zenity --info --title "Translation" --text "$translation"
exit

Last edited by johnraff (2012-01-10 16:39:50)


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

Be excellent to each other!

#2 2012-01-06 20:34:55

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Access google translate from a terminal.

pretty cool johnraff, thanks!

Offline

#3 2012-01-07 17:35:14

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

Interacting with web pages is a nice hobby, and in case you want to try some yourself here are a few points on how this was put together.

You need a basic idea of, or willingness to learn, html, regular expressions, grep, sed and probably awk. A basic idea will go a long way. cool

1) Have a look at the code. Your browser will show it. http://translate.google.com is a horrible porridge of html tags and lots of javascript, all packed into a mere 17 lines, broken apparently at random. I nearly gave up at this point, but try selecting the bit round the result box where the translated text appears and right-click "View selection source" (in Firefox anyway). That's more reasonable, though still huge:

<div id="gt-res-wrap"><div id="gt-res-content" class="almost_half_cell"><div dir="ltr" style=""><div id="tts_button" style="" class=""><div role="button" tabindex="0" dataattribute="http://www.gstatic.com/translate/sound_player2.swf" title="http://www.gstatic.com/translate/sound_player2.swf" style="background: url(&quot;chrome://flashblock/content/flash.png&quot;) no-repeat scroll center center transparent ! important; min-width: 32px ! important; min-height: 32px ! important; width: 32px; height: 32px; border: 1px solid rgb(223, 223, 223); cursor: pointer; overflow: hidden; display: inline-block; visibility: visible ! important; -moz-box-sizing: border-box;" bgactive="url(chrome://flashblock/content/flashplay.png) no-repeat center" bginactive="url(chrome://flashblock/content/flash.png) no-repeat center"></div></div><span id="result_box" class="short_text" lang="en"><span class="hps">dog</span></span></div></div><div id="spell-place-holder" style="height: 27px;"></div>

The translation, "dog" in this case, is near the end of this section, inside a span with an id of "result_box". That's good news because an id means nothing else on the page can share it - it's a unique identifier for our results. Hmm OK maybe it's possible.

2) Try it in a terminal. I used curl here, though wget is probably OK too.

john@raffles3:~$ curl http://translate.google.com
<!DOCTYPE html><html lang=en><meta charset=utf-8><title>Error 403 (Forbidden)!!1</title><style>*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKsAAADVCAMAAAAfHvCaAAAAGFBMVEVYn%2BH [i](lots and lots of this stuff - probably a base64 coded image)[/i] KJF%2FzufdnZtz7PrrevDZ03GPAaJDjbRA8dGsW6X6cgNmAhSEG%2FUiY%2Fsfiv02O7iVu1LunAAAAAElFTkSuQmCC);display:block;height:55px;margin:0 0 -7px;width:150px}* > #g{margin-left:-2px}#g img{visibility:hidden}* html #g img{visibility:visible}*+html #g img{visibility:visible}</style><a href=//www.google.com/ id=g><img src=//www.google.com/images/logo_sm.gif alt=Google></a><p><b>403.</b> <ins>That’s an error.</ins><p>Your client does not have permission to get URL <code>/</code> from this server.  (Client IP address: 61.211.133.253)<br><br>
  <ins>That’s all we know.</ins>

Anyway it's no good. It was OK in Firefox but Google don't like curl (or wget). Both of them are honest by default and tell the server who they are. We can fix it with the --user-agent option, just an empty string "" is enough, so

curl --user-agent "" http://translate.google.com > test.html

will get us a local copy of the page to play with without hitting Google over and over.

3) How to send the request? There's a handy perl script called formfind.pl here from the curl people. Send it that Google porridge and it will tell you about the webform inside:

john@raffles3:~$ curl --user-agent "" http://translate.google.com | ./scripts/formfind.pl 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 37418    0 37418    0     0   217k      0 --:--:-- --:--:-- --:--:--  285k
--- FORM report. Uses POST to URL "/"
--- type: application/x-www-form-urlencoded
Select: NAME="sl"
  Option VALUE="auto" (SELECTED)
  Option VALUE="separator"
  Option VALUE="af"
  Option VALUE="sq"
  Option VALUE="ar"
  Option VALUE="hy"
  Option VALUE="az"
  Option VALUE="eu"
  Option VALUE="be"
  Option VALUE="bn"
  Option VALUE="bg"
  Option VALUE="ca"
  Option VALUE="zh-CN"
  Option VALUE="hr"
  Option VALUE="cs"
  Option VALUE="da"
  Option VALUE="nl"
  Option VALUE="en"
  Option VALUE="et"
  Option VALUE="tl"
  Option VALUE="fi"
  Option VALUE="fr"
  Option VALUE="gl"
  Option VALUE="ka"
  Option VALUE="de"
  Option VALUE="el"
  Option VALUE="gu"
  Option VALUE="ht"
  Option VALUE="iw"
  Option VALUE="hi"
  Option VALUE="hu"
  Option VALUE="is"
  Option VALUE="id"
  Option VALUE="ga"
  Option VALUE="it"
  Option VALUE="ja"
  Option VALUE="kn"
  Option VALUE="ko"
  Option VALUE="la"
  Option VALUE="lv"
  Option VALUE="lt"
  Option VALUE="mk"
  Option VALUE="ms"
  Option VALUE="mt"
  Option VALUE="no"
  Option VALUE="fa"
  Option VALUE="pl"
  Option VALUE="pt"
  Option VALUE="ro"
  Option VALUE="ru"
  Option VALUE="sr"
  Option VALUE="sk"
  Option VALUE="sl"
  Option VALUE="es"
  Option VALUE="sw"
  Option VALUE="sv"
  Option VALUE="ta"
  Option VALUE="te"
  Option VALUE="th"
  Option VALUE="tr"
  Option VALUE="uk"
  Option VALUE="ur"
  Option VALUE="vi"
  Option VALUE="cy"
  Option VALUE="yi"
[end of select]
Select: NAME="tl"
  Option VALUE="af"
  Option VALUE="sq"
  Option VALUE="ar"
  Option VALUE="hy"
  Option VALUE="az"
  Option VALUE="eu"
  Option VALUE="be"
  Option VALUE="bn"
  Option VALUE="bg"
  Option VALUE="ca"
  Option VALUE="zh-CN"
  Option VALUE="zh-TW"
  Option VALUE="hr"
  Option VALUE="cs"
  Option VALUE="da"
  Option VALUE="nl"
  Option VALUE="en" (SELECTED)
  Option VALUE="et"
  Option VALUE="tl"
  Option VALUE="fi"
  Option VALUE="fr"
  Option VALUE="gl"
  Option VALUE="ka"
  Option VALUE="de"
  Option VALUE="el"
  Option VALUE="gu"
  Option VALUE="ht"
  Option VALUE="iw"
  Option VALUE="hi"
  Option VALUE="hu"
  Option VALUE="is"
  Option VALUE="id"
  Option VALUE="ga"
  Option VALUE="it"
  Option VALUE="ja"
  Option VALUE="kn"
  Option VALUE="ko"
  Option VALUE="la"
  Option VALUE="lv"
  Option VALUE="lt"
  Option VALUE="mk"
  Option VALUE="ms"
  Option VALUE="mt"
  Option VALUE="no"
  Option VALUE="fa"
  Option VALUE="pl"
  Option VALUE="pt"
  Option VALUE="ro"
  Option VALUE="ru"
  Option VALUE="sr"
  Option VALUE="sk"
  Option VALUE="sl"
  Option VALUE="es"
  Option VALUE="sw"
  Option VALUE="sv"
  Option VALUE="ta"
  Option VALUE="te"
  Option VALUE="th"
  Option VALUE="tr"
  Option VALUE="uk"
  Option VALUE="ur"
  Option VALUE="vi"
  Option VALUE="cy"
  Option VALUE="yi"
[end of select]
Button: "Translate" (SUBMIT)
Input: NAME="js" VALUE="n" (HIDDEN)
Input: NAME="prev" VALUE="_t" (HIDDEN)
Input: NAME="hl" VALUE="en" (HIDDEN)
Input: NAME="ie" VALUE="ISO-8859-1" (HIDDEN)
Input: NAME="layout" VALUE="2" (HIDDEN)
Input: NAME="eotf" VALUE="1" (HIDDEN)
Textarea: NAME="text"
Input: NAME="file" (FILE)
--- end of FORM

So it looks as if "sl" is "source language", "tl" is "target language" and "text" might be our input. There are also some mysterious hidden values but I haven't found leaving them out seems to do any harm. If anyone knows what they're for, please share the info! Now we can send a request with curl something like:

curl --user-agent "" -d "sl=$source" -d "tl=$target" --data-urlencode "text=$1" http://translate.google.com

The text has to be sent with the --data-urlencode option because it might have spaces etc, or use multi-byte characters like Japanese. If you have problems with curl you can check what it's actually sending to the server by running netcat in a separate terminal

nc -l -p 3333

then substitute 'http://localhost:3333' for the google url in your curl command.

4) Analyse what comes back. This is where those regular expressions come in, but I'll have to leave that a couple of days because it's time to crash and I'll be gone tomorrow.


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#4 2012-01-07 23:18:12

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Access google translate from a terminal.

^ thanks johnraff, really nice of you to do such a detailed explanation. you are talking of a basic knowledge of this stuff, and i think i have a bit more than basic knowledge of at least html, but i still don't think i would've nailed it as properly as you did. you would probably make a pretty good detective smile

Offline

#5 2012-01-07 23:47:53

gensym
#! Junkie
Registered: 2011-10-17
Posts: 447

Re: Access google translate from a terminal.

Thanks for sharing this! I will definitely give this a go smile

johnraff wrote:

Note: This is just a hack - I tried to make the regular expressions as robust as possible, bit it will probably break eventually when Google change the code. If you notice a problem please post and we can try and fix it...

reminds me of this smile

Last edited by gensym (2012-01-07 23:50:30)


'Multiple exclamation marks,' he went on, shaking his head, 'are a sure sign of a diseased mind.', {Eric}

Offline

#6 2012-01-08 00:05:48

el_koraco
#!/loony/bun
From: inside Ed
Registered: 2011-07-25
Posts: 4,749

Re: Access google translate from a terminal.

This is awesome. Awesome.

Offline

#7 2012-01-10 17:42:38

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

...continued from post 3.

4) Analyse what comes back. The first thing that comes up when playing with grep is that it doesn't work properly when the returned page's results are in Japanese. Turns out that instead of using UTF-8 for everything as you might expect, the translated page comes back with different encodings for each language - Shift_JIS for Japanese, ISOsomething for European... So the first job is to convert it to UTF-8 so we can work with it. 'iconv' will do it, but needs to be told what encoding to convert from. The encoding is near the top of the html as eg '<meta content="text/html; charset=Shift_JIS" http-equiv="content-type">' but I thought it would be cleaner to get it from the http headers sent by the server. Curl's -i option will add them to the top of the file, and grep will fetch the line we want: it starts "Content-Type" so this command will get it:

encoding=$(grep -E -m1 -o "Content-Type: .* charset=[[:alnum:]_-]+" <<<"$result")

(If the page is in $result.)

5) Now we can give iconv the right encoding to use:

iconv -f $encoding <<<"$result" | grep -E -o "<span[^>]* id=(['\"]?)result_box\1 [^>]*>( *<[^>]*>)*[^<]*" | html2text -utf8

Looks bad, and it is, really. Grep's -E option lets us do away with some backslashes for readability, -o means just output the matched part. The key is the 'id=result_box' bit in the html page code - id means it's unique on the page. Now attributes could also be enclosed in quotes, single or double, so Google could have written id="result_box" or id='result_box'. Any point in covering those possiblilities? Probably not, but I did anyway.

[^>] is a handy construction. It means anything except a closing tag. (Likewise [^"] for things in quotes.)
<[^>]*> just means any tag at all, so ( *<[^>]*>)* means zero or more tags, possibly preceded by spaces. Finally the bit we want is represented by [^<]* . Sed could have been used to chop off all the tags after it, but there's also a need to convert html entities like &#39; back to apostrophes. html2text will do that, and strip out all the tags too.

6) This now works, but awk turned out to be a lot faster. Set the Record Separator to '</div>' instead of the default linebreaks and we break the file up into little chunks, and line breaks no longer matter. I didn't get it at first, but now all we need to do is search for the record matching the regex:

<span[^>]* id=["'\'']?result_box["'\'']?

and pass it to html2text. smile

Last edited by johnraff (2012-01-11 17:55:06)


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#8 2012-01-10 17:47:19

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

@gensym Yes the topic keeps coming up - Real Coders don't do this kind of stuff. Probably true (can anyone recommend an html parser?) but no astronauts' lives are at stake here...

@el_k You're too kind. This is just a quick mashup.


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#9 2012-01-10 17:56:38

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Access google translate from a terminal.

^ for stuff like this, which will probably just be used by you and some others from the forum etc, and not by anyone letting astronaut's lives depend on it, i think using regex to parse html is just fine. it can also be more fun. a couple of years ago when i started doing html, i wanted to do an animation. i didn't know Flash, which i probably should've used. i did know html and animated gifs. so, i made an animated dude walking, and used a 'marquee'-tag (scroll something along the screen) to scroll the animated gif at just the right speed so it looked like it was walking. it felt awesome and was lots of fun to make. and it Just Worked. most important thing of all, imo.

Offline

#10 2012-01-11 18:08:24

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

Wow! Sounds pretty neat. cool
Pity a lot of people switch off marquees now. Animated gifs don't get such a good press either...


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#11 2012-01-11 19:10:29

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Access google translate from a terminal.

^ yeah, i was so amazed i could actually get that idea to work. all the cool kids were doing flash, i was doing 10-frame walking-dude animated gifs moving across the screen through configured-to-death marquee-tags wink i just got a kick out of being able to do something like that just with, basically, native HTML, while all HTML was 'supposed to be' was markup-language.

i can have the same kick building stuff with the tools at hand, regex and html in this case, instead of using something pre-build that is supposed to be 'better'. i'm sure these dudes know what they're talking about, but i am too non-conformist to not enjoy doing it my way anyway big_smile

Offline

#12 2012-01-11 19:26:56

gensym
#! Junkie
Registered: 2011-10-17
Posts: 447

Re: Access google translate from a terminal.

@johnraff,rhowaldt

You have a valid point there,  nobodies life depends on it, it is safe and yet it gives a weird feeling a satisfaction (and in this particular case it's way more practical). I just don't like the idea of parsing a context free formal language with regular expressions, but that may be something personal as regexes in programming languages are not that "reg". Regardless of this, I really like your script  smile

@parser
For python there is a built in one, for perl there are a bunch of them in cpan.

Last edited by gensym (2012-01-11 21:04:42)


'Multiple exclamation marks,' he went on, shaking his head, 'are a sure sign of a diseased mind.', {Eric}

Offline

#13 2012-01-12 16:25:50

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

@gensym You're right of course, but in this case we're not really into parsing html pages in general, just extracting a particular bit of data from a particular page. While in this script we're proof against minor html-compliant changes like extra line-breaks or different tag attributes, if Google changed the id of that span even a "proper" parser would fail. Anyway, thanks for that python link. Now I have to learn python! edit: Just ran into this on a blog I follow, on parsing with python.

@rhowaldt Any chance of seeing that walking man?

@all A little wrapper script for translate.sh. Bind it to a keyboard shortcut and you can select a bit of text anywhere on your desktop and get a popup translation. (Select, then keyboard.)

#!/bin/bash
# translate-selection.sh - google translation of selected text
# needs xsel to read from clipboard

query=$(xsel)
#notify-send "Google Translate" "Query is ${query}"
translation=$($HOME/scripts/translate.sh "$query")
zenity --info --title "Translation" --text "$translation"
exit

Last edited by johnraff (2012-01-12 16:56:53)


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#14 2012-01-12 17:25:33

rhowaldt
#!*$%:)
Registered: 2011-03-09
Posts: 4,396

Re: Access google translate from a terminal.

^ that popup-thingy is a nice addition to the script.

sadly, the walking man happened about 10 years ago, so no chance in hell of me having that lying around anywhere. i could show you some really nice animations i did from those days, if my HD hadn't crashed back then. oh, i can still remember the agony and anger i felt when that happened, my first HD crash. still not doing regular backups though. i'm stupid sometimes smile

Offline

#15 2012-01-13 04:39:52

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

tYzg3dA Even with a checker like this I still don't do it for another week or so. roll


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#16 2012-05-21 11:13:47

henrynott
New Member
Registered: 2012-05-21
Posts: 2

Re: Access google translate from a terminal.

It does not work..
In urxvt, after I pressed ENTER for ". translate.sh chie", urxvt disappeared.

Last edited by henrynott (2012-05-21 11:15:13)

Offline

#17 2012-05-22 05:09:30

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

Hmm... I don't know what that would be, henrynott. It's still working fine for me.

Have you tried it in a different terminal from urxvt? (I use urxvt though)
Try 'bash -x /path/to/translate.sh' for more detailed output on what exactly is happening.

...and welcome to Crunchland!

Last edited by johnraff (2012-05-22 05:10:09)


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#18 2012-05-22 13:24:19

henrynott
New Member
Registered: 2012-05-21
Posts: 2

Re: Access google translate from a terminal.

Update: By using './translate.sh', it works! Thank you, John!
--------------------
Thank you, johnraff, for teaching me how to use 'bash -x' to find the problem. I find that my html2text does not have utf8 patch, thus I download a utf8 patch from http://www.mbayer.de/html2text/downloads/ along with the source (html2text-1.3.2a.tar.gz). (Because I am a newbie, if I do something wrong, please correct me smile, ) cp the patch into extracted folder of the source, and type 'patch < patch-utf8-html2text-1.3.2a.diff', the output:

patching file Area.C
patching file Area.h
patching file format.C
patching file html2text.C
patching file html.h
patching file sgml.C
patching file table.C

The compiling is successful but translate.sh does not work. (I've tried html2text-1.3.2.tar.gz but the compiling failed.) Here is some of the output for 'bash -x /path/to/translate.sh':

bash -x translate.sh legs fr en
+ help='translate <text> [[<source language>] <target language>]
if target missing, use DEFAULT_TARGET_LANG
if source missing, use auto'
+ DEFAULT_TARGET_LANG=en
+ [[ legs = -h ]]
+ [[ legs = --help ]]
+ [[ -n en ]]
+ source=fr
+ target=en
++ curl -s -i --user-agent '' -d sl=fr -d tl=en --data-urlencode text=legs http://translate.google.com
+ result='HTTP/1.1 200 OK
Date: Tue, 22 May 2012 13:14:57 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, must-revalidate
Pragma: no-cache
X-Frame-Options: SAMEORIGIN
Content-Type: text/html; charset=ISO-8859-1
Content-Language: en
Set-Cookie: PREF=ID=e6ddc73d745d448e:TM=1337692497:LM=1337692497:S=A5-__lNuvHHhudtW; expires=Thu, 22-May-2014 13:14:57 GMT; path=/; domain=.google.com
Set-Cookie: NID=60=ItPUe2C2ixOuzf481HIC5KaJLCGuI_WpN1mwylgnKGS5waLAIuX3cfUqs2YD4MNTiuRdkeaWgXO-ZIGHKf4aVmcPO6U7qArUTuY9DluMgl2HcL1mdN6kk8PEyUxC_7il; expires=Wed, 21-Nov-2012 13:14:57 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
X-Content-Type-Options: nosniff
Server: HTTP server (unknown)
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked

<!DOCTYPE html>/very/long/</html>'
++ awk '/Content-Type: .* charset=/ {sub(/^.*charset=["'\'']?/,""); sub(/[ "'\''].*$/,""); print}'
+ encoding=$'ISO-8859-1\r'
+ html2text -utf8
+ awk 'BEGIN {RS="</div>"};/<span[^>]* id=["'\'']?result_box["'\'']?/'
+ iconv -f $'ISO-8859-1\r'
legacy
+ exit

It did produce the result 'legacy' but I cannot find errors in the output. When I type '. translate.sh legs fr en', I still get my urxvtc shut down... I guess there's something wrong with my urxvt? But I don't know how... Thank you for your time!

Last edited by henrynott (2012-05-22 23:42:47)

Offline

#19 2012-05-22 16:21:14

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

henrynott wrote:

When I type '. translate.sh legs fr en', I still get my urxvtc shut down...

Try typing './translate.sh legs fr en'

btw I don't think there should have been any need to recompile html2text if you're running Debian stable. My version, from the repositories, is  1.3.2a-15


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#20 2012-06-24 22:25:58

wuy
#! Member
Registered: 2009-02-05
Posts: 76

Re: Access google translate from a terminal.

This is very handy. I use mainly the pop-up thingy.
In Chrome there's the excellent dictionary extension but for Firefox and else this one is great. Plus for a quick translation like in these post boxes while writing (if I know some word in a language I just type it and then translate it for the language I'm typing).
I didn't have my back-up with me so I came to fetch the scripts and so I can thank belatedly johnraff.

Just a note: the first script has a commented line (#iconv -f $encoding....). It's no use, right (I can just delete it)?

Offline

#21 2012-06-26 16:19:03

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

Hi wuv, yes apart from the initial "shebang" ('#!/bin/bash') any commented lines can be omitted. I left that #iconv... line in because it was in the original post and I wanted to show off how much shorter the replacement was... roll


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

#22 2014-05-08 09:13:59

pasckoch
New Member
Registered: 2014-05-08
Posts: 1

Re: Access google translate from a terminal.

Hi Johnraff,
I did a same script in 2012 with PHP. It's better to script bash and call with php but I worked without the linux access.
Recently I moved the protocole to https otherwise Google response http code is 301 (moved page).
Same issue in your code:

result=$(curl -s -i --user-agent "" -d "sl=$source" -d "tl=$target" --data-urlencode "text=$1" http://translate.google.com)

to

result=$(curl -s -i --user-agent "" -d "sl=$source" -d "tl=$target" --data-urlencode "text=$1" https://translate.google.com)

Regards,

Last edited by pasckoch (2014-05-08 09:15:50)

Offline

#23 2014-05-09 13:36:11

johnraff
nullglob
From: Nagoya, Japan
Registered: 2009-01-07
Posts: 4,148
Website

Re: Access google translate from a terminal.

Hi pasckoch, thank you for catching that.
About a week ago I noticed that the page was being redirected but didn't realize it was because it needed an https protocol.

I had a fix ready to post, but yours is much simpler and better!

curl is very powerful and flexible so you can use the -L and --post302 options to access the redirected page. You will get back both sets of headers though, so the awk code for getting the encoding had to be modified to pick out the last time it was sent. Just for reference, this is what I did

result=$(curl -s -i -L --post302 --user-agent "" -d "sl=$source" -d "tl=$target" --data-urlencode "text=$1" http://translate.google.com)
# after redirect (get both sets of headers) use last version of encoding:
encoding=$(awk '/Content-Type: .* charset=/ {sub(/^.*charset=["'\'']?/,""); sub(/[ "'\''].*$/,""); OUT=$0}END{print OUT}' <<<"$result")

...but, your fix just needs a single 's' to be added.  cool

I'll edit the script in the first post.


John
--------------------
( a boring Japan blog , Japan Links, idle twitterings  and GitStuff )
#! forum moderator    BunsenLabs

Offline

Board footer

Powered by FluxBB

Copyright © 2012 CrunchBang Linux.
Proudly powered by Debian. Hosted by Linode.
Debian is a registered trademark of Software in the Public Interest, Inc.
Server: acrobat

Debian Logo