Cache JSON in PHP

function getJson($url) {

    // cache files are created like cache/abcdef123456...
    $cacheFile = dirname(__FILE__) . DIRECTORY_SEPARATOR . 'cache' . DIRECTORY_SEPARATOR . md5($url);

    if (file_exists($cacheFile)) {
        $fh = fopen($cacheFile, 'r');
        $cacheTime = trim(fgets($fh));

        // if data was cached recently, return cached data
        if ($cacheTime > strtotime('-60 minutes')) {
            return fread($fh);
        }

        // else delete cache file
        fclose($fh);
        unlink($cacheFile);
    }

    // define your json here
    $data = /* define data */
    $json = json_encode($data);

    $fh = fopen($cacheFile, 'w');
    fwrite($fh, time() . "\n");
    fwrite($fh, $json);
    fclose($fh);

    return $json;
}

Sources

Advertisements

WordPress shortcodes

First, create a function with $atts as an argument. It holds all the attributes you create in the shortcode.

Simple shortcode

function foobar_func( $atts ){
  return "foo and bar";
}
add_shortcode( 'foobar', 'foobar_func' );

The shortcode to be used:

[foobar]

Shortcode with attributes

function bartag_func( $atts ) {
  $a = shortcode_atts( array(
    'foo' => 'something', // 'something' is default value
    'bar' => 'something else', // 'something else' is default value
  ), $atts );
  return "foo = {$a['foo']}";
}
add_shortcode( 'bartag', 'bartag_func' );

The shortcode to be used:

[bartag foo="value" bar="other value"]

Enclosing shortcode

Enclosing shortcodes need not only the $atts argument but a $content argument. This argument is what is enclosed by the shortcode. Use $content = null so it will work even if there’s no content.

function caption_shortcode( $atts, $content = null ) {
  return '<span class="caption">' . $content . '</span>';
}
add_shortcode( 'captions', 'caption_shortcode' );

Example shortcut:

[captions]My Caption[/captions]

Output:

<span class="caption">My Caption</span>

Sources

 

Data scraping

Scraping data from HTML

Required gems:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

Open HTML file:

base_url = 'http://www.example.com'
page = Nokogiri::HTML(open(base_url))

Select method 1—using XPath:

page_content = page.xpath("//div[@class='page']/a")

Select method 2—using CSS selectors:

page_content = Nokogiri::HTML(open(page)).css(".page a")

This method is preferred, since you can subselect from it:

page_content = Nokogiri::HTML(open(base_url))
page_table = page_content.css("table")
table_special_row = page_table.css("td.special")

When many pages must be accessed, pause the script for some seconds, so you don’t overburden the site.

sleep 4

It’s also recommended to download site data if many requests will be needed:

file = open('page.html', 'w') {|f| f.write(page.readlines)}

Saving files

Required gems:

require 'rubygems'
require 'open-uri'

Create a directory:

Dir.mkdir('pictures') unless File.exists?('pictures')

Saving a list of image files saved in a text file:

# open txt file and removes its BOM, loops through its lines
File.open('pictures.txt', "r:bom|utf-8").readlines.each do |line|
  #removes spaces from URLs
  url = line.gsub(/\s+/, "")
  filename = url.gsub('http://example.com/', '')

  # use begin/reach to handle 404 errors so the script is not aborted
  begin
    picture = open(url)
    File.open("fotos/#{filename}", 'wb') do |f|
      f.write(foto.read)
      puts "saved #{filename}"
    end
  rescue OpenURI::HTTPError
    puts "error saving #{filename}"
  end
end

Useful resources

csv

Creates a CSV file. Each line represents a row:

require 'csv'

CSV.open("scrape.csv", "w") do |csv|
 csv << ["value 1", "value 2", "value 3", "value 4"]
 csv << ["value 5", "value 6", "value 7", "value 8"]
end

When inserting data from strings or arrays, there might be some encoding problems, so the encode method must be used in each of the values of the row.

csv << [string.encode('UTF-8'), hash['key'].encode('UTF-8')]

strip

Removes leading and trailing whitespace:

"   hello   ".strip #=> "hello"
 "\tgoodbye\r\n".strip #=> "goodbye"

gsub

Replaces values in string:

page_table_rows = page.css('tr')
page_table_rows.each do |row|
  row_string = row.to_s  # = <tr><td>Value 1</td><td>Value 2</td></tr>
  row_string.gsub!('<tr><td>', '')  # = Value 1</td><td>Value 2</td></tr>
  row_string.gsub!('</td><td>', ', ') # = Value 1, Value 2</td></tr>
  row_string.gsub!('</td></tr>', '') # = Value 1, Value 2
end

Regex

Use match and the array it creates to get values:

"22/7/2014".match('(0[1-9]|[12][0-9]|3[01])[- /.]([1-9]|1[012])[- /.](19|20)\d\d') => #<MatchData "22/7/2014" 1:"22" 2:"7" 3:"20">

References